# Quick start: Using Jack models

## Prerequisites

**Note:** these commands need to be run in terminal from the root of Jack.

Download GloVe [[1]](#ref1) vectors:
> `sh data/GloVe/download.sh`

Download pretrained FastQA [[2]](#ref2) and DAM [[3]](#ref2) models:
> `wget -O fastqa.zip https://www.dropbox.com/s/qb796uljoqj0lvo/fastqa.zip?dl=1`

> `wget -O dam.zip http://data.neuralnoise.com/jack/natural_language_inference/dam.zip`

Prepare the model for use:
> `unzip fastqa.zip`

> `unzip dam.zip`

First, let's get the imports sorted:

In [1]:
%load_ext autoreload
%autoreload 2
import os
os.chdir('..')    # change dir to Jack root

In [2]:
from jack import readers
from jack.core import QASetting
from jack.io.load import load_jack
from notebooks.prettyprint import QAPrettyPrint, print_nli

## Usecase: Question Answering (QA)

Load the (previously downloaded) pretrained FastQA [[2]](#ref2) model:

In [3]:
fastqa_reader = readers.reader_from_file("./fastqa")

INFO:tensorflow:Restoring parameters from ./fastqa/model_module


Let's define a reading comprehension _paragraph_ and a _question_ from the SQuAD [[4]](#ref4) corpus:

In [4]:
paragraph = """It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. 
At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary."""

question = "To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?"

We merge them into a single `QASetting` data structure. This structure requires a _question_ and a _list of supporting documents_:

In [5]:
qa_setting = QASetting(question=question, support=[paragraph])

Feed the `qa_setting` (paragraph and the question) structure into the reader to get the _answers_:

In [6]:
answers = fastqa_reader([qa_setting])

The answer can be found here:

In [7]:
answers[0][0].text

'Saint Bernadette Soubirous'

...together with the answer span, which we use to highlight the answer in the text:

In [8]:
QAPrettyPrint(paragraph, answers[0][0].span)

...and the score of the answer:

In [9]:
answers[0][0].score

0.9918955

We can also predict the top _k_ answers of our model instead of just the best scoring one:

In [10]:
top_k = 10
fastqa_reader.model_module.set_topk(top_k)
answers = fastqa_reader([qa_setting])
for i, a in enumerate(answers[0]):
    print("Answer %d:   %s \t (score: %.5f)" % (i, a.text, a.score))

Answer 0:   Saint Bernadette Soubirous 	 (score: 0.99190)
Answer 1:   Saint Bernadette Soubirous 	 (score: 0.99190)
Answer 2:   Bernadette Soubirous 	 (score: 0.00798)
Answer 3:   Saint Bernadette 	 (score: 0.00006)
Answer 4:   Soubirous 	 (score: 0.00003)
Answer 5:   Saint Bernadette Soubirous in 	 (score: 0.00002)
Answer 6:   Saint Bernadette Soubirous in 1858 	 (score: 0.00000)
Answer 7:   to Saint Bernadette Soubirous 	 (score: 0.00000)
Answer 8:   Saint Bernadette Soubirous in 1858. 	 (score: 0.00000)
Answer 9:   Saint 	 (score: 0.00000)


## Usecase: Natural Language Inference (NLI)

We first load a pretrained DAM [[3]](#ref3) model:

In [11]:
dam_reader = readers.reader_from_file("./dam")

INFO:tensorflow:Restoring parameters from ./dam/model_module


and next some Natural Language Inference examples from the SNLI corpus [[5]](#ref5):

In [12]:
premise = "A wedding party is taking pictures."
hypothesis1 = "A group of people is celebrating."
hypothesis2 = "A rock band is giving a concert."

In the NLI case, the answer is a label among {_"entailment"_, _"neutral"_, _"contradiction"_}.

We again use the same `QASetting` data structure as above to feed Jack the input entailment data:

In [13]:
snli_setting1 = QASetting(question=hypothesis1, support=[premise])
snli_setting2 = QASetting(question=hypothesis2, support=[premise])

We generate predictions by calling the reader with these inputs:

In [14]:
prediction = dam_reader([snli_setting1])
print_nli(premise, hypothesis1, prediction[0][0].text)

prediction = dam_reader([snli_setting2])
print_nli(premise, hypothesis2, prediction[0][0].text)

A wedding party is taking pictures.	--(entailment)-->	A group of people is celebrating.
A wedding party is taking pictures.	--(contradiction)-->	A rock band is giving a concert.


...and we can again also inspect prediction scores:

In [15]:
print(prediction[0][0].score)

0.99598


## References:

<a id='ref1'>[1]</a> Jeffrey Pennington, Richard Socher, and Christopher Manning. <a href='http://www.aclweb.org/anthology/D14-1162'>"Glove: Global vectors for word representation."</a> Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.

<a id='ref2'>[2]</a> Dirk Weissenborn, Georg Wiese, and Laura Seiffe. <a href='http://www.aclweb.org/anthology/K17-1028'>"Making neural qa as simple as possible but not simpler."</a> Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL). 2017.</a>

<a id='ref3'>[3]</a> Ankur Parikh, Oscar Täckström, Dipanjan Das, Jakob Uszkoreit . <a href='http://www.aclweb.org/anthology/D14-1162'>"A Decomposable Attention Model for Natural Language Inference."</a> Proceedings of the 2016 conference on empirical methods in natural language processing (EMNLP). 2016. 


<a id='ref4'>[4]</a> Pranav Rajpurkar, et al. <a href='http://www.anthology.aclweb.org/D/D16/D16-1264.pdf'>"SQuAD: 100,000+ Questions for Machine Comprehension of Text."</a> Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2016.


<a id='ref5'>[5]</a> Samuel Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. <a href='http://www.anthology.aclweb.org/D/D16/D16-1264.pdf'>"A large annotated corpus for learning natural language inference."</a> In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2015.
