# Document QA

 This is the implementation of paper [Simple and Effective Multi-Paragraph Reading Comprehension](https://arxiv.org/pdf/1710.10723.pdf) (Clark et. al., 2017).
- Code is adapted from [https://github.com/allenai/document-qa](https://github.com/allenai/document-qa)
- Please contact the original authors for questions and suggestions. 

This model, by [Clark et. al., 2017](https://arxiv.org/pdf/1710.10723.pdf), considers the problem of adapting neural paragraph-level question answering models to the case where entire documents are given as input. Here the authors proposed a solution that trains models to produce well calibrated confidence scores for their results on individual paragraphs. This model sample multiple paragraphs from the documents during training and use a shared normalization training objective that encourages the model to produce globally correct output. Next, this method is combined with a state-of-the-art pipeline for training models on document QA data.


## Setup
### Dependencies
We require python >= 3.5, tensorflow 1.10, and a handful of other supporting libraries. 
Tensorflow should be installed separately following the docs. To install the other dependencies use the commands below from a terminal in the linux DLVM first.

```

conda create --name <myenv> Python=3.6
source activate <myenv>
cd /home/<user>/notebooks
git clone https://github.com/antriv/Transfer_Learning_Text.git
cd Transfer_Learning_Text/Transfer_Learning/document-qa/
pip install -r requirements.txt
pip install tensorflow-gpu
python -m nltk.downloader punkt stopwords
```

To make this environment available in a Linux DLVM JupyterHub:

```
source activate <myenv>
pip install ipykernel
sudo "/home/<user>/.conda/envs/<myenv>/bin/python" -m ipykernel install --name <myenv> --display-name "<myenv>"
```

To run this notebook, choose the kernel `"<myenv>"`

### Word Vectors
The models we train use the common crawl 840 billion token GloVe word vectors from [here](https://nlp.stanford.edu/projects/glove/).

```
cd /home/<user>/notebooks/Transfer_Learning_Text/Transfer_Learning/document-qa/
mkdir -p docqa/glove
cd docqa/glove
wget http://nlp.stanford.edu/data/glove.840B.300d.zip
unzip glove.840B.300d.zip
rm glove.840B.300d.zip
```

### Pre-Trained Models
We have four pre-trained models

1. "squad" Our model trained on the standard SQuAD dataset, this model is listed on the SQuAD leaderboard 
as BiDAF + Self Attention

2. "squad-shared-norm" Our model trained on document-level SQuAD using the shared-norm approach. 

3. "triviaqa-web-shared-norm" Our model trained on TriviaQA web with the shared-norm approach. This 
is the model we used to submit scores to the TriviaQA leader board.
 
4. "triviaqa-unfiltered-shared-norm" Our model trained on TriviaQA unfiltered with the shared-norm approach.
This is the model that powers our demo.

The models can be downloaded [here](https://drive.google.com/open?id=1Hj9WBQHVa__bqoD5RIOPu2qDpvfJQwjR)

The models use the cuDNN implementation of GRUs by default, which means they can only be run on
the GPU. We also have much slower, but CPU compatible, versions [here](https://drive.google.com/open?id=1NRmb2YilnZOfyKULUnL7gu3HE5nT0sMy).

Once the models is downloaded, store them in:

```

cd /home/<user>/notebooks/Transfer_Learning_Text/Transfer_Learning/document-qa/
mkdir -p  pretrained_models
mkdir -p models
```

## Testing from Linux DLVM Terminal

### User Input
"docqa/scripts/1_run_on_user_documents.py" serves as a heavily commented example of how to run our models 
and pre-processing pipeline on other kinds of text. For example:
 
 ```
 
 source activate <myenv>
 cd /home/<user>/notebooks/Transfer_Learning_Text/Transfer_Learning/document-qa/
 python docqa/scripts/1_run_on_user_documents.py /path/to/model/directory 
 "Who wrote the satirical essay 'A Modest Proposal'?"  
 ~/data/triviaqa/evidence/wikipedia/A_Modest_Proposal.txt 
 ~/data/triviaqa/evidence/wikipedia/Jonathan_Swift.txt
 ```
 
### Demo on static document
You may try any question by keeping the paragraph content constant. Here we use Harry Shum's Book "Future Computed" as out static document. 
`python docqa/scripts/2_run_on_static_documents.py /path/to/model/directory 
 "What is AI Law?"`

### Bot-Like Experience on static document
You may create a bot using your own paragraph/document. We use the triviaqa-unfiltered-shared-norm model for this test. Here we use Harry Shum's Book "Future Computed" as our static document. 
Now, we need to operationalize the model on this document. We use python Flask API to operationalize the model locally.
```
python 3_run_flask_api_local_on_static_documents.py 
```
This operationalizes the model at port 5000. To test the bot locally we can run:
```
python 4_local_static_request.py  "What is AI Law"?
```

## Testing from this Notebook

### Bot-Like Experience on Future Computed Book

You may create a bot using your own paragraph/document. We use the triviaqa-unfiltered-shared-norm model for this test. Here we use Harry Shum's Book "Future Computed" as our static document. 

Now, we need to operationalize the model on this document. We use python Flask API to operationalize the model locally.

In JupyterHub, click on "New" button on top-right and open a "Terminal". Inside the JupyterHyub "Terminal":

```
cd notebooks/document-qa
"/home/<user>/.conda/envs/<myenv>/bin/python" docqa/scripts/3_run_flask_api_local_on_static_documents.py
```
This operationalizes the model locally on DLVM at port 5000. And the flask server is running at JupyterHub Terminal.


To test the model locally we can run:

In [8]:
!"/home/antriv/.conda/envs/tf_1.2_py35/bin/python" docqa/scripts/4_local_static_request.py "what is AI Law"

AI law will emerge


We can go on changing the question and test the model for different questions from this notebook