## Question answering with BERT and SQuAD
Stanford Question Answering Dataset (SQuAD) (https://rajpurkar.github.io/SQuAD-explorer/) is a reading comprehension dataset, consisting of questions posed by crowd workers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

Bidirectional Encoder Representations from Transformers (BERT) (https://arxiv.org/abs/1810.04805) is a technique for NLP (Natural Language Processing) pre-training developed by Google. BERT was created and published in 2018 by Jacob Devlin and his colleagues from Google.

In this playground we create a SQuAD model by using a pre-trained BERT base model as the backbone and adding an additional fully connected layer to the end of the model and train it using the SQuAD dataset.

**Note:** If the SQuAD dataset is not available on this machine, the following code can take longer the first time it is executed as the dataset needs to be downloaded and intialized. Please be patient and avoid interrupting the process during the download.

In [1]:
from fireball import Model, myPrint
from fireball.datasets.squad import SquadDSet

import time, os

gpus = "upto4"

# Preparing the dataset and model (Downloading them if necessary)
SquadDSet.download()
Model.downloadFromZoo("BertUncasedL12O768H12NoPool.npz", "./Models/")
 
trainDs,testDs = SquadDSet.makeDatasets("Train,Test", batchSize=128, version=1 )
SquadDSet.printDsInfo(trainDs, testDs)
SquadDSet.printStats(trainDs, testDs)

Downloading from "https://fireball.s3.us-west-1.amazonaws.com/data/SQuAD/SQuAD.zip" ...
Extracting "/pa/home/hamidirads/data/SQuAD/SQuAD.zip" ...
Deleting "/pa/home/hamidirads/data/SQuAD/SQuAD.zip" ...
Initializing tokenizer from "/data/SQuAD/vocab.txt" ... Done. (Vocab Size: 30522)
SquadDSet Dataset Info:
    Dataset Location ............................... /data/SQuAD/
    Number of Training Samples ..................... 87844
    Number of Test Samples ......................... 10833
    Dataset Version ................................ 1
    Max Seq. Len ................................... 384
    +----------------------+--------------+--------------+
    | Parameter            | Training     | Test         |
    +----------------------+--------------+--------------+
    | NumQuestions         | 87451        | 10570        |
    | NumGoodQuestions     | 87451        | 10570        |
    | NumAnswers           | 87844        | 35556        |
    | NumContexts          | 18896        

## Create Fireball model, print model information, and train on SQuAD dataset
Now we create a model and initialize it's parameters from the BERT-base pre-trained model. The last fully connected layer is initialized randomly. The file ```BertUncasedL12O768H12NoPool.npz``` contains the parameters extracted from Google's original pre-trained BERT-base. 

In [2]:

layersInfo='EMB_L512_O768_S.02:None,LN_E1e-12:None:DO_R0.1;' \
           '12*BERT_O768_I3072_H12:GELU;' \
           'FC_O2:None:L2R,ANSWER'

# For the learning rate, we start at 0.00004 and train only the last fully connected
# layer (fixing the main BERT models) for 100 batches. After that, we train the whole 
# model end-to-end for 900 more batches. Then we change the learning rate to 0.00003 
# and train for 3000 batchs, before changing it to 0.00002 and train until end of training.
learningRate = [(0,4e-5),(100,'trainAll'),(1000,3e-5),(4000,2e-5)]

model = Model.makeFromFile("Models/BertUncasedL12O768H12NoPool.npz",
                           name='Bert-SQuAD', layersInfo=layersInfo,
                           trainDs=trainDs, testDs=testDs,
                           batchSize=32, numEpochs=2, regFactor=0.0001,
                           learningRate=learningRate, optimizer='Adam',
                           gpus=gpus)

model.printLayersInfo()
model.initSession()
model.printNetConfig()
model.train()
model.save("Models/BertSquad.fbm")


Reading from "Models/BertUncasedL12O768H12NoPool.npz" ... Done.
Creating the fireball model "Bert-SQuAD" ... [93m
Done.

Scope            InShape       Comments                 OutShape      Activ.   Post Act.        # of Params
---------------  ------------  -----------------------  ------------  -------  ---------------  -----------
IN_EMB           ≤512 2                                 ≤512 768      None                      23,835,648 
S1_L1_LN         ≤512 768                               ≤512 768      None     DO:0.1           1,536      
S2_L1_BERT       ≤512 768      768/3072, 12 heads       ≤512 768      GELU                      7,087,872  
S2_L2_BERT       ≤512 768      768/3072, 12 heads       ≤512 768      GELU                      7,087,872  
S2_L3_BERT       ≤512 768      768/3072, 12 heads       ≤512 768      GELU                      7,087,872  
S2_L4_BERT       ≤512 768      768/3072, 12 heads       ≤512 768      GELU                      7,087,872  
S2_L5_BERT   

### Some Notes about the above results:
- The warning message above means the 2 tensors (Weights and biases) for the fully connected layer ```S3_L1_FC``` were initialized randomly because they were not in the ```BertUncasedL12O768H12NoPool.npz``` model file. This is expected because we added this layer to use the multi-task BERT model for the task of Question Answering.
- In the **Network Configuration** section, "Non-Transfered Tensors" means the ones that were not transfered from the pre-trained model and therefore initialized randomly as explained above. 
- The notation "≤512" in the "InShape" and "OutShape" columns means the sequence of tokens can be less than or equal to 512 for this model.
- This is an example of how flexible the definition of "Learning Rate" is in Fireball. Review the comments and syntax of the learning rate in the above code.


## Quick inference demonstration
Here we have a "context" which is a paragraph about InterDigital copied from Wikipedia and 3 different questions related to the context. We use our model to answer the questions.

In [3]:
context = r"""
InterDigital is a technology research and development company that provides wireless and video technologies for 
mobile devices, networks, and services worldwide. Founded in 1972, InterDigital is listed on NASDAQ and is 
included in the S&P SmallCap 600. InterDigital had 2020 revenue of $359 million and a portfolio of about 
32,000 U.S. and foreign issued patents and patent applications.
"""

print(context)
questions = [
    "When was InterDigital established?",
    "How much was InterDigital's revenue in 2020?",
    "What does InterDigital provide?",
]

for i, question in enumerate(questions):
    sample, spans = testDs.tokenizer.makeModelInput(context, question, returnSpans=True)

    startTok, endTok = model.inferOne(sample)

    answer = testDs.tokenizer.getTextFromTokSpan(sample[0], context, spans[0], startTok, endTok)
    print("\nQ%d: %s\n    %s"%(i+1, question, answer))
    


InterDigital is a technology research and development company that provides wireless and video technologies for 
mobile devices, networks, and services worldwide. Founded in 1972, InterDigital is listed on NASDAQ and is 
included in the S&P SmallCap 600. InterDigital had 2020 revenue of $359 million and a portfolio of about 
32,000 U.S. and foreign issued patents and patent applications.


Q1: When was InterDigital established?
    1972

Q2: How much was InterDigital's revenue in 2020?
    $359 million 

Q3: What does InterDigital provide?
    wireless and video technologies 


## Evaluating the model
This code runs inference on all questions in the test dataset and compares the results with the ground-truth. The evaluation scores are calculated based on the original evaluation code provided by SQuAD dataset.

In [4]:
model = Model.makeFromFile("Models/BertSquad.fbm", testDs=testDs, gpus=gpus)   
model.initSession()
results = model.evaluate()


Reading from "Models/BertSquad.fbm" ... Done.
Creating the fireball model "Bert-SQuAD" ... Done.
  Processed 10833 Samples. (Time: 55.83 Sec.)                              

    Exact Match: 80.851
    f1:          87.902



## Also look at

[Reducing number of parameters of BERT/SQuAD Model](BertSquad-Reduce.ipynb)

[Pruning BERT/SQuAD Model](BertSquad-Prune.ipynb)

[Quantizing BERT/SQuAD Model](BertSquad-Quantize.ipynb)

[Exporting BERT/SQuAD Model to ONNX](BertSquad-ONNX.ipynb)

[Exporting BERT/SQuAD Model to TensorFlow](BertSquad-TF.ipynb)

[Exporting BERT/SQuAD Model to CoreML](BertSquad-CoreML.ipynb)

---

[Fireball Playgrounds](../Contents.ipynb)

