# Question Answering with BERT and HuggingFace

This notebook shows a simple example of using pre-trained BERT models from HuggingFace. What will be shown:
- Loading a pre-trained model for Question Answering.
- Evaluating the model of some simple examples
- Fine-tune training the model on a new dataset
- Evaluating the fine-tuned trained model

From pipelines from HuggingFace we can find pre-trained transformer models for a variety of tasks like sentiment analysis, question answers, etc --> [Link](https://huggingface.co/docs/transformers/v4.21.0/en/main_classes/pipelines)

Two parameters will be give to the pipeline: task (at the moment has 16 options) and model (a model identifier or an actual instance of a pretrained model inheriting from PreTrainedModel for PyTorch). When calling the pipeline for a specific model, sometimes the task parameter can be ignored as the model already has that information.


For question answering we can refer to --> [Link](https://huggingface.co/docs/transformers/main_classes/pipelines#the-pipeline-abstraction) and we use this model [Model](https://huggingface.co/distilbert-base-cased-distilled-squad) we can see from the documentation of the model what it has been trained on.

In [1]:
#!pip install transformers

In [4]:
from transformers import pipeline



In [5]:
QA = pipeline(task="question-answering", model="distilbert-base-cased-distilled-squad")

If we chose not use the pipleline, we have to use a tokenizer to process the sentence first. Then 'with no grad' we pass the dictionary returned by the tokenizer to the model. The model link gives an example [Link](https://huggingface.co/distilbert-base-cased-distilled-squad). The examples don't have much details usually and you would need to try a few times to become familiar how they actually work. 

Sometimes even the format of the returned parameters from the model are not discussed or shown!

### Simple test 1

In [32]:
context = """
I have to pick up the kids from the school at 5 in the afternoon today. But tomorrow I have to do that two hours earlier
"""

In [33]:
questions = [
            "When should I be at the school today?",
             "What time should I pick them up tomorrow?",
             "From where should I pick them up?"
          ]

answers = QA(question = questions, context = context)

print(answers)


[{'score': 0.3463056683540344, 'start': 47, 'end': 65, 'answer': '5 in the afternoon'}, {'score': 0.4067622125148773, 'start': 47, 'end': 65, 'answer': '5 in the afternoon'}, {'score': 0.8328907489776611, 'start': 33, 'end': 43, 'answer': 'the school'}]


We can see that the model returns a list of dictionaries. The text of the answer can be retrieved by the key 'answer' and the score given by the 'score'. Location of the answer is also return by 'start' and 'end' key.

In [36]:
for question, answer in zip(questions, answers):
    print(question, "\n>> " + answer['answer'] + "\nCorrectness score: " + str(answer['score']) + "\n")

When should I be at the school today? 
>> 5 in the afternoon
Correctness score: 0.3463056683540344

What time should I pick them up tomorrow? 
>> 5 in the afternoon
Correctness score: 0.4067622125148773

From where should I pick them up? 
>> the school
Correctness score: 0.8328907489776611



Well, OK but not so smart perhaps.

## Simple test 2

A paragraph from wikipedia about San Francisco's 1906 earth quake.

In [38]:
context = """
The 1906 earthquake preceded the development of the Richter magnitude scale by three decades. 
The most widely accepted estimate for the magnitude of the quake on the modern moment magnitude scale is 7.9. 
Values from 7.7 to as high as 8.3 have been proposed.
According to findings published in the Journal of Geophysical Research, 
severe deformations in the earth's crust took place both before and after the earthquake's impact. 
Accumulated strain on the faults in the system was relieved during the earthquake, 
which is the supposed cause of the damage along the 450-kilometre-long (280 mi) segment of the San Andreas plate boundary.
The 1906 rupture propagated both northward and southward for a total of 296 miles (476 km).
Shaking was felt from Oregon to Los Angeles, and as far inland as central Nevada.

A strong foreshock preceded the main shock by about 20 to 25 seconds. 
The strong shaking of the main shock lasted about 42 seconds. 
There were decades of minor earthquakes, more than at any other time in the historical record for northern California, 
before the 1906 quake. Previously interpreted as precursory activity to the 1906 earthquake, 
they have been found to have a strong seasonal pattern,
and are now believed to be due to large seasonal sediment loads in coastal bays that overlie faults as a result of 
the erosion caused by hydraulic mining in the later years of the California Gold Rush.
"""

In [39]:
questions = ["Was Ritcher scale known in 1906?",
             "What is the magnitude estimate of 1906 earthquake?",
             "What is the highest proposed magnitude estimate of 1906 earthquake?",
             "How long did the main shock last?"]

answers = QA(question=questions, context=context)

for question, answer in zip(questions, answers):
    print(question, "\n>> " + answer['answer'] + "\nCorrectness score: " + str(answer['score']) + "\n")

Was Ritcher scale known in 1906? 
>> The 1906 rupture propagated both northward and southward
Correctness score: 0.021126631647348404

What is the magnitude estimate of 1906 earthquake? 
>> 7.9
Correctness score: 0.763514518737793

What is the highest proposed magnitude estimate of 1906 earthquake? 
>> 7.9
Correctness score: 0.37356507778167725

How long did the main shock last? 
>> 42 seconds
Correctness score: 0.6660254597663879



# Pre-trained data evaluation

We try to evaluate the performance of the model on [TyDi QA] (https://ai.google.com/research/tydiqa) dataset.

Dataset has been downloaded locally from [Link](https://storage.googleapis.com/nlprefresh-public/tydiqa_data.zip)

In [41]:
from datasets import load_from_disk

In [46]:
path = './Data/tydiqa_data'
tydiqa_data = load_from_disk(path)
tydiqa_data

DatasetDict({
    train: Dataset({
        features: ['passage_answer_candidates', 'question_text', 'document_title', 'language', 'annotations', 'document_plaintext', 'document_url'],
        num_rows: 9211
    })
    validation: Dataset({
        features: ['passage_answer_candidates', 'question_text', 'document_title', 'language', 'annotations', 'document_plaintext', 'document_url'],
        num_rows: 1031
    })
})