# Introduction

The notebook is intened to experiment with different technologies for the task of **Question and Answering**.

There are two different **type of models**:
- *Open Domain* - They do not require a passed context
- *Reading Comprehension* - They find the answer within a given context

Such models can work in two different **approaches**:
- *Open Book* - The model can access external source of information
- *Closed Book* - The model can only access what has been encoded in its paramters

The **Components** of an Open Domain Q&A are:
- *Retriever* - It finds relevant contexts from an external source given the question (This is the component that differentiate an Open Domain from a Reading Comprehension)
- *Reader* - It locates the position in the context where the answer to the question is (alternatively there can be a *Generator*)
- *Generator*



## Datasets

### SQuAD 2.0

The SQuAD (Stanford Question and Answering Dataset) is a hugely popular dataset containing question and answer pairs scraped from Wikipedia, covering topics ranging from Beyonce, to Physics. 

It is possible to retrieve both the Training and Dev set at this [link](https://rajpurkar.github.io/SQuAD-explorer/).

In [1]:
# Import Standard Libraries
import json

In [2]:
# Read data
with open('./../../../data/squad_train_v2.0.json', 'rb') as file:
    squad_train_data = json.load(file)

with open('./../../../data/squad_dev_v2.0.json', 'rb') as file:
    squad_dev_data = json.load(file)

In [3]:
squad_train_data['data'][0]['paragraphs'][0] # First data are about Beyonce

{'qas': [{'question': 'When did Beyonce start becoming popular?',
   'id': '56be85543aeaaa14008c9063',
   'answers': [{'text': 'in the late 1990s', 'answer_start': 269}],
   'is_impossible': False},
  {'question': 'What areas did Beyonce compete in when she was growing up?',
   'id': '56be85543aeaaa14008c9065',
   'answers': [{'text': 'singing and dancing', 'answer_start': 207}],
   'is_impossible': False},
  {'question': "When did Beyonce leave Destiny's Child and become a solo singer?",
   'id': '56be85543aeaaa14008c9066',
   'answers': [{'text': '2003', 'answer_start': 526}],
   'is_impossible': False},
  {'question': 'In what city and state did Beyonce  grow up? ',
   'id': '56bf6b0f3aeaaa14008c9601',
   'answers': [{'text': 'Houston, Texas', 'answer_start': 166}],
   'is_impossible': False},
  {'question': 'In which decade did Beyonce become famous?',
   'id': '56bf6b0f3aeaaa14008c9602',
   'answers': [{'text': 'late 1990s', 'answer_start': 276}],
   'is_impossible': False},
  {'q

In [4]:
# Retrieve all the Q&A pairs
q_a_train_data = []

# Loop through groups -> paragraphs -> qa_pairs
for group in squad_train_data['data']:
    for paragraph in group['paragraphs']:

        # Retrieve context
        context = paragraph['context']

        for qa_pair in paragraph['qas']:

            # Retrieve question
            question = qa_pair['question']

            # Check if there is 'answers' or 'plausible_answers'
            if 'answers' in qa_pair.keys() and len(qa_pair['answers']) > 0:
                answer_list = qa_pair['answers']
            elif 'plausible_answers' in qa_pair.keys() and len(qa_pair['plausible_answers']) > 0:
                answer_list = qa_pair['plausible_answers']
            else:
                # Check if no answer is given
                answer_list = []

            # Retrieve just the text from each answer
            answer_list = [item['text'] for item in answer_list]

            # Remove duplicates
            answer_list = list(set(answer_list))

            # Add each answer to the dataset
            for answer in answer_list:
                # append dictionary sample to parsed squad
                q_a_train_data.append({
                    'question': question,
                    'answer': answer,
                    'context': context
                })

# Retrieve all the Q&A pairs
q_a_dev_data = []

# Loop through groups -> paragraphs -> qa_pairs
for group in squad_dev_data['data']:
    for paragraph in group['paragraphs']:

        # Retrieve context
        context = paragraph['context']

        for qa_pair in paragraph['qas']:

            # Retrieve question
            question = qa_pair['question']

            # Check if there is 'answers' or 'plausible_answers'
            if 'answers' in qa_pair.keys() and len(qa_pair['answers']) > 0:
                answer_list = qa_pair['answers']
            elif 'plausible_answers' in qa_pair.keys() and len(qa_pair['plausible_answers']) > 0:
                answer_list = qa_pair['plausible_answers']
            else:
                # Check if no answer is given
                answer_list = []

            # Retrieve just the text from each answer
            answer_list = [item['text'] for item in answer_list]

            # Remove duplicates
            answer_list = list(set(answer_list))

            # Add each answer to the dataset
            for answer in answer_list:
                # append dictionary sample to parsed squad
                q_a_dev_data.append({
                    'question': question,
                    'answer': answer,
                    'context': context
                })

In [5]:
q_a_train_data[0]

{'question': 'When did Beyonce start becoming popular?',
 'answer': 'in the late 1990s',
 'context': 'Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".'}

In [6]:
# Save data
with open('./../../../data/squad_processed_train.json', 'w') as file:
    json.dump(q_a_train_data, file)

with open('./../../../data/squad_processed_dev.json', 'w') as file:
    json.dump(q_a_dev_data, file)

# BERT Model

In [7]:
# Import Standard Libraries
import json
from transformers import BertTokenizer, BertForQuestionAnswering, pipeline

  from .autonotebook import tqdm as notebook_tqdm
2023-12-28 23:03:23.079415: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [8]:
with open('./../../../data/squad_processed_train.json', 'r') as file:
    q_a_train_data = json.load(file)

with open('./../../../data/squad_processed_dev.json', 'r') as file:
    q_a_dev_data = json.load(file)

In [9]:
# Instantiate the tokenizer and the model
tokenizer = BertTokenizer.from_pretrained('deepset/bert-base-cased-squad2')
model = BertForQuestionAnswering.from_pretrained('deepset/bert-base-cased-squad2')

Some weights of the model checkpoint at deepset/bert-base-cased-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [10]:
# Define the pipeline
qa_pipeline = pipeline('question-answering', model=model, tokenizer=tokenizer)

In [11]:
# Retrieve few answers
answers = []

for sample in q_a_train_data[:5]:

    answer = qa_pipeline({
        'question': sample['question'],
        'context': sample['context']
    })

    answers.append({
        'True Label': sample['answer'],
        'Prediction': answer['answer'],
        'Confidence': answer['score']
    })

In [12]:
answers

[{'True Label': 'in the late 1990s',
  'Prediction': 'late 1990s',
  'Confidence': 0.5621357560157776},
 {'True Label': 'singing and dancing',
  'Prediction': 'singing and dancing',
  'Confidence': 0.9938411116600037},
 {'True Label': '2003',
  'Prediction': '(2003),',
  'Confidence': 0.9965661764144897},
 {'True Label': 'Houston, Texas',
  'Prediction': 'Houston, Texas,',
  'Confidence': 0.847782552242279},
 {'True Label': 'late 1990s',
  'Prediction': '1990s',
  'Confidence': 0.6779066324234009}]

# Model Evaluation

## Exact Match (EM)

There are few limitations, since an exact match is quite hard to achieve.

In [14]:
# Retrieve the exact matches
exact_matches = []

for answer in answers:

    if answer['Prediction'] == answer['True Label']:

        exact_matches.append(1)

    else:

        exact_matches.append(0)

print(f'Exact matches: {sum(exact_matches)/len(exact_matches)}')

Exact matches: 0.2
