# Question and Answering with Hugging Face

This notebook is using Hugging Face's BERT implementation with application to the SQuAD 2.0 dataset to answer questions. The dataset consists of training on a paragraph of information and then answering questions from that context. The 2.0 version also has impossible questions in the dataset and a successful model must determine when an answer cannot be found in the context paragraph. The links to the original paper and the updated version. 

https://arxiv.org/pdf/1606.05250.pdf


https://arxiv.org/pdf/1806.03822.pdf




In [0]:
!pip install tensorflow
!pip install transformers
!pip install tensorflow_datasets
!pip install simpletransformers
!pip install apex


Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/13/33/ffb67897a6985a7b7d8e5e7878c3628678f553634bd3836404fef06ef19b/transformers-2.5.1-py3-none-any.whl (499kB)
[K     |████████████████████████████████| 501kB 5.1MB/s 
[?25hCollecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/a6/b4/7a41d630547a4afd58143597d5a49e07bfd4c42914d8335b2a5657efc14b/sacremoses-0.0.38.tar.gz (860kB)
[K     |████████████████████████████████| 870kB 45.6MB/s 
Collecting tokenizers==0.5.2
[?25l  Downloading https://files.pythonhosted.org/packages/d1/3f/73c881ea4723e43c1e9acf317cf407fab3a278daab3a69c98dcac511c04f/tokenizers-0.5.2-cp36-cp36m-manylinux1_x86_64.whl (3.7MB)
[K     |████████████████████████████████| 3.7MB 29.8MB/s 
[?25hCollecting sentencepiece
[?25l  Downloading https://files.pythonhosted.org/packages/74/f4/2d5214cbf13d06e7cb2c20d84115ca25b53ea76fa1f0ade0e3c9749de214/sentencepiece-0.1.85-cp36-cp36m-manylinux1_x86_64.whl (1.0MB)
[K    

# Example : Question Answering

In the following cell, we:

**1.** Import everything from the `transformers` library, as well as `tensorflow` and the `tensorflow_datasets` library. 
<br>
<br>
**2.** Here we are using the simple transformers library. It is built on top of huggingfaces transformers and simplifies some of the steps but still allows for customization.  https://github.com/ThilinaRajapakse/simpletransformers
<br>
<br>
**3.** We will be using the Question Answering Model from simple transformers.

https://github.com/ThilinaRajapakse/simpletransformers#question-answering

<br>
<br>
**4.**  For this example, we'll be using SQUAD 2.0 dataset(Stanford Question and Answer Dataset). We are training with the train-v2.0. For more information, see https://rajpurkar.github.io/SQuAD-explorer/

In [0]:
%tensorflow_version 2.x

In [0]:
import tensorflow as tf
from transformers import * 
import tensorflow_datasets
import torch
import os
import pandas as pd
from simpletransformers.question_answering import QuestionAnsweringModel






In [0]:
train_args = {
    'learning_rate': 3e-5,
    'num_train_epochs': 2,
    'max_seq_length': 384,
    'doc_stride': 128,
    'overwrite_output_dir': True,
    'reprocess_input_data': False,
    'train_batch_size': 2,
    'gradient_accumulation_steps': 8,
}

model = QuestionAnsweringModel('bert', 'bert-base-cased', args=train_args)

For the curious here are the default hyper parameters of the model:

```
self.args = {
    'output_dir': 'outputs/',
    'cache_dir': 'cache_dir/',

    'fp16': True,
    'fp16_opt_level': 'O1',
    'max_seq_length': 512,
    'train_batch_size': 8,
    'gradient_accumulation_steps': 1,
    'eval_batch_size': 8,
    'num_train_epochs': 1,
    'weight_decay': 0,
    'learning_rate': 4e-5,
    'adam_epsilon': 1e-8,
    'warmup_ratio': 0.06,
    'warmup_steps': 0,
    'max_grad_norm': 1.0,

    'logging_steps': 50,
    'save_steps': 2000,
    'evaluate_during_training': False,

    'overwrite_output_dir': False,
    'reprocess_input_data': False,

    'process_count': cpu_count() - 2 if cpu_count() > 2 else 1,
    'n_gpu': 1,
    'silent': False,

    'doc_stride': 384,
    'max_query_length': 64,
    'n_best_size': 20,
    'max_answer_length': 100,
    'null_score_diff_threshold': 0.0
}
```

## To access the data, you will need to go to https://rajpurkar.github.io/SQuAD-explorer/ and download the training data 2.0 and the dev data 2.0.

In [0]:
import json


with open('train-v2.0.json', 'r') as f:
    train = json.load(f)

train_dataset = [item for topic in train['data'] for item in topic['paragraphs'] ]


In [0]:
train

In [0]:
type(train_dataset)


list

In [0]:
model.train_model(train_dataset)

In [0]:
with open('dev-v2.0.json', 'r') as f:
    dev_data = json.load(f)

dev_data = [item for topic in dev_data['data'] for item in topic['paragraphs'] ]

preds = model.predict(dev_data)

os.makedirs('results', exist_ok=True)

submission = {pred['id']: pred['answer'] for pred in preds}

with open('results/submission.json', 'w') as f:
    json.dump(submission, f)

  0%|          | 13/11873 [00:00<01:37, 122.02it/s]

Converting to features started.


100%|██████████| 11873/11873 [01:22<00:00, 143.82it/s]


HBox(children=(IntProgress(value=0, max=1540), HTML(value='')))




In [0]:
test_df = pd.DataFrame.from_dict(dev_data)
preds_df = pd.DataFrame.from_dict(preds)

In [0]:
test_df


Unnamed: 0,qas,context
0,[{'question': 'In what country is Normandy loc...,The Normans (Norman: Nourmands; French: Norman...
1,[{'question': 'Who was the duke in the battle ...,"The Norman dynasty had a major political, cult..."
2,[{'question': 'What is the original meaning of...,"The English name ""Normans"" comes from the Fren..."
3,[{'question': 'When was the Duchy of Normandy ...,"In the course of the 10th century, the initial..."
4,[{'question': 'Who upon arriving gave the orig...,"Before Rollo's arrival, its populations did no..."
...,...,...
1199,[{'question': 'Where does centripetal force go...,"where is the mass of the object, is the velo..."
1200,[{'question': 'What is the only form potential...,A conservative force that acts on a closed sys...
1201,[{'question': 'What is sometimes impossible to...,"For certain physical scenarios, it is impossib..."
1202,[{'question': 'In what treatment are nonconser...,The connection between macroscopic nonconserva...


In [0]:
preds_df

Unnamed: 0,id,answer
0,56ddde6b9a695914005b9628,Francia. The distinct cultural and ethnic iden...
1,56ddde6b9a695914005b9629,Francia. The distinct cultural and ethnic iden...
2,56ddde6b9a695914005b962a,Francia. The distinct cultural and ethnic iden...
3,56ddde6b9a695914005b962b,Francia. The distinct cultural and ethnic iden...
4,56ddde6b9a695914005b962c,Francia. The distinct cultural and ethnic iden...
...,...,...
11868,5737aafd1c456719005744ff,"arcane units of force include the sthène, whic..."
11869,5ad28ad0d7d075001a4299cc,recated; however it still sees use for some pu...
11870,5ad28ad0d7d075001a4299cd,"arcane units of force include the sthène, whic..."
11871,5ad28ad0d7d075001a4299ce,recated; however it still sees use for some pu...


In [0]:
test_df['qas'].iloc[0]

[{'answers': [{'answer_start': 0, 'text': ' '}],
  'id': '56ddde6b9a695914005b9628',
  'is_impossible': False,
  'question': 'In what country is Normandy located?'},
 {'answers': [{'answer_start': 0, 'text': ' '}],
  'id': '56ddde6b9a695914005b9629',
  'is_impossible': False,
  'question': 'When were the Normans in Normandy?'},
 {'answers': [{'answer_start': 0, 'text': ' '}],
  'id': '56ddde6b9a695914005b962a',
  'is_impossible': False,
  'question': 'From which countries did the Norse originate?'},
 {'answers': [{'answer_start': 0, 'text': ' '}],
  'id': '56ddde6b9a695914005b962b',
  'is_impossible': False,
  'question': 'Who was the Norse leader?'},
 {'answers': [{'answer_start': 0, 'text': ' '}],
  'id': '56ddde6b9a695914005b962c',
  'is_impossible': False,
  'question': 'What century did the Normans first gain their separate identity?'},
 {'answers': [{'answer_start': 0, 'text': ' '}],
  'id': '5ad39d53604f3c001a3fe8d1',
  'is_impossible': True,
  'plausible_answers': [{'answer_sta

In [0]:
test_df['context'].iloc[0]

'The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse ("Norman" comes from "Norseman") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants would gradually merge with the Carolingian-based cultures of West Francia. The distinct cultural and ethnic identity of the Normans emerged initially in the first half of the 10th century, and it continued to evolve over the succeeding centuries.'

In [0]:
preds_df['answer'].iloc[0]

'Francia. The distinct cultural and ethnic identity of the Normans emerged initially in the first half of'