# Question & Answering - BERT

Tutorial: https://towardsdatascience.com/question-answering-with-bert-xlnet-xlm-and-distilbert-using-simple-transformers-4d8785ee762a

In [1]:
import json
from multiprocessing import cpu_count
from simpletransformers.question_answering import QuestionAnsweringModel



In [None]:
%%writefile setup.sh
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir ./

Writing setup.sh


In [None]:
!sh setup.sh

## 1. Load Data

In [7]:
train_file = 'data/train-v2.0.json'
dev_file = 'data/dev-v2.0.json'

n_samples = 500

with open(train_file, 'r') as f:
    train_data = json.load(f)

with open(dev_file, 'r') as f:
    dev_data = json.load(f)

train_data = [item for topic in train_data['data'] for item in topic['paragraphs'] ][:n_samples]
dev_data = [item for topic in dev_data['data'] for item in topic['paragraphs'] ][:n_samples]

In [8]:
# Explore context and Q&A
train_data[0]

{'qas': [{'question': 'When did Beyonce start becoming popular?',
   'id': '56be85543aeaaa14008c9063',
   'answers': [{'text': 'in the late 1990s', 'answer_start': 269}],
   'is_impossible': False},
  {'question': 'What areas did Beyonce compete in when she was growing up?',
   'id': '56be85543aeaaa14008c9065',
   'answers': [{'text': 'singing and dancing', 'answer_start': 207}],
   'is_impossible': False},
  {'question': "When did Beyonce leave Destiny's Child and become a solo singer?",
   'id': '56be85543aeaaa14008c9066',
   'answers': [{'text': '2003', 'answer_start': 526}],
   'is_impossible': False},
  {'question': 'In what city and state did Beyonce  grow up? ',
   'id': '56bf6b0f3aeaaa14008c9601',
   'answers': [{'text': 'Houston, Texas', 'answer_start': 166}],
   'is_impossible': False},
  {'question': 'In which decade did Beyonce become famous?',
   'id': '56bf6b0f3aeaaa14008c9602',
   'answers': [{'text': 'late 1990s', 'answer_start': 276}],
   'is_impossible': False},
  {'q

## 2. Train Q&A

In [None]:
train_args = {
    'learning_rate': 3e-5,
    'num_train_epochs': 2,
    'max_seq_length': 384,
    'doc_stride': 128,
    'overwrite_output_dir': True,
    'reprocess_input_data': False,
    'train_batch_size': 2,
    'gradient_accumulation_steps': 8,
    'output_dir': 'outputs/',
    'cache_dir': 'cache_dir/'
}

model = QuestionAnsweringModel('bert', 'bert-base-cased', args=train_args)

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForQuestionAnswering: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-cased and a

In [None]:
model.train_model(train_data)

convert squad examples to features: 100%|██████████| 3840/3840 [00:33<00:00, 115.15it/s]
add example index and unique id: 100%|██████████| 3840/3840 [00:00<00:00, 698717.08it/s]


Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic


HBox(children=(FloatProgress(value=0.0, description='Epoch', max=2.0, style=ProgressStyle(description_width='i…

HBox(children=(FloatProgress(value=0.0, description='Running Epoch 0 of 2', max=1980.0, style=ProgressStyle(de…



Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0





HBox(children=(FloatProgress(value=0.0, description='Running Epoch 1 of 2', max=1980.0, style=ProgressStyle(de…





## 3. Evaluation

In [10]:
model = QuestionAnsweringModel('bert', 'outputs/', use_cuda=False)

In [11]:
context = """

  The Apollo program, also known as Project Apollo, was the third United States 
  human spaceflight program carried out by the National Aeronautics and Space 
  Administration (NASA), which accomplished landing the first humans on the Moon 
  from 1969 to 1972. First conceived during Dwight D. Eisenhower's administration 
  as a three-man spacecraft to follow the one-man Project Mercury which put the 
  first Americans in space, Apollo was later dedicated to President John F. Kennedy's 
  national goal of "landing a man on the Moon and returning him safely to the Earth" 
  by the end of the 1960s, which he proposed in a May 25, 1961, address to Congress. 
  Project Mercury was followed by the two-man Project Gemini (1962–66). The first 
  manned flight of Apollo was in 1968.

"""


question1 = "What year did the first manned Apollo flight occur?"
question2 = "What is the name of the president that envisioned the project?"
question3 = "Who set the goal of the project?"

In [12]:
qa = [  
      {  
          'context': context,  
          'qas': [  
              {'id': '0', 'question': question1},  
              {'id': '1', 'question': question2},
              {'id': '2', 'question': question3}  
          ]  
      }  
    ]

In [13]:
model.predict(qa, n_best_size=1)

convert squad examples to features: 100%|██████████| 3/3 [00:00<00:00, 42.65it/s]
add example index and unique id: 100%|██████████| 3/3 [00:00<00:00, 4379.71it/s]


HBox(children=(FloatProgress(value=0.0, description='Running Prediction', max=1.0, style=ProgressStyle(descrip…




([{'id': '0', 'answer': ['1968', '']},
  {'id': '1', 'answer': ['Dwight D. Eisenhower', '']},
  {'id': '2', 'answer': ['President John F. Kennedy', '']}],
 [{'id': '0', 'probability': [0.9999999710069993, 2.8993000642041257e-08]},
  {'id': '1', 'probability': [0.9999994344280717, 5.655719282884439e-07]},
  {'id': '2', 'probability': [0.9999998544126524, 1.4558734768641542e-07]}])