# **Question Answering❓**
with fine-tuned BERT on newsQA.  

Question answering comes in many forms. We’ll look at the particular type of extractive QA that involves answering a question about a passage by highlighting the segment of the passage that answers the question. This involves fine-tuning a model which predicts a start position and an end position in the passage. More specifically, we will fine tune the [bert-base-uncased](https://huggingface.co/bert-base-uncased) model on the [NewsQA](https://huggingface.co/datasets/lucadiliello/newsqa) dataset.

I have followed [this tutorial](https://github.com/angelosps/Question-Answering) from the for how to fine tune BERT on SQuAD 2.0 which in our case is a custom newsQA dataset

In [1]:
!pip install transformers



In [2]:
import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, AdamW
from torch.utils.data import DataLoader, Dataset
import json
from tqdm import tqdm



In [3]:
!pip install datasets



In [4]:
# Load the NewsQA dataset
from datasets import load_dataset
newsqa_dataset = load_dataset('lucadiliello/newsqa')

Downloading and preparing dataset parquet/lucadiliello--newsqa to /root/.cache/huggingface/datasets/parquet/lucadiliello--newsqa-206550e86bcc3ded/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901...


Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/29.7M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.63M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/parquet/lucadiliello--newsqa-206550e86bcc3ded/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

### **Get data 📁**

Let's extract our data and store them into some data structures.

In [5]:
def read_newsqa_data(dataset):
    contexts = []
    questions = []
    answers = []
    string_ans = []

    for item in dataset:
        context = item['context']
        question = item['question']
        answer = {'answer_start': item['labels'][0]['start'][0], 'answer_end': item['labels'][0]['end'][0]}  # Assuming there's only one answer
        string_answer = item['answers'][0]
        
        contexts.append(context)
        questions.append(question)
        answers.append(answer)
        string_ans.append(string_answer)
    return contexts, questions, answers, string_ans

In [6]:
train_contexts, train_questions, train_answers, train_str_ans = read_newsqa_data(newsqa_dataset['train'].select(list(range(5000))))
valid_contexts, valid_questions, valid_answers, valid_str_ans = read_newsqa_data(newsqa_dataset['validation'].select(list(range(1000))))

In [7]:
train_str_ans[:5]

['19',
 'February.',
 'rape and murder',
 'Moninder Singh Pandher',
 'Moninder Singh Pandher']

### **Tokenization 🔢**

In [8]:
# Initialize the RoBERTa tokenizer
tokenizer = AutoTokenizer.from_pretrained('deepset/roberta-base-squad2')
train_encodings = tokenizer(train_contexts, train_questions, truncation=True, padding=True)
valid_encodings = tokenizer(valid_contexts, valid_questions, truncation=True, padding=True)

Downloading (…)okenizer_config.json:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

Next we need to convert our character start/end positions to token start/end positions. Why is that? Because our words converted into tokens, so the answer start/end needs to show the index of start/end token which contains the answer and not the specific characters in the context.

In [9]:
# Convert character start/end positions to token start/end positions
def add_token_positions(encodings, answers):
    start_positions = []
    end_positions = []
    for i in range(len(answers)):
        char_start = answers[i]['answer_start']
        char_end = answers[i]['answer_end']

        token_start = encodings.char_to_token(i, char_start)
        token_end = encodings.char_to_token(i, char_end)

        start_positions.append(token_start)
        end_positions.append(token_end)

        if token_start is None:
            start_positions[-1] = tokenizer.model_max_length
        if token_end is None:
            end_positions[-1] = tokenizer.model_max_length

    encodings.update({'start_positions': start_positions, 'end_positions': end_positions})

In [10]:
add_token_positions(train_encodings, train_answers)

In [11]:
add_token_positions(valid_encodings, valid_answers)

In [12]:
class NewsQA_Dataset(Dataset):
    def __init__(self, encodings):
        self.encodings = encodings

    def __getitem__(self, idx):
        return {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}

    def __len__(self):
        return len(self.encodings.input_ids)

## Creating the dataset using the class

In [13]:
train_dataset = NewsQA_Dataset(train_encodings)
valid_dataset = NewsQA_Dataset(valid_encodings)

In [14]:
# Create dataloaders for training and validation
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
valid_loader = DataLoader(valid_dataset, batch_size=16)

## Importing the model

In [15]:
# Initialize the RoBERTa model for question answering
model = AutoModelForQuestionAnswering.from_pretrained('deepset/roberta-base-squad2')

Downloading model.safetensors:   0%|          | 0.00/496M [00:00<?, ?B/s]

In [16]:
num_layers = model.config.num_hidden_layers
print(f"Number of layers: {num_layers}")

Number of layers: 12


In [17]:
num_layers_to_freeze = 6
for param in model.roberta.embeddings.parameters():
    param.requires_grad = False
for layer in model.roberta.encoder.layer[:num_layers_to_freeze]:
    for param in layer.parameters():
        param.requires_grad = False

In [18]:
# Check if GPU is available and move the model accordingly
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)

RobertaForQuestionAnswering(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(50265, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0-11): 12 x RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (Lay

### Model Hyperparameters

In [19]:
# Initialize the optimizer
optimizer = AdamW(model.parameters(), lr=5e-5)
# Training loop
num_epochs = 100



## Training the Model

In [20]:
model.train()
# Training loop
for epoch in range(num_epochs):
    model.train()
    total_loss = 0

    for batch in tqdm(train_loader, desc=f'Epoch {epoch + 1}', dynamic_ncols=True):
        inputs = {key: value.to(device) for key, value in batch.items()}

        # Forward pass
        outputs = model(**inputs)
        loss = outputs.loss
        total_loss += loss.item()
        # Backward pass and optimization
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

    # Calculate and print the average loss for this epoch
    avg_loss = total_loss / len(train_loader)
    print(f'Epoch {epoch + 1} - Avg Loss: {avg_loss:.4f}')

Epoch 1: 100%|██████████| 313/313 [02:54<00:00,  1.80it/s]


Epoch 1 - Avg Loss: 2.3683


Epoch 2: 100%|██████████| 313/313 [02:53<00:00,  1.81it/s]


Epoch 2 - Avg Loss: 1.6652


Epoch 3: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 3 - Avg Loss: 1.3087


Epoch 4: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 4 - Avg Loss: 1.0093


Epoch 5: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 5 - Avg Loss: 0.7864


Epoch 6: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 6 - Avg Loss: 0.6383


Epoch 7: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 7 - Avg Loss: 0.5240


Epoch 8: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 8 - Avg Loss: 0.4586


Epoch 9: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 9 - Avg Loss: 0.4072


Epoch 10: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 10 - Avg Loss: 0.3764


Epoch 11: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 11 - Avg Loss: 0.3328


Epoch 12: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 12 - Avg Loss: 0.3267


Epoch 13: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 13 - Avg Loss: 0.2928


Epoch 14: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 14 - Avg Loss: 0.2733


Epoch 15: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 15 - Avg Loss: 0.2682


Epoch 16: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 16 - Avg Loss: 0.2472


Epoch 17: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 17 - Avg Loss: 0.2488


Epoch 18: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 18 - Avg Loss: 0.2349


Epoch 19: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 19 - Avg Loss: 0.2119


Epoch 20: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 20 - Avg Loss: 0.2154


Epoch 21: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 21 - Avg Loss: 0.2141


Epoch 22: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 22 - Avg Loss: 0.2040


Epoch 23: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 23 - Avg Loss: 0.2064


Epoch 24: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 24 - Avg Loss: 0.1972


Epoch 25: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 25 - Avg Loss: 0.1967


Epoch 26: 100%|██████████| 313/313 [02:54<00:00,  1.80it/s]


Epoch 26 - Avg Loss: 0.2019


Epoch 27: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 27 - Avg Loss: 0.1755


Epoch 28: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 28 - Avg Loss: 0.1799


Epoch 29: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 29 - Avg Loss: 0.1979


Epoch 30: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 30 - Avg Loss: 0.1800


Epoch 31: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 31 - Avg Loss: 0.1682


Epoch 32: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 32 - Avg Loss: 0.1650


Epoch 33: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 33 - Avg Loss: 0.1477


Epoch 34: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 34 - Avg Loss: 0.1589


Epoch 35: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 35 - Avg Loss: 0.1588


Epoch 36: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 36 - Avg Loss: 0.1644


Epoch 37: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 37 - Avg Loss: 0.1656


Epoch 38: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 38 - Avg Loss: 0.1628


Epoch 39: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 39 - Avg Loss: 0.1612


Epoch 40: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 40 - Avg Loss: 0.1479


Epoch 41: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 41 - Avg Loss: 0.1464


Epoch 42: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 42 - Avg Loss: 0.1583


Epoch 43: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 43 - Avg Loss: 0.1428


Epoch 44: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 44 - Avg Loss: 0.1507


Epoch 45: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 45 - Avg Loss: 0.1443


Epoch 46: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 46 - Avg Loss: 0.1502


Epoch 47: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 47 - Avg Loss: 0.1395


Epoch 48: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 48 - Avg Loss: 0.1349


Epoch 49: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 49 - Avg Loss: 0.1431


Epoch 50: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 50 - Avg Loss: 0.1311


Epoch 51: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 51 - Avg Loss: 0.1412


Epoch 52: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 52 - Avg Loss: 0.1394


Epoch 53: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 53 - Avg Loss: 0.1348


Epoch 54: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 54 - Avg Loss: 0.1330


Epoch 55: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 55 - Avg Loss: 0.1378


Epoch 56: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 56 - Avg Loss: 0.1286


Epoch 57: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 57 - Avg Loss: 0.1211


Epoch 58: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 58 - Avg Loss: 0.1196


Epoch 59: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 59 - Avg Loss: 0.1286


Epoch 60: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 60 - Avg Loss: 0.1452


Epoch 61: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 61 - Avg Loss: 0.1323


Epoch 62: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 62 - Avg Loss: 0.1142


Epoch 63: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 63 - Avg Loss: 0.1117


Epoch 64: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 64 - Avg Loss: 0.1270


Epoch 65: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 65 - Avg Loss: 0.1257


Epoch 66: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 66 - Avg Loss: 0.1300


Epoch 67: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 67 - Avg Loss: 0.1190


Epoch 68: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 68 - Avg Loss: 0.1231


Epoch 69: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 69 - Avg Loss: 0.1189


Epoch 70: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 70 - Avg Loss: 0.1220


Epoch 71: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 71 - Avg Loss: 0.1286


Epoch 72: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 72 - Avg Loss: 0.1095


Epoch 73: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 73 - Avg Loss: 0.1231


Epoch 74: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 74 - Avg Loss: 0.1256


Epoch 75: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 75 - Avg Loss: 0.1213


Epoch 76: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 76 - Avg Loss: 0.1095


Epoch 77: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 77 - Avg Loss: 0.1279


Epoch 78: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 78 - Avg Loss: 0.1153


Epoch 79: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 79 - Avg Loss: 0.1196


Epoch 80: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 80 - Avg Loss: 0.1151


Epoch 81: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 81 - Avg Loss: 0.1055


Epoch 82: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 82 - Avg Loss: 0.1119


Epoch 83: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 83 - Avg Loss: 0.1254


Epoch 84: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 84 - Avg Loss: 0.1030


Epoch 85: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 85 - Avg Loss: 0.1110


Epoch 86: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 86 - Avg Loss: 0.1130


Epoch 87: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 87 - Avg Loss: 0.1184


Epoch 88: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 88 - Avg Loss: 0.1005


Epoch 89: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 89 - Avg Loss: 0.1134


Epoch 90: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 90 - Avg Loss: 0.1186


Epoch 91: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 91 - Avg Loss: 0.1100


Epoch 92: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 92 - Avg Loss: 0.1034


Epoch 93: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 93 - Avg Loss: 0.1128


Epoch 94: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 94 - Avg Loss: 0.1114


Epoch 95: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 95 - Avg Loss: 0.1025


Epoch 96: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 96 - Avg Loss: 0.1095


Epoch 97: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 97 - Avg Loss: 0.1183


Epoch 98: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 98 - Avg Loss: 0.1041


Epoch 99: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]


Epoch 99 - Avg Loss: 0.1100


Epoch 100: 100%|██████████| 313/313 [02:53<00:00,  1.80it/s]

Epoch 100 - Avg Loss: 0.1037





## Saving the Model

In [21]:
# Save the fine-tuned model if needed
model.save_pretrained('local_fine_tuned_roberta_on_newsqa')
tokenizer.save_pretrained('local_fine_tuned_roberta_on_newsqa')

('local_fine_tuned_roberta_on_newsqa/tokenizer_config.json',
 'local_fine_tuned_roberta_on_newsqa/special_tokens_map.json',
 'local_fine_tuned_roberta_on_newsqa/vocab.json',
 'local_fine_tuned_roberta_on_newsqa/merges.txt',
 'local_fine_tuned_roberta_on_newsqa/added_tokens.json',
 'local_fine_tuned_roberta_on_newsqa/tokenizer.json')

In [22]:
 # Initialize the tokenizer and model
fine_tuned_tokenizer = AutoTokenizer.from_pretrained('local_fine_tuned_roberta_on_newsqa')
fine_tuned_model = AutoModelForQuestionAnswering.from_pretrained('local_fine_tuned_roberta_on_newsqa')

In [23]:
fine_tuned_model.to(device)

RobertaForQuestionAnswering(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(50265, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0-11): 12 x RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (Lay

# Inference

In [24]:
# Perform inference
question = "What war was the Iwo Jima battle a part of?"
context = "One of the Marines shown in a famous World War II photograph raising the U.S. flag on Iwo Jima was posthumously awarded a certificate of U.S. citizenship on Tuesday.\n\nThe Marine Corps War Memorial in Virginia depicts Strank and five others raising a flag on Iwo Jima.\n\nSgt. Michael Strank, who was born in Czechoslovakia and came to the United States when he was 3, derived U.S. citizenship when his father was naturalized in 1935. However, U.S. Citizenship and Immigration Services recently discovered that Strank never was given citizenship papers.\n\nAt a ceremony Tuesday at the Marine Corps Memorial -- which depicts the flag-raising -- in Arlington, Virginia, a certificate of citizenship was presented to Strank\'s younger sister, Mary Pero.\n\nStrank and five other men became national icons when an Associated Press photographer captured the image of them planting an American flag on top of Mount Suribachi on February 23, 1945.\n\nStrank was killed in action on the island on March 1, 1945, less than a month before the battle between Japanese and U.S. forces there ended.\n\nJonathan Scharfen, the acting director of CIS, presented the citizenship certificate Tuesday.\n\nHe hailed Strank as a true American hero and a wonderful example of the remarkable contribution and sacrifices that immigrants have made to our great republic throughout its history."

In [25]:
# Tokenize the passage and question
inputs = tokenizer(question, context, return_tensors="pt")
inputs.to(device)

# Perform inference
with torch.no_grad():
    outputs = fine_tuned_model(**inputs)
    start_idx = torch.argmax(outputs[0])
    end_idx = torch.argmax(outputs[1]) + 1

# Get the answer text from the passage
answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(inputs["input_ids"][0][start_idx:end_idx]))

print("Question:", question)
print("Answer:", answer)

Question: What war was the Iwo Jima battle a part of?
Answer:  II


In [26]:
def get_prediction(context, question):
  inputs = tokenizer.encode_plus(question, context, return_tensors='pt').to(device)
  outputs = model(**inputs)

  answer_start = torch.argmax(outputs[0])
  answer_end = torch.argmax(outputs[1]) + 1

  answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(inputs['input_ids'][0][answer_start:answer_end]))

  return answer

def normalize_text(s):
  """Removing articles and punctuation, and standardizing whitespace are all typical text processing steps."""
  import string, re
  def remove_articles(text):
    regex = re.compile(r"\b(a|an|the)\b", re.UNICODE)
    return re.sub(regex, " ", text)
  def white_space_fix(text):
    return " ".join(text.split())
  def remove_punc(text):
    exclude = set(string.punctuation)
    return "".join(ch for ch in text if ch not in exclude)
  def lower(text):
    return text.lower()

  return white_space_fix(remove_articles(remove_punc(lower(s))))

def exact_match(prediction, truth):
    return bool(normalize_text(prediction) == normalize_text(truth))

def compute_f1(prediction, truth):
  pred_tokens = normalize_text(prediction).split()
  truth_tokens = normalize_text(truth).split()

  # if either the prediction or the truth is no-answer then f1 = 1 if they agree, 0 otherwise
  if len(pred_tokens) == 0 or len(truth_tokens) == 0:
    return int(pred_tokens == truth_tokens)

  common_tokens = set(pred_tokens) & set(truth_tokens)

  # if there are no common tokens then f1 = 0
  if len(common_tokens) == 0:
    return 0

  prec = len(common_tokens) / len(pred_tokens)
  rec = len(common_tokens) / len(truth_tokens)

  return round(2 * (prec * rec) / (prec + rec), 2)

def question_answer(context, question,answer):
  prediction = get_prediction(context,question)
  em_score = exact_match(prediction, answer)
  f1_score = compute_f1(prediction, answer)

  print(f'Question: {question}')
  print(f'Prediction: {prediction}')
  print(f'True Answer: {answer}')
  print(f'Exact match: {em_score}')
  print(f'F1 score: {f1_score}\n')
    
  return f1_score

In [27]:
f1=0
for contexts, question, answer in zip(valid_contexts[:], valid_questions[:], valid_str_ans[:]):
    f1 += question_answer(context, question, answer)
avg_f1_score=f1/1000

Question: What will be nominated?
Prediction: Sgt. Michael Strank,
True Answer: three different videos
Exact match: False
F1 score: 0

Question: What does the Harrison Ford video feature?
Prediction:  the flag-raising
True Answer: getting his chest waxed,
Exact match: False
F1 score: 0

Question: What videos will you send?
Prediction: 
True Answer: environmental
Exact match: False
F1 score: 0

Question: What is Ford getting waxed?
Prediction: 
True Answer: his chest
Exact match: False
F1 score: 0

Question: Who got his chest waxed?
Prediction: Sgt. Michael Strank,
True Answer: Harrison Ford
Exact match: False
F1 score: 0

Question: How do you send in your video?
Prediction:  planting an American flag
True Answer: Use the iReport form
Exact match: False
F1 score: 0

Question: What type of videos should you nominate?
Prediction:  famous World
True Answer: think are the best.
Exact match: False
F1 score: 0

Question: What did Steve Bruce describe Amire Zaki as?
Prediction: 
True Answer: u

In [28]:
print(f"Average F1 score={avg_f1_score}")

Average F1 score=0.004669999999999997
