### USEFUL LINKS
1. [Huggingface Documentation for Fine-tuning with Custom Datasets](https://huggingface.co/transformers/custom_datasets.html?fbclid=IwAR0HHEEUFfsT9wUkTb-l_nYTxfH2Twq0j99NDfw0WdhEAkgq7NFx_U7eTbQ)
2. [Going through SQuAD v2.0 Dataset](https://towardsdatascience.com/how-to-fine-tune-a-q-a-transformer-86f91ec92997) 

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Download Dataset

In [None]:
import os

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [None]:
os.mkdir('/content/drive/MyDrive/Dataset/squad')

- (SQuAD Repo](https://rajpurkar.github.io/SQuAD-explorer/)
- Our [Trainning Dataset](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json) and [Testing Dataset](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json)

In [None]:
url = 'https://rajpurkar.github.io/SQuAD-explorer/dataset/'

In [None]:
# for making requests to the url
import requests

Downloading dataset and save them into drive directory

In [None]:
for file in ['train-v2.0.json', 'dev-v2.0.json']:
  res = requests.get(f'{url}{file}') # making requests
  
  # saving requests into drive using open
  with open(f'/content/drive/MyDrive/Dataset/squad/{file}', 'wb') as f:
    for chunk in res.iter_content(chunk_size=4):
      f.write(chunk)

## Data Preparation

In [None]:
import json

In [None]:
# openng our dataset
with open('/content/drive/MyDrive/Dataset/squad/train-v2.0.json', 'rb') as f: # reading binary
    squad_dict = json.load(f) # load it into a dictionary
squad_dict.keys()


dict_keys(['version', 'data'])

In [None]:
# extracting title
for group in squad_dict['data']:
  print(group['title'])

Beyoncé
Frédéric_Chopin
Sino-Tibetan_relations_during_the_Ming_dynasty
IPod
The_Legend_of_Zelda:_Twilight_Princess
Spectre_(2015_film)
2008_Sichuan_earthquake
New_York_City
To_Kill_a_Mockingbird
Solar_energy
Kanye_West
Buddhism
American_Idol
Dog
2008_Summer_Olympics_torch_relay
Genome
Comprehensive_school
Republic_of_the_Congo
Prime_minister
Institute_of_technology
Wayback_Machine
Dutch_Republic
Symbiosis
Canadian_Armed_Forces
Cardinal_(Catholicism)
Iranian_languages
Lighting
Separation_of_powers_under_the_United_States_Constitution
Architecture
Human_Development_Index
Southern_Europe
BBC_Television
Arnold_Schwarzenegger
Plymouth
Heresy
Warsaw_Pact
Materialism
Christian
Sony_Music_Entertainment
Oklahoma_City
Hunter-gatherer
United_Nations_Population_Fund
Russian_Soviet_Federative_Socialist_Republic
Alexander_Graham_Bell
Pub
Internet_service_provider
Comics
Saint_Helena
Aspirated_consonant
Hydrogen
Space_Race
Web_browser
BeiDou_Navigation_Satellite_System
Canon_law
Communications_in_Som

In [None]:
# next let's get into paragraphs:
squad_dict['data'][1]['paragraphs']

[{'context': 'Frédéric François Chopin (/ˈʃoʊpæn/; French pronunciation: \u200b[fʁe.de.ʁik fʁɑ̃.swa ʃɔ.pɛ̃]; 22 February or 1 March 1810 – 17 October 1849), born Fryderyk Franciszek Chopin,[n 1] was a Polish and French (by citizenship and birth of father) composer and a virtuoso pianist of the Romantic era, who wrote primarily for the solo piano. He gained and has maintained renown worldwide as one of the leading musicians of his era, whose "poetic genius was based on a professional technique that was without equal in his generation." Chopin was born in what was then the Duchy of Warsaw, and grew up in Warsaw, which after 1815 became part of Congress Poland. A child prodigy, he completed his musical education and composed his earlier works in Warsaw before leaving Poland at the age of 20, less than a month before the outbreak of the November 1830 Uprising.',
  'qas': [{'answers': [{'answer_start': 182, 'text': 'Polish and French'}],
    'id': '56cbd2356d243a140015ed66',
    'is_impossi

In [None]:
# next we want to access context into each paragraph:
squad_dict['data'][1]['paragraphs'][0]['context']

#squad_dict['data'][1]['paragraphs'][2]['context']

'Frédéric François Chopin (/ˈʃoʊpæn/; French pronunciation: \u200b[fʁe.de.ʁik fʁɑ̃.swa ʃɔ.pɛ̃]; 22 February or 1 March 1810 – 17 October 1849), born Fryderyk Franciszek Chopin,[n 1] was a Polish and French (by citizenship and birth of father) composer and a virtuoso pianist of the Romantic era, who wrote primarily for the solo piano. He gained and has maintained renown worldwide as one of the leading musicians of his era, whose "poetic genius was based on a professional technique that was without equal in his generation." Chopin was born in what was then the Duchy of Warsaw, and grew up in Warsaw, which after 1815 became part of Congress Poland. A child prodigy, he completed his musical education and composed his earlier works in Warsaw before leaving Poland at the age of 20, less than a month before the outbreak of the November 1830 Uprising.'

In [None]:
for passage in group['paragraphs']:
  print(passage['context'])


Before the 20th century, the term matter included ordinary matter composed of atoms and excluded other energy phenomena such as light or sound. This concept of matter may be generalized from atoms to include any objects having mass even when at rest, but this is ill-defined because an object's mass can arise from its (possibly massless) constituents' motion and interaction energies. Thus, matter does not have a universal definition, nor is it a fundamental concept in physics today. Matter is also used loosely as a general term for the substance that makes up all observable physical objects.
All the objects from everyday life that we can bump into, touch or squeeze are composed of atoms. This atomic matter is in turn made up of interacting subatomic particles—usually a nucleus of protons and neutrons, and a cloud of orbiting electrons. Typically, science considers these composite particles matter because they have both rest mass and volume. By contrast, massless particles, such as photo

In [None]:
# for each context we have few different questions and answers
for qa in passage['qas']:
  print(qa)
  print(qa['question'])

{'plausible_answers': [{'text': 'matter', 'answer_start': 485}], 'question': 'Physics has broadly agreed on the definition of what?', 'id': '5a7e070b70df9f001a875439', 'answers': [], 'is_impossible': True}
Physics has broadly agreed on the definition of what?
{'plausible_answers': [{'text': 'Alfvén', 'answer_start': 327}], 'question': 'Who coined the term partonic matter?', 'id': '5a7e070b70df9f001a87543a', 'answers': [], 'is_impossible': True}
Who coined the term partonic matter?
{'plausible_answers': [{'text': 'Gk. common matter', 'answer_start': 350}], 'question': 'What is another name for anti-matter?', 'id': '5a7e070b70df9f001a87543b', 'answers': [], 'is_impossible': True}
What is another name for anti-matter?
{'plausible_answers': [{'text': 'a specifying modifier', 'answer_start': 529}], 'question': 'Matter usually does not need to be used in conjunction with what?', 'id': '5a7e070b70df9f001a87543c', 'answers': [], 'is_impossible': True}
Matter usually does not need to be used in

In [None]:
# after extracting questions we need to extract answers
# answer comes as an list
if 'plausible_answers' in qa.keys():
  access = 'plausible_answers'
else:
  access = 'answers'
for answer in qa[access]:
  print(answer)


{'text': 'physics', 'answer_start': 37}


In [None]:
# so we will do all the thigs into a loop
contexts = []
questions = []
answers = []
for group in squad_dict['data']: # squad_dict has 2 keys, version and data
  for passage in group['paragraphs']: # accessing paragraphs of data
    context = passage['context'] 
    for qa in passage['qas']:
      question = qa['question']
      if 'plausible_answers' in qa.keys():
        access = 'plausible_answers'
      else:
        access = 'answers'
      for answer in qa[access]:
        contexts.append(context) # appending context, question and answer into our lists
        questions.append(question)
        answers.append(answer)

In [None]:
contexts[:5]

['Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".',
 'Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead s

In [None]:
questions[:5]

['When did Beyonce start becoming popular?',
 'What areas did Beyonce compete in when she was growing up?',
 "When did Beyonce leave Destiny's Child and become a solo singer?",
 'In what city and state did Beyonce  grow up? ',
 'In which decade did Beyonce become famous?']

In [None]:
answers[:5]

[{'answer_start': 269, 'text': 'in the late 1990s'},
 {'answer_start': 207, 'text': 'singing and dancing'},
 {'answer_start': 526, 'text': '2003'},
 {'answer_start': 166, 'text': 'Houston, Texas'},
 {'answer_start': 276, 'text': 'late 1990s'}]

In [None]:
# Now just to see a random data from our dictionary
squad_dict['data'][1]

{'paragraphs': [{'context': 'Frédéric François Chopin (/ˈʃoʊpæn/; French pronunciation: \u200b[fʁe.de.ʁik fʁɑ̃.swa ʃɔ.pɛ̃]; 22 February or 1 March 1810 – 17 October 1849), born Fryderyk Franciszek Chopin,[n 1] was a Polish and French (by citizenship and birth of father) composer and a virtuoso pianist of the Romantic era, who wrote primarily for the solo piano. He gained and has maintained renown worldwide as one of the leading musicians of his era, whose "poetic genius was based on a professional technique that was without equal in his generation." Chopin was born in what was then the Duchy of Warsaw, and grew up in Warsaw, which after 1815 became part of Congress Poland. A child prodigy, he completed his musical education and composed his earlier works in Warsaw before leaving Poland at the age of 20, less than a month before the outbreak of the November 1830 Uprising.',
   'qas': [{'answers': [{'answer_start': 182, 'text': 'Polish and French'}],
     'id': '56cbd2356d243a140015ed66'

In [None]:
# the previous loop is needed to be done for both the datasets
def read_squad(path):
  with open(path, 'rb') as f: # reading binary
    squad_dict = json.load(f)
  
  contexts = []
  questions = []
  answers = []
  for group in squad_dict['data']: # squad_dict has 2 keys, version and data
    for passage in group['paragraphs']: # accessing paragraphs of data
      context = passage['context'] 
      for qa in passage['qas']:
        question = qa['question']
        if 'plausible_answers' in qa.keys():
          access = 'plausible_answers'
        else:
          access = 'answers'
        for answer in qa[access]:
          contexts.append(context) # appending context, question and answer into our lists
          questions.append(question)
          answers.append(answer)
  return contexts, questions, answers

In [None]:
train_contexts, train_questions, train_answers = read_squad('/content/drive/MyDrive/Dataset/squad/train-v2.0.json') # train dataset
val_contexts, val_questions, val_answers = read_squad('/content/drive/MyDrive/Dataset/squad/dev-v2.0.json') # validation dataset

In [None]:
train_answers[0]

{'answer_start': 269, 'text': 'in the late 1990s'}

We already have answer start, we need answer end

In [None]:
def add_end_index(answers, contexts):
  for answer, context in zip(answers, contexts):
    gold_text = answer['text'] # answer we are looking for
    start_idx = answer['answer_start']
    end_idx = start_idx + len(gold_text)

    # if both the text are equal then we have our actual end index
    if context[start_idx : end_idx] == gold_text:
      answer['answer_end'] = end_idx
    else:
      # position is off by 1 or 2 character
      for n in [1, 2]:
        if context[start_idx-n : end_idx-n] == gold_text:
          answer['answer_start'] = start_idx - n
          answer['answer_end'] = end_idx - n

Now we can apply it to our train and validation dataset

In [None]:
add_end_index(train_answers, train_contexts)
add_end_index(val_answers, val_contexts)

Viewing our changes

In [None]:
train_answers[:5]

[{'answer_end': 286, 'answer_start': 269, 'text': 'in the late 1990s'},
 {'answer_end': 226, 'answer_start': 207, 'text': 'singing and dancing'},
 {'answer_end': 530, 'answer_start': 526, 'text': '2003'},
 {'answer_end': 180, 'answer_start': 166, 'text': 'Houston, Texas'},
 {'answer_end': 286, 'answer_start': 276, 'text': 'late 1990s'}]

---
## Tokenize/Encoding

In [None]:
!pip install transformers --quiet
from transformers import DistilBertTokenizerFast # smaller version of BERT, quicker, faster

# initialize tokenizer
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')

[K     |████████████████████████████████| 2.1MB 19.6MB/s 
[K     |████████████████████████████████| 870kB 47.5MB/s 
[K     |████████████████████████████████| 3.3MB 48.4MB/s 
[?25h  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=466062.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=28.0, style=ProgressStyle(description_w…




In [None]:
# tokenize, truncation means 
train_encodings = tokenizer(train_contexts, train_questions, truncation=True, padding=True)
val_encodings = tokenizer(val_contexts, val_questions, truncation=True, padding=True)

In [None]:
train_encodings.keys()

dict_keys(['input_ids', 'attention_mask'])

In [None]:
len(train_encodings['input_ids'])

130319

In [None]:
# decoding an input id
# BERT is expecting 512 tokens for every example, if it doesn't get, it will fulfilled all remainning token with [PAD]->padding
tokenizer.decode(train_encodings['input_ids'][0])

'[CLS] beyonce giselle knowles - carter ( / biːˈjɒnseɪ / bee - yon - say ) ( born september 4, 1981 ) is an american singer, songwriter, record producer and actress. born and raised in houston, texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of r & b girl - group destiny\'s child. managed by her father, mathew knowles, the group became one of the world\'s best - selling girl groups of all time. their hiatus saw the release of beyonce\'s debut album, dangerously in love ( 2003 ), which established her as a solo artist worldwide, earned five grammy awards and featured the billboard hot 100 number - one singles " crazy in love " and " baby boy ". [SEP] when did beyonce start becoming popular? [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] 

The tokenizer is great, but it doesn’t produce our answer start-end token positions. Only produced: `'input_ids' & 'attention_mask'` For that, we define a custom add_token_positions function. 

In [None]:
train_answers[0]

{'answer_end': 286, 'answer_start': 269, 'text': 'in the late 1990s'}

In [None]:
train_answers[0]['answer_start']

269

In [None]:
train_encodings.char_to_token(0, train_answers[0]['answer_start'])

67

In [None]:
train_encodings.char_to_token(0, train_answers[0]['answer_end']) # space returned, this is a limitations of this model
# so we need to add some extra logics to handle this

In [None]:
def add_token_postions(encodings, answers):
  # as we are collecting all the positions we need a list
  start_positions = []
  end_positions = []

  for i in range(len(answers)):
    start_positions.append(encodings.char_to_token(i, answers[i]['answer_start']))
    end_positions.append(encodings.char_to_token(i, answers[i]['answer_end'])) 

    # Handling the space problem 
    go_back = 1
    while end_positions[-1] is None:
      end_positions[-1] = encodings.char_to_token(i, answers[i]['answer_end']-go_back)
      go_back +=1
    # this problem can appair into starting positions also, if both questions and answers don't fit within token length
    if start_positions[-1] is None:
      start_positions[-1] = tokenizer.model_max_length 
  # added the positions into our dictionary
  encodings.update({
      'start_positions' : start_positions,
      'end_positions' : end_positions
  })

In [None]:
# adding this token positions into our training and testing dataset
add_token_postions(train_encodings, train_answers)
add_token_postions(val_encodings, val_answers)

In [None]:
# now let's look at the keys again
train_encodings.keys()

dict_keys(['input_ids', 'attention_mask', 'start_positions', 'end_positions'])

In [None]:
train_encodings['start_positions'][:10]

[67, 55, 128, 47, 69, 81, 124, 91, 69, 72]

In [None]:
train_encodings['end_positions'][:10]

[70, 57, 129, 50, 70, 85, 126, 93, 70, 73]

We’ve now prepared our data, and we have everything we need — we just need to transform it into the correct format for training with PyTorch.
For this, we need to build a dataset object: `SquadDataset`

In [None]:
import torch
class SquadDataset(torch.utils.data.Dataset):
    def __init__(self, encodings):
        self.encodings = encodings

    def __getitem__(self, idx):
        return {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}

    def __len__(self):
        return len(self.encodings.input_ids)

In [None]:
# applying this to our datasets
train_dataset = SquadDataset(train_encodings)
val_dataset = SquadDataset(val_encodings)

---
## Fine-tune

Our data is now wholly ready for use by our model. All we do now is set up our PyTorch environment, initialize the DataLoader which we will be using to load data during training 

In [None]:
from transformers import DistilBertForQuestionAnswering
model = DistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=442.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=267967963.0, style=ProgressStyle(descri…




Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForQuestionAnswering: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this mode

In [None]:
from torch.utils.data import DataLoader
from transformers import AdamW
from tqdm import tqdm # progressbar

In [None]:
# defining device type, using gpu if available, otherwise use cpu
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
if torch.cuda.is_available():
  device = torch.device("cuda:0")
  print("GPU Allocated")
else:
  torch.device('cpu')
  print("CPU Allocated")

model.to(device)
model.train()

# initializing optimizer, learning rate
optim = AdamW(model.parameters(), lr=5e-5)

GPU Allocated


In [None]:
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)

In [None]:
train_encodings.keys()

dict_keys(['input_ids', 'attention_mask', 'start_positions', 'end_positions'])

In [None]:
# 3 epoch
for epoch in range(3):
    loop = tqdm(train_loader)
    for batch in loop: # for each batch within loop
        optim.zero_grad()
        
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        start_positions = batch['start_positions'].to(device)
        end_positions = batch['end_positions'].to(device)
        
        # feeding them into our model for train
        outputs = model(input_ids, attention_mask=attention_mask, start_positions=start_positions, end_positions=end_positions)
        
        # from our training batch, we extracting the loss by calculating the loss for every parameter
        loss = outputs[0]
        
        # updating the gradient
        loss.backward()
        optim.step() 

        # for visualizing
        loop.set_description(f'Epoch {epoch}')
        loop.set_postfix(loss=loss.item())
    
    # saving Epoch
    

#model.eval()

Epoch 0: 100%|██████████| 8145/8145 [1:46:28<00:00,  1.28it/s, loss=1.02]
Epoch 1: 100%|██████████| 8145/8145 [1:46:32<00:00,  1.27it/s, loss=0.808]
Epoch 2: 100%|██████████| 8145/8145 [1:46:31<00:00,  1.27it/s, loss=0.632]


In [None]:
model_path = '/content/drive/MyDrive/Dataset/squad/DistilBert Trained Model'
model.save_pretrained(model_path) # saving the model
tokenizer.save_pretrained(model_path) # saving tokenizer

('/content/drive/MyDrive/Dataset/squad/DistilBert Trained Model/tokenizer_config.json',
 '/content/drive/MyDrive/Dataset/squad/DistilBert Trained Model/special_tokens_map.json',
 '/content/drive/MyDrive/Dataset/squad/DistilBert Trained Model/vocab.txt',
 '/content/drive/MyDrive/Dataset/squad/DistilBert Trained Model/added_tokens.json')

In [None]:
model.eval() # switching to trainning mode

DistilBertForQuestionAnswering(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0): TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
            

## Model Validation

In [None]:
val_loader = DataLoader(val_dataset, batch_size=16)

acc = [] # accuracy

loop = tqdm(val_loader)
for batch in loop: # for each batch within loop
    with torch.no_grad(): 
      input_ids = batch['input_ids'].to(device)
      attention_mask = batch['attention_mask'].to(device)
      start_true = batch['start_positions'].to(device) # true values
      end_true = batch['end_positions'].to(device) # true values
      
      # feeding them into our model for validation
      outputs = model(input_ids, attention_mask=attention_mask)
      start_pred = torch.argmax(outputs['start_logits'], dim=1) # predicted values
      end_pred = torch.argmax(outputs['end_logits'], dim=1) # predicted values

      # checking true values are equal with predicted values
      # calcualating batch accuracy and appending it within list
      acc.append(((start_pred == start_true).sum()/len(start_pred)).item())
      acc.append(((end_pred == end_true).sum()/len(end_pred)).item())
      

100%|██████████| 1640/1640 [07:02<00:00,  3.88it/s]


In [None]:
outputs

QuestionAnsweringModelOutput([('start_logits',
                               tensor([[-10.7700,  -1.8402,  -0.1913,  ..., -13.7909, -13.7782, -13.8991],
                                       [-10.7700,  -1.8402,  -0.1913,  ..., -13.7909, -13.7782, -13.8991],
                                       [-10.7700,  -1.8402,  -0.1913,  ..., -13.7909, -13.7782, -13.8991],
                                       ...,
                                       [ -9.9024,   2.5822,   3.8301,  ..., -13.6726, -13.6707, -13.7631],
                                       [ -8.7607,   2.2173,   2.8455,  ..., -13.7040, -13.7118, -13.7065],
                                       [-10.6318,   0.0849,   1.2742,  ..., -13.8317, -13.8203, -13.8625]],
                                      device='cuda:0')),
                              ('end_logits',
                               tensor([[ -9.1727,  -5.2830,  -3.2011,  ..., -12.5255, -12.5294, -12.3281],
                                       [ -9.1727,  -5.283

In [None]:
# this is just for one batch, we need to do it for every single batch
outputs['start_logits'] # highest value of start_logits represents our start token's highest probability
torch.argmax(outputs['start_logits'], dim=1) # returns higest value's index, dim returns that batch of highest value's index


tensor([66, 66, 66, 66,  2, 17, 49, 99], device='cuda:0')

### Checking Accuracy

In [None]:
# checking match
(start_pred == start_true)

tensor([False, False, False, False,  True, False, False, False],
       device='cuda:0')

In [None]:
# calculating accuracy
(start_pred == start_true).sum()/len(start_pred) # it will give us accuracy within tensor

tensor(0.1250, device='cuda:0')

In [None]:
((start_pred == start_true).sum()/len(start_pred)).item() # taking out values from tensor

# very poor accuracy on final batch

0.125

In [None]:
# printing accuracy list
acc

[0.9375,
 1.0,
 0.75,
 0.6875,
 0.6875,
 0.9375,
 0.8125,
 0.5,
 0.625,
 0.75,
 0.5,
 0.625,
 0.875,
 0.9375,
 0.75,
 0.8125,
 0.625,
 0.8125,
 0.4375,
 0.75,
 0.8125,
 0.9375,
 0.6875,
 0.75,
 0.5625,
 0.75,
 0.375,
 0.5,
 0.8125,
 0.9375,
 0.9375,
 0.9375,
 0.8125,
 0.8125,
 0.875,
 0.6875,
 0.625,
 0.6875,
 0.6875,
 0.5625,
 0.625,
 0.6875,
 0.625,
 0.5,
 0.5,
 0.875,
 0.375,
 0.5,
 0.75,
 0.9375,
 0.5,
 0.75,
 0.4375,
 0.625,
 0.375,
 0.5,
 0.1875,
 0.5,
 0.3125,
 0.5,
 0.375,
 0.5625,
 0.625,
 0.6875,
 0.4375,
 0.5625,
 0.375,
 0.8125,
 0.375,
 0.5,
 0.375,
 0.5,
 0.6875,
 0.6875,
 0.5,
 0.1875,
 0.5,
 0.875,
 0.4375,
 0.5625,
 0.625,
 0.8125,
 0.375,
 0.875,
 0.5,
 0.8125,
 0.5,
 0.3125,
 0.75,
 0.4375,
 0.25,
 0.4375,
 0.3125,
 0.5,
 0.4375,
 0.5625,
 0.4375,
 0.5,
 0.5625,
 0.875,
 0.625,
 0.5625,
 0.5625,
 0.375,
 0.625,
 0.8125,
 0.6875,
 0.6875,
 0.9375,
 0.5,
 0.5,
 0.8125,
 0.6875,
 0.75,
 0.5625,
 0.3125,
 0.5,
 0.6875,
 0.4375,
 0.25,
 0.5625,
 0.75,
 0.4375,
 0.8125,
 0

In [None]:
sum(acc)/len(acc) # printing overall acc

0.6387576219512195

So we got 63% exact match accuracy. What does mean by that?

In [None]:
# for last batch
start_true

tensor([158, 158, 158, 158,   2,  18,  50, 100], device='cuda:0')

In [None]:
start_pred

tensor([66, 66, 66, 66,  2, 17, 49, 99], device='cuda:0')

In last batch, our acc was 12.5%, but still the model can predic values closely, which is not perfect but still the model is pretty well. It gives 63% exact acc.