<a href="https://colab.research.google.com/github/dansjack/ad-470-group2/blob/agt-proj3/chatbot3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#First Steps
GPU Info

To get the same results (best ones) you should have a Tesla P100 GPU. 

In [19]:
!nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.



Then, we will install the interface to use the pre-trained BERT model.

In [20]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.19.2-py3-none-any.whl (4.2 MB)
[K     |████████████████████████████████| 4.2 MB 24.2 MB/s 
[?25hCollecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 56.7 MB/s 
[?25hCollecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 52.7 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.6.0-py3-none-any.whl (84 kB)
[K     |████████████████████████████████| 84 kB 3.4 MB/s 
Installing collected packages: pyyaml, tokenizers, huggingface-hub, transformers
  Attempting uninstall: pyyaml
    Found existing installation: PyYAML 3.13
    Unin

Import necessary libraries

Now, it's time to import the necessary libraries

In [21]:
import torch
import json
import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
import re
import string
import collections
from transformers import BertTokenizerFast, BertForQuestionAnswering
from transformers.tokenization_utils_base import BatchEncoding
from torch.utils.data import Dataset, DataLoader
from tqdm import tqdm

pd.set_option('max_colwidth', 500)
%matplotlib inline

Enable CUDA

Enable CUDA for GPU utilization by our model. This makes calculations and thus the training of our models faster.

In [22]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
device

device(type='cpu')

Set random seed

Set a constant random seed in order to get the same (deterministic) outputs every time we run our models.

In [23]:
seed = 17064
def reset_seed():
  random.seed(seed)
  np.random.seed(seed)
  torch.manual_seed(seed)
  torch.cuda.manual_seed_all(seed)
reset_seed()

BERT model definition

And here, we define the name of the pre-trained bert model from Hugging Face which we'll use.

In [24]:
BERT_MODEL_NAME = 'bert-base-uncased'

Dataset loading
Now, let's load our datasets:

Define json preprocessing function

First, we will define a function that takes a squad json dataset file path as argument and returns a dataframe with the questions, their answers in the form (start_position, end_position, answer_text) and the corpus that contains the answer to the corresponding question.

In [25]:
def squad_load_from_json(json_file_path: str):
  with open(json_file_path, "r") as f:
    json_data = json.load(f)['data']
    questions = []
    answers = []
    corpuses = []
    for category in json_data:
      for paragraph in category['paragraphs']:
        context = paragraph['context']
        for qa in paragraph['qas']:
          corpuses.append(context)
          question = qa['question']
          questions.append(question)
          if qa['is_impossible']:
            ans_list = qa['plausible_answers']
          else:
            ans_list = qa['answers']
          ans_set = set()
          if len(ans_list) == 0:
            print("Question ", question, " has no answers")
          for idx, ans in enumerate(ans_list):
            ans_set.add((ans['answer_start'], ans['answer_start']+len(ans['text']), ans['text']))
          answers.append(list(ans_set))
    return pd.DataFrame(data={'question':questions, 'answer':answers, 'corpus':corpuses})

Download the datasets

Then we will download the train and validation datasets from the squad website and then convert them to pandas dataframes with the script above.

In [26]:
!wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json
!wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json

--2022-05-24 21:11:58--  https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json
Resolving rajpurkar.github.io (rajpurkar.github.io)... 185.199.108.153, 185.199.109.153, 185.199.110.153, ...
Connecting to rajpurkar.github.io (rajpurkar.github.io)|185.199.108.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 42123633 (40M) [application/json]
Saving to: ‘train-v2.0.json.1’


2022-05-24 21:11:59 (173 MB/s) - ‘train-v2.0.json.1’ saved [42123633/42123633]

--2022-05-24 21:11:59--  https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json
Resolving rajpurkar.github.io (rajpurkar.github.io)... 185.199.108.153, 185.199.109.153, 185.199.110.153, ...
Connecting to rajpurkar.github.io (rajpurkar.github.io)|185.199.108.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4370528 (4.2M) [application/json]
Saving to: ‘dev-v2.0.json.1’


2022-05-24 21:11:59 (185 MB/s) - ‘dev-v2.0.json.1’ saved [4370528/4370528]



Train Dataset

In [27]:
train_dataset = squad_load_from_json("train-v2.0.json")
train_dataset

Unnamed: 0,question,answer,corpus
0,When did Beyonce start becoming popular?,"[(269, 286, in the late 1990s)]","Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debu..."
1,What areas did Beyonce compete in when she was growing up?,"[(207, 226, singing and dancing)]","Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debu..."
2,When did Beyonce leave Destiny's Child and become a solo singer?,"[(526, 530, 2003)]","Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debu..."
3,In what city and state did Beyonce grow up?,"[(166, 180, Houston, Texas)]","Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debu..."
4,In which decade did Beyonce become famous?,"[(276, 286, late 1990s)]","Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debu..."
...,...,...,...
130314,Physics has broadly agreed on the definition of what?,"[(485, 491, matter)]","The term ""matter"" is used throughout physics in a bewildering variety of contexts: for example, one refers to ""condensed matter physics"", ""elementary matter"", ""partonic"" matter, ""dark"" matter, ""anti""-matter, ""strange"" matter, and ""nuclear"" matter. In discussions of matter and antimatter, normal matter has been referred to by Alfvén as koinomatter (Gk. common matter). It is fair to say that in physics, there is no broad consensus as to a general definition of matter, and the term ""matter"" usu..."
130315,Who coined the term partonic matter?,"[(327, 333, Alfvén)]","The term ""matter"" is used throughout physics in a bewildering variety of contexts: for example, one refers to ""condensed matter physics"", ""elementary matter"", ""partonic"" matter, ""dark"" matter, ""anti""-matter, ""strange"" matter, and ""nuclear"" matter. In discussions of matter and antimatter, normal matter has been referred to by Alfvén as koinomatter (Gk. common matter). It is fair to say that in physics, there is no broad consensus as to a general definition of matter, and the term ""matter"" usu..."
130316,What is another name for anti-matter?,"[(350, 367, Gk. common matter)]","The term ""matter"" is used throughout physics in a bewildering variety of contexts: for example, one refers to ""condensed matter physics"", ""elementary matter"", ""partonic"" matter, ""dark"" matter, ""anti""-matter, ""strange"" matter, and ""nuclear"" matter. In discussions of matter and antimatter, normal matter has been referred to by Alfvén as koinomatter (Gk. common matter). It is fair to say that in physics, there is no broad consensus as to a general definition of matter, and the term ""matter"" usu..."
130317,Matter usually does not need to be used in conjunction with what?,"[(529, 550, a specifying modifier)]","The term ""matter"" is used throughout physics in a bewildering variety of contexts: for example, one refers to ""condensed matter physics"", ""elementary matter"", ""partonic"" matter, ""dark"" matter, ""anti""-matter, ""strange"" matter, and ""nuclear"" matter. In discussions of matter and antimatter, normal matter has been referred to by Alfvén as koinomatter (Gk. common matter). It is fair to say that in physics, there is no broad consensus as to a general definition of matter, and the term ""matter"" usu..."


In [28]:
type(train_dataset)
train_split = train_dataset.sample(frac=0.3,random_state=200)
train_split

Unnamed: 0,question,answer,corpus
79919,When were the Chronicles of Huayang compiled?,"[(339, 364, the Jin dynasty (265–420))]","The existence of the early state of Shu was poorly recorded in the main historical records of China. It was, however, referred to in the Book of Documents as an ally of the Zhou. Accounts of Shu exist mainly as a mixture of mythological stories and historical legends recorded in local annals such as the Chronicles of Huayang compiled in the Jin dynasty (265–420), with folk stories such as that of Emperor Duyu (杜宇) who taught the people agriculture and transformed himself into a cuckoo after ..."
97596,Which one of Muhammad's wives had a particular impact on his view of women and education?,"[(556, 564, Khadijah)]","According to the Sunni scholar Ibn ʻAsākir in the 12th century, there were opportunities for female education in the medieval Islamic world, writing that women could study, earn ijazahs (academic degrees), and qualify as scholars and teachers. This was especially the case for learned and scholarly families, who wanted to ensure the highest possible education for both their sons and daughters. Ibn ʻAsakir had himself studied under 80 different female teachers in his time. Female education in ..."
25768,"When did the term ""Near East"" acquire considerable disrepute?","[(21, 37, the 19th century)]","In the last years of the 19th century the term ""Near East"" acquired considerable disrepute in eyes of the English-speaking public as did the Ottoman Empire itself. The cause of the onus was the Hamidian Massacres of Armenians because they were Christians, but it seemed to spill over into the protracted conflicts of the Balkans. For a time, ""Near East"" meant primarily the Balkans. Robert Hichens' book The Near East (1913) is subtitled Dalmatia, Greece and Constantinople."
2660,What location was the focus of the Austrian leg of Spectre's production?,"[(311, 327, Ice Q Restaurant)]","Filming started in Austria in December 2014, with production taking in the area around Sölden—including the Ötztal Glacier Road, Rettenbach glacier and the adjacent ski resort and cable car station—and Obertilliach and Lake Altaussee, before concluding in February 2015. Scenes filmed in Austria centred on the Ice Q Restaurant, standing in for the fictional Hoffler Klinik, a private medical clinic in the Austrian Alps. Filming included an action scene featuring a Land Rover Defender Bigfoot a..."
12923,What type of climate does Oklahoma city?,"[(20, 45, humid subtropical climate)]","Oklahoma City has a humid subtropical climate (Köppen: Cfa), with frequent variations in weather daily and seasonally, except during the consistently hot and humid summer months. Prolonged and severe droughts (sometimes leading to wildfires in the vicinity) as well as very heavy rainfall leading to flash flooding and flooding occur with some regularity. Consistent winds, usually from the south or south-southeast during the summer, help temper the hotter weather. Consistent northerly winds du..."
...,...,...,...
68737,How does Alberta classify provincial legislation?,"[(31, 61, incorporated or unincorporated)]","Canada allows nonprofits to be incorporated or unincorporated. Nonprofits may incorporate either federally, under Part II of the Canada Business Corporations Act or under provincial legislation. Many of the governing Acts for Canadian nonprofits date to the early 1900s, meaning that nonprofit legislation has not kept pace with legislation that governs for-profit corporations; particularly with regards to corporate governance. Federal, and in some provinces (such as Ontario), incorporation is..."
101101,What was Kaunitz of Austria willing to trade for French help in capturing Silesia?,"[(185, 239, willing to trade Austrian Netherlands for France's aid)]","Years later, Kaunitz kept trying to establish France's alliance with Austria. He tried as hard as he could for Austria to not get entangled in Hanover's political affairs, and was even willing to trade Austrian Netherlands for France's aid in recapturing Silesia. Frustrated by this decision and by the Dutch Republic's insistence on neutrality, Britain soon turned to Russia. On September 30, 1755, Britain pledged financial aid to Russia in order to station 50,000 troops on the Livonian-Lithun..."
21437,Who governs the church?,"[(53, 88, the Dean and Chapter of Westminster)]","Westminster Abbey is a collegiate church governed by the Dean and Chapter of Westminster, as established by Royal charter of Queen Elizabeth I in 1560, which created it as the Collegiate Church of St Peter Westminster and a Royal Peculiar under the personal jurisdiction of the Sovereign. The members of the Chapter are the Dean and four canons residentiary, assisted by the Receiver General and Chapter Clerk. One of the canons is also Rector of St Margaret's Church, Westminster, and often hold..."
63703,What is the former Bayer Pharmaceutical campus used for?,"[(442, 471, laboratory and research space)]","Yale's central campus in downtown New Haven covers 260 acres (1.1 km2) and comprises its main, historic campus and a medical campus adjacent to the Yale-New Haven Hospital. In western New Haven, the university holds 500 acres (2.0 km2) of athletic facilities, including the Yale Golf Course. In 2008, Yale purchased the 136-acre (0.55 km2) former Bayer Pharmaceutical campus in West Haven, Connecticut, the buildings of which are now used as laboratory and research space. Yale also owns seven fo..."


Validation Dataset
 

In [29]:
validation_dataset = squad_load_from_json("dev-v2.0.json")
validation_dataset

Question  What was proven in 2001 in regard to the solid oxygen phase?  has no answers
Question  What was discovered in 2006 in regard to O4?  has no answers
Question  What does air in equilibrium with water contain?  has no answers
Question  What is paired oxygen?  has no answers
Question  Why are O molecules paramagnetic?  has no answers
Question  What is formula for the reactive oxygen ion?  has no answers
Question  What are products of oxygen use in organisms?   has no answers
Question  What began to accumulate 5.2 billion years ago?  has no answers
Question  What is red in both the liquid and solid states?  has no answers
Question  Why do polar oceans support reduced amounts of life?  has no answers
Question  What involves delivering a gas stream that is 9% to 93% O2?  has no answers
Question  What do oxoacids evolve from?  has no answers
Question  What is the essential purpose of supplementation?   has no answers
Question  In case of cabin pressurization, what is available to pas

Unnamed: 0,question,answer,corpus
0,In what country is Normandy located?,"[(159, 165, France)]","The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (""Norman"" comes from ""Norseman"") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants ..."
1,When were the Normans in Normandy?,"[(87, 117, in the 10th and 11th centuries), (94, 117, 10th and 11th centuries)]","The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (""Norman"" comes from ""Norseman"") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants ..."
2,From which countries did the Norse originate?,"[(256, 283, Denmark, Iceland and Norway)]","The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (""Norman"" comes from ""Norseman"") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants ..."
3,Who was the Norse leader?,"[(308, 313, Rollo)]","The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (""Norman"" comes from ""Norseman"") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants ..."
4,What century did the Normans first gain their separate identity?,"[(671, 683, 10th century), (671, 675, 10th), (649, 683, the first half of the 10th century)]","The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (""Norman"" comes from ""Norseman"") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants ..."
...,...,...,...
11868,What is the seldom used force unit equal to one thousand newtons?,"[(665, 671, sthène)]","The pound-force has a metric counterpart, less commonly used than the newton: the kilogram-force (kgf) (sometimes kilopond), is the force exerted by standard gravity on one kilogram of mass. The kilogram-force leads to an alternate, but rarely used unit of mass: the metric slug (sometimes mug or hyl) is that mass that accelerates at 1 m·s−2 when subjected to a force of 1 kgf. The kilogram-force is not a part of the modern SI system, and is generally deprecated; however it still sees use for ..."
11869,What does not have a metric counterpart?,"[(4, 15, pound-force)]","The pound-force has a metric counterpart, less commonly used than the newton: the kilogram-force (kgf) (sometimes kilopond), is the force exerted by standard gravity on one kilogram of mass. The kilogram-force leads to an alternate, but rarely used unit of mass: the metric slug (sometimes mug or hyl) is that mass that accelerates at 1 m·s−2 when subjected to a force of 1 kgf. The kilogram-force is not a part of the modern SI system, and is generally deprecated; however it still sees use for ..."
11870,What is the force exerted by standard gravity on one ton of mass?,"[(82, 96, kilogram-force)]","The pound-force has a metric counterpart, less commonly used than the newton: the kilogram-force (kgf) (sometimes kilopond), is the force exerted by standard gravity on one kilogram of mass. The kilogram-force leads to an alternate, but rarely used unit of mass: the metric slug (sometimes mug or hyl) is that mass that accelerates at 1 m·s−2 when subjected to a force of 1 kgf. The kilogram-force is not a part of the modern SI system, and is generally deprecated; however it still sees use for ..."
11871,What force leads to a commonly used unit of mass?,"[(195, 209, kilogram-force)]","The pound-force has a metric counterpart, less commonly used than the newton: the kilogram-force (kgf) (sometimes kilopond), is the force exerted by standard gravity on one kilogram of mass. The kilogram-force leads to an alternate, but rarely used unit of mass: the metric slug (sometimes mug or hyl) is that mass that accelerates at 1 m·s−2 when subjected to a force of 1 kgf. The kilogram-force is not a part of the modern SI system, and is generally deprecated; however it still sees use for ..."


Exploding datasets on multiple answers

Now, we will explode the datasets on the answers list field because it will help later.

In [30]:
train_dataset = train_split.explode('answer').reset_index()
train_dataset

Unnamed: 0,index,question,answer,corpus
0,79919,When were the Chronicles of Huayang compiled?,"(339, 364, the Jin dynasty (265–420))","The existence of the early state of Shu was poorly recorded in the main historical records of China. It was, however, referred to in the Book of Documents as an ally of the Zhou. Accounts of Shu exist mainly as a mixture of mythological stories and historical legends recorded in local annals such as the Chronicles of Huayang compiled in the Jin dynasty (265–420), with folk stories such as that of Emperor Duyu (杜宇) who taught the people agriculture and transformed himself into a cuckoo after ..."
1,97596,Which one of Muhammad's wives had a particular impact on his view of women and education?,"(556, 564, Khadijah)","According to the Sunni scholar Ibn ʻAsākir in the 12th century, there were opportunities for female education in the medieval Islamic world, writing that women could study, earn ijazahs (academic degrees), and qualify as scholars and teachers. This was especially the case for learned and scholarly families, who wanted to ensure the highest possible education for both their sons and daughters. Ibn ʻAsakir had himself studied under 80 different female teachers in his time. Female education in ..."
2,25768,"When did the term ""Near East"" acquire considerable disrepute?","(21, 37, the 19th century)","In the last years of the 19th century the term ""Near East"" acquired considerable disrepute in eyes of the English-speaking public as did the Ottoman Empire itself. The cause of the onus was the Hamidian Massacres of Armenians because they were Christians, but it seemed to spill over into the protracted conflicts of the Balkans. For a time, ""Near East"" meant primarily the Balkans. Robert Hichens' book The Near East (1913) is subtitled Dalmatia, Greece and Constantinople."
3,2660,What location was the focus of the Austrian leg of Spectre's production?,"(311, 327, Ice Q Restaurant)","Filming started in Austria in December 2014, with production taking in the area around Sölden—including the Ötztal Glacier Road, Rettenbach glacier and the adjacent ski resort and cable car station—and Obertilliach and Lake Altaussee, before concluding in February 2015. Scenes filmed in Austria centred on the Ice Q Restaurant, standing in for the fictional Hoffler Klinik, a private medical clinic in the Austrian Alps. Filming included an action scene featuring a Land Rover Defender Bigfoot a..."
4,12923,What type of climate does Oklahoma city?,"(20, 45, humid subtropical climate)","Oklahoma City has a humid subtropical climate (Köppen: Cfa), with frequent variations in weather daily and seasonally, except during the consistently hot and humid summer months. Prolonged and severe droughts (sometimes leading to wildfires in the vicinity) as well as very heavy rainfall leading to flash flooding and flooding occur with some regularity. Consistent winds, usually from the south or south-southeast during the summer, help temper the hotter weather. Consistent northerly winds du..."
...,...,...,...,...
39091,68737,How does Alberta classify provincial legislation?,"(31, 61, incorporated or unincorporated)","Canada allows nonprofits to be incorporated or unincorporated. Nonprofits may incorporate either federally, under Part II of the Canada Business Corporations Act or under provincial legislation. Many of the governing Acts for Canadian nonprofits date to the early 1900s, meaning that nonprofit legislation has not kept pace with legislation that governs for-profit corporations; particularly with regards to corporate governance. Federal, and in some provinces (such as Ontario), incorporation is..."
39092,101101,What was Kaunitz of Austria willing to trade for French help in capturing Silesia?,"(185, 239, willing to trade Austrian Netherlands for France's aid)","Years later, Kaunitz kept trying to establish France's alliance with Austria. He tried as hard as he could for Austria to not get entangled in Hanover's political affairs, and was even willing to trade Austrian Netherlands for France's aid in recapturing Silesia. Frustrated by this decision and by the Dutch Republic's insistence on neutrality, Britain soon turned to Russia. On September 30, 1755, Britain pledged financial aid to Russia in order to station 50,000 troops on the Livonian-Lithun..."
39093,21437,Who governs the church?,"(53, 88, the Dean and Chapter of Westminster)","Westminster Abbey is a collegiate church governed by the Dean and Chapter of Westminster, as established by Royal charter of Queen Elizabeth I in 1560, which created it as the Collegiate Church of St Peter Westminster and a Royal Peculiar under the personal jurisdiction of the Sovereign. The members of the Chapter are the Dean and four canons residentiary, assisted by the Receiver General and Chapter Clerk. One of the canons is also Rector of St Margaret's Church, Westminster, and often hold..."
39094,63703,What is the former Bayer Pharmaceutical campus used for?,"(442, 471, laboratory and research space)","Yale's central campus in downtown New Haven covers 260 acres (1.1 km2) and comprises its main, historic campus and a medical campus adjacent to the Yale-New Haven Hospital. In western New Haven, the university holds 500 acres (2.0 km2) of athletic facilities, including the Yale Golf Course. In 2008, Yale purchased the 136-acre (0.55 km2) former Bayer Pharmaceutical campus in West Haven, Connecticut, the buildings of which are now used as laboratory and research space. Yale also owns seven fo..."


In [31]:
validation_dataset = validation_dataset.explode('answer').reset_index()
validation_dataset

Unnamed: 0,index,question,answer,corpus
0,0,In what country is Normandy located?,"(159, 165, France)","The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (""Norman"" comes from ""Norseman"") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants ..."
1,1,When were the Normans in Normandy?,"(87, 117, in the 10th and 11th centuries)","The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (""Norman"" comes from ""Norseman"") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants ..."
2,1,When were the Normans in Normandy?,"(94, 117, 10th and 11th centuries)","The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (""Norman"" comes from ""Norseman"") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants ..."
3,2,From which countries did the Norse originate?,"(256, 283, Denmark, Iceland and Norway)","The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (""Norman"" comes from ""Norseman"") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants ..."
4,3,Who was the Norse leader?,"(308, 313, Rollo)","The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (""Norman"" comes from ""Norseman"") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants ..."
...,...,...,...,...
16328,11868,What is the seldom used force unit equal to one thousand newtons?,"(665, 671, sthène)","The pound-force has a metric counterpart, less commonly used than the newton: the kilogram-force (kgf) (sometimes kilopond), is the force exerted by standard gravity on one kilogram of mass. The kilogram-force leads to an alternate, but rarely used unit of mass: the metric slug (sometimes mug or hyl) is that mass that accelerates at 1 m·s−2 when subjected to a force of 1 kgf. The kilogram-force is not a part of the modern SI system, and is generally deprecated; however it still sees use for ..."
16329,11869,What does not have a metric counterpart?,"(4, 15, pound-force)","The pound-force has a metric counterpart, less commonly used than the newton: the kilogram-force (kgf) (sometimes kilopond), is the force exerted by standard gravity on one kilogram of mass. The kilogram-force leads to an alternate, but rarely used unit of mass: the metric slug (sometimes mug or hyl) is that mass that accelerates at 1 m·s−2 when subjected to a force of 1 kgf. The kilogram-force is not a part of the modern SI system, and is generally deprecated; however it still sees use for ..."
16330,11870,What is the force exerted by standard gravity on one ton of mass?,"(82, 96, kilogram-force)","The pound-force has a metric counterpart, less commonly used than the newton: the kilogram-force (kgf) (sometimes kilopond), is the force exerted by standard gravity on one kilogram of mass. The kilogram-force leads to an alternate, but rarely used unit of mass: the metric slug (sometimes mug or hyl) is that mass that accelerates at 1 m·s−2 when subjected to a force of 1 kgf. The kilogram-force is not a part of the modern SI system, and is generally deprecated; however it still sees use for ..."
16331,11871,What force leads to a commonly used unit of mass?,"(195, 209, kilogram-force)","The pound-force has a metric counterpart, less commonly used than the newton: the kilogram-force (kgf) (sometimes kilopond), is the force exerted by standard gravity on one kilogram of mass. The kilogram-force leads to an alternate, but rarely used unit of mass: the metric slug (sometimes mug or hyl) is that mass that accelerates at 1 m·s−2 when subjected to a force of 1 kgf. The kilogram-force is not a part of the modern SI system, and is generally deprecated; however it still sees use for ..."


#Tokenization process

After loading the datasets, before we use them for the evaluation process, we must first tokenize them to calculate the gold answer start and end token positions.

**Download BERT tokenizer**

In [32]:
tokenizer = BertTokenizerFast.from_pretrained(BERT_MODEL_NAME)

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

**Calculate start and end tokens of the answers**

In [33]:
def calculate_tokenized_ans_indices(dataset: pd.DataFrame):
  ans_tok_start = []
  ans_tok_end = []
  ans_tok_text = []
  for idx, ans in enumerate(dataset['answer'].values):
    if not pd.isna(ans):
      ans_text_start = ans[0]
      ans_text_end = ans[1]
      ans_text = ans[2]
      encoding = tokenizer.encode_plus(text=dataset['corpus'].values[idx], text_pair=dataset['question'].values[idx], max_length=512, padding='max_length', truncation=True)
      ans_start = encoding.char_to_token(0, ans_text_start)
      ans_end = encoding.char_to_token(0, ans_text_end-1)
      # Handle truncated answers
      if ans_start is None:
        ans_start = ans_end = tokenizer.model_max_length
      elif ans_end is None:
        ans_end = [i for i, inp in enumerate(encoding['input_ids']) if inp == tokenizer.sep_token_id][0]
      # Wrong because it gives free score points on truncated answers. Fixed this on the cross evaluation script.
      ans_text_tok = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(encoding['input_ids'][ans_start:ans_end+1]))
    else:
      ans_start = ans_end = tokenizer.model_max_length
      ans_text_tok = ""
    ans_tok_start.append(ans_start)
    ans_tok_end.append(ans_end)
    ans_tok_text.append(ans_text_tok)
  dataset['ans_start_tok'] = ans_tok_start
  dataset['ans_end_tok'] = ans_tok_end
  dataset['ans_tok_text'] = ans_tok_text
  return dataset

In [34]:
train_dataset = calculate_tokenized_ans_indices(train_dataset)
train_dataset

Unnamed: 0,index,question,answer,corpus,ans_start_tok,ans_end_tok,ans_tok_text
0,79919,When were the Chronicles of Huayang compiled?,"(339, 364, the Jin dynasty (265–420))","The existence of the early state of Shu was poorly recorded in the main historical records of China. It was, however, referred to in the Book of Documents as an ally of the Zhou. Accounts of Shu exist mainly as a mixture of mythological stories and historical legends recorded in local annals such as the Chronicles of Huayang compiled in the Jin dynasty (265–420), with folk stories such as that of Emperor Duyu (杜宇) who taught the people agriculture and transformed himself into a cuckoo after ...",66,73,the jin dynasty ( 265 – 420 )
1,97596,Which one of Muhammad's wives had a particular impact on his view of women and education?,"(556, 564, Khadijah)","According to the Sunni scholar Ibn ʻAsākir in the 12th century, there were opportunities for female education in the medieval Islamic world, writing that women could study, earn ijazahs (academic degrees), and qualify as scholars and teachers. This was especially the case for learned and scholarly families, who wanted to ensure the highest possible education for both their sons and daughters. Ibn ʻAsakir had himself studied under 80 different female teachers in his time. Female education in ...",108,111,khadijah
2,25768,"When did the term ""Near East"" acquire considerable disrepute?","(21, 37, the 19th century)","In the last years of the 19th century the term ""Near East"" acquired considerable disrepute in eyes of the English-speaking public as did the Ottoman Empire itself. The cause of the onus was the Hamidian Massacres of Armenians because they were Christians, but it seemed to spill over into the protracted conflicts of the Balkans. For a time, ""Near East"" meant primarily the Balkans. Robert Hichens' book The Near East (1913) is subtitled Dalmatia, Greece and Constantinople.",6,8,the 19th century
3,2660,What location was the focus of the Austrian leg of Spectre's production?,"(311, 327, Ice Q Restaurant)","Filming started in Austria in December 2014, with production taking in the area around Sölden—including the Ötztal Glacier Road, Rettenbach glacier and the adjacent ski resort and cable car station—and Obertilliach and Lake Altaussee, before concluding in February 2015. Scenes filmed in Austria centred on the Ice Q Restaurant, standing in for the fictional Hoffler Klinik, a private medical clinic in the Austrian Alps. Filming included an action scene featuring a Land Rover Defender Bigfoot a...",65,67,ice q restaurant
4,12923,What type of climate does Oklahoma city?,"(20, 45, humid subtropical climate)","Oklahoma City has a humid subtropical climate (Köppen: Cfa), with frequent variations in weather daily and seasonally, except during the consistently hot and humid summer months. Prolonged and severe droughts (sometimes leading to wildfires in the vicinity) as well as very heavy rainfall leading to flash flooding and flooding occur with some regularity. Consistent winds, usually from the south or south-southeast during the summer, help temper the hotter weather. Consistent northerly winds du...",5,7,humid subtropical climate
...,...,...,...,...,...,...,...
39091,68737,How does Alberta classify provincial legislation?,"(31, 61, incorporated or unincorporated)","Canada allows nonprofits to be incorporated or unincorporated. Nonprofits may incorporate either federally, under Part II of the Canada Business Corporations Act or under provincial legislation. Many of the governing Acts for Canadian nonprofits date to the early 1900s, meaning that nonprofit legislation has not kept pace with legislation that governs for-profit corporations; particularly with regards to corporate governance. Federal, and in some provinces (such as Ontario), incorporation is...",7,9,incorporated or unincorporated
39092,101101,What was Kaunitz of Austria willing to trade for French help in capturing Silesia?,"(185, 239, willing to trade Austrian Netherlands for France's aid)","Years later, Kaunitz kept trying to establish France's alliance with Austria. He tried as hard as he could for Austria to not get entangled in Hanover's political affairs, and was even willing to trade Austrian Netherlands for France's aid in recapturing Silesia. Frustrated by this decision and by the Dutch Republic's insistence on neutrality, Britain soon turned to Russia. On September 30, 1755, Britain pledged financial aid to Russia in order to station 50,000 troops on the Livonian-Lithun...",42,51,willing to trade austrian netherlands for france's aid
39093,21437,Who governs the church?,"(53, 88, the Dean and Chapter of Westminster)","Westminster Abbey is a collegiate church governed by the Dean and Chapter of Westminster, as established by Royal charter of Queen Elizabeth I in 1560, which created it as the Collegiate Church of St Peter Westminster and a Royal Peculiar under the personal jurisdiction of the Sovereign. The members of the Chapter are the Dean and four canons residentiary, assisted by the Receiver General and Chapter Clerk. One of the canons is also Rector of St Margaret's Church, Westminster, and often hold...",9,14,the dean and chapter of westminster
39094,63703,What is the former Bayer Pharmaceutical campus used for?,"(442, 471, laboratory and research space)","Yale's central campus in downtown New Haven covers 260 acres (1.1 km2) and comprises its main, historic campus and a medical campus adjacent to the Yale-New Haven Hospital. In western New Haven, the university holds 500 acres (2.0 km2) of athletic facilities, including the Yale Golf Course. In 2008, Yale purchased the 136-acre (0.55 km2) former Bayer Pharmaceutical campus in West Haven, Connecticut, the buildings of which are now used as laboratory and research space. Yale also owns seven fo...",101,104,laboratory and research space


In [35]:
validation_dataset = calculate_tokenized_ans_indices(validation_dataset)
validation_dataset

Unnamed: 0,index,question,answer,corpus,ans_start_tok,ans_end_tok,ans_tok_text
0,0,In what country is Normandy located?,"(159, 165, France)","The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (""Norman"" comes from ""Norseman"") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants ...",41,41,france
1,1,When were the Normans in Normandy?,"(87, 117, in the 10th and 11th centuries)","The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (""Norman"" comes from ""Norseman"") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants ...",26,31,in the 10th and 11th centuries
2,1,When were the Normans in Normandy?,"(94, 117, 10th and 11th centuries)","The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (""Norman"" comes from ""Norseman"") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants ...",28,31,10th and 11th centuries
3,2,From which countries did the Norse originate?,"(256, 283, Denmark, Iceland and Norway)","The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (""Norman"" comes from ""Norseman"") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants ...",63,67,"denmark, iceland and norway"
4,3,Who was the Norse leader?,"(308, 313, Rollo)","The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (""Norman"" comes from ""Norseman"") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants ...",73,74,rollo
...,...,...,...,...,...,...,...
16328,11868,What is the seldom used force unit equal to one thousand newtons?,"(665, 671, sthène)","The pound-force has a metric counterpart, less commonly used than the newton: the kilogram-force (kgf) (sometimes kilopond), is the force exerted by standard gravity on one kilogram of mass. The kilogram-force leads to an alternate, but rarely used unit of mass: the metric slug (sometimes mug or hyl) is that mass that accelerates at 1 m·s−2 when subjected to a force of 1 kgf. The kilogram-force is not a part of the modern SI system, and is generally deprecated; however it still sees use for ...",158,160,sthene
16329,11869,What does not have a metric counterpart?,"(4, 15, pound-force)","The pound-force has a metric counterpart, less commonly used than the newton: the kilogram-force (kgf) (sometimes kilopond), is the force exerted by standard gravity on one kilogram of mass. The kilogram-force leads to an alternate, but rarely used unit of mass: the metric slug (sometimes mug or hyl) is that mass that accelerates at 1 m·s−2 when subjected to a force of 1 kgf. The kilogram-force is not a part of the modern SI system, and is generally deprecated; however it still sees use for ...",2,4,pound - force
16330,11870,What is the force exerted by standard gravity on one ton of mass?,"(82, 96, kilogram-force)","The pound-force has a metric counterpart, less commonly used than the newton: the kilogram-force (kgf) (sometimes kilopond), is the force exerted by standard gravity on one kilogram of mass. The kilogram-force leads to an alternate, but rarely used unit of mass: the metric slug (sometimes mug or hyl) is that mass that accelerates at 1 m·s−2 when subjected to a force of 1 kgf. The kilogram-force is not a part of the modern SI system, and is generally deprecated; however it still sees use for ...",18,21,kilogram - force
16331,11871,What force leads to a commonly used unit of mass?,"(195, 209, kilogram-force)","The pound-force has a metric counterpart, less commonly used than the newton: the kilogram-force (kgf) (sometimes kilopond), is the force exerted by standard gravity on one kilogram of mass. The kilogram-force leads to an alternate, but rarely used unit of mass: the metric slug (sometimes mug or hyl) is that mass that accelerates at 1 m·s−2 when subjected to a force of 1 kgf. The kilogram-force is not a part of the modern SI system, and is generally deprecated; however it still sees use for ...",50,53,kilogram - force


**Create custom dataset class**

Now let's create the custom dataset class which also handle the tokenization part. This way, the tokenization happens during training and validation and we don't consume ram to save all the tokenizations in prior.

In [36]:
class SQuAD_Dataset(Dataset):
  def __init__(self, data: pd.DataFrame):
    self.data = data

  def __len__(self):
    return len(self.data)

  def __getitem__(self, idx):
    encoding = tokenizer(text=self.data['corpus'].values[idx], text_pair=self.data['question'].values[idx], max_length=512, padding='max_length', truncation=True, return_tensors='pt').to(device)
    return {
        'ans_start_tok': torch.tensor(self.data['ans_start_tok'].values[idx], dtype=torch.long, device=device),
        'ans_end_tok': torch.tensor(self.data['ans_end_tok'].values[idx], dtype=torch.long, device=device),
        'input_ids': encoding['input_ids'][0], 
        'attention_mask': encoding['attention_mask'][0],
        'token_type_ids': encoding['token_type_ids'][0]
    }

In [37]:
train_squad_dataset = SQuAD_Dataset(data=train_dataset)

In [38]:
val_squad_dataset = SQuAD_Dataset(data=validation_dataset)

**Create Dataloaders**

In [39]:
BATCH_SIZE=8

In [40]:
train_dataloader = DataLoader(train_squad_dataset, batch_size=BATCH_SIZE, shuffle=False)
validation_dataloader = DataLoader(val_squad_dataset, batch_size=1, shuffle=False)

On the validation dataloader, i gave 1 as batch size 

**Train & Evaluation methods**

Now, let's define the training function. It follows the same logic as the previous ones. The only major difference is the way the validation loss is calculated. In particular, for each question in the validation dataset, i take the minimum loss among all the answers because the model's goal is to approach any valid answer as good as possible, and if i consider all answer losses (and even those from answers far from the one that gives the minimum loss), this will add extra noisy error for no reason and confuse the train-validation loss plot. To calculate the validation loss this way, we must take the answers one at a time, so this is the reason i set batch size=1 on the validation dataloader.

In [41]:
def train(model: BertForQuestionAnswering, validation_dataloader: DataLoader, learning_rate: float, epochs: int):
  # Define the optimizer
  opt = torch.optim.AdamW(model.parameters(), lr=learning_rate)
  # Initialize train and validation losses lists
  train_losses = []
  validation_losses = []
  # Train for given # of epochs
  for epoch in range(epochs):
    model.train()
    t_losses = []
    for batch in tqdm(train_dataloader):
      # Delete previously stored gradients
      opt.zero_grad()
      # Get loss and outputs from the model
      start_positions = batch['ans_start_tok'] # we can do this because we know taht squad trai dataset questions always have 1 question
      end_positions = batch['ans_end_tok']
      out = model(input_ids=batch['input_ids'], attention_mask=batch['attention_mask'], token_type_ids=batch['token_type_ids'], start_positions=start_positions, end_positions=end_positions)
      # Perform backpropagation starting from the loss returned in this epoch
      loss = out[0]
      loss.backward()
      # Update model's weights based on the gradients calculated during backprop
      opt.step()
      t_losses.append(loss.item())

    # Calculate train loss in current epoch
    train_loss = np.mean(t_losses) 
    train_losses.append(train_loss)
    with torch.no_grad():
      model.eval()
      v_losses = []
      tmp_losses = []
      cur_idx = -1
      for val_idx, val_batch in enumerate(tqdm(validation_dataloader)):
        start_positions = val_batch['ans_start_tok'] # we can do this because we know that squad train dataset questions always have 1 question
        end_positions = val_batch['ans_end_tok']
        out = model(input_ids=val_batch['input_ids'], attention_mask=val_batch['attention_mask'], token_type_ids=val_batch['token_type_ids'], start_positions=start_positions, end_positions=end_positions)
        loss = out[0]
        batch_idx = validation_dataset['index'].values[val_idx]
        if cur_idx != batch_idx:
          cur_idx = batch_idx
          if len(tmp_losses) > 0:
            v_losses.append(min(tmp_losses))
          tmp_losses = []
        tmp_losses.append(loss.item())
      if len(tmp_losses) > 0:
        v_losses.append(min(tmp_losses))
      val_loss = np.mean(v_losses) 
      validation_losses.append(val_loss)

    # Print current epoch status
    print(f"Epoch {epoch:3}: Loss = {train_loss:.5f} Val_loss = {val_loss:.5f}")

  return   validation_losses

Model Loading

Now let's load the pre-trained bert-base-uncased model.

In [42]:
 reset_seed()

In [43]:
model = BertForQuestionAnswering.from_pretrained(BERT_MODEL_NAME).to(device)
model

Downloading:   0%|          | 0.00/420M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForQuestionAnswering: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased a

BertForQuestionAnswering(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_

In [44]:
 validation_losses = train(model, validation_dataloader, learning_rate=1e-5, epochs=3)

  1%|          | 48/4887 [31:44<53:19:17, 39.67s/it]


KeyboardInterrupt: ignored

In [None]:
torch.save(model.state_dict(), "model.pt")