# Error analysis

This notebook contains the source code used for the error analysis described in the project's report, where wrong answers are taken from the SQuAD v1.1 dev set for each and every model.

## Imports

In order to import source files, we have to add the `src` folder to the Python `PATH`$\dots$ 

In [13]:
import sys

sys.path.insert(0, "src")

Then, we can import packages as usual$\dots$

In [16]:
import os
import json

import numpy as np
from transformers.trainer_utils import set_seed

import config

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Initialization

### PyTorch and numpy

Set the random seed to a fixed number for reproducible results$\dots$

In [17]:
set_seed(config.RANDOM_SEED)

## Wrong answers loading

Load the errors made on the SQuAD v1.1 dev set for each and every model$\dots$

In [30]:
with open('results/wrong/baseline.json') as f:
    baseline_errors = json.load(f)
with open('results/wrong/bidaf.json') as f:
    bidaf_errors = json.load(f)
with open('results/wrong/bert.json') as f:
    bert_errors = json.load(f)
with open('results/wrong/distilbert.json') as f:
    distilbert_errors = json.load(f)
with open('results/wrong/electra.json') as f:
    electra_errors = json.load(f)

Observe how much errors are made with each model$\dots$

In [31]:
print(f"The Baseline model makes {len(baseline_errors)} errors")
print(f"The BiDAF model makes {len(bidaf_errors)} errors")
print(f"The BERT model makes {len(bert_errors)} errors")
print(f"The DistilBERT model makes {len(distilbert_errors)} errors")
print(f"The ELECTRA model makes {len(electra_errors)} errors")

The Baseline model makes 8276 errors
The BiDAF model makes 4222 errors
The BERT model makes 2731 errors
The DistilBERT model makes 2789 errors
The ELECTRA model makes 2062 errors


## Wrong answers analysis

Compute the common errors among all the models and the common errors among the best models, i.e. BiDAF and ELECTRA$\dots$

In [35]:
all_common_errors = list(
    set(electra_errors.keys())
    & set(bidaf_errors.keys())
    & set(baseline_errors.keys())
    & set(bert_errors.keys())
    & set(distilbert_errors.keys())
)
best_common_errors = list(set(electra_errors.keys()) & set(bidaf_errors.keys()))

In [36]:
print(f"The number of common errors between all the models is {len(all_common_errors)}")
print(f"The number of common errors between BiDAF and ELECTRA is {len(best_common_errors)}")

The number of common errors between all the models is 940
The number of common errors between BiDAF and ELECTRA is 1535


Take a random subset of errors (among the common ones between BiDAF and ELECTRA)$\dots$

In [37]:
random_best_common_errors = np.random.choice(best_common_errors, 50, replace=False)

Show the selected errors$\dots$

In [38]:
for e in random_best_common_errors:
    context = electra_errors[e]["context"]
    question = electra_errors[e]["question"]
    answers = electra_errors[e]["answers"]
    electra_pred = electra_errors[e]["prediction"]
    bidaf_pred = bidaf_errors[e]["prediction"]
    print(f"Context: {context}")
    print(f"Question: {question}")
    print(f"Answers: {answers}")
    print(f"Predictions: [ELECTRA] {electra_pred} [BiDAF] {bidaf_pred}")
    print()

Context: One key figure in the plans for what would come to be known as American Empire, was a geographer named Isiah Bowman. Bowman was the director of the American Geographical Society in 1914. Three years later in 1917, he was appointed to then President Woodrow Wilson's inquiry in 1917. The inquiry was the idea of President Wilson and the American delegation from the Paris Peace Conference. The point of this inquiry was to build a premise that would allow for U.S authorship of a 'new world' which was to be characterized by geographical order. As a result of his role in the inquiry, Isiah Bowman would come to be known as Wilson's geographer. 
Question: Who besides Woodrow Wilson himself had the idea for the inquiry?
Answers: ['american delegation from paris peace conference', 'american delegation from paris peace conference', 'american delegation from paris peace conference', 'american delegation from paris peace conference', 'american delegation from paris peace conference']
Predic