LAST LINKs:
* https://colab.research.google.com/drive/1WIk2bxglElfZewOHboPFNj8H44_VAyKE?usp=sharing#scrollTo=Gw3IZYrfKl4Z
* https://medium.com/analytics-vidhya/fine-tune-a-roberta-encoder-decoder-model-trained-on-mlm-for-text-generation-23da5f3c1858
* https://huggingface.co/course/chapter7/7?fw=tf

LINKs:
* https://github.com/huggingface/notebooks/blob/main/examples/question_answering.ipynb

* https://github.com/Michael-M-Mike/Unibo-NLP-Assignments/blob/main/A2_Seq2Seq_Abstractive_Question_Answering_(QA)_on_CoQA/distilroberta_42.ipynb

# Assignment 2

**Credits**: Andrea Galassi, Federico Ruggeri, Paolo Torroni

**Keywords**: Transformers, Question Answering, CoQA

## Overview

### Problem

Question Answering (QA) on [CoQA](https://stanfordnlp.github.io/coqa/) dataset: a conversational QA dataset.

### Task

Given a question $Q$, a text passage $P$, the task is to generate the answer $A$.<br>
$\rightarrow A$ can be: (i) a free-form text or (ii) unanswerable;

**Note**: an question $Q$ can refer to previous dialogue turns. <br>
$\rightarrow$ dialogue history $H$ may be a valuable input to provide the correct answer $A$.

### Models

We are going to experiment with transformer-based models to define the following models:

1.  $A = f_\theta(Q, P)$

2. $A = f_\theta(Q, P, H)$

where $f_\theta$ is the transformer-based model we have to define with $\theta$ parameters.

## The CoQA dataset

<center>
    <img src="https://drive.google.com/uc?export=view&id=16vrgyfoV42Z2AQX0QY7LHTfrgektEKKh" width="750"/>
</center>

For detailed information about the dataset, feel free to check the original [paper](https://arxiv.org/pdf/1808.07042.pdf).



## Rationales

Each QA pair is paired with a rationale $R$: it is a text span extracted from the given text passage $P$. <br>
$\rightarrow$ $R$ is not a requested output, but it can be used as an additional information at training time!

## Dataset Statistics

* **127k** QA pairs.
* **8k** conversations.
* **7** diverse domains: Children's Stories, Literature, Mid/High School Exams, News, Wikipedia, Reddit, Science.
* Average conversation length: **15 turns** (i.e., QA pairs).
* Almost **half** of CoQA questions refer back to **conversational history**.
* Only **train** and **validation** sets are available.

## Dataset snippet

The dataset is stored in JSON format. Each dialogue is represented as follows:

```
{
    "source": "mctest",
    "id": "3dr23u6we5exclen4th8uq9rb42tel",
    "filename": "mc160.test.41",
    "story": "Once upon a time, in a barn near a farm house, there lived a little white kitten named Cotton. 
    Cotton lived high up in a nice warm place above the barn where all of the farmer's horses slept. [...]" % <-- $P$
    "questions": [
        {
            "input_text": "What color was Cotton?",   % <-- $Q_1$
            "turn_id": 1
        },
        {
            "input_text": "Where did she live?",
            "turn_id": 2
        },
        [...]
    ],
    "answers": [
        {
            "span_start": 59,   % <-- $R_1$ start index
            "spand_end": 93,    % <-- $R_1$ end index
            "span_text": "a little white kitten named Cotton",   % <-- $R_1$
            "input_text" "white",   % <-- $A_1$      
            "turn_id": 1
        },
        [...]
    ]
}
```

### Simplifications

Each dialogue also contains an additional field ```additional_answers```. For simplicity, we **ignore** this field and only consider one groundtruth answer $A$ and text rationale $R$.

CoQA only contains 1.3% of unanswerable questions. For simplicity, we **ignore** those QA pairs.

# [0] Functions and imports

In [1]:
# %%capture
# !pip install datasets
# !pip install transformers
# !pip install tensorflow_addons
# !pip install allennlp-models


# NOTE:
#     - SEED ED ERRORE
#     - LUNGHEZZA INPUTS E OUTPUTS - https://towardsdatascience.com/to-distil-or-not-to-distil-bert-roberta-and-xlnet-c777ad92f8
#     - GRANDEZZA DATASETS
#     - WARNINGS NELLA CREAZIONE DEL MODELLO E NEL TRAINING
#     - COME UTILIZZARE SPAN DI TESTO

In [22]:
from IPython.display import display_html, clear_output
from itertools import chain,cycle
import plotly.express as px
from copy import deepcopy
import urllib.request
import transformers
import numpy as np
import json
import time
import os
import torch
import random 
import warnings
import pandas as pd
from tqdm import tqdm

from sklearn.model_selection import GroupShuffleSplit
from datasets import *
from transformers import AutoTokenizer, PreTrainedTokenizerFast, EncoderDecoderModel, Seq2SeqTrainingArguments, Seq2SeqTrainer, AdamW, DataCollatorForSeq2Seq

# warnings.filterwarnings(action='once')    
warnings.filterwarnings(action='ignore')

os.environ['TOKENIZERS_PARALLELISM'] = 'true'

# Display dataframes 
def display(*args,titles=cycle([''])):
    html_str=''
    for df,title in zip(args, chain(titles,cycle(['</br>'])) ):
        html_str+='<th style="text-align:left"><td style="vertical-align:top">'
        html_str+=f'<h4 style="text-align: left;">{title}</h2>'
        html_str+=df.to_html().replace('table','table style="display:inline"')
        html_str+='</td></th>'
    display_html(html_str,raw=True)
    
# Setting seeds for reproducibility
def set_reproducibility(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    transformers.set_seed(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

    os.environ['TF_DETERMINISTIC_OPS'] = '1'

# Check tokenizer special tokens
def check_tokens(tokenizer):
    # Get the special tokens and their corresponding IDs
    special_tokens = tokenizer.special_tokens_map
    special_ids = tokenizer.convert_tokens_to_ids(list(special_tokens.values()))
    print("Special tokens:")
    for token_type, token_list in special_tokens.items():
        print(f"{token_type}: {token_list}")
    # Print the special tokens and their corresponding IDs
    for token, id in zip(special_tokens.keys(), special_ids):
        print(f"{token}: {id}")
        
# Compute metrics in the trainer
def compute_metrics(pred,tokenizer):
    labels = pred.label_ids
    preds = pred.predictions
    
    labels_text = tokenizer.batch_decode(labels, skip_special_tokens=True)
    preds_text = tokenizer.batch_decode(preds, skip_special_tokens=True)
    
    squad_scores=[]
    for i in range(len(preds_text)):
        squad_scores.append(compute_f1(str(preds_text[i]), str(labels_text[i])))
    mean_squad_f1 = sum(squad_scores)/len(squad_scores)

    return {"squad_f1_score": mean_squad_f1}

# Generate Answers
def generate_answers(test_loader,model,tokenizer,dataset,verbose=False):
    i=0
    generated_answers = []
    squad_scores = []
    for batch in tqdm(test_loader):
        

        example = batch['input_ids'].to(device)
        att_mask = batch['attention_mask'].to(device)
        generated_ids = model.generate(input_ids=example, 
                                          attention_mask=att_mask,
                                          max_length=max_length_answer
                                         )
        

        generated_answer = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
        ground_truth = dataset['answer'][i:i+batch_size]
        
        
        squad = []
        for j in range(len(generated_answer)):                
            sq = compute_f1(generated_answer[j], ground_truth[j])
            generated_answers.append(generated_answer[j])
            squad_scores.append(sq)
            squad.append(sq)
                

        if verbose:
            print(dataset['question'][i:i+batch_size])
            print(f'Generated ans: {generated_answer}')
            print(f'True ans: {ground_truth}')
            print(f'SQuAD F1 score: {squad}')
            
#        for idx in range(batch_size):
#          for idx in range(int(batch_size/4)):
#             print(f'Question: [{dataset["test"]["question"][i+idx]}]')
#             print(f'\tGT: {ground_truth[idx]}\t-\tGENERATED: {generated_answers[idx]}')
        i+=batch_size  
            
            
    mean_squad_f1 = sum(squad_scores)/len(squad_scores)
    print(f'Mean SQuAD F1-score: {mean_squad_f1}')   
    
    dataset = dataset.add_column("generated_answer", generated_answers)
    dataset = dataset.add_column("squad_score", squad_scores)

    return dataset
        
# Add history to context    
def add_history_to_context(df):
    print('Adding the history to each entry of the dataframe...')
    # Sort the dataframe by 'id' and 'turn_id'
    df = df.sort_values(['id', 'turn_id'])
    # Group the dataframe by 'id'
    groups = df.groupby('id')
    # Create an empty list to store the updated rows
    new_rows = []
    # Iterate over each group
    for _, group in tqdm(groups):
        # Initialize an empty string for the history 
        history = ''
        # Iterate over each row in the group
        for i, row in group.iterrows():
            # Concatenate 'input_text_x' and 'input_text_y' for each row
            if row['turn_id'] > 1: # only consider previous turn_ids
                prev_rows = group.loc[group['turn_id'] < row['turn_id'], ['input_text_x', 'input_text_y']]
                history = ''.join(prev_rows['input_text_x'] + prev_rows['input_text_y'] + ';')
            else:
                history = ''
            # Update the 'history_context' column for the current row
            new_row = row.copy()
            if history == '':
                new_row['history_context'] = row['story']
            else:
                new_row['history_context'] = history+'</s>'+row['story']
            # Append the updated row to the list of new rows
            new_rows.append(new_row)
    # Create a new dataframe with the updated rows
    result_df = pd.DataFrame(new_rows)
    print('History added.')
    return result_df
# Show the worst 5 preds per source
def show_worst_five(dataset):
    df = pd.DataFrame(dataset)
    
    for source in df['source'].unique():
        source_df = df[df['source'] == source]
        sorted_df = source_df.sort_values(by='squad_score', ascending=True)
        worst_5_df = sorted_df.head(5)
        print(f"Source: {source}")
        display(worst_5_df[['context', 'question', 'answer', 'generated_answer', 'squad_score']])
        print()
   
load_model = False

In [3]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")     

print("Using device:", device)

Using device: cuda


In [4]:
!nvidia-smi

Wed Mar 29 13:54:35 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 528.49       Driver Version: 528.49       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   47C    P0    33W / 125W |      0MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

### SQuAD metric

In [5]:
"""
https://github.com/allenai/allennlp-models/blob/main/allennlp_models/rc/tools/squad.py

Functions taken from [the official evaluation script]
(https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/)
for SQuAD version 2.0.
"""
import collections
import re
import string
from typing import Callable, Sequence, TypeVar, Tuple


def make_qid_to_has_ans(dataset):
    qid_to_has_ans = {}
    for article in dataset:
        for p in article["paragraphs"]:
            for qa in p["qas"]:
                qid_to_has_ans[qa["id"]] = bool(qa["answers"])
    return qid_to_has_ans


def normalize_answer(s):
    """Lower text and remove punctuation, articles and extra whitespace."""

    def remove_articles(text):
        regex = re.compile(r"\b(a|an|the)\b", re.UNICODE)
        return re.sub(regex, " ", text)

    def white_space_fix(text):
        return " ".join(text.split())

    def remove_punc(text):
        exclude = set(string.punctuation)
        return "".join(ch for ch in text if ch not in exclude)

    def lower(text):
        return text.lower()

    return white_space_fix(remove_articles(remove_punc(lower(s))))


def get_tokens(s):
    if not s:
        return []
    return normalize_answer(s).split()


def compute_exact(a_pred: str, a_gold: str) -> int:
    return int(normalize_answer(a_pred) == normalize_answer(a_gold))


def compute_f1(a_pred: str, a_gold: str) -> float:
    pred_toks = get_tokens(a_pred)
    gold_toks = get_tokens(a_gold)
    common = collections.Counter(pred_toks) & collections.Counter(gold_toks)  # type: ignore[var-annotated]
    num_same = sum(common.values())
    if len(pred_toks) == 0 or len(gold_toks) == 0:
        # If either is no-answer, then F1 is 1 if they agree, 0 otherwise
        return float(pred_toks == gold_toks)
    if num_same == 0:
        return 0.0
    precision = 1.0 * num_same / len(pred_toks)
    recall = 1.0 * num_same / len(gold_toks)
    f1 = (2 * precision * recall) / (precision + recall)
    return f1


_P = TypeVar("_P")
_G = TypeVar("_G")
_T = TypeVar("_T", int, float, Tuple[int, ...], Tuple[float, ...])


def metric_max_over_ground_truths(
    metric_fn: Callable[[_P, _G], _T], prediction: _P, ground_truths: Sequence[_G]
) -> _T:
    scores_for_ground_truths = []
    for ground_truth in ground_truths:
        score = metric_fn(prediction, ground_truth)
        scores_for_ground_truths.append(score)
    return max(scores_for_ground_truths)


def get_metric_score(prediction: str, gold_answers: Sequence[str]) -> Tuple[int, float]:
    exact_scores = metric_max_over_ground_truths(compute_exact, prediction, gold_answers)
    f1_scores = metric_max_over_ground_truths(compute_f1, prediction, gold_answers)
    return exact_scores, f1_scores

## [Task 1] Remove unaswerable QA pairs

Write your own script to remove unaswerable QA pairs from both train and validation sets.

## Dataset Download


In [6]:
class DownloadProgressBar(tqdm):
    def update_to(self, b=1, bsize=1, tsize=None):
        if tsize is not None:
            self.total = tsize
        self.update(b * bsize - self.n)
        
def download_url(url, output_path):
    with DownloadProgressBar(unit='B', unit_scale=True,
                             miniters=1, desc=url.split('/')[-1]) as t:
        urllib.request.urlretrieve(url, filename=output_path, reporthook=t.update_to)

def download_data(data_path, url_path, suffix):    
    if not os.path.exists(data_path):
        os.makedirs(data_path)
        
    data_path = os.path.join(data_path, f'{suffix}.json')

    if not os.path.exists(data_path):
        print(f"Downloading CoQA {suffix} data split... (it may take a while)")
        download_url(url=url_path, output_path=data_path)
        urllib.request.urlretrieve(url_path, filename=data_path)
        print("Download completed!")

In [7]:
# Train data
train_url = "https://nlp.stanford.edu/data/coqa/coqa-train-v1.0.json"
download_data(data_path='coqa', url_path=train_url, suffix='train')

# Test data
test_url = "https://nlp.stanford.edu/data/coqa/coqa-dev-v1.0.json"
download_data(data_path='coqa', url_path=test_url, suffix='test')  # <-- Why test? See next slides for an answer!

#### Data Inspection

Spend some time in checking accurately the dataset format and how to retrieve the tasks' inputs and outputs!

In [8]:
# Creating Dataframes and removing unanswerable questions
train_data = json.load((open('coqa/train.json')))
test_data = json.load((open('coqa/test.json')))

qas = pd.json_normalize(train_data['data'], ['questions'], ['source', 'id', 'story'])
ans = pd.json_normalize(train_data['data'], ['answers'],['id'])
train_val_df = pd.merge(qas,ans, left_on=['id','turn_id'], right_on=['id','turn_id'])
train_val_df = train_val_df.loc[train_val_df['input_text_y']!='unknown']

qas = pd.json_normalize(test_data['data'], ['questions'], ['source', 'id', 'story'])
ans = pd.json_normalize(test_data['data'], ['answers'],['id'])
test_df = pd.merge(qas,ans, left_on=['id','turn_id'], right_on=['id','turn_id'])
test_df = test_df.loc[test_df['input_text_y']!='unknown']

In [9]:
# Removing bad turns
train_val_df = train_val_df.loc[(train_val_df['bad_turn_x'] != 'True') & (train_val_df['bad_turn_y'] != 'True')]

# Removing equal text/answer entries
train_val_df = train_val_df[train_val_df.story != train_val_df.input_text_y]
test_df = test_df[test_df.story != test_df.input_text_y]

# Removing enties with empty answers
train_val_df = train_val_df[train_val_df['input_text_y'].str.len()>0]
test_df = test_df[test_df['input_text_y'].str.len()>0]

In [10]:
# Text preprocess
def preprocess(ds,columns):
    ds = ds.replace(r'\n',' ', regex=True)
#     ds = ds.replace(r'[^\w\s]+', ' ', regex=True)
#     for feature in columns:
#         ds[feature] = ds[feature].str.lower().str.strip()
        
    return ds

columns = ['story', 'input_text_x', 'span_text', 'input_text_y']

train_val_df = preprocess(train_val_df,columns)
test_df = preprocess(test_df,columns)

## [Task 2] Train, Validation and Test splits

CoQA only provides a train and validation set since the test set is hidden for evaluation purposes.

We'll consider the provided validation set as a test set. <br>
$\rightarrow$ Write your own script to:
* Split the train data in train and validation splits (80% train and 20% val)
* Perform splits such that a dialogue appears in one split only! (i.e., split at dialogue level)
* Perform splitting using the following seed for reproducibility: 42

#### Reproducibility Memo

Check back tutorial 2 on how to fix a specific random seed for reproducibility!

In [11]:
# Train/Validation Split
set_reproducibility(42)

train_inds, val_inds = next(GroupShuffleSplit(test_size=.20, n_splits=2, random_state = 42).split(train_val_df, groups=train_val_df['id']))

train_df = train_val_df.iloc[train_inds]
val_df = train_val_df.iloc[val_inds].reset_index()

# Add the histoy_context column to the datasets
train_df = add_history_to_context(train_df)
val_df = add_history_to_context(val_df)
test_df = add_history_to_context(test_df)

Adding the history to each entry of the dataframe...


100%|██████████████████████████████████████████████████████████████████████████████| 5754/5754 [01:05<00:00, 87.43it/s]


History added.
Adding the history to each entry of the dataframe...


100%|██████████████████████████████████████████████████████████████████████████████| 1439/1439 [00:16<00:00, 86.96it/s]


History added.
Adding the history to each entry of the dataframe...


100%|████████████████████████████████████████████████████████████████████████████████| 500/500 [00:05<00:00, 83.67it/s]


History added.


In [12]:
# Checking the Dataframes
print(f'Training set [{train_df.shape}]')
print(f'\tFeatures: {list(train_df.columns)}')
display(train_df.head())
#display(train_df.loc[11:15,['id', 'input_text_x', 'input_text_y', 'span_text']])

print(f'Validation set [{val_df.shape}]')
print(f'\tFeatures: {list(val_df.columns)}')
display(val_df.head())
#display(val_df.loc[11:15,['id', 'input_text_x', 'input_text_y', 'span_text']])

print(f'Test set [{test_df.shape}]')
print(f'\tFeatures: {list(test_df.columns)}')
display(test_df.head())
#display(test_df.loc[11:15,['id', 'input_text_x', 'input_text_y', 'span_text']])

Training set [(85823, 12)]
	Features: ['input_text_x', 'turn_id', 'bad_turn_x', 'source', 'id', 'story', 'span_start', 'span_end', 'span_text', 'input_text_y', 'bad_turn_y', 'history_context']


Unnamed: 0,input_text_x,turn_id,bad_turn_x,source,id,story,span_start,span_end,span_text,input_text_y,bad_turn_y,history_context
47030,Who is Hannah?,1,,mctest,3018q3zvoiqh6tkjkzarysii242ra6,"Hannah Harvey was a ten year old that had many friends in school. She lived in New York and enjoyed doing gymnastics and playing soccer. One day, Hannah came home from school and her parents greeted her. She knew that something was different by the expressions on their faces. Even Jackson, Hannah's dog, was acting different. Hannah asked why everyone was being so strange. Hannah's father, who was known as Pop, explained to Hannah that his job was forcing him to move. Hannah did not seem to think this was too big of a deal. Then, Hannah's mother explained that they were moving to Kenya. Kenya, she explained, was a place in Africa and life would be very different there. As Hannah began to cry thinking about all of her friends at home, Hannah's mother calmed her with a gentle touch. Jackson began howling as Hannah cried, but was also calmed by Hannah's mother. Hannah spent the next two weeks visiting her friends and saying her goodbyes. She did not know the next time she would be home. She cried very hard when she said goodbye to her best friend, Susan. Susan did not quite understand where Kenya was, but promised to visit Hannah. The next day, Hannah boarded a plane with her family. At first, they thought that Jackson could not come with them. However, Hannah worked hard and helped make sure that he had all of his shots so that he could come. After he had them all, the airport said it was OK for Jackson to come! The Harvey family left and off they went across the ocean to begin their new life in Kenya.",18,32,a ten year old,a ten year old,,"Hannah Harvey was a ten year old that had many friends in school. She lived in New York and enjoyed doing gymnastics and playing soccer. One day, Hannah came home from school and her parents greeted her. She knew that something was different by the expressions on their faces. Even Jackson, Hannah's dog, was acting different. Hannah asked why everyone was being so strange. Hannah's father, who was known as Pop, explained to Hannah that his job was forcing him to move. Hannah did not seem to think this was too big of a deal. Then, Hannah's mother explained that they were moving to Kenya. Kenya, she explained, was a place in Africa and life would be very different there. As Hannah began to cry thinking about all of her friends at home, Hannah's mother calmed her with a gentle touch. Jackson began howling as Hannah cried, but was also calmed by Hannah's mother. Hannah spent the next two weeks visiting her friends and saying her goodbyes. She did not know the next time she would be home. She cried very hard when she said goodbye to her best friend, Susan. Susan did not quite understand where Kenya was, but promised to visit Hannah. The next day, Hannah boarded a plane with her family. At first, they thought that Jackson could not come with them. However, Hannah worked hard and helped make sure that he had all of his shots so that he could come. After he had them all, the airport said it was OK for Jackson to come! The Harvey family left and off they went across the ocean to begin their new life in Kenya."
47031,Where did she live?,2,,mctest,3018q3zvoiqh6tkjkzarysii242ra6,"Hannah Harvey was a ten year old that had many friends in school. She lived in New York and enjoyed doing gymnastics and playing soccer. One day, Hannah came home from school and her parents greeted her. She knew that something was different by the expressions on their faces. Even Jackson, Hannah's dog, was acting different. Hannah asked why everyone was being so strange. Hannah's father, who was known as Pop, explained to Hannah that his job was forcing him to move. Hannah did not seem to think this was too big of a deal. Then, Hannah's mother explained that they were moving to Kenya. Kenya, she explained, was a place in Africa and life would be very different there. As Hannah began to cry thinking about all of her friends at home, Hannah's mother calmed her with a gentle touch. Jackson began howling as Hannah cried, but was also calmed by Hannah's mother. Hannah spent the next two weeks visiting her friends and saying her goodbyes. She did not know the next time she would be home. She cried very hard when she said goodbye to her best friend, Susan. Susan did not quite understand where Kenya was, but promised to visit Hannah. The next day, Hannah boarded a plane with her family. At first, they thought that Jackson could not come with them. However, Hannah worked hard and helped make sure that he had all of his shots so that he could come. After he had them all, the airport said it was OK for Jackson to come! The Harvey family left and off they went across the ocean to begin their new life in Kenya.",79,87,New York,New York,,"Who is Hannah?a ten year old;</s>Hannah Harvey was a ten year old that had many friends in school. She lived in New York and enjoyed doing gymnastics and playing soccer. One day, Hannah came home from school and her parents greeted her. She knew that something was different by the expressions on their faces. Even Jackson, Hannah's dog, was acting different. Hannah asked why everyone was being so strange. Hannah's father, who was known as Pop, explained to Hannah that his job was forcing him to move. Hannah did not seem to think this was too big of a deal. Then, Hannah's mother explained that they were moving to Kenya. Kenya, she explained, was a place in Africa and life would be very different there. As Hannah began to cry thinking about all of her friends at home, Hannah's mother calmed her with a gentle touch. Jackson began howling as Hannah cried, but was also calmed by Hannah's mother. Hannah spent the next two weeks visiting her friends and saying her goodbyes. She did not know the next time she would be home. She cried very hard when she said goodbye to her best friend, Susan. Susan did not quite understand where Kenya was, but promised to visit Hannah. The next day, Hannah boarded a plane with her family. At first, they thought that Jackson could not come with them. However, Hannah worked hard and helped make sure that he had all of his shots so that he could come. After he had them all, the airport said it was OK for Jackson to come! The Harvey family left and off they went across the ocean to begin their new life in Kenya."
47032,What did she do there?,3,,mctest,3018q3zvoiqh6tkjkzarysii242ra6,"Hannah Harvey was a ten year old that had many friends in school. She lived in New York and enjoyed doing gymnastics and playing soccer. One day, Hannah came home from school and her parents greeted her. She knew that something was different by the expressions on their faces. Even Jackson, Hannah's dog, was acting different. Hannah asked why everyone was being so strange. Hannah's father, who was known as Pop, explained to Hannah that his job was forcing him to move. Hannah did not seem to think this was too big of a deal. Then, Hannah's mother explained that they were moving to Kenya. Kenya, she explained, was a place in Africa and life would be very different there. As Hannah began to cry thinking about all of her friends at home, Hannah's mother calmed her with a gentle touch. Jackson began howling as Hannah cried, but was also calmed by Hannah's mother. Hannah spent the next two weeks visiting her friends and saying her goodbyes. She did not know the next time she would be home. She cried very hard when she said goodbye to her best friend, Susan. Susan did not quite understand where Kenya was, but promised to visit Hannah. The next day, Hannah boarded a plane with her family. At first, they thought that Jackson could not come with them. However, Hannah worked hard and helped make sure that he had all of his shots so that he could come. After he had them all, the airport said it was OK for Jackson to come! The Harvey family left and off they went across the ocean to begin their new life in Kenya.",106,135,gymnastics and playing soccer,gymnastics and soccer,,"Who is Hannah?a ten year old;Where did she live?New York;</s>Hannah Harvey was a ten year old that had many friends in school. She lived in New York and enjoyed doing gymnastics and playing soccer. One day, Hannah came home from school and her parents greeted her. She knew that something was different by the expressions on their faces. Even Jackson, Hannah's dog, was acting different. Hannah asked why everyone was being so strange. Hannah's father, who was known as Pop, explained to Hannah that his job was forcing him to move. Hannah did not seem to think this was too big of a deal. Then, Hannah's mother explained that they were moving to Kenya. Kenya, she explained, was a place in Africa and life would be very different there. As Hannah began to cry thinking about all of her friends at home, Hannah's mother calmed her with a gentle touch. Jackson began howling as Hannah cried, but was also calmed by Hannah's mother. Hannah spent the next two weeks visiting her friends and saying her goodbyes. She did not know the next time she would be home. She cried very hard when she said goodbye to her best friend, Susan. Susan did not quite understand where Kenya was, but promised to visit Hannah. The next day, Hannah boarded a plane with her family. At first, they thought that Jackson could not come with them. However, Hannah worked hard and helped make sure that he had all of his shots so that he could come. After he had them all, the airport said it was OK for Jackson to come! The Harvey family left and off they went across the ocean to begin their new life in Kenya."
47033,Did she have any pets?,4,,mctest,3018q3zvoiqh6tkjkzarysii242ra6,"Hannah Harvey was a ten year old that had many friends in school. She lived in New York and enjoyed doing gymnastics and playing soccer. One day, Hannah came home from school and her parents greeted her. She knew that something was different by the expressions on their faces. Even Jackson, Hannah's dog, was acting different. Hannah asked why everyone was being so strange. Hannah's father, who was known as Pop, explained to Hannah that his job was forcing him to move. Hannah did not seem to think this was too big of a deal. Then, Hannah's mother explained that they were moving to Kenya. Kenya, she explained, was a place in Africa and life would be very different there. As Hannah began to cry thinking about all of her friends at home, Hannah's mother calmed her with a gentle touch. Jackson began howling as Hannah cried, but was also calmed by Hannah's mother. Hannah spent the next two weeks visiting her friends and saying her goodbyes. She did not know the next time she would be home. She cried very hard when she said goodbye to her best friend, Susan. Susan did not quite understand where Kenya was, but promised to visit Hannah. The next day, Hannah boarded a plane with her family. At first, they thought that Jackson could not come with them. However, Hannah worked hard and helped make sure that he had all of his shots so that he could come. After he had them all, the airport said it was OK for Jackson to come! The Harvey family left and off they went across the ocean to begin their new life in Kenya.",282,303,"Jackson, Hannah's dog",a dog,,"Who is Hannah?a ten year old;Where did she live?New York;What did she do there?gymnastics and soccer;</s>Hannah Harvey was a ten year old that had many friends in school. She lived in New York and enjoyed doing gymnastics and playing soccer. One day, Hannah came home from school and her parents greeted her. She knew that something was different by the expressions on their faces. Even Jackson, Hannah's dog, was acting different. Hannah asked why everyone was being so strange. Hannah's father, who was known as Pop, explained to Hannah that his job was forcing him to move. Hannah did not seem to think this was too big of a deal. Then, Hannah's mother explained that they were moving to Kenya. Kenya, she explained, was a place in Africa and life would be very different there. As Hannah began to cry thinking about all of her friends at home, Hannah's mother calmed her with a gentle touch. Jackson began howling as Hannah cried, but was also calmed by Hannah's mother. Hannah spent the next two weeks visiting her friends and saying her goodbyes. She did not know the next time she would be home. She cried very hard when she said goodbye to her best friend, Susan. Susan did not quite understand where Kenya was, but promised to visit Hannah. The next day, Hannah boarded a plane with her family. At first, they thought that Jackson could not come with them. However, Hannah worked hard and helped make sure that he had all of his shots so that he could come. After he had them all, the airport said it was OK for Jackson to come! The Harvey family left and off they went across the ocean to begin their new life in Kenya."
47034,What did it do?,5,,mctest,3018q3zvoiqh6tkjkzarysii242ra6,"Hannah Harvey was a ten year old that had many friends in school. She lived in New York and enjoyed doing gymnastics and playing soccer. One day, Hannah came home from school and her parents greeted her. She knew that something was different by the expressions on their faces. Even Jackson, Hannah's dog, was acting different. Hannah asked why everyone was being so strange. Hannah's father, who was known as Pop, explained to Hannah that his job was forcing him to move. Hannah did not seem to think this was too big of a deal. Then, Hannah's mother explained that they were moving to Kenya. Kenya, she explained, was a place in Africa and life would be very different there. As Hannah began to cry thinking about all of her friends at home, Hannah's mother calmed her with a gentle touch. Jackson began howling as Hannah cried, but was also calmed by Hannah's mother. Hannah spent the next two weeks visiting her friends and saying her goodbyes. She did not know the next time she would be home. She cried very hard when she said goodbye to her best friend, Susan. Susan did not quite understand where Kenya was, but promised to visit Hannah. The next day, Hannah boarded a plane with her family. At first, they thought that Jackson could not come with them. However, Hannah worked hard and helped make sure that he had all of his shots so that he could come. After he had them all, the airport said it was OK for Jackson to come! The Harvey family left and off they went across the ocean to begin their new life in Kenya.",291,325,"Hannah's dog, was acting different",it was acting different,,"Who is Hannah?a ten year old;Where did she live?New York;What did she do there?gymnastics and soccer;Did she have any pets?a dog;</s>Hannah Harvey was a ten year old that had many friends in school. She lived in New York and enjoyed doing gymnastics and playing soccer. One day, Hannah came home from school and her parents greeted her. She knew that something was different by the expressions on their faces. Even Jackson, Hannah's dog, was acting different. Hannah asked why everyone was being so strange. Hannah's father, who was known as Pop, explained to Hannah that his job was forcing him to move. Hannah did not seem to think this was too big of a deal. Then, Hannah's mother explained that they were moving to Kenya. Kenya, she explained, was a place in Africa and life would be very different there. As Hannah began to cry thinking about all of her friends at home, Hannah's mother calmed her with a gentle touch. Jackson began howling as Hannah cried, but was also calmed by Hannah's mother. Hannah spent the next two weeks visiting her friends and saying her goodbyes. She did not know the next time she would be home. She cried very hard when she said goodbye to her best friend, Susan. Susan did not quite understand where Kenya was, but promised to visit Hannah. The next day, Hannah boarded a plane with her family. At first, they thought that Jackson could not come with them. However, Hannah worked hard and helped make sure that he had all of his shots so that he could come. After he had them all, the airport said it was OK for Jackson to come! The Harvey family left and off they went across the ocean to begin their new life in Kenya."


Validation set [(21452, 13)]
	Features: ['index', 'input_text_x', 'turn_id', 'bad_turn_x', 'source', 'id', 'story', 'span_start', 'span_end', 'span_text', 'input_text_y', 'bad_turn_y', 'history_context']


Unnamed: 0,index,input_text_x,turn_id,bad_turn_x,source,id,story,span_start,span_end,span_text,input_text_y,bad_turn_y,history_context
3212,17365,What sport are they playing?,1,,cnn,3018q3zvoiqh6tkjkzarysii34aary,"(CNN) -- Five-time winner Roger Federer opened his U.S. Open account Monday with a straight sets win over Santiago Giraldo in New York. Despite surrendering his serve three times, the 30-year-old Swiss enjoyed a relatively comfortable style=""display:inline"" match against the Colombian, ranked 54 in the world, winning 6-4 6-3 6-2 on the Arthur Ashe Stadium court. Leading 5-1 in the opening set, a number of uncharacteristic errors from Federer saw him squander a double-break advantage before he finally rallied to win 6-4. The second and third sets were more straight-forward, though the world number three will be concerned about his winners-to-unforced errors ratio -- he finished with 36 winners and 35 unforced errors. ""It was quite up and down, getting used to the conditions,"" admitted Federer, in quotes carried by usopen.org. ""I don't think I've ever played my best in the first round but it's important to come through them and come up with a good feeling."" Home favorite Mardy Fish was ruthlessly efficient as he easily dispatched Germany's Tobias Kamke 6-2 6-2 6-1. However fellow American Ryan Harrison was not so fortunate. The 19-year old lost out to big-serving Croat Marin Cilic, 6-2, 7-5, 7-6 (8/6). Seventh seed Gael Monfils ruined the U.S. Open debut of Grigor Dimitrov of Bulgaria with a battling 7-6, 6-3, 6-4 victory, while Czech Tomas Berdych, the number nine seed, beat French qualifier Romain Jouan 6-2, 7-6 (7/4), 6-1. Elsewhere, French 13th seed Richard Gasquet trounced Ukrainian Sergiy Stakhovsky 6-4 6-4 6-0, Serbian Janko Tipsarevic ousted France's Augustin Gensse 6-2 7-5 6-0, while Czech Radek Stepanek beat Germany's Philipp Kohlschreiber 6-4 6-1 6-3.",9,60,Five-time winner Roger Federer opened his U.S. Open,tennis,,"(CNN) -- Five-time winner Roger Federer opened his U.S. Open account Monday with a straight sets win over Santiago Giraldo in New York. Despite surrendering his serve three times, the 30-year-old Swiss enjoyed a relatively comfortable style=""display:inline"" match against the Colombian, ranked 54 in the world, winning 6-4 6-3 6-2 on the Arthur Ashe Stadium court. Leading 5-1 in the opening set, a number of uncharacteristic errors from Federer saw him squander a double-break advantage before he finally rallied to win 6-4. The second and third sets were more straight-forward, though the world number three will be concerned about his winners-to-unforced errors ratio -- he finished with 36 winners and 35 unforced errors. ""It was quite up and down, getting used to the conditions,"" admitted Federer, in quotes carried by usopen.org. ""I don't think I've ever played my best in the first round but it's important to come through them and come up with a good feeling."" Home favorite Mardy Fish was ruthlessly efficient as he easily dispatched Germany's Tobias Kamke 6-2 6-2 6-1. However fellow American Ryan Harrison was not so fortunate. The 19-year old lost out to big-serving Croat Marin Cilic, 6-2, 7-5, 7-6 (8/6). Seventh seed Gael Monfils ruined the U.S. Open debut of Grigor Dimitrov of Bulgaria with a battling 7-6, 6-3, 6-4 victory, while Czech Tomas Berdych, the number nine seed, beat French qualifier Romain Jouan 6-2, 7-6 (7/4), 6-1. Elsewhere, French 13th seed Richard Gasquet trounced Ukrainian Sergiy Stakhovsky 6-4 6-4 6-0, Serbian Janko Tipsarevic ousted France's Augustin Gensse 6-2 7-5 6-0, while Czech Radek Stepanek beat Germany's Philipp Kohlschreiber 6-4 6-1 6-3."
3213,17366,What event was it?,2,,cnn,3018q3zvoiqh6tkjkzarysii34aary,"(CNN) -- Five-time winner Roger Federer opened his U.S. Open account Monday with a straight sets win over Santiago Giraldo in New York. Despite surrendering his serve three times, the 30-year-old Swiss enjoyed a relatively comfortable style=""display:inline"" match against the Colombian, ranked 54 in the world, winning 6-4 6-3 6-2 on the Arthur Ashe Stadium court. Leading 5-1 in the opening set, a number of uncharacteristic errors from Federer saw him squander a double-break advantage before he finally rallied to win 6-4. The second and third sets were more straight-forward, though the world number three will be concerned about his winners-to-unforced errors ratio -- he finished with 36 winners and 35 unforced errors. ""It was quite up and down, getting used to the conditions,"" admitted Federer, in quotes carried by usopen.org. ""I don't think I've ever played my best in the first round but it's important to come through them and come up with a good feeling."" Home favorite Mardy Fish was ruthlessly efficient as he easily dispatched Germany's Tobias Kamke 6-2 6-2 6-1. However fellow American Ryan Harrison was not so fortunate. The 19-year old lost out to big-serving Croat Marin Cilic, 6-2, 7-5, 7-6 (8/6). Seventh seed Gael Monfils ruined the U.S. Open debut of Grigor Dimitrov of Bulgaria with a battling 7-6, 6-3, 6-4 victory, while Czech Tomas Berdych, the number nine seed, beat French qualifier Romain Jouan 6-2, 7-6 (7/4), 6-1. Elsewhere, French 13th seed Richard Gasquet trounced Ukrainian Sergiy Stakhovsky 6-4 6-4 6-0, Serbian Janko Tipsarevic ousted France's Augustin Gensse 6-2 7-5 6-0, while Czech Radek Stepanek beat Germany's Philipp Kohlschreiber 6-4 6-1 6-3.",9,60,Five-time winner Roger Federer opened his U.S. Open,the U.S. Open,,"What sport are they playing?tennis;</s>(CNN) -- Five-time winner Roger Federer opened his U.S. Open account Monday with a straight sets win over Santiago Giraldo in New York. Despite surrendering his serve three times, the 30-year-old Swiss enjoyed a relatively comfortable style=""display:inline"" match against the Colombian, ranked 54 in the world, winning 6-4 6-3 6-2 on the Arthur Ashe Stadium court. Leading 5-1 in the opening set, a number of uncharacteristic errors from Federer saw him squander a double-break advantage before he finally rallied to win 6-4. The second and third sets were more straight-forward, though the world number three will be concerned about his winners-to-unforced errors ratio -- he finished with 36 winners and 35 unforced errors. ""It was quite up and down, getting used to the conditions,"" admitted Federer, in quotes carried by usopen.org. ""I don't think I've ever played my best in the first round but it's important to come through them and come up with a good feeling."" Home favorite Mardy Fish was ruthlessly efficient as he easily dispatched Germany's Tobias Kamke 6-2 6-2 6-1. However fellow American Ryan Harrison was not so fortunate. The 19-year old lost out to big-serving Croat Marin Cilic, 6-2, 7-5, 7-6 (8/6). Seventh seed Gael Monfils ruined the U.S. Open debut of Grigor Dimitrov of Bulgaria with a battling 7-6, 6-3, 6-4 victory, while Czech Tomas Berdych, the number nine seed, beat French qualifier Romain Jouan 6-2, 7-6 (7/4), 6-1. Elsewhere, French 13th seed Richard Gasquet trounced Ukrainian Sergiy Stakhovsky 6-4 6-4 6-0, Serbian Janko Tipsarevic ousted France's Augustin Gensse 6-2 7-5 6-0, while Czech Radek Stepanek beat Germany's Philipp Kohlschreiber 6-4 6-1 6-3."
3214,17367,where?,3,,cnn,3018q3zvoiqh6tkjkzarysii34aary,"(CNN) -- Five-time winner Roger Federer opened his U.S. Open account Monday with a straight sets win over Santiago Giraldo in New York. Despite surrendering his serve three times, the 30-year-old Swiss enjoyed a relatively comfortable style=""display:inline"" match against the Colombian, ranked 54 in the world, winning 6-4 6-3 6-2 on the Arthur Ashe Stadium court. Leading 5-1 in the opening set, a number of uncharacteristic errors from Federer saw him squander a double-break advantage before he finally rallied to win 6-4. The second and third sets were more straight-forward, though the world number three will be concerned about his winners-to-unforced errors ratio -- he finished with 36 winners and 35 unforced errors. ""It was quite up and down, getting used to the conditions,"" admitted Federer, in quotes carried by usopen.org. ""I don't think I've ever played my best in the first round but it's important to come through them and come up with a good feeling."" Home favorite Mardy Fish was ruthlessly efficient as he easily dispatched Germany's Tobias Kamke 6-2 6-2 6-1. However fellow American Ryan Harrison was not so fortunate. The 19-year old lost out to big-serving Croat Marin Cilic, 6-2, 7-5, 7-6 (8/6). Seventh seed Gael Monfils ruined the U.S. Open debut of Grigor Dimitrov of Bulgaria with a battling 7-6, 6-3, 6-4 victory, while Czech Tomas Berdych, the number nine seed, beat French qualifier Romain Jouan 6-2, 7-6 (7/4), 6-1. Elsewhere, French 13th seed Richard Gasquet trounced Ukrainian Sergiy Stakhovsky 6-4 6-4 6-0, Serbian Janko Tipsarevic ousted France's Augustin Gensse 6-2 7-5 6-0, while Czech Radek Stepanek beat Germany's Philipp Kohlschreiber 6-4 6-1 6-3.",97,137,win over Santiago Giraldo in New York.,in New York.,,"What sport are they playing?tennis;What event was it?the U.S. Open;</s>(CNN) -- Five-time winner Roger Federer opened his U.S. Open account Monday with a straight sets win over Santiago Giraldo in New York. Despite surrendering his serve three times, the 30-year-old Swiss enjoyed a relatively comfortable style=""display:inline"" match against the Colombian, ranked 54 in the world, winning 6-4 6-3 6-2 on the Arthur Ashe Stadium court. Leading 5-1 in the opening set, a number of uncharacteristic errors from Federer saw him squander a double-break advantage before he finally rallied to win 6-4. The second and third sets were more straight-forward, though the world number three will be concerned about his winners-to-unforced errors ratio -- he finished with 36 winners and 35 unforced errors. ""It was quite up and down, getting used to the conditions,"" admitted Federer, in quotes carried by usopen.org. ""I don't think I've ever played my best in the first round but it's important to come through them and come up with a good feeling."" Home favorite Mardy Fish was ruthlessly efficient as he easily dispatched Germany's Tobias Kamke 6-2 6-2 6-1. However fellow American Ryan Harrison was not so fortunate. The 19-year old lost out to big-serving Croat Marin Cilic, 6-2, 7-5, 7-6 (8/6). Seventh seed Gael Monfils ruined the U.S. Open debut of Grigor Dimitrov of Bulgaria with a battling 7-6, 6-3, 6-4 victory, while Czech Tomas Berdych, the number nine seed, beat French qualifier Romain Jouan 6-2, 7-6 (7/4), 6-1. Elsewhere, French 13th seed Richard Gasquet trounced Ukrainian Sergiy Stakhovsky 6-4 6-4 6-0, Serbian Janko Tipsarevic ousted France's Augustin Gensse 6-2 7-5 6-0, while Czech Radek Stepanek beat Germany's Philipp Kohlschreiber 6-4 6-1 6-3."
3215,17368,Who is the five time winner mentioned?,4,,cnn,3018q3zvoiqh6tkjkzarysii34aary,"(CNN) -- Five-time winner Roger Federer opened his U.S. Open account Monday with a straight sets win over Santiago Giraldo in New York. Despite surrendering his serve three times, the 30-year-old Swiss enjoyed a relatively comfortable style=""display:inline"" match against the Colombian, ranked 54 in the world, winning 6-4 6-3 6-2 on the Arthur Ashe Stadium court. Leading 5-1 in the opening set, a number of uncharacteristic errors from Federer saw him squander a double-break advantage before he finally rallied to win 6-4. The second and third sets were more straight-forward, though the world number three will be concerned about his winners-to-unforced errors ratio -- he finished with 36 winners and 35 unforced errors. ""It was quite up and down, getting used to the conditions,"" admitted Federer, in quotes carried by usopen.org. ""I don't think I've ever played my best in the first round but it's important to come through them and come up with a good feeling."" Home favorite Mardy Fish was ruthlessly efficient as he easily dispatched Germany's Tobias Kamke 6-2 6-2 6-1. However fellow American Ryan Harrison was not so fortunate. The 19-year old lost out to big-serving Croat Marin Cilic, 6-2, 7-5, 7-6 (8/6). Seventh seed Gael Monfils ruined the U.S. Open debut of Grigor Dimitrov of Bulgaria with a battling 7-6, 6-3, 6-4 victory, while Czech Tomas Berdych, the number nine seed, beat French qualifier Romain Jouan 6-2, 7-6 (7/4), 6-1. Elsewhere, French 13th seed Richard Gasquet trounced Ukrainian Sergiy Stakhovsky 6-4 6-4 6-0, Serbian Janko Tipsarevic ousted France's Augustin Gensse 6-2 7-5 6-0, while Czech Radek Stepanek beat Germany's Philipp Kohlschreiber 6-4 6-1 6-3.",8,40,Five-time winner Roger Federer,Roger Federer,,"What sport are they playing?tennis;What event was it?the U.S. Open;where?in New York.;</s>(CNN) -- Five-time winner Roger Federer opened his U.S. Open account Monday with a straight sets win over Santiago Giraldo in New York. Despite surrendering his serve three times, the 30-year-old Swiss enjoyed a relatively comfortable style=""display:inline"" match against the Colombian, ranked 54 in the world, winning 6-4 6-3 6-2 on the Arthur Ashe Stadium court. Leading 5-1 in the opening set, a number of uncharacteristic errors from Federer saw him squander a double-break advantage before he finally rallied to win 6-4. The second and third sets were more straight-forward, though the world number three will be concerned about his winners-to-unforced errors ratio -- he finished with 36 winners and 35 unforced errors. ""It was quite up and down, getting used to the conditions,"" admitted Federer, in quotes carried by usopen.org. ""I don't think I've ever played my best in the first round but it's important to come through them and come up with a good feeling."" Home favorite Mardy Fish was ruthlessly efficient as he easily dispatched Germany's Tobias Kamke 6-2 6-2 6-1. However fellow American Ryan Harrison was not so fortunate. The 19-year old lost out to big-serving Croat Marin Cilic, 6-2, 7-5, 7-6 (8/6). Seventh seed Gael Monfils ruined the U.S. Open debut of Grigor Dimitrov of Bulgaria with a battling 7-6, 6-3, 6-4 victory, while Czech Tomas Berdych, the number nine seed, beat French qualifier Romain Jouan 6-2, 7-6 (7/4), 6-1. Elsewhere, French 13th seed Richard Gasquet trounced Ukrainian Sergiy Stakhovsky 6-4 6-4 6-0, Serbian Janko Tipsarevic ousted France's Augustin Gensse 6-2 7-5 6-0, while Czech Radek Stepanek beat Germany's Philipp Kohlschreiber 6-4 6-1 6-3."
3216,17369,Who does he defeat?,5,,cnn,3018q3zvoiqh6tkjkzarysii34aary,"(CNN) -- Five-time winner Roger Federer opened his U.S. Open account Monday with a straight sets win over Santiago Giraldo in New York. Despite surrendering his serve three times, the 30-year-old Swiss enjoyed a relatively comfortable style=""display:inline"" match against the Colombian, ranked 54 in the world, winning 6-4 6-3 6-2 on the Arthur Ashe Stadium court. Leading 5-1 in the opening set, a number of uncharacteristic errors from Federer saw him squander a double-break advantage before he finally rallied to win 6-4. The second and third sets were more straight-forward, though the world number three will be concerned about his winners-to-unforced errors ratio -- he finished with 36 winners and 35 unforced errors. ""It was quite up and down, getting used to the conditions,"" admitted Federer, in quotes carried by usopen.org. ""I don't think I've ever played my best in the first round but it's important to come through them and come up with a good feeling."" Home favorite Mardy Fish was ruthlessly efficient as he easily dispatched Germany's Tobias Kamke 6-2 6-2 6-1. However fellow American Ryan Harrison was not so fortunate. The 19-year old lost out to big-serving Croat Marin Cilic, 6-2, 7-5, 7-6 (8/6). Seventh seed Gael Monfils ruined the U.S. Open debut of Grigor Dimitrov of Bulgaria with a battling 7-6, 6-3, 6-4 victory, while Czech Tomas Berdych, the number nine seed, beat French qualifier Romain Jouan 6-2, 7-6 (7/4), 6-1. Elsewhere, French 13th seed Richard Gasquet trounced Ukrainian Sergiy Stakhovsky 6-4 6-4 6-0, Serbian Janko Tipsarevic ousted France's Augustin Gensse 6-2 7-5 6-0, while Czech Radek Stepanek beat Germany's Philipp Kohlschreiber 6-4 6-1 6-3.",96,137,win over Santiago Giraldo in New York.,Santiago Giraldo,,"What sport are they playing?tennis;What event was it?the U.S. Open;where?in New York.;Who is the five time winner mentioned?Roger Federer;</s>(CNN) -- Five-time winner Roger Federer opened his U.S. Open account Monday with a straight sets win over Santiago Giraldo in New York. Despite surrendering his serve three times, the 30-year-old Swiss enjoyed a relatively comfortable style=""display:inline"" match against the Colombian, ranked 54 in the world, winning 6-4 6-3 6-2 on the Arthur Ashe Stadium court. Leading 5-1 in the opening set, a number of uncharacteristic errors from Federer saw him squander a double-break advantage before he finally rallied to win 6-4. The second and third sets were more straight-forward, though the world number three will be concerned about his winners-to-unforced errors ratio -- he finished with 36 winners and 35 unforced errors. ""It was quite up and down, getting used to the conditions,"" admitted Federer, in quotes carried by usopen.org. ""I don't think I've ever played my best in the first round but it's important to come through them and come up with a good feeling."" Home favorite Mardy Fish was ruthlessly efficient as he easily dispatched Germany's Tobias Kamke 6-2 6-2 6-1. However fellow American Ryan Harrison was not so fortunate. The 19-year old lost out to big-serving Croat Marin Cilic, 6-2, 7-5, 7-6 (8/6). Seventh seed Gael Monfils ruined the U.S. Open debut of Grigor Dimitrov of Bulgaria with a battling 7-6, 6-3, 6-4 victory, while Czech Tomas Berdych, the number nine seed, beat French qualifier Romain Jouan 6-2, 7-6 (7/4), 6-1. Elsewhere, French 13th seed Richard Gasquet trounced Ukrainian Sergiy Stakhovsky 6-4 6-4 6-0, Serbian Janko Tipsarevic ousted France's Augustin Gensse 6-2 7-5 6-0, while Czech Radek Stepanek beat Germany's Philipp Kohlschreiber 6-4 6-1 6-3."


Test set [(7917, 10)]
	Features: ['input_text_x', 'turn_id', 'source', 'id', 'story', 'span_start', 'span_end', 'span_text', 'input_text_y', 'history_context']


Unnamed: 0,input_text_x,turn_id,source,id,story,span_start,span_end,span_text,input_text_y,history_context
5257,What's Rubio going to decide in the next few weeks?,1,cnn,3018q3zvoiqh6tkjkzarysii2vbarg,"(CNN)Sen. Marco Rubio said he'll make a decision on running for president in the next few weeks and feels confident that he could obtain the resources to ""credibly run a campaign and win,"" despite an increasingly crowded GOP field taking shape. In a wide-ranging interview with CNN's Wolf Blitzer, the Florida Republican also gave his thoughts on Republican efforts to defund the president's executive action on immigration, and further explained why he won't support the administration's new Cuba policy. 2016 Rubio, who released a new book ""American Dreams"" on Tuesday, said he's still deciding whether he thinks he can be more effective as president or as a senator under the new majority. He has already said he won't run for both offices in 2016. Marco Rubio: Radicalized individuals 'very real threat' to the West With power players Mitt Romney and Jeb Bush now considered likely contenders, Rubio said ""they're both credible and well-funded"" candidates but argued there would still be room for his campaign if he decides to plow forward. ""I'm confident that if we decide to run for president ... we will have the funding and the resources necessary to credibly run a campaign and win,"" he said. ""But I understand that the longer you wait, the harder it becomes to do that,"" he added. Romney tells donors he's considering 2016 bid Bush and Romney have already been in active in talking with big-money supporters and securing financial resources, making it strategically more difficult for other potential contenders like Rubio to lock down support from the GOP's donor class.",49,74,on running for president,running for president,"(CNN)Sen. Marco Rubio said he'll make a decision on running for president in the next few weeks and feels confident that he could obtain the resources to ""credibly run a campaign and win,"" despite an increasingly crowded GOP field taking shape. In a wide-ranging interview with CNN's Wolf Blitzer, the Florida Republican also gave his thoughts on Republican efforts to defund the president's executive action on immigration, and further explained why he won't support the administration's new Cuba policy. 2016 Rubio, who released a new book ""American Dreams"" on Tuesday, said he's still deciding whether he thinks he can be more effective as president or as a senator under the new majority. He has already said he won't run for both offices in 2016. Marco Rubio: Radicalized individuals 'very real threat' to the West With power players Mitt Romney and Jeb Bush now considered likely contenders, Rubio said ""they're both credible and well-funded"" candidates but argued there would still be room for his campaign if he decides to plow forward. ""I'm confident that if we decide to run for president ... we will have the funding and the resources necessary to credibly run a campaign and win,"" he said. ""But I understand that the longer you wait, the harder it becomes to do that,"" he added. Romney tells donors he's considering 2016 bid Bush and Romney have already been in active in talking with big-money supporters and securing financial resources, making it strategically more difficult for other potential contenders like Rubio to lock down support from the GOP's donor class."
5258,Does he feel confident about it?,2,cnn,3018q3zvoiqh6tkjkzarysii2vbarg,"(CNN)Sen. Marco Rubio said he'll make a decision on running for president in the next few weeks and feels confident that he could obtain the resources to ""credibly run a campaign and win,"" despite an increasingly crowded GOP field taking shape. In a wide-ranging interview with CNN's Wolf Blitzer, the Florida Republican also gave his thoughts on Republican efforts to defund the president's executive action on immigration, and further explained why he won't support the administration's new Cuba policy. 2016 Rubio, who released a new book ""American Dreams"" on Tuesday, said he's still deciding whether he thinks he can be more effective as president or as a senator under the new majority. He has already said he won't run for both offices in 2016. Marco Rubio: Radicalized individuals 'very real threat' to the West With power players Mitt Romney and Jeb Bush now considered likely contenders, Rubio said ""they're both credible and well-funded"" candidates but argued there would still be room for his campaign if he decides to plow forward. ""I'm confident that if we decide to run for president ... we will have the funding and the resources necessary to credibly run a campaign and win,"" he said. ""But I understand that the longer you wait, the harder it becomes to do that,"" he added. Romney tells donors he's considering 2016 bid Bush and Romney have already been in active in talking with big-money supporters and securing financial resources, making it strategically more difficult for other potential contenders like Rubio to lock down support from the GOP's donor class.",10,129,Marco Rubio said he'll make a decision on running for president in the next few weeks and feels confident that he could,yes,"What's Rubio going to decide in the next few weeks?running for president;</s>(CNN)Sen. Marco Rubio said he'll make a decision on running for president in the next few weeks and feels confident that he could obtain the resources to ""credibly run a campaign and win,"" despite an increasingly crowded GOP field taking shape. In a wide-ranging interview with CNN's Wolf Blitzer, the Florida Republican also gave his thoughts on Republican efforts to defund the president's executive action on immigration, and further explained why he won't support the administration's new Cuba policy. 2016 Rubio, who released a new book ""American Dreams"" on Tuesday, said he's still deciding whether he thinks he can be more effective as president or as a senator under the new majority. He has already said he won't run for both offices in 2016. Marco Rubio: Radicalized individuals 'very real threat' to the West With power players Mitt Romney and Jeb Bush now considered likely contenders, Rubio said ""they're both credible and well-funded"" candidates but argued there would still be room for his campaign if he decides to plow forward. ""I'm confident that if we decide to run for president ... we will have the funding and the resources necessary to credibly run a campaign and win,"" he said. ""But I understand that the longer you wait, the harder it becomes to do that,"" he added. Romney tells donors he's considering 2016 bid Bush and Romney have already been in active in talking with big-money supporters and securing financial resources, making it strategically more difficult for other potential contenders like Rubio to lock down support from the GOP's donor class."
5259,What policy is he not in favor of?,3,cnn,3018q3zvoiqh6tkjkzarysii2vbarg,"(CNN)Sen. Marco Rubio said he'll make a decision on running for president in the next few weeks and feels confident that he could obtain the resources to ""credibly run a campaign and win,"" despite an increasingly crowded GOP field taking shape. In a wide-ranging interview with CNN's Wolf Blitzer, the Florida Republican also gave his thoughts on Republican efforts to defund the president's executive action on immigration, and further explained why he won't support the administration's new Cuba policy. 2016 Rubio, who released a new book ""American Dreams"" on Tuesday, said he's still deciding whether he thinks he can be more effective as president or as a senator under the new majority. He has already said he won't run for both offices in 2016. Marco Rubio: Radicalized individuals 'very real threat' to the West With power players Mitt Romney and Jeb Bush now considered likely contenders, Rubio said ""they're both credible and well-funded"" candidates but argued there would still be room for his campaign if he decides to plow forward. ""I'm confident that if we decide to run for president ... we will have the funding and the resources necessary to credibly run a campaign and win,"" he said. ""But I understand that the longer you wait, the harder it becomes to do that,"" he added. Romney tells donors he's considering 2016 bid Bush and Romney have already been in active in talking with big-money supporters and securing financial resources, making it strategically more difficult for other potential contenders like Rubio to lock down support from the GOP's donor class.",371,425,defund the president's executive action on immigration,immigration,"What's Rubio going to decide in the next few weeks?running for president;Does he feel confident about it?yes;</s>(CNN)Sen. Marco Rubio said he'll make a decision on running for president in the next few weeks and feels confident that he could obtain the resources to ""credibly run a campaign and win,"" despite an increasingly crowded GOP field taking shape. In a wide-ranging interview with CNN's Wolf Blitzer, the Florida Republican also gave his thoughts on Republican efforts to defund the president's executive action on immigration, and further explained why he won't support the administration's new Cuba policy. 2016 Rubio, who released a new book ""American Dreams"" on Tuesday, said he's still deciding whether he thinks he can be more effective as president or as a senator under the new majority. He has already said he won't run for both offices in 2016. Marco Rubio: Radicalized individuals 'very real threat' to the West With power players Mitt Romney and Jeb Bush now considered likely contenders, Rubio said ""they're both credible and well-funded"" candidates but argued there would still be room for his campaign if he decides to plow forward. ""I'm confident that if we decide to run for president ... we will have the funding and the resources necessary to credibly run a campaign and win,"" he said. ""But I understand that the longer you wait, the harder it becomes to do that,"" he added. Romney tells donors he's considering 2016 bid Bush and Romney have already been in active in talking with big-money supporters and securing financial resources, making it strategically more difficult for other potential contenders like Rubio to lock down support from the GOP's donor class."
5260,What type of people does he think are dangerous to the West?,4,cnn,3018q3zvoiqh6tkjkzarysii2vbarg,"(CNN)Sen. Marco Rubio said he'll make a decision on running for president in the next few weeks and feels confident that he could obtain the resources to ""credibly run a campaign and win,"" despite an increasingly crowded GOP field taking shape. In a wide-ranging interview with CNN's Wolf Blitzer, the Florida Republican also gave his thoughts on Republican efforts to defund the president's executive action on immigration, and further explained why he won't support the administration's new Cuba policy. 2016 Rubio, who released a new book ""American Dreams"" on Tuesday, said he's still deciding whether he thinks he can be more effective as president or as a senator under the new majority. He has already said he won't run for both offices in 2016. Marco Rubio: Radicalized individuals 'very real threat' to the West With power players Mitt Romney and Jeb Bush now considered likely contenders, Rubio said ""they're both credible and well-funded"" candidates but argued there would still be room for his campaign if he decides to plow forward. ""I'm confident that if we decide to run for president ... we will have the funding and the resources necessary to credibly run a campaign and win,"" he said. ""But I understand that the longer you wait, the harder it becomes to do that,"" he added. Romney tells donors he's considering 2016 bid Bush and Romney have already been in active in talking with big-money supporters and securing financial resources, making it strategically more difficult for other potential contenders like Rubio to lock down support from the GOP's donor class.",773,828,Radicalized individuals 'very real threat' to the West,Radicalized individuals,"What's Rubio going to decide in the next few weeks?running for president;Does he feel confident about it?yes;What policy is he not in favor of?immigration;</s>(CNN)Sen. Marco Rubio said he'll make a decision on running for president in the next few weeks and feels confident that he could obtain the resources to ""credibly run a campaign and win,"" despite an increasingly crowded GOP field taking shape. In a wide-ranging interview with CNN's Wolf Blitzer, the Florida Republican also gave his thoughts on Republican efforts to defund the president's executive action on immigration, and further explained why he won't support the administration's new Cuba policy. 2016 Rubio, who released a new book ""American Dreams"" on Tuesday, said he's still deciding whether he thinks he can be more effective as president or as a senator under the new majority. He has already said he won't run for both offices in 2016. Marco Rubio: Radicalized individuals 'very real threat' to the West With power players Mitt Romney and Jeb Bush now considered likely contenders, Rubio said ""they're both credible and well-funded"" candidates but argued there would still be room for his campaign if he decides to plow forward. ""I'm confident that if we decide to run for president ... we will have the funding and the resources necessary to credibly run a campaign and win,"" he said. ""But I understand that the longer you wait, the harder it becomes to do that,"" he added. Romney tells donors he's considering 2016 bid Bush and Romney have already been in active in talking with big-money supporters and securing financial resources, making it strategically more difficult for other potential contenders like Rubio to lock down support from the GOP's donor class."
5261,Who are some likely competition to him?,5,cnn,3018q3zvoiqh6tkjkzarysii2vbarg,"(CNN)Sen. Marco Rubio said he'll make a decision on running for president in the next few weeks and feels confident that he could obtain the resources to ""credibly run a campaign and win,"" despite an increasingly crowded GOP field taking shape. In a wide-ranging interview with CNN's Wolf Blitzer, the Florida Republican also gave his thoughts on Republican efforts to defund the president's executive action on immigration, and further explained why he won't support the administration's new Cuba policy. 2016 Rubio, who released a new book ""American Dreams"" on Tuesday, said he's still deciding whether he thinks he can be more effective as president or as a senator under the new majority. He has already said he won't run for both offices in 2016. Marco Rubio: Radicalized individuals 'very real threat' to the West With power players Mitt Romney and Jeb Bush now considered likely contenders, Rubio said ""they're both credible and well-funded"" candidates but argued there would still be room for his campaign if he decides to plow forward. ""I'm confident that if we decide to run for president ... we will have the funding and the resources necessary to credibly run a campaign and win,"" he said. ""But I understand that the longer you wait, the harder it becomes to do that,"" he added. Romney tells donors he's considering 2016 bid Bush and Romney have already been in active in talking with big-money supporters and securing financial resources, making it strategically more difficult for other potential contenders like Rubio to lock down support from the GOP's donor class.",830,906,With power players Mitt Romney and Jeb Bush now considered likely contenders,Mitt Romney and Jeb Bush,"What's Rubio going to decide in the next few weeks?running for president;Does he feel confident about it?yes;What policy is he not in favor of?immigration;What type of people does he think are dangerous to the West?Radicalized individuals;</s>(CNN)Sen. Marco Rubio said he'll make a decision on running for president in the next few weeks and feels confident that he could obtain the resources to ""credibly run a campaign and win,"" despite an increasingly crowded GOP field taking shape. In a wide-ranging interview with CNN's Wolf Blitzer, the Florida Republican also gave his thoughts on Republican efforts to defund the president's executive action on immigration, and further explained why he won't support the administration's new Cuba policy. 2016 Rubio, who released a new book ""American Dreams"" on Tuesday, said he's still deciding whether he thinks he can be more effective as president or as a senator under the new majority. He has already said he won't run for both offices in 2016. Marco Rubio: Radicalized individuals 'very real threat' to the West With power players Mitt Romney and Jeb Bush now considered likely contenders, Rubio said ""they're both credible and well-funded"" candidates but argued there would still be room for his campaign if he decides to plow forward. ""I'm confident that if we decide to run for president ... we will have the funding and the resources necessary to credibly run a campaign and win,"" he said. ""But I understand that the longer you wait, the harder it becomes to do that,"" he added. Romney tells donors he's considering 2016 bid Bush and Romney have already been in active in talking with big-money supporters and securing financial resources, making it strategically more difficult for other potential contenders like Rubio to lock down support from the GOP's donor class."


Now we check if there is any overlapping dialogue between train and validation set.

In [13]:
set_train = set(train_df['id'])
set_val = set(val_df['id'])

overlap = False
for i in set_train:
    if i in set_val:
        overlap = True
        break

print('Overlap' if overlap else 'No overlap')

No overlap


In [14]:
# Relevant features for the dataset
featurez = ['story', 'input_text_x', 'span_text', 'input_text_y', 'history_context','source']

# Dataframes to Datasets
train_df_to_ds = train_df[featurez]
val_df_to_ds = val_df[featurez]
test_df_to_ds = test_df[featurez]

train_df_to_ds = train_df_to_ds.rename(columns={'input_text_x': 'question', 'story': 'context',\
                                               'input_text_y': 'answer', 'span_text': 'text'})
val_df_to_ds = val_df_to_ds.rename(columns={'input_text_x': 'question', 'story': 'context',\
                                               'input_text_y': 'answer', 'span_text': 'text'})
test_df_to_ds = test_df_to_ds.rename(columns={'input_text_x': 'question', 'story': 'context',\
                                               'input_text_y': 'answer', 'span_text': 'text'})

Now, since the dataset is huge and we are more focused on the reasoning on our choices rather than obtaining the best results, we are going to extract a portion of it.
The next step is gonna be the truncation of the inputs lengths. The pre-trained models that are gonna be tested can process lengths up to 512, that is why our truncation will be at least equal to this value; moreover, we are going to sort the datasets according to the sum of the lengths of the 'context' and 'question' fields together, expecting to truncate the least possible number of examples.

In [15]:
model_checkpoint_M1 = 'distilroberta-base'
# Tokenizer
tokenizer_M1 = AutoTokenizer.from_pretrained(model_checkpoint_M1)
assert isinstance(tokenizer_M1, PreTrainedTokenizerFast)

# Setting the BOS and EOS token
tokenizer_M1.bos_token = tokenizer_M1.cls_token
tokenizer_M1.eos_token = tokenizer_M1.sep_token
check_tokens(tokenizer_M1)

Special tokens:
bos_token: <s>
eos_token: </s>
unk_token: <unk>
sep_token: </s>
pad_token: <pad>
cls_token: <s>
mask_token: <mask>
bos_token: 0
eos_token: 2
unk_token: 3
sep_token: 2
pad_token: 1
cls_token: 0
mask_token: 50264


In [27]:
def tokenize_string(string_1, string_2=None):
    if string_2 is not None:
        tokens = tokenizer_M1(string_1,string_2)
    else:
        tokens = tokenizer_M1(string_1)
    return tokens

story_lengths = [len(tokenizer_M1(x,y)["input_ids"]) for (x,y) in zip(train_val_df['story'],\
                                                               train_val_df['input_text_x'])]
fig_inputs = px.box(list(story_lengths),
                   title="Tokenized Stories and Questions Lengths Distribution")
fig_inputs.show()

Token indices sequence length is longer than the specified maximum sequence length for this model (715 > 512). Running this sequence through the model will result in indexing errors


In [16]:
max_length_input = 512 #min(512,round(np.quantile(list(set(inputs_lengths)), .05))) 
print(f'Max length:{max_length_input}')

Max length:512


In [29]:
answers_lengths = [len(tokenizer_M1(x)["input_ids"]) for x in train_val_df['input_text_y']]

fig_inputs = px.box(list(answers_lengths),
                    title="Tokenized Answers Lengths Distribution", 
                    color_discrete_sequence=['red'])
fig_inputs.show()

In [17]:
max_length_answer = 13
# max_length_answer = round(np.quantile(list(set(outputs_lengths)), .75))
print(f'Max length (upper fence): {max_length_answer}')

Max length (upper fence): 13


In [31]:
import plotly.graph_objs as go
from plotly.subplots import make_subplots

def plot_sources(train_df, val_df):
    sources = train_df["source"].unique()

    fig = make_subplots(rows=1, cols=2, subplot_titles=('Sources of stories in train set', 'Sources of stories in validation set'))

    fig.add_trace(go.Bar(x=train_df["source"].value_counts().index.tolist(), 
                         y=train_df["source"].value_counts().values.tolist(),
                         name='Train Data',
                         marker=dict(color='#1f77b4')), 
                  row=1, col=1)

    fig.add_trace(go.Bar(x=val_df["source"].value_counts().index.tolist(), 
                         y=val_df["source"].value_counts().values.tolist(),
                         name='Validation Data',
                         marker=dict(color='#ff7f0e')), 
                  row=1, col=2)

    fig.update_layout(height=500, width=1300, showlegend=True, 
                      legend=dict(x=0.8, y=1.15, orientation="h"))

    fig.show()

plot_sources(train_df, val_df)

In [18]:
# Datasets Batch split
batch_size = 8
ratio = 5

train_samples = (round(train_df_to_ds.shape[0] * ratio / 100) // batch_size) * batch_size

val_samples = (round(val_df_to_ds.shape[0] * ratio / 100) // batch_size) * batch_size
test_samples = (round(test_df_to_ds.shape[0] * ratio / 100) // batch_size) * batch_size

train_dataset = Dataset.from_dict(train_df_to_ds.iloc[:train_samples])
val_dataset = Dataset.from_dict(val_df_to_ds.iloc[:val_samples])
test_dataset = Dataset.from_dict(test_df_to_ds.iloc[:test_samples])

dataset_COQA = DatasetDict({'train':train_dataset,'validation':val_dataset,'test':test_dataset})
print(dataset_COQA)

DatasetDict({
    train: Dataset({
        features: ['context', 'question', 'text', 'answer', 'history_context', 'source'],
        num_rows: 4288
    })
    validation: Dataset({
        features: ['context', 'question', 'text', 'answer', 'history_context', 'source'],
        num_rows: 1072
    })
    test: Dataset({
        features: ['context', 'question', 'text', 'answer', 'history_context', 'source'],
        num_rows: 392
    })
})


In [19]:
def prepare_features(batch, tokenizer, max_length_input, max_length_answer,history=False,test=False):
    
    if not test:
        truncated_contexts = []
        for context, span, answ in zip(batch['context'], batch['text'], batch['answer']):
            max_length_context = max_length_input - len(answ)
            # Compute the start and end indices of the substring in the string
            start = context.find(span)
            end = start + len(span)

            # Compute the start and end indices of the truncated string
            start = max(0, start - (max_length_context - len(span)) // 2)
            end = min(len(context), end + (max_length_context - len(span)) // 2)

            # Truncate the string to a maximum length of 512, centered around the substring
            truncated_context = context[start:end]
            truncated_contexts.append(truncated_context)
        
    # Tokenize the Question and Context columns
    if history:
        encoded_batch_inputs = tokenizer(
            batch['question'],
            batch['history_context'],
            max_length=max_length_input,
            truncation='only_second',
            padding='max_length',
            return_tensors='pt'        
        )
    else:
        encoded_batch_inputs = tokenizer(
            batch['question'],
            batch['context'],
            max_length=max_length_input,
            truncation='only_second',
            padding='max_length',
            return_tensors='pt'        
        )

    # Tokenize the Answer column
    encoded_batch_labels = tokenizer(
        batch['answer'],
        max_length=max_length_answer,
        padding='max_length',
        truncation=True,
        return_tensors='pt'
    )
    
    encoded_batch_inputs['labels'] = encoded_batch_labels.input_ids

    return encoded_batch_inputs

* [M1] DistilRoBERTa (distilberta-base)

In [20]:
# Tokenizing the Dataset
tokenized_datasets_M1 = DatasetDict()

# Use the `prepare_features` functions
tokenized_datasets_M1['train'] = dataset_COQA['train'].map(
    lambda batch: prepare_features(batch, tokenizer_M1, max_length_input, max_length_answer),
    batched=True,
    batch_size=batch_size,
    remove_columns=dataset_COQA['train'].column_names 
    #remove_columns=[x for x in dataset_COQA['train'].column_names if x != 'source']
)

# Use the `prepare_features` functions
tokenized_datasets_M1['validation'] = dataset_COQA['validation'].map(
    lambda batch: prepare_features(batch, tokenizer_M1, max_length_input, max_length_answer),
    batched=True,
    batch_size=batch_size,
    remove_columns=dataset_COQA['validation'].column_names 
    #remove_columns=[x for x in dataset_COQA['train'].column_names if x != 'source']
)

# Use the `prepare_features` functions
tokenized_datasets_M1['test'] = dataset_COQA['test'].map(
    lambda batch: prepare_features(batch, tokenizer_M1, max_length_input, max_length_answer, test=True),
    batched=True,
    batch_size=batch_size,
    remove_columns=dataset_COQA['test'].column_names 
    #remove_columns=[x for x in dataset_COQA['train'].column_names if x != 'source']
)

print(tokenized_datasets_M1)

  0%|          | 0/536 [00:00<?, ?ba/s]

  0%|          | 0/134 [00:00<?, ?ba/s]

  0%|          | 0/49 [00:00<?, ?ba/s]

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 4288
    })
    validation: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 1072
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 392
    })
})


## [Task 3] Model definition

Write your own script to define the following transformer-based models from [huggingface](https://HuggingFace.co/).

* [M1] DistilRoBERTa (distilberta-base)
* [M2] BERTTiny (bert-tiny)

**Note**: Remember to install the ```transformers``` python package!

**Note**: We consider small transformer models for computational reasons!

When loading a pre-trained model into a target model, any layers present in the pre-trained model but not in the target model will be discarded. Conversely, any layers present in the target model but not in the pre-trained model will be initialized according to the initialization strategy of the target model.

This behavior is expected when using pre-trained models, and is due to the fact that the architecture of the target model may differ from that of the pre-trained model. It is important to note that this discrepancy in architecture does not necessarily imply that the target model will perform poorly out of the box. However, it is generally necessary to fine-tune the target model on a downstream task in order to achieve good performance.

* [M1] DistilRoBERTa (distilberta-base)

In [23]:
# Load Model
if load_model:
    model_checkpoint_M1 = 'model_M1_42'
    model_M1 = EncoderDecoderModel.from_pretrained(model_checkpoint_M1)
else:
    model_checkpoint_M1 = 'distilroberta-base'
    model_M1 = EncoderDecoderModel.from_encoder_decoder_pretrained(model_checkpoint_M1, model_checkpoint_M1, tie_encoder_decoder=False)

# Model special tokens
model_M1.config.decoder_start_token_id = tokenizer_M1.cls_token_id
model_M1.config_eos_token_id = tokenizer_M1.sep_token_id
model_M1.config.pad_token_id = tokenizer_M1.pad_token_id
model_M1.config.vocab_size = model_M1.config.encoder.vocab_size

# Model hyperparams
model_M1.config.max_length = max_length_answer
model_M1.config.min_length = 1
model_M1.config.no_repeat_ngram_size = 1
model_M1.config.early_stopping = True
model_M1.config.repetition_penalty= 3.
model_M1.config.num_beams = 8

print(f"Parameters #: {model_M1.num_parameters()}")

model_M1.to(device)

Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForCausalLM were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['roberta.encoder.layer.2.crossattention.self.key.weight', 'roberta.encoder.layer.1.crossattention.self.key.weight', 'roberta.encoder.layer.0.crossattention.sel

Parameters #: 178472025


EncoderDecoderModel(
  (encoder): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(50265, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0-5): 6 x RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): L

In [24]:
# Load Model
if load_model:
    model_checkpoint_M1_H = 'model_M1_H_42'
    model_M1_H = EncoderDecoderModel.from_pretrained(model_checkpoint_M1_H)
else:
    model_checkpoint_M1 = 'distilroberta-base'
    model_M1_H = EncoderDecoderModel.from_encoder_decoder_pretrained(model_checkpoint_M1, model_checkpoint_M1, tie_encoder_decoder=False)

# Model special tokens
model_M1_H.config.decoder_start_token_id = tokenizer_M1.cls_token_id
model_M1_H.config_eos_token_id = tokenizer_M1.sep_token_id
model_M1_H.config.pad_token_id = tokenizer_M1.pad_token_id
model_M1_H.config.vocab_size = model_M1_H.config.encoder.vocab_size

# Model hyperparams
model_M1_H.config.max_length = max_length_answer
model_M1_H.config.min_length = 1
model_M1_H.config.no_repeat_ngram_size = 1
model_M1_H.config.early_stopping = True
model_M1_H.config.repetition_penalty= 3.
model_M1_H.config.num_beams = 8

print(f"Parameters #: {model_M1_H.num_parameters()}")

model_M1_H.to(device)

Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForCausalLM were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['roberta.encoder.layer.2.crossattention.self.key.weight', 'roberta.encoder.layer.1.crossattention.self.key.weight', 'roberta.encoder.layer.0.crossattention.sel

Parameters #: 178472025


EncoderDecoderModel(
  (encoder): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(50265, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0-5): 6 x RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): L

## [Task 4] Question generation with text passage $P$ and question $Q$ + [Task 6] Train and evaluate $f_\theta(P, Q)$

**[Task 4]**:
We want to define $f_\theta(P, Q)$. 

Write your own script to implement $f_\theta$ for each model: M1 and M2.

#### Formulation

Consider a dialogue on text passage $P$. 

For each question $Q_i$ at dialogue turn $i$, your model should take $P$ and $Q_i$ and generate $A_i$.

**[Task 6]**:
Write your own script to train and evaluate your $f_\theta(P, Q)$.

#### Instructions

* Perform multiple train/evaluation seed runs: [42, 2022, 1337].$^1$
* Evaluate your models with the following metrics: SQUAD F1-score.$^2$
* Fine-tune each transformer-based models for **3 epochs**.
* Report evaluation SQUAD F1-score computed on the validation and test sets.

$^1$ Remember what we said about code reproducibility in Tutorial 2!

$^2$ You can use ```allennlp``` python package for a quick implementation of SQUAD F1-score: ```from allennlp_models.rc.tools import squad```. 

A dataset collator is a function used in data processing for deep learning models, 
especially in training and evaluation. It collates, or collects, several examples 
from a dataset into a batch and performs operations on the batch, such as padding 
or stacking. This is usually done to make the input data compatible with the model's 
batch size, which is the number of samples processed together in one forward/backward pass. 
The dataset collator takes care of the preprocessing required to format the examples in the batch, 
allowing the data to be efficiently processed by the deep learning framework.

* [M1] DistilRoBERTa (distilberta-base)

In [26]:
# Initialize the data collator
data_collator_M1 = DataCollatorForSeq2Seq(tokenizer=tokenizer_M1, model=model_M1)

epochs = 3

# Training hyperparameters
training_args_M1 = Seq2SeqTrainingArguments(
    output_dir='./M1_Checkpoints',
    evaluation_strategy="epoch",
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    predict_with_generate=True,
    overwrite_output_dir=True,
    fp16=True if device == 'cuda' else False, 
    num_train_epochs = epochs,
    weight_decay=0.01,
    logging_steps=10,
    load_best_model_at_end=True,
    save_total_limit=1
)

# Optimizer and scheduler
optimizer_M1 = AdamW(model_M1.parameters(),lr= 5e-5)
train_steps  = epochs*len(tokenized_datasets_M1['train'])/batch_size
scheduler_M1 = transformers.get_cosine_schedule_with_warmup(optimizer=optimizer_M1,num_warmup_steps=50,num_training_steps=train_steps)
optimizers_M1 = optimizer_M1, scheduler_M1

# Trainer definition
trainer_M1 = Seq2SeqTrainer(
    model=model_M1,
    tokenizer=tokenizer_M1,
    args=training_args_M1,
    compute_metrics=lambda pred: compute_metrics(pred, tokenizer_M1),
    train_dataset=tokenized_datasets_M1['train'],
    eval_dataset=tokenized_datasets_M1['validation'],
    optimizers=optimizers_M1,
    data_collator=DataCollatorForSeq2Seq(tokenizer_M1,model=model_M1)
)

In [27]:
os.environ["WANDB_DISABLED"] = "true"

if not load_model:
    # Start the training
    trainer_M1.train()

***** Running training *****
  Num examples = 4288
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 1608
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"


Epoch,Training Loss,Validation Loss,Squad F1 Score
1,2.0617,1.942835,0.039101
2,1.5525,1.84626,0.018203
3,2.0232,1.825752,0.013523


Saving model checkpoint to ./M1_Checkpoints\checkpoint-500
Configuration saved in ./M1_Checkpoints\checkpoint-500\config.json
Model weights saved in ./M1_Checkpoints\checkpoint-500\pytorch_model.bin
tokenizer config file saved in ./M1_Checkpoints\checkpoint-500\tokenizer_config.json
Special tokens file saved in ./M1_Checkpoints\checkpoint-500\special_tokens_map.json
Deleting older checkpoint [M1_Checkpoints\checkpoint-500] due to args.save_total_limit
Deleting older checkpoint [M1_Checkpoints\checkpoint-1000] due to args.save_total_limit
Deleting older checkpoint [M1_Checkpoints\checkpoint-1500] due to args.save_total_limit
Deleting older checkpoint [M1_Checkpoints\checkpoint-2000] due to args.save_total_limit
Deleting older checkpoint [M1_Checkpoints\checkpoint-2500] due to args.save_total_limit
Deleting older checkpoint [M1_Checkpoints\checkpoint-3000] due to args.save_total_limit
Deleting older checkpoint [M1_Checkpoints\checkpoint-3500] due to args.save_total_limit
Deleting older c

In [None]:
if not os.path.exists('model_M1_42'):
    os.makedirs('model_M1_42')

if not load_model:
    trainer_M1.save_model('model_M1_42')

#### Evaluation on the Validation Set

In [35]:
# Create a DataLoader for the dataset using the data collator
test_loader_M1 = torch.utils.data.DataLoader(tokenized_datasets_M1['validation'], 
                                          batch_size=batch_size, 
                                          collate_fn=data_collator_M1)

# Generate answers
generate_answers(test_loader_M1,model_M1,tokenizer_M1,dataset_COQA['validation'])

  1%|▌                                                                                 | 1/134 [00:00<00:59,  2.22it/s]

['What sport are they playing?', 'What event was it?', 'where?', 'Who is the five time winner mentioned?', 'Who does he defeat?', 'how did federer describe the conditions', 'How did he describe his playing', 'who is the home favorite?']
Generated ans: ['15 years', '15 years', '15 years', '15 years', '15 years', '15 years', '15 years', '15 years']
True ans: ['tennis', 'the U.S. Open', 'in New York.', 'Roger Federer', 'Santiago Giraldo', '"It was quite up and down, getting used to the conditions,"', '"I don\'t think I\'ve ever played my best in the first round but it\'s important to come through them and come up with a good feeling."', 'Mardy Fish is']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


  1%|█▏                                                                                | 2/134 [00:00<00:47,  2.75it/s]

['who is the american?', 'how old is he?', 'did he win?', 'When did Betty start working?', 'What kind of colors does she like to wear?', 'Is Jack interested in fashion?', 'What does he wear in summer?', 'Where has Alice shopped for clothes?']
Generated ans: ['15 years', '15 years', '15 years', 'He was could it to a her', 'He was could it to a her', 'He was could it to a her', 'He was could it to a her', 'He was could it to a her']
True ans: ['Ryan Harrison', '19', 'no', 'This year.', 'Bright', 'No', 'a T-shirt', 'street markets']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


  2%|█▊                                                                                | 3/134 [00:01<00:44,  2.95it/s]

['Why?', 'What does she often put on her clothes?', 'Does she wear T-shirts too?', 'Who wears different buttons?', 'Does Betty shop for clothes less often than she used to?', 'How often does Alice shop for clothes?', 'Does she tend to wear the same things often?', 'Where do the people talking live?']
Generated ans: ['He was could it to a her', 'He was could it to a her', 'He was could it to a her', 'He was could it to a her', 'He was could it to a her', 'He was could it to a her', 'He was could it to a her', 'He was could it to a her']
True ans: ['They are cheap.', 'flowers', 'Probably.', 'Alice', 'Yes', 'Once a month.', 'No', 'France']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


  3%|██▍                                                                               | 4/134 [00:01<00:38,  3.35it/s]

['who is the article about?', 'what is his profession', 'where does he teach?', 'what does he want for fathers day', 'what is his festival called?', 'Has he written anything?', 'what?', 'what is it called?']
Generated ans: ['In the city', 'In the city', 'In the city', 'In the city', 'In the city', 'In the city', 'In the city', 'In the city']
True ans: ['Brian Greene,', 'physics and mathematics', 'Columbia University', 'his children to develop a passion for science.', 'World Science Festival.', 'yes', "a children's book", 'Icarus at the Edge of Time."']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


  4%|███▋                                                                              | 6/134 [00:01<00:32,  3.99it/s]

['who is Icarus?', 'how old is he in the story?', 'where is he living?', 'who is driving the ship?', 'what does his father announce?', 'Does he listen to his father?', 'what does he use?', 'did he build it?']
Generated ans: ['In the city', 'In the city', 'In the city', 'In the city', 'In the city', 'In the city', 'In the city', 'In the city']
True ans: ['a boy', '14', 'a space ship', 'Icarus', 'We are making an emergency course diversion to avoid an uncharted black hole', 'no', 'his own small spacecraft.', 'yes']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
['does his plan work?', 'what happens?', 'how far does he travel ahead?', 'is he found?', 'What state did the performance take place in?', 'Was there an audience?', 'How many people?', 'What was the name of the venue?']
Generated ans: ['In the city', 'In the city', 'In the city', 'In the city', 'April,000', 'April,000', 'April,000', 'April,000']
True ans: ['no', 'He miscalculates', '10,000 years into the future', 'yes', 

  5%|████▎                                                                             | 7/134 [00:01<00:30,  4.17it/s]

['Who was the performance in honor of?', 'What had she done to be honored?', 'Who were the singers at the show?', 'Which award had all of them won before?', 'How was McCartney selected as the winner?', 'How many people voted?', 'How long was voting open for?', 'Where could you vote?']
Generated ans: ['April,000', 'April,000', 'April,000', 'April,000', 'April,000', 'April,000', 'April,000', 'April,000']
True ans: ['Liz McCartney', 'helped survivors of Hurricane Katrina', 'Christina Aguilera ,Alicia Keys and John Legend', 'Grammy Award', 'online voting', 'More than 1 million', 'six weeks', 'CNN.com.']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


  7%|█████▌                                                                            | 9/134 [00:02<00:27,  4.53it/s]

["Who was the program's host?", "What does Cooper think it's relieving to know?", 'When was the program aired?', 'What song did Keys perform?', 'Which record is that from?', 'Who did Legend perform with?', 'What kind of people does the campaign praise?', 'What song did Aguilera sing?']
Generated ans: ['April,000', 'April,000', 'April,000', 'April,000', 'April,000', 'April,000', 'April,000', 'April,000']
True ans: ['Anderson Cooper,', 'to know that there are people like these heroes', 'Thanksgiving night.', 'Superwoman', 'As I Am', "If You're Out There", 'people who care more for others than they do for themselves', 'Beautiful']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
['How long did the first goal take?', 'How much did he cost when he left Manchester United?', 'Who is this story about?', 'What sport is played in this story?', 'Whose shot rebounded off the post?', "What's real's new generation called?", 'Did he play for anyone else?', 'How many did the coach sign?']
Gene

  7%|██████                                                                           | 10/134 [00:02<00:26,  4.69it/s]

['Does Ronaldo celebrate after signing?', 'What was the score before he signed on?', 'what city is the most populous in New York', 'how many people live there', 'what does it border', 'who does it have the same boundaries as', 'since when', 'is it the most densely populated county']
Generated ans: ['15 years', '15 years', '14 years', '14 years', '14 years', '14 years', '14 years', '14 years']
True ans: ['yes', '2-1', 'Brooklyn', '2,629,150 residents', 'the borough of Queens at the southwestern end of Long Island', 'Kings County', '1896', 'no']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


  8%|██████▋                                                                          | 11/134 [00:02<00:26,  4.68it/s]

['which is', 'which county is the 4th smallest', 'with new york gone, what rank would Brooklyn have under most populous in US', 'After who', 'and who else', 'Brooklyn was independent until what year', 'what is there official motto', 'What is the starting date of Mosaic?']
Generated ans: ['14 years', '14 years', '14 years', '14 years', '14 years', '14 years', '14 years', '14th years']
True ans: ['New York', 'Kings County', 'third', 'Los Angeles', 'Chicago', '1898', 'Unity makes strength', 'the 3rd millennium BC.']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


  9%|███████▎                                                                         | 12/134 [00:03<00:26,  4.57it/s]

['Where did it start?', 'Where have Bronze age pebble mosaics been located?', 'In what part of Greece were Pebble Mosaics made?', 'During what period did mosaic art flourish?', 'What was the period between 6th to the 15th century named?', 'In which century were mosaics found in Macedonia ?', 'In what city?', 'At what location?']
Generated ans: ['14th years', '14th years', '14th years', '14th years', '14th years', '14th years', '14th years', '14th years']
True ans: ['Mesopotamia', 'in Tiryns', 'Mycenean', 'the 6th to the 15th centuries', 'during the Byzantine Empire', '4th century BC', 'Aegae', 'in the palace-city']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 10%|███████▊                                                                         | 13/134 [00:03<00:34,  3.55it/s]

['What was found in Durres?', 'When?', 'Who sent for Benson?', 'Was Benson old or young?', 'How about the Captain?', 'Why did he not send Benson to Fort Prescott?', 'Who had Benson brought?', 'Where did he bring them?']
Generated ans: ['14th years', '14th years', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['The Beauty of Durrës', '4th-century BC', 'Captain Moore', 'old', 'young', 'he wanted him there', 'Joe and Darry', 'the fort']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.14285714285714288, 0.0, 0.0]


 10%|████████▍                                                                        | 14/134 [00:03<00:38,  3.16it/s]

['Does the Captain want Benson to abandon the boys?', 'Where should the boys go back to?', 'Who is willing to die fighting?', "Whose heart will break if they boys don't return?", 'Who reached out to shake hands first?', 'Who needed to ask for the right to go with the men?', 'Where was he going?', 'What did he need there?']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could was has had to a the be', 'He could was has had to a the be', 'He could was has had to a the be']
True ans: ['no', 'East.', 'Captain Moore', "his mother's", 'Benson', 'Berenger', 'first hostel', 'make arrangements for the relay of horses']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 11%|█████████                                                                        | 15/134 [00:04<00:38,  3.11it/s]

['What else?', 'Who is she?', 'When was Torx created?', 'by who?', 'Is it trademarked?', 'What is it?', "What's special about it?", 'What do people like to call it?']
Generated ans: ['He could was has had to a the be', 'He could was has had to a the be', '14,000', '14,000', '14,000', '14,000', '14,000', '14,000']
True ans: ['reception of Veronique', "Eustacie's maid,", '1967', 'Camcar Textron,', 'yes', 'a type of screw head', 'a 6-point star-shaped pattern', 'star']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 12%|█████████▋                                                                       | 16/134 [00:04<00:34,  3.44it/s]

['Why?', 'Is there an official generic name?', 'What is it?', 'Does this have a shortened version?', 'What/', 'Who made this the official name?', 'Are there other types of screw heads?', 'What are they?']
Generated ans: ['14,000', '14,000', '14,000', '14,000', '14,000', '14,000', '14,000', '14,000']
True ans: ['as in star screwdriver', 'Yes', 'hexalobular internal', 'Yes', '6lobe', 'International Organization for Standardization', 'Yes', 'Phillips head or slot head screws']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 13%|██████████▎                                                                      | 17/134 [00:04<00:31,  3.71it/s]

['Which one makes the driver cam out?', 'Why?', 'What do Torx stop?', 'What does a slot head do best?', 'Where are Torx starting to get more popular?', 'Do you see them on cars?', 'What is the subject of the article?', 'Where is it?']
Generated ans: ['14,000', '14,000', '14,000', '14,000', '14,000', '14,000', '14th years', '14th years']
True ans: ['Phillips heads', 'to prevent overtightening', 'cam-out.', 'achieve a desired torque consistently.', 'in construction industries.', 'Yes', 'Toulouse', 'France']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 13%|██████████▉                                                                      | 18/134 [00:04<00:29,  3.91it/s]

['What industry is it the center of?', 'What is headquartered there?', 'What river is the does the city lie on?', 'Is it a large city?', 'What is the latest stated population?', 'Is it considered the large city in France?', 'Is there a university there?', 'IS it new?']
Generated ans: ['14th years', '14th years', '14th years', '14th years', '14th years', '14th years', '14th years', '14th years']
True ans: ['the European aerospace industry', 'Airbus', 'the River Garonne', 'Yes', '466,297', 'No', 'Yes', 'No']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 14%|███████████▍                                                                     | 19/134 [00:05<00:34,  3.35it/s]

['How old is it?', 'Is it the largest university in France?', 'Had Emily seen Mrs. Ellmother recently?', 'Did she look well?', 'Was anyone with her?', 'How were clothes?', 'what stood out?', 'What did Emily say she was afraid had happened to her?']
Generated ans: ['14th years', '14th years', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['It was founded in 1229', 'No', 'no', 'no', 'yes', 'loose', 'the big bones of her face', 'suffering from illness,']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 15%|████████████                                                                     | 20/134 [00:05<00:37,  3.03it/s]

['how did she reply?', 'and what did she want?', 'were the women the same age?', 'Who was introduced to Francine?', 'by who?', 'was Francine old?', 'Did Mrs Ellmother enter the room swiftly?', 'how then?']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ["It's the life I'm leading that wears me down", 'want work and change', 'no', 'Mrs. Ellmother', 'Emily', 'no', 'no', 'reluctantly']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 16%|████████████▋                                                                    | 21/134 [00:06<00:39,  2.86it/s]

['did she have a nickname?', 'who gave it to her?', "how did she take Emily's hand?", 'did she sound the same asalways?', 'how then?', 'and?', 'When did Ben Franklin create glasses', 'what kind']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He was had to a the be.', 'He was had to a the be.']
True ans: ['who gave it to her?', 'her late mistress', 'doubtingly.', 'no', 'with hardly a vestige left of her former firmness of voice', 'manner.', '1784', 'bifocal']
SQuAD F1 score: [0.26666666666666666, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 16%|█████████████▎                                                                   | 22/134 [00:06<00:39,  2.85it/s]

['what kind did the chinese use?', 'what did they use them for?', 'Did Leah like to eat fish when she was young?', 'What did she want to know about fish?', 'and what did her dad anwser?', 'What did Leah want to know after that?', 'What did her dad anwser to that?', 'Was he going to show her how to catch a fish?']
Generated ans: ['He was had to a the be.', 'He was had to a the be.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.']
True ans: ['colored glasses', 'fashion', 'yes', 'Where fish come from', 'People have to catch them', 'how to catch a fish', 'People have caught fish from a pole, line, and hook for a long time', 'yes']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.20000000000000004, 0.22222222222222224, 0.0, 0.0]


 17%|█████████████▉                                                                   | 23/134 [00:06<00:37,  2.95it/s]

['What did he want to take with them?', 'What did Leah pick for lunch?', 'Did she bring any other food?', 'Where did they drive to?', 'What time of day was it?', 'What did her dad show her how to put on the hook?', 'What else did he say fish like?', 'Whas it sunny out?']
Generated ans: ['He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.']
True ans: ['Lunch', 'fries with ketchup and a bean sandwich', 'candy bears', 'a nearby lake', 'early', 'a worm', 'grasshoppers, corn, or tiny fish', 'yes']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 18%|██████████████▌                                                                  | 24/134 [00:07<00:37,  2.94it/s]

['Did she get bored?', 'What happends after a short wait?', 'What did leah do next?', 'Was she happy to catch a fish?', 'Who was in walmart?', 'what did he pick up in the store?', 'Who saw him?', 'what did he say?']
Generated ans: ['He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.']
True ans: ['yes', 'the float went under', 'lifted up her line and took the small fish out of the water.', 'yes', 'John Crawford', 'A BB gun', 'Ronald Ritchie', 'A black man was "walking around with a gun in the store," and "pointing it at people,"']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.10526315789473682]


 19%|███████████████                                                                  | 25/134 [00:07<00:36,  2.96it/s]

['Then what?', 'Did he live?', 'What were his last words?', 'who shot him?', 'Where was the Walmart located ?', 'How old was he?', 'What is the hardest job in sports?', 'Who hated it?']
Generated ans: ['He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'toa a the father', 'toa a the father']
True ans: ['Crawford was shot', 'No', '"It\'s not real"', 'Police', 'Beavercreek', '22', 'Hockey enforcers', 'The players']
SQuAD F1 score: [0.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 19%|███████████████▋                                                                 | 26/134 [00:07<00:34,  3.17it/s]

['Which one is speaking about his hatred in the article?', 'How long has he been playing?', 'Who is he speaking too?', 'Where is it at?', 'What sad thing is happening to enforcers during this summer time?', 'How many?', 'Who is the most recent?', 'Where was he found?']
Generated ans: ['tothe the his', 'toa a the father', 'toa a the father', 'toa a the father', 'tothe the his', 'toa a the father', 'toa a the father', 'toa a the father']
True ans: ['Georges Laraque', '12-year', 'Cybulski & Company radio program', 'Canada', 'Deaths', 'Three', 'Wade Belak', 'his apartment']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 20%|████████████████▎                                                                | 27/134 [00:07<00:32,  3.33it/s]

['Where was his apartment located?', 'How many hockey teams are there?', 'Who is one of the most fearsome enforcer?', 'What was his nickname?', 'Was he currently playing?', 'Why not?', 'What happened to him?', 'Was it a homicide?']
Generated ans: ['toa a the father', 'toa a the father', 'toa a the father', 'toa a the father', 'toa a the father', 'toa a the father', 'toa a the father', 'toa a the father']
True ans: ['Toronto', '30', 'Derek Boogaard', 'Boogeyman', 'No', 'trying to recover from concussions sustained in on-ice bouts.', 'was found dead', 'No']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 21%|████████████████▉                                                                | 28/134 [00:08<00:31,  3.32it/s]

['Where was he found?', 'How old was he?', 'What kind of mother does Lulu have?', 'How many people and dogs pile into a bed?', 'What does the narrator call her household?', "What is the narrator's name?", 'What does she call her mom?', 'Who wrote a book?']
Generated ans: ['toa a the father', 'toa a the father', 'He was could it to a her', 'He was could it to a her', 'He was could it to a her', 'He was could it to a her', 'He was could it to a her', 'He was could it to a her']
True ans: ['in his Minneapolis home', '28', 'a strict mother', 'six', 'Tiger', 'Sophia', 'Tiger Mom', 'Tiger Mom']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 22%|█████████████████▌                                                               | 29/134 [00:08<00:31,  3.30it/s]

['What is her book called?', "How long did Sophia spend making her mom's card?", 'What occasion was the card for?', 'True or False: Tiger Mom loved the card.', 'Why not?', 'What instrument does Sophia play?', 'Who received 2 nominations for the Nobel Prize?', 'Who praised Shen?']
Generated ans: ['He was could it to a her', 'He was could it to a her', 'He was could it to a her', 'He was could it to a her', 'He was could it to a her', 'He was could it to a her', '14,000', '14,000']
True ans: ['Battle Hymn of the Tiger Mother', '30 second;', 'birthday', 'no', "because Sophia didn't put her heart into it", 'piano', 'Shen Congwen', 'Mo Yan']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.28571428571428575, 0.0, 0.0, 0.0]


 22%|██████████████████▏                                                              | 30/134 [00:08<00:29,  3.58it/s]

['Why did Mo feel close to Shen?', 'Who else did Mo commend?', 'And?', 'Did Mo finish school?', 'How much school did he complete?', 'What about Shen?', 'What did they do after quitting school?', 'What did Mo call this experience?']
Generated ans: ['14,000', '14,000', '14,000', '14,000', '14,000', '14,000', '14,000', '14,000']
True ans: ['they have similar life experiences', 'Lu Xun', 'Mao Dun', 'no', 'fifth grade', 'high school', 'joined the army', 'learning from the book of life']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 23%|██████████████████▋                                                              | 31/134 [00:08<00:27,  3.80it/s]

['Are the themes of their writing similar?', 'What is the basis of their writings?', "What is Mo's hometown?", "And Shen's?", 'What did Mo say he learned from Shen?', 'What is unusual for Shen in his writings?', 'Does Mo try to replicate this in his writing?', 'Why?']
Generated ans: ['14,000', '14,000', '14,000', '14,000', '14,000', '14,000', '14,000', '14,000']
True ans: ['yes', 'their hometown', 'Gaomi', 'Xiangxi', 'how to deal with characters in a fiction', 'he has a humanistic touch towards all of his characters', 'yes', 'it shows the ability of a novelist']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 24%|███████████████████▎                                                             | 32/134 [00:09<00:30,  3.30it/s]

['What quality did Mo learn from Lu?', 'And Lao?', 'Which army is mentioned?', 'Did they win?', 'Who disagreed?', 'Were there injuries?', 'Whose ear was hurt?', 'And whose hat was damaged?']
Generated ans: ['14,000', '14,000', 'He could had was would to not a be the it is', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['depth', 'humor', 'Cosy Moments', 'yes', 'Billy Windsor', 'yes', "Comrade Brady's", 'Psmith']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 25%|███████████████████▉                                                             | 33/134 [00:09<00:33,  3.00it/s]

['Who was removed from the scheme?', "Where wouldn't Psmith want to meet him?", 'What name does Psmith call him?', 'Who may have he hit?', 'Or perhaps who else?', 'Does Psmith think Repetto will be around forever?', 'Where might he go?', 'A large or small one?']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['Comrade Repetto', 'in a lonely road', "Nature's sand-baggers", 'his nurse', 'his young brother', 'no', 'in a cell', 'small']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 25%|████████████████████▌                                                            | 34/134 [00:10<00:35,  2.84it/s]

['Who will put him there?', 'For a long time?', 'Does Billy agree?', 'What does he think Repetto will do?', 'How many men caught him?', 'How many toughs would swear differently?', 'Where will they say he was?', 'Does this person consider themself to be a joker?']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He was could to it.']
True ans: ['the Law', 'at least a brief spell', 'no', 'prove an alibi', 'three', 'thirty', 'five miles away', 'not any more']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 26%|█████████████████████▏                                                           | 35/134 [00:10<00:33,  2.93it/s]

['Was it because of a traumatic experience?', 'What happened?', 'Did he call for help?', 'Who did he call to?', "Why didn't they come?", 'Why?', "Did he call for help when he didn't need it?", 'Where were they swimming?']
Generated ans: ['He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.']
True ans: ['yes', "couldn't swim", 'yes', 'his friends', 'thought he was joking', 'he always played jokes', 'yes', 'in the sea']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.4444444444444445, 0.22222222222222224, 0.0, 0.0]


 27%|█████████████████████▊                                                           | 36/134 [00:10<00:32,  2.99it/s]

['Did they wind up saving his life?', 'How?', 'Who else did he play jokes on?', 'How did he joke with her?', 'Was that a lie?', 'How did his mother respond?', 'What did she say?', 'Who is Susie Wolff?']
Generated ans: ['He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', '15 years']
True ans: ['yes', 'They took him to the hospital', 'his mother', 'told her that his brother fell from the open window', 'yes', 'shouted at him', '"If you do it again, I\'ll hit you."', 'A Williams development driver.']
SQuAD F1 score: [0.0, 0.20000000000000004, 0.0, 0.0, 0.0, 0.0, 0.15384615384615385, 0.0]
['How long has she worked for Willams?', 'Who played in the Wimbledon titles match mentioned in the story?', 'Who was her opponent?', 'Who won the match?', 'What was the score?', 'How old is the victor now?', 'How many grand slams has she won?', 'When did she capture her last one?']
Generated ans: [

 29%|███████████████████████▌                                                         | 39/134 [00:11<00:23,  4.03it/s]

['At what venue?', 'Was this match an easy one for her?', "Was Serena's opponent healthy for the match?", 'What was wrong with her?', 'How many games in a row did Williams win at the end?', 'How did she celebrate her triumph?', 'Who did she hug?', 'Anyone else?']
Generated ans: ['15 years', '15 years', '15 years', '15 years', '15 years', '15 years', '15 years', '15 years']
True ans: ['The All England Club.', 'No.', 'No.', 'A respiratory problem.', 'Five.', "By climbing into the players' box .", 'Her sister Venus.', 'Her physiotherapist Esther Lee.']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
['How long was she out of action?', 'What caused her inactivity?', 'What was the sickness?', 'Was she calm after she won?', 'Did she think she would get another Wimbledon title?', 'what infected the population?', 'what does it consist of?', 'where is it located?']
Generated ans: ['15 years', '15 years', '15 years', '15 years', '15 years', '14 years', '14 years', '14 years']
True ans: 

 30%|████████████████████████▏                                                        | 40/134 [00:11<00:25,  3.65it/s]

['how many did it kill?', 'how many americans?', 'how long has it been around?', 'what percentage is active?', 'is it deadly?', 'what is the abbreviation for it?', 'how many got sick?', 'What was the name of the victim?']
Generated ans: ['14 years', '14 years', '14 years', '14 years', '14 years', '14 years', '14 years', 'He could had was to not a the be']
True ans: ['1.5 million deaths', '5–10%', 'ancient times.', '10%', 'yes', 'TB', 'One', 'Petra Anderson']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 31%|████████████████████████▊                                                        | 41/134 [00:12<00:28,  3.30it/s]

['How old was she?', 'What was a miracle?', 'Was her pastor involved?', 'What is this type of miracle called?', 'What does that mean?', 'What is his example for that?', 'Was she born with this problem?', 'What was she shot with?']
Generated ans: ['He could had was to not a the be', 'He could had was to not a the be', 'He could had was to not a the be', 'He could had was to not a the be', 'He could had was to not a the be', 'He could had was to not a the be', 'He could had was to not a the be', 'He could had was to not a the be']
True ans: ['22', 'brain abnormality', 'No', 'prevenient grace', 'God working ahead of time for a particular event in the future', 'Batman', 'Yes', 'shotgun']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 31%|█████████████████████████▍                                                       | 42/134 [00:12<00:29,  3.11it/s]

['Was the damage life threatening?', 'Where is she from?', 'What was the religious leaders name?', 'What congregation did he belong to?', 'Which is where?', 'What is his title?', 'Where did the bullet stop?', 'How many pieces hit her?']
Generated ans: ['He could had was to not a the be', 'He could had was to not a the be', 'He could had was to not a the be', 'He could had was to not a the be', 'He could had was to not a the be', 'He could had was to not a the be', 'He could had was to not a the be', 'He could had was to not a the be']
True ans: ['yes', 'Colorado', 'Brad Strait', 'Cherry Creek Presbyterian Church', 'Englewood, Colorado', 'senior pastor', 'rear of her brain', 'Three']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 32%|█████████████████████████▉                                                       | 43/134 [00:12<00:30,  3.01it/s]

['How many struck her face?', 'How many in total?', 'What company is this article from?', "Who are Michelle Obama's girls?", 'Were they getting heavy?', 'What do the studies show about diets', 'Who is susan ringwood?', 'Do girls idolize their mothers?']
Generated ans: ['He could had was to not a the be', 'He could had was to not a the be', 'He could had was to not a the be', 'He was had to a the be', 'He was had to a the be', 'He was had to a the be', 'He was had to a the be', 'He was had to a the be']
True ans: ['one', 'Four', 'CNN', 'Malia and Sasha', 'yes', 'they show that the more children diet, the more likely they are to become obese', 'chief executive of the eating disorders charity', 'yes']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.1111111111111111, 0.0, 0.0]


 33%|██████████████████████████▌                                                      | 44/134 [00:13<00:29,  3.05it/s]

['Is there a genetic component to eating disorders', 'Should food be" good or bad"?', 'Should food be feared?', 'What is the pressure on girls today?', 'How old is Carly?', 'Who does she blame?', 'Who has an eating message?', 'When did Carly go on her first diet?']
Generated ans: ['He was had to a the be', 'He was had to a the be', 'He was had to a the be', 'He was had to a the be', 'He was had to a the be', 'He was had to a the be', 'He was had to a the be', 'He was had to a the be']
True ans: ['yes', 'no', 'no', 'immense', '40', 'her mother', "America's First Lady", 'when she was 10']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.22222222222222224]


 34%|███████████████████████████▏                                                     | 45/134 [00:13<00:31,  2.83it/s]

['What did she lose?', 'Is it wrong just to blame mothers?', 'Who always competed?', 'Was it healthy?', 'What did George find great interest in?', 'Did he ever find someone who collected them?', 'From where?', 'What did he have?']
Generated ans: ['He was had to a the be', 'He was had to a the be', 'He could was had to not a be of the his it', 'He could was had to not a be of the his it', 'He could was had to not a be of the his it', 'He could was had to not a be of the his it', 'He could was had to not a be of the his it', 'He could was had to not a be of the his it']
True ans: ['puppy fat', 'yes', 'George and Richard', 'no', 'old dictionaries', 'yes', 'Australia', 'a first edition']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 34%|███████████████████████████▊                                                     | 46/134 [00:13<00:32,  2.71it/s]

['A common one?', 'Who thought they would have millions of dollars?', 'Did he?', 'How many of them ended up with a bookstore?', 'Where were they located?', 'Were they very successful?', 'Why not?', 'Who quit first?']
Generated ans: ['He could was had to not a be of the her', 'He could was had to not a be of the his it', 'He could was had to not a be of the his it', 'He could was had to not a be of the his it', 'He could was had to not a be of the his it', 'He could was had to not a be of the his it', 'He could was had to not a be of the his it', 'He could was had to not a be of the her']
True ans: ['No', 'Richard', 'no', 'Both', 'Coleford High Street', 'no', 'hard to make much money from books', 'Richard']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.11764705882352941, 0.0]


 35%|████████████████████████████▍                                                    | 47/134 [00:14<00:34,  2.55it/s]

['Did this make the other one very happy in the long run?', 'What was the uncommon package he received covered in?', 'What did he notice was on it?', 'Of whom?', 'Was it his obituary?', 'What was it then?', 'Who had an impulse?', 'What did she order?']
Generated ans: ['He could was had to not a be of the his it', 'He could was have had to a be the not it would', 'He could was had to not a be of the his it', 'He could was had to not a be of the his it', 'He could was have had to a be the not it would', 'He could was had to not a be of the his it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['Yes', 'newspaper', 'a photo', 'Richard', 'No', 'A story about a bookseller.', 'Mrs. Linley', 'a carriage']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 36%|█████████████████████████████                                                    | 48/134 [00:14<00:35,  2.44it/s]

['Did she want use if for herself?', 'How much time must pass before Westerfield can be brought back?', 'To where?', 'Who did Linley write a letter to?', 'Who did she send it by?', 'Who had a sense of duty?', 'Does she have a son or daughter?', 'Did Presty discover things?']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['Yes', 'An hour', 'Mount Morven;', 'Mrs. MacEdwin,', 'by her maid', 'Mrs. Presty', 'A daughter.', 'Yes']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 37%|█████████████████████████████▌                                                   | 49/134 [00:15<00:36,  2.30it/s]

['Who loved Kitty?', 'What is the title of this chapter?', 'And the number?', 'How was the governess received?', 'Who declared, "How very ungrateful"', 'What brightened the interval of expectation?', 'What had just departed?', "Where did Linley's freedom of action begin?"]
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['Sydney', 'The Husband', '17', 'With utmost kindness', 'Mrs. Presty', 'A domestic event.', 'The carriage', 'At the bedside.']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 37%|██████████████████████████████▏                                                  | 50/134 [00:15<00:36,  2.31it/s]

['Did it end there, too?', "What had been hanging from Lightfoot's antlers?", 'What kind of animal was Peter?', 'What kind of animal was Jumper?', 'Were they related?', 'How?', 'What kind of animal was Lightfoot?', 'When did Peter last see him?']
Generated ans: ['He could had was would to not a be the have it', 'He could was had to a the be.', 'He could was had to a the be.', 'He could was had to a the be.', 'He could had was would to not a the it is have', 'He could had was would to not a the it is have', 'He could was had to a the be.', 'He could had was would to not a the it is have']
True ans: ['Yes', 'rags', 'a rabbit', 'a hare', 'yes', 'they were cousinss', 'a deer', 'since the last winter']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 38%|██████████████████████████████▊                                                  | 51/134 [00:16<00:35,  2.35it/s]

['Did Lightfoot say he lost something?', 'What?', 'Did he say he got new ones?', "Was Peter able to believe Lightfoot's story?", 'Who was able to believe it?', "What's one of the reasons that Jumper does?", "What's another reason ?", 'Was Jumper big?']
Generated ans: ['He could had was would to not a the it is have', 'He could had was would to not a the it is have', 'He could had was would to not a the it is have', 'He could had was would to not a the it is have', 'He could had was would to not a the it is have', 'He could had was would to not a the it is have', 'He could had was would to not a the it is have', 'He could had was would to not a the it is have']
True ans: ['yes', 'his antlers', 'yes', 'no', 'Jumper the Hare', "because he saw Lightfoot's old antlers after they had fallen off", 'and he often saw Lightfoot while his new ones were growing', 'yes']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.1904761904761905, 0.09523809523809525, 0.0]


 39%|███████████████████████████████▍                                                 | 52/134 [00:16<00:34,  2.40it/s]

['Had he been eavesdropping?', 'Why was Peter snappish with him?', 'Did Peter want to believe Lightfoot?', 'Did Lightfoot like to tell everybody things they might not believe?', 'What did Peter say to Lightfoot about it?', 'Who was angry?', 'From what country?', 'Why were they angry?']
Generated ans: ['He could had was would to not a the it is have', 'He could had was would to not a the it is have', 'He could had was would to not a the it is have', 'He could had was would to not a the it is have', 'He could had was would to not a the it is have', '14,000', '14,000', '14,000']
True ans: ['yes', 'Jumper had startled him', 'yes', 'no', '"I\'m trying to believe it,"', 'workers', 'France', 'proposed layoffs']
SQuAD F1 score: [0.0, 0.14285714285714288, 0.0, 0.0, 0.26666666666666666, 0.0, 0.0, 0.0]


 40%|████████████████████████████████                                                 | 53/134 [00:16<00:29,  2.79it/s]

['How many people would be laid off?', 'Was the company willing to compromise?', 'What did they do to express their anger?', 'When?', 'Did they let anyone out?', 'Who?', 'Why?', 'How many others did they keep inside?']
Generated ans: ['14,000', '14,000', '14,000', '14,000', '14,000', '14,000', '14,000', '14,000']
True ans: ['more than 700', 'no', 'they held executives hostage', 'Tuesday', 'yes', 'Mr. Petit', 'he has heart problems', 'four']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 40%|████████████████████████████████▋                                                | 54/134 [00:17<00:25,  3.13it/s]

['Were they trying to hurt them?', 'What city did this incident start in?', 'How many people were protesting in front of the building?', 'What was the company called?', 'Who is its vice president?', 'What does Bernard Patrick do?', 'Who did he speak to?', 'How many workers participated in the hostage situation?']
Generated ans: ['14,000', '14,000', '14,000', '14,000', '14,000', '14,000', '14,000', '14,000']
True ans: ['no', 'Grenoble', 'About 500', 'Caterpillar', 'Chris Schena', 'union representative', 'CNN', 'Hundreds']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 41%|█████████████████████████████████▏                                               | 55/134 [00:17<00:23,  3.34it/s]

["Who is a spokesman for the workers' union?", 'Whose chief announced a new manifesto?', 'What does he say?', 'Does he declare the United States a threat to the world?', "What's the name of the chief?", 'Is Hezbollah a political party in Lebanon?', 'Has Hezbollah claimed responsibility for terrorist attacks?', 'Is he rejecting the terrorist label?']
Generated ans: ['14,000', 'Republican of the Europe', 'Republican of the Europe', 'Republican of the Europe', 'Republican of the Europe', 'Republican of the Europe', 'Republican of the Europe', 'Republican of the Europe']
True ans: ['Nicolas Benoit', "Hezbollah's", 'he calls for the liberation of Jerusalem', 'Yes', 'Hassan Nasrallah', 'Yes', 'Yes', 'Yes']
SQuAD F1 score: [0.0, 0.0, 0.2222222222222222, 0.0, 0.0, 0.0, 0.0, 0.0]


 42%|█████████████████████████████████▊                                               | 56/134 [00:17<00:22,  3.54it/s]

['What does he say about Hezbollah?', 'Which counties were praised by him?', 'Does Nasrallah appear in public?', 'how old was she?', 'what happened for 2 weeks?', 'when was the first such case known?', 'who was it?', 'how did she speak after?']
Generated ans: ['Republican of the Europe', 'Republican of the Europe', 'Republican of the Europe', 'In the city', 'In the city', 'In the city', 'In the city', 'In the city']
True ans: ['it is a "resistance" force', 'Iran and Syria', 'No', '47', 'she was speechless', 'during the Second World War', 'a Norwegian woman', 'with a German-sounding accent']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 43%|██████████████████████████████████▍                                              | 57/134 [00:17<00:20,  3.78it/s]

['did Hasnip have the same problem?', 'did she call anyone?', 'who?', 'who else had the problem?', 'what accent did she have?', 'did others feel that she had it?', 'what needs to happen to get the problem?', 'any other reasons?']
Generated ans: ['In the city', 'In the city', 'In the city', 'In the city', 'In the city', 'In the city', 'In the city', 'In the city']
True ans: ['she found herself talking with what seemed to be a French accent', 'yes', 'a friend', 'an English woman named Annie', 'a Scottish accent', 'no', 'damage to several parts of the brain', 'a stroke']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 43%|███████████████████████████████████                                              | 58/134 [00:17<00:18,  4.00it/s]

['is the accent real?', 'who says it is not?', 'is this problem rare?', 'did the people like the norwegian woman after?', 'who was she injured by?', 'what had happened to Wendy to get it?', 'how do the victims use syllables?', 'What nation did Alexander rule?']
Generated ans: ['In the city', 'In the city', 'In the city', 'In the city', 'In the city', 'In the city', 'In the city', '14 years']
True ans: ['no', 'a phonetician', 'yes', 'no', 'the German military', 'a brain injury', 'they lengthened them', 'Evidence: As the forceful king of Macedonia']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 44%|███████████████████████████████████▋                                             | 59/134 [00:18<00:18,  4.15it/s]

['Who would like to play basketball at the White House?', 'How old is he?', 'Where is he from?', "What's his current occupation?", 'For what network?', 'Who did he vote for?', 'Has he met him?', 'How many three point shots has he made?']
Generated ans: ['In the prison', 'In the prison', 'In the prison', 'In the prison', 'In the prison', 'In the prison', 'In the prison', 'In the prison']
True ans: ['Reggie Miller', '43', 'Indiana Pacers', 'an NBA analyst', 'TNT', 'Obama', 'no', '2,560']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 45%|████████████████████████████████████▎                                            | 60/134 [00:18<00:17,  4.12it/s]

['Has Obama played basketball?', 'When?', 'For whom?', 'What does he want to add to the White house?', 'Where would he put it?', 'What type of belief system is Christianity?', 'WHat is it founded on?', 'How many believers are there?']
Generated ans: ['In the prison', 'In the prison', 'In the prison', 'In the prison', 'In the prison', 'EU of the Europe', 'Republican of the Europe', 'EU of the Europe']
True ans: ['yes', '1979', 'Punahou High School', 'a basketball court', 'in place of the bowling alley', 'monotheisti', 'on the life and teachings of Jesus Christ', 'over 2.4\xa0billion followers']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.28571428571428575, 0.0, 0.2, 0.0]


 46%|████████████████████████████████████▊                                            | 61/134 [00:18<00:17,  4.07it/s]

['Does that mean it is a minor religion?', 'How does it rank among the global faiths?', 'In how many nations is it popular?', 'WHat do they call the main figure in their religion?', 'Do they have another name for him?', 'WHat is it?', 'What is written about his life?', 'Why is it called that?']
Generated ans: ['EU of the Europe', 'Republican of the Europe', 'Republican of the Europe', 'EU of the Europe', 'EU of the Europe', 'EU of the Europe', 'EU of the Europe', 'EU of the Europe']
True ans: ['no', "It is the world's largest religion", 'in 158 countries and territories', 'Jesus Christ', 'yes', 'Son of God', 'gospel', 'because it means good news']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.3333333333333333, 0.0, 0.0]


 46%|█████████████████████████████████████▍                                           | 62/134 [00:19<00:20,  3.57it/s]

['WHo wrote them?', 'What choice does a family have when a member is diagnosed with dementia?', 'What choice did the Lazzara family make?', 'Who was diagnosed with dementia?', 'When?', 'Who largely bore the responsibility for his care?', 'How was she related to Lazzara Sr.?', 'What were some of her duties?']
Generated ans: ['EU of the Europe', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.']
True ans: ['Matthew, Mark, Luke, and John', 'place them in a facility, or care for them themselves', 'cared for them at home', 'Anthony Lazzara Sr.', '1996', 'Gail', 'he was her father-in-law', 'She fed him, bathed him and changed his diapers']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4444444444444445, 0.0]


 47%|██████████████████████████████████████                                           | 63/134 [00:19<00:21,  3.26it/s]

["Why couldn't her husband help?", 'What was his career?', 'Did they care for Lazzara Sr. until his death?', 'When did they stop caring for him at home?', 'Where did he go, then?', 'At what age did he die?', 'In what year?', 'In what war had he served?']
Generated ans: ['He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.']
True ans: ['he was on the road', 'truck driver', 'no', 'Two years ago', 'a local Veterans Affairs facility', '95', '2008', 'World War II']
SQuAD F1 score: [0.4444444444444445, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 48%|██████████████████████████████████████▋                                          | 64/134 [00:19<00:22,  3.11it/s]

['What career did he also share with his son?', 'Did it take a toll on the marriage between Gail and Anthony?', 'Did they divorce?', 'Are Gail and Anthony the same age?', 'How old are they?', "Why didn't Gail leave the relationship?", 'What was the other name for Operation Sea Lion', 'What is it?']
Generated ans: ['He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', '14th years', 'Spain of the France']
True ans: ['truck driver', 'yes', 'no', 'yes', '56', "she couldn't walk away from her father-in-law", 'Operation Sealion', "Nazi Germany's code name for the plan for an invasion of the United Kingdom during the Battle of Britain in the Second World War."]
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.09090909090909091]


 49%|███████████████████████████████████████▎                                         | 65/134 [00:20<00:20,  3.33it/s]

['Who was the German supreme leader?', 'What was his other title?', 'After what event he divided about this invasion?', 'Did he really wanted to invade?', 'What was the alternative?', 'Did it succeed?', 'Did he make  preparations for amphibious attack?', 'Was his force experienced for that?']
Generated ans: ['Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France']
True ans: ['Adolf Hitler', 'the German Führer', 'the Fall of France', 'no', 'a peace agreement', 'no', 'no', 'no']
SQuAD F1 score: [0.0, 0.0, 0.6666666666666666, 0.0, 0.0, 0.0, 0.0, 0.0]


 49%|███████████████████████████████████████▉                                         | 66/134 [00:20<00:19,  3.50it/s]

['Which force was better suited for that?', 'Where they did that?', 'When?', 'Did he try air and naval superiority instead?', 'Over which channel?', 'Did his force make it?', 'Did his High Command doubt it?', 'How about he himself?']
Generated ans: ['Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France']
True ans: ['the Japanese', 'at the Battle of Wuhan', 'in 1938', 'yes', 'the English Channel', 'no', 'yes', 'he had serious doubts about the prospects for success']
SQuAD F1 score: [0.0, 0.28571428571428575, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 50%|████████████████████████████████████████▌                                        | 67/134 [00:20<00:21,  3.11it/s]

["what was put over Dick's head?", 'did he know where he was being taken?', 'whose idea was it to use the boat?', 'why did he want to use it?', 'did it work?', 'where did he think he was in relation to camp?', 'what did they say to Dick when they first took him?', 'what did he think was going to happen?']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['It was a large towel', 'No', "It was Jackson's idea", 'To bewilder Dick, so he would not know where he was being taken', 'Yes', 'The head of the lake', 'They said "Don\'t dare to make a sound!"  "If you do, you\'ll be struck senseless"', 'He had thought they were g

 51%|█████████████████████████████████████████                                        | 68/134 [00:21<00:22,  2.94it/s]

['did he think he was going to be hazed?', 'how did he feel about that?', 'what was Dick doing before they took him?', 'do he go to sleep right away?', 'was he worn out?', 'What school did Jeremey Shu-How Lin graduate from?', 'Did he have an athletic scholarship there?', 'What sport does he play']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', '15 years', '15 years', '15,000']
True ans: ['Yes', 'He was far from pleased', 'He was retiring as usual for the night', 'Yes', 'Yes', 'Harvard University', 'no', 'basketball']
SQuAD F1 score: [0.0, 0.26666666666666666, 0.23529411764705882, 0.0, 0.0, 0.0, 0.0, 0.0]


 51%|█████████████████████████████████████████▋                                       | 69/134 [00:21<00:19,  3.27it/s]

["He'a a first. How?", 'Where was he born?', 'When?', 'Who did he play for his first season?', 'Was he drafted?', 'He was the first NBA player to what?', "What's the name of his following?", 'What does the Associated Press call him?']
Generated ans: ['15,000', '15 years', '15 years', '15 years', '15 years', '15,000', '15 years', '15 years']
True ans: ['the first Chinese-American professional basketball player', 'California,', '1988', 'Golden State Warriors.', 'no', 'record at least 20 points and seven assists in each of his first four starts,', 'Linsanity.', 'the most Surprising story in the NBA"']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 52%|██████████████████████████████████████████▎                                      | 70/134 [00:21<00:19,  3.35it/s]

['How long did he play with the Houston Rockets?', "What day did Rick and his friend's play?", 'And where did they play at?', 'Who took them there?', 'and who is she to Rick?', "And what were the friend's names?", 'What did Rick play on first?', 'And Chris']
Generated ans: ['15 years', 'He could to it.', 'He could to it.', 'He could to it.', 'He was could to it.', 'He was could to it.', 'He could to it.', 'He was could to it.']
True ans: ['less than two weeks', 'Tuesday', 'at a playground near his house', 'Trish', 'his mom', 'Andrew and Chris', 'the monkey bars', 'the swings']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 53%|██████████████████████████████████████████▉                                      | 71/134 [00:21<00:18,  3.38it/s]

['what about Andrew?', 'Where did the mom sit?', 'doing what?', 'what happened at 6?', 'so what did Trish do?', 'Who followed Trish to her car?', "Who didn't make it to the car?", 'why?']
Generated ans: ['He was could to it.', 'He could to it.', 'He was could to it.', 'He could to it.', 'He was could to it.', 'He was could to it.', 'He could to it.', 'He was could to it.']
True ans: ['the slide', 'on a bench', 'reading', 'it started to rain', 'put her book inside of her jacket', 'Rick and Andrew', 'Chris', 'he tripped and fell']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.5, 0.0, 0.0, 0.0, 0.22222222222222224]


 54%|███████████████████████████████████████████▌                                     | 72/134 [00:22<00:19,  3.25it/s]

['Was he hurt?', 'How so?', 'What did the mom have to put over it?', 'What did Ben want Tom to do with him?', 'Could he go?', 'What was his chore?', 'Who asked him to do it?', 'what day was this?']
Generated ans: ['He was could to it.', 'He was could to it.', 'He could to it.', 'He was could to a the play.', 'He was could to a the play.', 'He was could to a the play.', 'He was could to a the play.', 'He was could to a the play.']
True ans: ['yes', 'He scabbed his knee', 'a bandage', 'to go swimming', 'no', 'whitwashing the fence', 'his Aunt Polly', 'Saturday']
SQuAD F1 score: [0.0, 0.22222222222222224, 0.0, 0.25, 0.0, 0.0, 0.0, 0.0]


 54%|████████████████████████████████████████████▏                                    | 73/134 [00:22<00:20,  3.03it/s]

['Did he have money to burn in his pockets?', 'Who was the first victim of his con?', 'Who took the opportunity for a a flying toy?', 'Did anyone trade a puppy?', 'Who was shouting out loud?', 'who is going to prison?', 'what is his occupation?', 'is he 60 years old?']
Generated ans: ['He was could to a the play.', 'He could had was to not a the it liked would have', 'He was could to it.', 'He was could to a the play.', 'He was could to a the play.', 'April,000', 'April,000', 'April,000']
True ans: ['no', 'Ben Rogers', 'Billy', 'no', 'Ben Rogers', 'Wesley Snipes', 'Actor', 'No']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 55%|████████████████████████████████████████████▋                                    | 74/134 [00:22<00:17,  3.36it/s]

['what is his age?', 'what is the name of the prison?', 'what city is it in?', 'which state?', 'why was he prosecuted?', 'how many years did he fail to submit the required documents?', 'which years?', 'did he make a great deal of money?']
Generated ans: ['April,000', 'April,000', 'April,000', 'April,000', 'April,000', 'April,000', 'April,000', 'April,000']
True ans: ['48', 'McKean Federal Correctional Institution', 'Lewis Run', 'Pennsylvania', 'for not filing tax returns', 'three', '1999, 2000 and 2001', 'Yes']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 56%|█████████████████████████████████████████████▎                                   | 75/134 [00:23<00:16,  3.63it/s]

['how much?', "why didn't he submit the required documents?", 'who conducted the discussion?', 'on what show?', 'on which channel?', 'on which day of the week?', 'was it conducted during the day?', 'when was it conducted?']
Generated ans: ['April,000', 'April,000', 'April,000', 'April,000', 'April,000', 'April,000', 'April,000', 'April,000']
True ans: ['$40 million dollars', 'he was involved in a tax resisters group.', 'Larry King', '"Larry King Live"', 'CNN', 'Tuesday', 'No', 'at night.']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 57%|█████████████████████████████████████████████▉                                   | 76/134 [00:23<00:15,  3.84it/s]

['how long is his prison term?', 'What channel is Mad Men on?', 'Is it a hit?', 'What season is it?', 'Who are some of the characters?', 'What decade does it portray?', 'What are a lot of people excited to see?', "How does Janie Lambert remember the late '60s?"]
Generated ans: ['April,000', 'Same.', 'Same.', 'Same.', 'Same.', 'Same.', 'Same.', 'Same.']
True ans: ['3 years', 'AMC', 'yes', 'seventh season', 'Don, Peggy, Pete', "'60s", 'the fabulous clothes .', 'more colorful and vibrant.']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 57%|██████████████████████████████████████████████▌                                  | 77/134 [00:23<00:17,  3.26it/s]

['How many men are there?', 'Where?', 'At what establishment?', 'Where were they seated?', 'Who is working on their behalfs?', 'Who has the happy outlook?', 'What was his reaction?', 'Did he have the appearance of a grown man?']
Generated ans: ['He could had was would to not a the be it have', 'He could had was would to not a the be it have', 'He could had was would to not a the be it have', 'He could had was would to not a the be it have', 'He could was had to the a be of his', 'He could was had to a the be of his', 'He could had was would to not a the be it have', 'He could had was would to not a the be it have']
True ans: ['Three', 'New York', 'An exclusive club', 'Their accustomed table', 'Chicago brokers', 'Higgins', 'He shrugged', 'No']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.16666666666666669, 0.0]


 58%|███████████████████████████████████████████████▏                                 | 78/134 [00:24<00:19,  2.94it/s]

['What did they sell?', 'How much?', 'Was there a bill of lading?', 'Did he appear to be dishonest?', 'Who needed to be called?', 'Who questioned the agent?', 'And who else?', 'What was his last name?']
Generated ans: ['He could had was would to not a the be it have', 'He could had was would to not a the be it have', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['Stock', "Ten million dollars' worth", 'no', 'no', 'the police', 'Matt', 'Andy', 'Dilks']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 59%|███████████████████████████████████████████████▊                                 | 79/134 [00:24<00:19,  2.77it/s]

['When did he take off?', 'What did he steal?', 'Who did he drive off with?', 'What is Billy?', 'And it was hitched to?', "What was Matt's occupation?", 'Who worked with him?', 'Was the thief emboldened?']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['Less than two hours ago', 'the cases of goods', 'Billy', 'the horse', 'a wagon', 'auctioneer.', 'Andy', 'yes']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 60%|████████████████████████████████████████████████▎                                | 80/134 [00:24<00:20,  2.69it/s]

['Was there an order?', 'Was it in writing?', 'Was the bandit a woman?', 'How old was Matt?', 'Who was stared at?', "What does the Gospel of Luke is account of who's life?", "Who was barren before Mary's conception?", 'At the time of her betrothal Joseph was what age?']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', '14th years', '14th years', '14th years']
True ans: ['yes', 'yes', 'no', 'he was young', 'the freight agent', "Mary's", 'Anne', 'thirty']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.30769230769230765, 0.0, 0.0, 0.0, 0.0]


 61%|█████████████████████████████████████████████████▌                               | 82/134 [00:25<00:15,  3.45it/s]

['How old was Mary?', 'Who was present at the Crucifixion of Jesus?', 'Was her body corrupt? (Mary)', 'What is known as the Assumption?', "Which Gospel begins with Mary's life?", 'Hannah took who to the Tabernacle?', 'Which Angel appeared to Mary?', 'Who is the player that is the hero of this story?']
Generated ans: ['14th years', '14th years', '14th years', '14th years', '14th years', '14th years', '14th years', '15 years']
True ans: ['12–14 years old', 'Mary', 'no', 'her incorrupt body was assumed directly into Heaven', 'The Gospel of Luke', 'Samuel', 'Gabriel', 'Karim Benzema']
SQuAD F1 score: [0.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
['What did he do?', 'What team does he play for?', 'In what minute did he score the goal?', 'Which player was frustrated during the game?', 'Why?', 'How far is he behind the record holder?', 'Who is his chief rival?', 'What number does Messi have?']
Generated ans: ['15 years', '15 years', '15 years', '15 years', '15 years', '15 years', '15 years', '15 y

 62%|██████████████████████████████████████████████████▏                              | 83/134 [00:25<00:13,  3.72it/s]

['What team does he face this Wednesday?', 'is the article discussing a land locked country?', 'what is it?', 'where?', 'what is it called?', 'anything else?', 'is it the 12 biggest island in the world?', 'what rank is it?']
Generated ans: ['15 years', '14th years', '14th years', '14th years', '14th years', '14th years', '14th years', '14th years']
True ans: ['Ajax', 'No', 'a large island', 'in the north Atlantic Ocean', 'Great Britain', 'Britain', 'no', '9th']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 63%|██████████████████████████████████████████████████▊                              | 84/134 [00:25<00:12,  3.93it/s]

['how many people live there?', 'as of when?', 'is that the largest population of any island?', 'what is the most populated one?', 'when did it become a single kingdom?', 'what made the joining official?', 'what two lands combined?', 'what Scottish monarch ruled the land?']
Generated ans: ['14th years', '14th years', '14th years', '14th years', '14th years', '14th years', '14th years', '14th years']
True ans: ['aboutt 61\xa0million people,', '. In 2011', 'No', 'Java in Indonesia and Honshu in Japan.', '1707', '1707 Acts of Union.', 'Kingdom of Englandand the Kingdom of Scotland', 'King Jame']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 63%|███████████████████████████████████████████████████▍                             | 85/134 [00:25<00:11,  4.10it/s]

['when?', 'when did the UK form?', 'Who is the article about?', 'How many titles has he won so far?', 'Who is he up against for his seventh final?', 'Who did he just beat?', 'Is he happy?', 'How many times has he lost at the French Open?']
Generated ans: ['14th years', '14th years', '15 years', '15 years', '15 years', '15 years', '15 years', '15 years']
True ans: ['16003', '1922', 'Rafael Nadal', '11 career grand slam titles.', 'David Ferrer', 'Nicolas Almagro', 'yes', 'once']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 64%|███████████████████████████████████████████████████▉                             | 86/134 [00:26<00:11,  4.22it/s]

['Have him and his opponent in the semi final played together before?', 'Who does he consider to be one of the best players in the world?', 'How many grand slams does Bjorg have?', 'Who died?', 'Anyone else?', 'Where?', 'How?', 'What did they wear?']
Generated ans: ['15 years', '15 years', '15 years', 'Republican of the', 'Republican of the', 'Republican of the', 'Republican of the', 'Republican of the']
True ans: ['Yes', 'Ferrer,', '11', 'Five Al-Shabaab militants', 'Three AU soldiers and a civilian', 'Somalia', 'detonated a car bomb at the entrance', 'Somali military uniforms']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 65%|████████████████████████████████████████████████████▌                            | 87/134 [00:26<00:10,  4.29it/s]

['Did a fight breakout?', 'Who is starting the fight?', 'Why?', 'At what place?', 'Where there?', 'What is there?', 'Is it the biggest?', 'When did Vienna start their death registry?']
Generated ans: ['Republican of the', 'Republican of the', 'Republican of the', 'Republican of the', 'Republican of the', 'Republican of the', 'Republican of the', '14,000']
True ans: ['yes', 'Al-Shabaab militants', 'to implement a stricter form of Islamic law', 'Somalia', 'Mogadishu', 'the Halane military base', 'yes', '1607']
SQuAD F1 score: [0.0, 0.0, 0.22222222222222224, 0.0, 0.0, 0.0, 0.0, 0.0]


 66%|█████████████████████████████████████████████████████▏                           | 88/134 [00:26<00:10,  4.34it/s]

['When does it end?', 'How was it written in the beginning?', 'Who has been looking it over?', 'From where?', 'What are they hoping to find?', 'When did Mozart die?', 'What month?', 'How old was he?']
Generated ans: ['14,000', '14,000', '14,000', '14,000', '14,000', '14,000', '14,000', '14,000']
True ans: ['1920', 'it was handwritten', 'Dr. Richard H.C. Zegers and his colleagues', 'the University of Amsterdam', "clues to Mozart's death", '1791', 'December', '35']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 66%|█████████████████████████████████████████████████████▊                           | 89/134 [00:26<00:10,  4.38it/s]

['Are there rumors about how he died', 'Who might have poisoned him?', 'Where could he have picked up trichinosis?', 'Can that be fatal?', 'What did he do while on his deathbed?', 'What?', 'Who painted a picture of him?', "Why couldn't he flip over?"]
Generated ans: ['14,000', '14,000', '14,000', '14,000', '14,000', '14,000', '14,000', '14,000']
True ans: ['yes', 'jealous rivals', 'undercooked pork', 'yes', 'sing', '"Requiem"', 'Johann Georg Edlinger', 'his body was swollen']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 67%|██████████████████████████████████████████████████████▍                          | 90/134 [00:27<00:10,  4.28it/s]

['What is a current theory on his cause of death?', 'From what?', 'What do people with Americanphobia not like?', 'What can they dislike about the United States?', 'What specific policies might they dislike?', 'Is there other names for Americanphobia/', 'How many?', 'What are they?']
Generated ans: ['14,000', '14,000', 'Republican of the Europe', 'Republican of the Europe', 'Republican of the Europe', '14 years', 'Republican of the Europe', 'Republican of the Europe']
True ans: ['kidney damage', 'a strep infection', 'the United States', 'the people', 'foreign policy', 'yes', 'two', 'Anti-Americanism and anti-American sentiment']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 68%|███████████████████████████████████████████████████████                          | 91/134 [00:27<00:10,  4.23it/s]

['What did the term start as?', 'What did it change into?', "Who is Brendon O'Connor?", 'Where is he employed?', 'Does he this this is a consistent thing?', 'Who is Marie-France Toinet?', 'Does she feel the term is justified if it does not imply systematic opposition?', 'How much of America does she say it needs to include?']
Generated ans: ['Republican of the Europe', 'Republican of the Europe', 'Republican of the Europe', '14 years', 'Republican of the Europe', '14 years', 'Republican of the Europe', 'Republican of the Europe']
True ans: ['a composite of stereotypes, prejudices and criticisms', 'a politically based criticism', 'A political scientist', 'the United States Studies Centre', 'no', 'a French scholar', 'no', 'all of it']
SQuAD F1 score: [0.2222222222222222, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3333333333333333]


 69%|███████████████████████████████████████████████████████▌                         | 92/134 [00:27<00:11,  3.68it/s]

['Where are the strongest negative opinions of the U.S.?', 'How many other areas?', 'Is it strong in part of Europe?', 'What about China', 'Where is it the weakest?', 'How does the U.S. try to act?', 'Where did Clinch find himself?', "On what day did Michelle's wedding take place?"]
Generated ans: ['14 years', '14 years', '14 years', '14 years', '14 years', 'Republican of the Europe', 'He was had could to a the be of her', 'He was had to a the be.']
True ans: ['in the Arab world', 'four', 'yes', 'yes', 'Sub-Saharan Africa and most parts of Southeast Asia', 'like a world policeman', 'the palace', 'October 3']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 69%|████████████████████████████████████████████████████████▏                        | 93/134 [00:28<00:11,  3.44it/s]

['What year?', 'How long after that did was this article written?', 'What will she be doing on her anniversary?', 'Is this her first choice?', 'How does she feel about it?', 'What is her last name?', 'What is her maiden name?', 'Where did they get married']
Generated ans: ['He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.']
True ans: ['1992', 'Twenty years', 'attending a presidential debate', 'no', 'excited', 'Robinson', 'Robinson', 'Chicago']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 70%|████████████████████████████████████████████████████████▊                        | 94/134 [00:28<00:12,  3.27it/s]

['Who is her husband?', 'What will he be doing on that night?', 'Who is he debating with?', 'Who was the president on this night?', 'What is she worried about?', 'What rule does she mention?', 'Is she concerned about her performance?', 'Who does she feel will be critiquing her?']
Generated ans: ['He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.']
True ans: ['Barack Obama', 'debate', 'Romney', 'Obama', 'following the rules', "you don't want to clap", 'yes', 'all eyes']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.20000000000000004, 0.0, 0.0]


 72%|██████████████████████████████████████████████████████████                       | 96/134 [00:28<00:10,  3.60it/s]

['Is she the best person to judge her husband?', 'What does she think she should not be judging?', 'What city will they be in?', 'How old is Wang Le?', 'What is WeChat?', 'How many people use it?', 'Who runs it?', 'Where are they located?']
Generated ans: ['He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', '15 million', '15 million', '15 million', '15 million', '15 million']
True ans: ['no', 'style or techniques', 'Denver', '28', 'a moblie messaging system', '400 million', 'Tencent', 'China']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.5, 0.0, 0.0]
["They are China's largest what?", 'What color envelopes are money given in?', 'Is this a Japanese tradition?', 'What then?', 'When are these gifts given?', 'What companies are making money off of this tradition?', 'Red symbolizes what?', 'What payment system is run by Alibaba?']
Generated ans: ['15 million', '15 million', '15 million', '15 million', '15 million', '15 million', '15 million', '15 million']
True a

 72%|██████████████████████████████████████████████████████████▋                      | 97/134 [00:29<00:09,  3.84it/s]

['How many users does it allow to send gifts?', "What is Alibaba's music app?", 'What did WeChat do to it?', 'Was this a long time ago?', 'What does Alipay control 82.6 percent of?', 'What percent does Tenpay account for?', 'What company is Tenpay associated with?', 'How many men were convicted of plotting a sea hijack?']
Generated ans: ['15 million', '15 million', '15 million', '15 million', '15 million', '15 million', '15 million', '14,000']
True ans: ['190 million', 'Xiami', 'blocked it', 'no', 'the Chinese mobile phone payment market', '10', 'Tencent', 'Five']
SQuAD F1 score: [0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 73%|███████████████████████████████████████████████████████████▏                     | 98/134 [00:29<00:08,  4.03it/s]

['What cast were they?', 'How many years are they sentenced to prison?', 'Who were they caught by?', 'In what month and year?', 'Did they claim to be innocent?', 'Did the court accept it?', 'Was their ship destroyed?', 'Who were the pirates then handed over to?']
Generated ans: ['14 years', '14,000', '14 years', '14 years', '14 years', '14 years', '14 years', '14 years']
True ans: ['Somali', 'Five', 'Danish navy', 'January 2009', 'Yes', 'No', 'Yes', 'Dutch authorities']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 74%|███████████████████████████████████████████████████████████▊                     | 99/134 [00:29<00:10,  3.43it/s]

['Was anything relevant to fishing found in their boat?', 'How many days do they have to file an appeal?', 'Who thought there was a silence before the storm?', 'Who was forbidden to enter the house?', 'Whose house was it?', 'Who stood in the centre of the group?', 'Were they holding hands?', 'Whose elbow was on the mantlpiece?']
Generated ans: ['14 years', '14 years', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['No', 'Two weeks (14 days)', 'Saton', 'Saton', 'Mr. Rochester', 'Saton and Lois', 'yes', 'Rochester']
SQuAD F1 score: [0.0, 0.3333333333333333, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 75%|███████████████████████████████████████████████████████████▋                    | 100/134 [00:30<00:11,  3.05it/s]

['Did he look mad?', 'What did his face look like?', 'Who looked anxiously at them?', 'And who else stood on the side, holding his peace?', "Had Lois consented to be Saton's wife?", "Did he still need Rochester's approval?", "Where would they go if they couldn't be married there?", 'What chapter is this?']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['no', 'expressionless', 'Mary', 'Vandermere', 'yes', 'no', 'the Comtesse', 'XXXVI']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 75%|████████████████████████████████████████████████████████████▎                   | 101/134 [00:30<00:11,  2.85it/s]

["What's the name of it?", 'What seemed to ebb slowly away in the silence?', 'Did something terrify Mary?', 'Who seemed terrified?', 'Who did the onus of more speech rest with?', 'How many conditions did Saton obey?', 'what date did Peter force his way in?', 'the time?']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could was had to a be of the his it would', 'He could was had to not a the be']
True ans: ['THE CHARLATAN UNMASKED', 'courage', 'No', 'Pauline', 'Saton', 'One', 'March 2002', '5 pm']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 76%|████████████████████████████████████████████████████████████▉                   | 102/134 [00:30<00:11,  2.73it/s]

['which room did Will see him in first?', 'how many storeys in the house?', 'was Peter clean?', 'what did Will think he was?', 'how did Will trap him?', 'where did the police take him?', 'what charges?', 'who cantacted Will a few weeks later?']
Generated ans: ['He could was had to not a the be', 'He could was had to a be of the his her', 'He could was had to a be of the his it would', 'He could was had to not a the be', 'He could was had to a be of the his it would', 'He could was had to not a the be', 'He could was had to not a the be', 'He could was had to not a the be']
True ans: ['bedroom', 'Five', 'no', 'a thief', 'pulled his jacket down to trap his arms', 'Pentonville Prison', 'breaking and entering', 'Kim Smith']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.22222222222222224, 0.0, 0.0, 0.0]


 77%|█████████████████████████████████████████████████████████████▍                  | 103/134 [00:31<00:11,  2.64it/s]

['what was being trialled?', 'how long had Peter spent in jail?', 'what did he hope for?', 'AND', "What is Peter's Surname?", 'how old are they?', "where is Will's home?", 'how do they greet each other?']
Generated ans: ['He could was had to a be of the his it would', 'He could was had to not a the be', 'He could was had to not a the be', 'He could was had to a be of the his her', 'He could was had to a be of the his it would', 'He could was had to a be of the his her', 'He could was had to a be of the his her', 'He could was had to not a the be']
True ans: ['restorative justice', '18 years', 'he could get clean', 'do something useful', 'Woolf', '55', 'North London', 'with a hug']
SQuAD F1 score: [0.0, 0.0, 0.36363636363636365, 0.0, 0.0, 0.0, 0.0, 0.0]


 78%|██████████████████████████████████████████████████████████████                  | 104/134 [00:31<00:11,  2.64it/s]

['were they always friends?', 'how long had Peter been taking drugs?', 'What is the highest of the bishops?', 'What religion has a see?', 'How many of those have an archiepiscopical rank?', 'Where is one country where there is only one see?', 'What is another?', 'Why are there only one there?']
Generated ans: ['He could was had to a be of the his her', 'He could was had to not a the be', '14th years', '14th years', 'Spain of the France', '14th years', '14th years', '14th years']
True ans: ['no', '30 years', 'archbishop', 'the Roman Catholic Church', '77', 'Luxembourg', 'Luxembourg', 'too small to be divided into several dioceses']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 78%|██████████████████████████████████████████████████████████████▋                 | 105/134 [00:32<00:11,  2.58it/s]

['Who is normally in charge of a see?', 'Who had a part of their body that was hooked', 'What part of her body?', 'How many chins did she have', 'What is she compared to', 'Is she fat?', 'Is she short?', 'Is she a creature?']
Generated ans: ['14th years', 'He could had was would to not a be the have it', 'He could had was would to not the be a it is', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['bishop', 'Madame Coutras did', 'her nose', 'three', 'a ship in full sail', 'Yes', 'No', 'Yes']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 79%|███████████████████████████████████████████████████████████████▎                | 106/134 [00:32<00:10,  2.56it/s]

['What kind?', 'Did she like to talk?', 'What was she wearing', 'What does the Doctor have', 'Who gave it to him', 'What is the doctors name', 'Does the narrator want to see it?', 'Where is the role of teacher normally carried out?']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'Sederal']
True ans: ['imposing', 'Yes', 'straight-fronted corsets', 'the picture', 'Strickland', 'Coutras', 'Yes', 'at a school or other place of formal education']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 80%|███████████████████████████████████████████████████████████████▉                | 107/134 [00:32<00:09,  2.96it/s]

['Do some teachers have to continue their education after they qualify to teach?', 'What do teachers use lesson plans for?', 'Where do teachers normally obtain their professional qualifications?', 'What are some subjects teachers provide in structions in?', 'Does formal education ever take place through home schooling?', 'What is the process of contunining their formal education after they have become qualified to teach called?', 'In different cultureds to teachers roles vary?', 'Who assists in informal learning?']
Generated ans: ['Sederal', 'July,000', 'Sederal', 'Sederal', 'Sederal', 'Sederal', 'Sederal', 'Sederal']
True ans: ['yes', 'to facilitate student learning', 'a university or college', 'the arts, religion, civics, community roles, or life skills', 'yes', 'professional development', 'yes', 'a family member, or by anyone with knowledge or skills in the wider community setting']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 81%|████████████████████████████████████████████████████████████████▍               | 108/134 [00:32<00:07,  3.31it/s]

['Is the role of a teacher ongoing?', 'What does the article say Einstein exscaped?', 'Did the FBI have a file on him?', 'How long was it?', 'What injustices was he vocal about?', 'Was that it?', 'How many times was he married?', 'Did he have any kids?']
Generated ans: ['Sederal', '15 years', '15 years', '15 years', '15 years', '15 years', '15 years', '14,000']
True ans: ['yes, often', "Hitler's Germany", 'Yes', '1,400 pages', 'racial prejudice', 'No, Fascism too', 'twice', 'Yes']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 81%|█████████████████████████████████████████████████████████████████               | 109/134 [00:33<00:07,  3.30it/s]

['What were his five papers from 1905 about?', 'What theory did he pen around a decade later?', 'What inventions are listed as being made possible due to his ideas and theories?', 'How did he feel about his children?', 'What did he use to charm people?', 'How old was Chuck when he came home from the hospital?', 'Why did he want to come home?', 'What did he give the person in the story?']
Generated ans: ['15 years', '15 years', '15 years', '14,000', '15 years', 'He was could to it.', 'He was could to it.', 'He was could to it.']
True ans: ['Space and time', 'Theory of relativity.', 'Computers and satellites are two of them', 'He was indifferent toward them', 'Poetry', 'thirteen', 'to die', 'a piece of crumpled paper']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.28571428571428575, 0.0]


 82%|█████████████████████████████████████████████████████████████████▋              | 110/134 [00:33<00:07,  3.20it/s]

['What was written on the paper?', 'What was on the list?', 'What was one of the times he had fun with his family?', 'Why did his family get pulled over?', "What was the policewoman's reaction to the costumes?", 'What joke did she tell?', 'Why did his dad say they were speeding?', 'who died?']
Generated ans: ['He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was had to a the be.']
True ans: ['a list', 'all the fun he had with his family', 'when they dressed up as fruits', 'speeding', 'she laughed', '"Where are you all heading -- a salad bar?"', 'because his kids were getting so ripe that they were starting to draw flies', 'Olivia Wise']
SQuAD F1 score: [0.0, 0.16666666666666666, 0.0, 0.0, 0.0, 0.0, 0.10526315789473682, 0.0]


 83%|██████████████████████████████████████████████████████████████████▎             | 111/134 [00:33<00:07,  3.12it/s]

['on what day?', 'did he die of natural causes?', 'what killed her?', 'was she 47 when she passed?', 'how old was she?', 'was she well known?', 'for what?', 'did she do that in her bathroom?']
Generated ans: ['He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.']
True ans: ['Monday', 'Yes', 'a brain tumor', 'No', '16', 'Yes', 'she recorded a song and it went viral', 'No']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 84%|██████████████████████████████████████████████████████████████████▊             | 112/134 [00:34<00:07,  3.05it/s]

['where did she do it?', 'where was it located?', 'what month did this happen?', 'was it an original work?', 'what was it?', "who's?", 'what was it called?', 'is that a popular work?']
Generated ans: ['He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.']
True ans: ['in a studio', 'Toronto', 'September', 'No', "a cover of someone else's song", 'Katy Perry', 'Roar', 'Yes']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 84%|███████████████████████████████████████████████████████████████████▍            | 113/134 [00:34<00:07,  2.86it/s]

['where did she release it?', 'when?', 'did the original artist know about it?', 'did she like it?', 'who was weary ?', 'of what ?', 'what did they disire ?', 'what would they welcome ?']
Generated ans: ['He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He was had to a the be.', 'He could was has had is to a the be of not', 'He could was has had is to a the be of not', 'He could was has had is to a the be of not', 'He could was has had is to a the be of not']
True ans: ['YouTube', 'October', 'Yes', 'Yes', 'England', 'Samoa', 'peace', 'result']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 85%|████████████████████████████████████████████████████████████████████            | 114/134 [00:35<00:07,  2.72it/s]

['of what ?', 'who was found of smoma ?', 'was he famous ?', 'did the story teller like becker ?', 'why ?', 'and what else ?', 'who followed him ?', 'did he think he could fix something ?']
Generated ans: ['He could was has had is to a the be of not', 'He could was has had is to a the be of not', 'He could was has had is to a the be of not', 'He could was has had to a the be of his is', 'He could was has had is to a the be of not', 'He could was has had is to a the be of not', 'He could was has had is to a the be of not', 'He could was has had to a the be of his is']
True ans: ['German settlement', 'Dr. Knappe', 'yes', 'no', 'seems to me both false', 'foolish', 'Knappe', 'yes']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.13333333333333333, 0.0, 0.0, 0.0]


 86%|████████████████████████████████████████████████████████████████████▋           | 115/134 [00:35<00:07,  2.63it/s]

['what was it ?', 'with who ?', 'how had fever ?', 'and where was the stay ?', 'who was ready to wink ?', 'during what ?', 'how many atates are mentioned ?', 'name one ?']
Generated ans: ['He could was has had is to a the be of not', 'He could was has had is to a the be of not', 'He could was has had is to a the be of not', 'He could was has had is to a the be of not', 'He could was has had is to a the be of not', 'He could was has had is to a the be of not', 'He could was has had is to a the be of not', 'He could was has had is to a the be of not']
True ans: ['breach', 'English consul', 'New Guinea', 'islands', 'English consul', 'the process', 'Three', 'suspicion']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 87%|█████████████████████████████████████████████████████████████████████▎          | 116/134 [00:35<00:06,  2.93it/s]

['When was the Linux kernel released?', 'By whom?', 'True or False: Most Linux distributions are just the kernel.', 'What is something many Linux distributions share?', 'Do they also share libraries?', 'Where do many of those come from?', 'What organization uses the GNU name?', 'Does the Chromebook use something from Linux?']
Generated ans: ['EU of the Europe', 'EU of the Europe', 'EU of the Europe', 'EU of the Europe', 'Republican of the', 'EU of the Europe', 'EU of the Europe', 'Republican of the']
True ans: ['September 17, 1991', 'Linus Torvalds', 'false', 'use of the word "Linux" in their name', 'yes', 'GNU project', 'TiVo', 'yes']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.2, 0.0, 0.0, 0.0, 0.0]


 87%|█████████████████████████████████████████████████████████████████████▊          | 117/134 [00:36<00:05,  3.20it/s]

['What?', 'What market does it lead?', 'Does the Chromebook cost less than three hundred dollars?', 'What percentage of notebooks sold for less than three hundred are Chromebooks?', 'What was Linux first made for?', 'What structure was it based on?', 'Is it on phones now?', 'What kind?']
Generated ans: ['EU of the Europe', 'EU of the Europe', 'Republican of the', 'Republican of the', 'EU of the Europe', 'EU of the Europe', 'Republican of the', 'EU of the Europe']
True ans: ['Chrome\xa0OS', 'US K–12 education', 'yes', 'nearly 20%', 'personal computers', 'Intel x86', 'yes', 'smartphones']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 88%|██████████████████████████████████████████████████████████████████████▍         | 118/134 [00:36<00:05,  2.98it/s]

['What is an example of a big iron system?', 'And one more?', 'True or False: Linux is used by the majority of desktop computers.', 'What percentage uses it?', 'Who had died?', 'What did the servant need to get?', 'For whom?', 'What else did he need to get ready?']
Generated ans: ['Republican of the', 'EU of the Europe', 'EU of the Europe', 'EU of the Europe', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['mainframe computers', 'TOP500 supercomputers.', 'no', '2.3%', 'Isabella', 'To get mourning', 'His daughter', 'Arrange a room']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.15384615384615383, 0.0, 0.0]


 89%|███████████████████████████████████████████████████████████████████████         | 119/134 [00:36<00:05,  2.81it/s]

['For whom?', 'How was the news of the tragedy delivered?', 'How was it decorated?', 'Whose home-coming did it announce?', 'Who was excited about the home-coming?', 'What was she wearing the night of the return?', 'Was she very sad?', 'Did she go to greet them alone?']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ["For the master's nephew", 'By letter', 'Edged with black', "The master's", 'Catherine', 'A black frock', 'No', 'No']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 90%|███████████████████████████████████████████████████████████████████████▋        | 120/134 [00:37<00:05,  2.68it/s]

['Who was she talking about?', 'Was he older than her, or younger?', 'By how much?', 'Did they walk quickly?', 'What had her aunt sent her?', 'Where did she keep it?', 'What was the name of the servant who had accompanied her?', 'Did she walk to the gate?']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['Linton', 'Younger', 'Six months', 'No', "A lock of Linton's hair", 'In a little glass box', 'Ellen', 'No']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 90%|████████████████████████████████████████████████████████████████████████▏       | 121/134 [00:37<00:04,  2.62it/s]

['where are people sending cards?', 'a person?', 'who is sending things?', 'from where?', 'where is Big Ben?', 'where at?', 'what is it?', 'how is it displayed?']
Generated ans: ['He could was had to a be of the his it is', 'He could was have had to a be the not it is', 'He could was had to a be of the his it is', 'He could was have had to a be the not it is', 'He could was had to a be of the his it is', 'He could was have had to a be the not it is', 'He could was have had to a be the not it is', 'He could was had to a be of the his it is']
True ans: ['to Big Ben', 'No', 'People', 'all over the world', 'London', 'in a tower', 'the parliament building', 'hanging up']
SQuAD F1 score: [0.15384615384615383, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 91%|████████████████████████████████████████████████████████████████████████▊       | 122/134 [00:38<00:05,  2.29it/s]

['what are the 4 faces?', 'are they considered friendly?', 'who enjoys seeing them?', 'what sound does it make?', 'how is the sound made?', 'how often?', 'what did Big Ben replace?', 'why?']
Generated ans: ['He could was had to a be of the his it is', 'He could was had to a be of the his it is', 'He could was had to a be of the his it is', 'He could was have had to a be the not it is', 'He could was have had to a be the not it is', 'He could was have had to a be the not it is', 'He could was had to a be of the his it is', 'He could was have had to a be the not it is']
True ans: ['clocks', 'YEs', 'The people', 'Bong!', 'the bell striking', 'on the hour', 'the old parliament clock', 'the old parliament building was burned down']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.125]


 92%|█████████████████████████████████████████████████████████████████████████▍      | 123/134 [00:38<00:04,  2.23it/s]

['when?', "how long did it take to complete Big Ben's clock?", 'how much longer for the tower?', 'who was it named after?', 'what is the article about?', 'how is it defined?', 'what must it do to work correctly?', 'are they also called something else?']
Generated ans: ['He could was have had to a be the not it is', 'He could was had to a be of the his it is', 'He could was have had to a be the not it is', 'He could was had to a be of the his it is', '14,000', '14,000', '14,000', '14,000']
True ans: ['in 1834', 'two years', 'Five more years', 'Benjamin Hall', 'The immune system', 'system of many biological structures', 'detect a wide variety of agents', 'pathogens']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 93%|██████████████████████████████████████████████████████████████████████████      | 124/134 [00:38<00:04,  2.46it/s]

['is the worm helpful?', 'do pathogens change?', 'slowly?', 'how are brains protected?', 'are they solid?', 'what are they composed of?', 'What are they trying to improve?', 'What is that?']
Generated ans: ['14,000', '14,000', '14,000', '14,000', '14,000', '14,000', 'April,000', 'April 25']
True ans: ['No', 'Yes', 'No', 'brain barriers separate the peripheral immune system from the neuroimmune system', 'No', 'blood and fluids', 'SAT', 'college exam']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 93%|██████████████████████████████████████████████████████████████████████████▋     | 125/134 [00:39<00:03,  2.84it/s]

['Is it an important one?', 'About how many people take it?', 'WHat is one of the updates to it?', 'When was that last used?', 'Does everyone have to take it on paper?', 'Is this a new development?', 'WHat is one of the ways to prepare for it?', 'WIll this have long term benefits?']
Generated ans: ['April 25', 'April,000', 'April 25', 'April,000', 'April,000', 'April 25', 'April 25', 'April 25']
True ans: ['yes', '1.7 million students', 'Scoring will return to a 1,600-point scale', '2004', 'no', 'yes', 'flashcards', 'no']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 94%|███████████████████████████████████████████████████████████████████████████▏    | 126/134 [00:39<00:02,  3.17it/s]

['Is there a written portion?', 'Is it mandatory?', 'Who decides whether it is necessary?', 'What other topics are covered?', 'Is this test the only option?', 'What is the other(s)?', 'Where is it preferred?', 'Why?']
Generated ans: ['April,000', 'April,000', 'April,000', 'April,000', 'April,000', 'April,000', 'April,000', 'April 25']
True ans: ['yes', 'no', 'students', 'a few areas, like algebra', 'no', 'ACT', 'central U.S.', "it is taken by almost every junior in 13 states as part of those states' testing scheme ."]
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 95%|███████████████████████████████████████████████████████████████████████████▊    | 127/134 [00:39<00:02,  3.07it/s]

['WHat is a criticsm against the SAT?', 'How old is Cob?', 'Are his parents alive?', 'What is he sick with?', 'Who is his new mother?', 'Does he have any siblings?', 'Where does he live?', 'What does he do during the day?']
Generated ans: ['April 25', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.']
True ans: ['students from wealthier families do better on the exam', 'six', 'no', 'a disease', 'Joy', 'Amy', 'on the farm', 'watched adults and elder children bicycle']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 96%|████████████████████████████████████████████████████████████████████████████▍   | 128/134 [00:40<00:01,  3.10it/s]

['Does he ride a bike?', 'What about now?', 'Was it given to him?', 'How often did he ride it?', 'Who is the suspect?', 'What is she suspected of doing?', 'What is her name?', 'How old was she?']
Generated ans: ['He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'toNo.', 'toNo.', 'toNo.', 'toNo.']
True ans: ['not at first', 'yes', 'yes', 'every day', 'Casey Anthony', 'being involved in the disappearance of her daughter', 'Caylee', 'Three']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 96%|█████████████████████████████████████████████████████████████████████████████   | 129/134 [00:40<00:01,  3.39it/s]

['Has the mother cooperated with the authorities?', 'In what ways has she been un cooperative?', 'When was she arrested?', 'Is she on Bail?', 'Was this a famous case?', 'How so?', 'Do cops think the daughter is living?', 'What makes them think this?']
Generated ans: ['toNo.', 'toNo.', 'toNo.', 'toNo.', 'toNo.', 'toNo.', 'toNo.', 'toNo.']
True ans: ['no', "she's led investigators down the wrong path and lied to them", 'on July 16', 'yes', 'yes', 'Anthony and her daughter have garnered national headlines and served as fodder for nightly crime shows.', 'no', "investigators found evidence of human decomposition in the trunk of Anthony's car"]
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 97%|█████████████████████████████████████████████████████████████████████████████▌  | 130/134 [00:40<00:01,  3.49it/s]

['What was the mother accused of ?', 'Where was evidence found?', 'What was found on the computer?', 'Who held vigils?', 'Where were they held (vigils)?', 'What was the mother officially charged with?', 'When did Caylee dissapear?', 'Who did nancy Grace interview?']
Generated ans: ['toNo.', 'toNo.', 'toNo.', 'toNo.', 'toNo.', 'toNo.', 'toNo.', 'toNo.']
True ans: ["her daughter's disappearance", 'in the car', 'internet searches for chloroform', 'protesters', "outside Anthony's home", 'child neglect, making false official statements and obstructing a criminal investigation', 'June', 'Casey Anthony\'s "babysitter"']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 99%|██████████████████████████████████████████████████████████████████████████████▊ | 132/134 [00:41<00:00,  4.02it/s]

['Were there visitors?', 'What happened to them?', 'How?', 'When did it happen?', 'Who arrested them?', 'Was one of the visitors religious?', 'What nationality were they?', 'What does DPRK mean?']
Generated ans: ['14 years.', '14 years', '14 years.', '14 years', '14 years', '14 years', '14 years.', '14 years.']
True ans: ['yes', 'they were detained', 'for"perpetrating hostile acts."', 'one was in late April', 'North Korea', 'yes', 'American', "Democratic People's Republic of Korea."]
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
['Was there any reason for the arrest?', 'what village was the avalanche?', 'how many different avalanches are mentioned?', 'what region are they generally a threat during the winter?', 'how far is Peth Hallan from Srinagar?', 'how many people were killed Friday?', 'how many days of snowfall were there?', 'is it a flat region or a mountainous one?']
Generated ans: ['14 years', '14 years', '14 years', '14 years', '14 years', '14 years', '14 years', '1

 99%|███████████████████████████████████████████████████████████████████████████████▍| 133/134 [00:41<00:00,  4.24it/s]

['name another place that had an avalanche', 'how far is that from Srinagar?', 'did anyone die in that avalanche?', 'were they all men?', 'how far is 75 miles in kilometers?', 'where did another avalanche occur?', 'where is that located?', 'when did it hit?']
Generated ans: ['14 years', '14 years', '14 years', '14 years', '14 years', '14 years', '14 years', '14 years']
True ans: ['Nayal', '75 miles', 'three people', 'no', '120', 'Gulab Bagh', 'north Kashmir', 'Friday morning']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


100%|████████████████████████████████████████████████████████████████████████████████| 134/134 [00:41<00:00,  3.22it/s]

['was anyone injured?', 'how many?', 'was anyone killed?', 'how many people were moved from Waltengo Nar and Gulab Bagh?', 'what were they put up in?', "Where was his dad's fishing gear?", 'What type of flying object frightened Alex?', 'According to his dad, did he have a reason to be frightened by it?']
Generated ans: ['14 years', '14 years', '14 years', '14 years', '14 years', 'He was could to a the play.', 'He was could to a the play.', 'He was could to a the play.']
True ans: ['yes', 'Three', 'yes', '300', 'makeshift rescue centers', 'the porch', 'a bat', 'no']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Mean SQuAD F1-score: 0.013805417236112767





Dataset({
    features: ['context', 'question', 'text', 'answer', 'history_context', 'source', 'generated_answer', 'squad_score'],
    num_rows: 1072
})

#### Evaluation on the Test Set

In [30]:
# Create a DataLoader for the dataset using the data collator
test_loader_M1 = torch.utils.data.DataLoader(tokenized_datasets_M1['test'], 
                                          batch_size=batch_size, 
                                          collate_fn=data_collator_M1)

# Generate answers
dataset_COQA['test'] = generate_answers(test_loader_M1,model_M1,tokenizer_M1,dataset_COQA['test'])

  2%|█▋                                                                                 | 1/49 [00:00<00:24,  1.98it/s]

["What's Rubio going to decide in the next few weeks?", 'Does he feel confident about it?', 'What policy is he not in favor of?', 'What type of people does he think are dangerous to the West?', 'Who are some likely competition to him?', 'Does he think they have a lot of money and credibility?', 'What was the title of his book?', 'When was it released?']
Generated ans: ['to20', 'to15 years', 'to20', 'to20', 'to15 years', 'to15 years', 'to15 years', 'to20']
True ans: ['running for president', 'yes', 'immigration', 'Radicalized individuals', 'Mitt Romney and Jeb Bush', 'yes', 'American Dreams', 'Tuesday']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


  6%|█████                                                                              | 3/49 [00:00<00:12,  3.64it/s]

["Has he decided if he'd be a better president or senator?", 'What happens the longer you wait?', "When's Romney considering making his own bid?", 'What type of resources are Bush and Romney trying to get?', 'What branch of Chemistry seeks ways to use raw materials to make industrial products?', 'Where do these raw materials originate from?', 'What is chemurgy called today?', "What was Carver's full name?"]
Generated ans: ['to15 years', 'to15 years', 'to15 years', 'to20', '14 years', '14 years', '14 years', '14 years']
True ans: ['not yet', 'the harder it becomes', '2016', 'big-money supporters', 'chemurgy.', 'farm products', 'the science of synthetics', 'George Washington Carver']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
['Was his initial step in his process to analyze parts of plants?', 'To determine what?', 'Did he combine substances to make new things?', 'What did he not care about in regards to the products he made?', 'Where was he a scientist?', 'Did he get offers

  8%|██████▊                                                                            | 4/49 [00:01<00:13,  3.35it/s]

['On what kind of disease was he an expert?', 'Did he have superior knowledge on one type especially?', 'Which one?', 'Where did he send his specimens?', 'What was the last name of the man who invented the electric light?', 'And his first name?', 'What did he offer Carver?', 'was justin sad?']
Generated ans: ['14 years', '14 years', '14 years', '14 years', '14 years', '14 years', '14 years', 'He was could to it.']
True ans: ['plant disease', 'Yes', 'the fungus variety', 'the United States Department of Agriculture.', 'Edison', 'Thomas', 'a laboratory in Detroit to carry out food research.', 'no']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 10%|████████▍                                                                          | 5/49 [00:01<00:13,  3.15it/s]

['was he overjoyed?', 'why?', 'did he receive gifts', 'what were they?', 'who greeted Justin in after he woke up?', 'what types of diversion did the kids engage in?', 'which ones?', 'did they enjoy food?']
Generated ans: ['He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.']
True ans: ['yes', 'it was his birthday.', 'yes', 'a basketball, a robot toy, a new bike and some super hero toys', 'his mom', 'They played games', 'tag and football.', 'yes']
SQuAD F1 score: [0.0, 0.4444444444444445, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 12%|██████████▏                                                                        | 6/49 [00:01<00:13,  3.21it/s]

['what food?', 'who was at the function?', 'did they bring anything?', 'what?', 'was Justin grateful?', 'did it take him a long time to snooze?', 'why?', 'did he go to sleep sad?']
Generated ans: ['He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.']
True ans: ['cake and ice cream', "Justin's friends", 'yes', 'presents.', 'yes', 'no', 'because of the exciting day he had', 'no']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1818181818181818, 0.0]


 14%|███████████▊                                                                       | 7/49 [00:02<00:17,  2.47it/s]

['What is the title of the chapter?', 'Who is badly injured?', 'Does Dave think he will recover?', 'What does he tell Merwell to do?', 'How did Dave leave?', 'What did he throw off the boat?', 'What insult had Dave been called?', 'True or False: This made Dave very angry.']
Generated ans: ['He could had was would to not a the have it is', 'He could had was would to not a the have it is', 'He could had was would to not a the have it is', 'He could had was would to not a the have it is', 'He could had was would to not a the have it is', 'He could had was would to not a the have it is', 'He could had was would to not a the have it is', 'He could had was would to not a the have it is']
True ans: ['The meeting of the gee eyes', 'Merwell', 'yes', 'keep a civil tongue in your head', 'he left on an ice-boat', 'three skates', '"poorhouse rat"', 'true']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.14285714285714288, 0.0, 0.0, 0.0]


 16%|█████████████▌                                                                     | 8/49 [00:02<00:16,  2.47it/s]

['Was he able to keep his temper?', 'Had he ever been able to?', 'Who does Dave say he wants to find?', 'Did Poole want him to leave?', 'What was Poole afraid Dave would do to him?', 'Who did he threaten to tell if Dave did?', 'Did Dave hit him?', 'Where does Dave say Poole needs to bring Merwell?']
Generated ans: ['He could had was would to not a the have it is', 'He could had was would to not a the have it is', 'He could had was would to not a the have it is', 'He could had was would to not a the have it is', 'He could had was would to not a the have it is', 'He could had was would to not a the have it is', 'He could had was would to not a the have it is', 'He could had was would to not a the have it is']
True ans: ['no', 'no', 'Henshaw and the others.', 'no', 'hit him', 'Doctor Clay', 'no', 'the Hall']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 18%|███████████████▏                                                                   | 9/49 [00:03<00:15,  2.51it/s]

["What is Poole's first name?", "And Merwell's?", "What is Dave's last name?", 'Did Poole shout at him in a strong voice?', 'where are follicles located?', 'what layer?', 'do follicles produce wool?', 'does wool impede heat transfer?']
Generated ans: ['He could had was would to not a the have it is', 'He could had was would to not a the have it is', 'He could had was would to not a the have it is', 'He could had was would to not a the have it is', 'Republican of the town', 'Republican of the town', 'Republican of the town', 'Republican of the town']
True ans: ['Nat', 'Link', 'Porter', 'no', 'In the skin.', 'The upper layer.', 'Yes.', 'Yes.']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 20%|████████████████▋                                                                 | 10/49 [00:03<00:13,  2.85it/s]

['what desert people use wool?', 'who else?', 'what is wool?', 'what is wool from goats called?', 'is wool chemically similar to cotton?', 'what is cotton mainly made of?', 'can you get wool from rabbits?', 'what kind?']
Generated ans: ['Republican of the town', 'Republican of the town', 'Republican of the town', 'Republican of the town', 'Republican of the town', 'Republican of the town', 'Republican of the town', 'Republican of the town']
True ans: ['Bedouins.', 'Tuaregs.', 'The textile fiber obtained from sheep and other animals', 'Mohair.', 'No.', 'Cellulose.', 'Yes.', 'Angora.']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 24%|████████████████████                                                              | 12/49 [00:03<00:10,  3.56it/s]

['how many types of fiber do primary follicles produce', 'what are they?', 'do secondary follicles produce three types as well?', 'how many do they produce?', 'which type?', 'is wool bulkier than other textiles?', 'is it bad at retaining heat?', 'what do the fibers hold?']
Generated ans: ['Republican of the town', 'Republican of the town', 'Republican of the town', 'Republican of the town', 'Republican of the town', 'Republican of the town', 'Republican of the town', 'Republican of the town']
True ans: ['Three.', 'Kemp, medullated fibers and true wool fibers.', 'No.', 'One.', 'True wool fibers.', 'Yes.', 'No.', 'Air.']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
['What did the man in this story do better than anyone else?', 'How old was he when he won the Formula One?', 'Did he use the same auto when he ran?', 'Who does he rate as the best?', 'How many times did Jackie Stewart take the cup?', 'What makes the difference between good and great?', 'How many did Stewart take?'

 29%|███████████████████████▍                                                          | 14/49 [00:04<00:08,  4.24it/s]

['When?', 'Where?', 'What were his stats?', 'Is enforcing the law the entire goal of police?', 'What is their main activity concerned with?', 'In the 17-1800s, what was one other thing they were focused on?', 'Anything else?', 'Is it true that there has been corruption in the police department?']
Generated ans: ['15 years', '15 years', '15 years', '14 years', '14 years', '14 years', '14 years', '14 years']
True ans: ['1968', 'at Hockenheim', 'he won 25 of his 73 Formula One races', 'No', 'preservation of order', 'maintaining the class system', 'protection of private property', 'Yes']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
['Just a few instances?', 'Is the force paid for privately?', 'How is it funded then?', 'What do they call an entity like that?', 'Are all police forces paid that way?', 'Who gives them their power?', 'How many main tasks are they asked to do?', "Do they protect people's personal property?"]
Generated ans: ['14 years', '14 years', '14 years', '14 yea

 31%|█████████████████████████                                                         | 15/49 [00:04<00:08,  3.82it/s]

['What did Mrs. Smith deliver?', 'how did she travel to do it?', 'who made them?', 'who did she make them for?', 'how old were they?', 'on what day did she do this?', 'where did Mr.s Jones live?', 'what did he like?']
Generated ans: ['He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.']
True ans: ['pies', 'in a hot air balloon', 'Mrs. Smith', 'her neighbors', 'all ages', 'Sunday', 'down the street', 'strawberry pie.']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 33%|██████████████████████████▊                                                       | 16/49 [00:05<00:09,  3.57it/s]

['where did he pick his up from?', 'what did Mrs. Kenner like?', 'what would she gather hers in?', 'who enjoyed the chocolate kind?', 'were they siblings?', 'where were they when they received theirs?', 'how did they get there?', 'Who wanted the peach kind?']
Generated ans: ['He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.']
True ans: ['the roof', 'apple pie', 'a big basket', 'Bobby and Sue', 'yes', 'the top of a hill', 'on their bicycles', 'Mr. Tevo']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 35%|████████████████████████████▍                                                     | 17/49 [00:05<00:09,  3.46it/s]

['how would he get his?', 'How did Josh get his?', "what was it's name?", 'what did the whole neighborhood do as this happened?', 'what instrument did the worker take the boy to?', 'did the boy think he could do it?', 'what else did he try?', 'what else?']
Generated ans: ['He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.']
True ans: ['in a big box', 'his dog chased after them', 'Rex', 'clap', 'A piano', 'no', 'a guitar.', 'the drums']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 37%|██████████████████████████████                                                    | 18/49 [00:05<00:09,  3.26it/s]

['did he want the guitar?', 'what order did he try each of the instruments?', 'which did he pick?', 'did his parents like that choice?', 'why?', 'who talked her into it?', 'did the boy join a band?', 'was he still playing drums?']
Generated ans: ['He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.']
True ans: ['no', 'piano, guitar, drums', 'the drums', "his mom didn't", 'she thought that he would be too loud', "The boy's dad", 'yes', 'yes']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.15384615384615385, 0.0, 0.0, 0.0]


 39%|███████████████████████████████▊                                                  | 19/49 [00:05<00:09,  3.31it/s]

['for who?', 'when?', 'How long is the Mountain rampart?', 'How many roads does it have?', 'Is it well lit?', 'What blocks the light?', 'Does Friedrich want to put up a fight to keep it?', 'Where is he?']
Generated ans: ['He was could to it.', 'He was could to it.', 'toNo.', 'toYes.', 'tothe the father', 'tothe the father', 'toYes.', 'toYes.']
True ans: ['the most popular rock band in the world', 'Twenty years later', 'three or four hundred miles', 'twelve or twenty', 'no', 'endless Pandour doggery', 'no', 'Camenz']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 41%|█████████████████████████████████▍                                                | 20/49 [00:06<00:08,  3.53it/s]

['Who found him there?', 'What was there good news about?', 'Is Valori surprised about anything?', 'How many thing?', 'What is Neisse?', 'Has it been made stronger?', 'How was it before?', 'What else has gotten stronger?']
Generated ans: ['toYes.', 'toYes.', 'toYes.', 'tothe the father', 'toYes.', 'toYes.', 'tothe the father', 'tothe the father']
True ans: ['Valori', 'Fontenoy', 'yes', 'Two', 'a Fortress', 'yes', 'impregnable', 'the Army']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 43%|███████████████████████████████████▏                                              | 21/49 [00:06<00:08,  3.47it/s]

['Is Valori a woman or man?', 'What has been untended?', "Who was Winterfeld's chief?", 'Where is he?', 'What does he have with him?', 'Where is Margraf Karl?', 'How many people went to the fair?', 'who?']
Generated ans: ['toYes.', 'toYes.', 'tothe the father', 'toYes.', 'toYes.', 'tothe the father', 'He was could to it.', 'He was could to it.']
True ans: ['a man', 'Upper Silesia', 'General Hautcharmoi', 'Ratibor', 'his small Detachment', 'at Jagerndorf', 'Two', 'Billy and Sandy']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 45%|████████████████████████████████████▊                                             | 22/49 [00:06<00:08,  3.30it/s]

['what fair?', 'who else was going to be there?', 'who was he?', 'was he funny?', "what'd they do so they were allowed to go?", 'did they clean their rooms too?', 'Was Bob there?', 'what was he doing?']
Generated ans: ['He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.']
True ans: ['the one in the neighborhood', 'Bob', 'the clown', 'Yes', 'Their chores', 'Yes', 'Yes', 'giving cotton candy']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 47%|██████████████████████████████████████▍                                           | 23/49 [00:07<00:07,  3.28it/s]

['anything else?', 'Was there a line?', 'Is Joey a male or female?', 'Who is he?', 'Who named him?', 'Who made him?', 'Why did she make him?', 'When was spaghetti night?']
Generated ans: ['He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to a it.', 'He was could to it.', 'He was could to a it.', 'He was could to a it.', 'He was could to it.']
True ans: ['candy apples', 'yes', 'Male.', 'A piece of spaghetti.', 'Marsha', 'Her mom.', 'It was spaghetti night.', 'Tuesday night.']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4444444444444445, 0.0]


 49%|████████████████████████████████████████▏                                         | 24/49 [00:07<00:07,  3.29it/s]

['Where does he live?', 'Does he stay fresh?', 'How?', 'Does she carry him?', 'What does buenos aires mean?', 'does it belong to the provinces?', 'what was it considered?', 'when was it removed from the province?']
Generated ans: ['He was could to a it.', 'He was could to a it.', 'He was could to a it.', 'He was could to a it.', '14th years', '14th years', '14th years', '14th years']
True ans: ['A plastic bag.', 'Yes.', "Marsha's mom told her to soak him in water every few days.", 'es', '"fair winds" or "good airs"', 'No', 'it is an autonomous district', '1880']
SQuAD F1 score: [0.0, 0.0, 0.11764705882352941, 0.0, 0.0, 0.0, 0.0, 0.0]


 51%|█████████████████████████████████████████▊                                        | 25/49 [00:07<00:06,  3.56it/s]

['it is the capial of what?', 'what is its population?', 'what part of the continent can you find it?', 'Who did the citizens elect in 1996?', 'also known as?', 'did they always elect mayors?', 'how was it done before?', 'by who?']
Generated ans: ['14th years', '14th years', '14th years', '14th years', '14th years', '14th years', '14th years', '14th years']
True ans: ['Argentina', '17 million', 'southeastern coast', 'chief of government', 'mayor', 'No', 'directly appointed', 'the President of the Republic']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 55%|█████████████████████████████████████████████▏                                    | 27/49 [00:08<00:05,  4.05it/s]

['what was its origanal name?', 'how about its formal name?', 'what does that translate to?', 'How many towns were added to the city limits after it was removed from the province?', 'what were they?', 'are they still part of the city?', 'What continent is Buenos Aires found on?', 'what century was it found in?']
Generated ans: ['14th years', '14th years', '14th years', '14th years', '14th years', '14th years', '14th years', '14th years']
True ans: ['Real de Nuestra Señora Santa María del Buen Ayre', 'Ciudad Autónoma de Buenos Aires', 'Autonomous City of Buenos Aires', 'Two', 'Belgrano and Flores', 'Yes', 'South America', '16th']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
["How long was the Prince's trip?", 'On what day does it begin?', 'Where?', 'In what country?', 'And where would the end?', 'Was the next day as full of excitement as the first?', 'Where did he go?', 'What is that?']
Generated ans: ['14 years', '14 years', '14 years', '14 years', '14 years', '14 years', '

 57%|██████████████████████████████████████████████▊                                   | 28/49 [00:08<00:04,  4.26it/s]

['Did he admire the temple from afar?', 'In what country is he expected to see military stuff?', 'Who would be putting on the show?', 'What would he visit on Sunday?', 'Is that all?', 'What else would he see?', 'what is the article about?', 'is it quantitative?']
Generated ans: ['14 years', '14 years', '14 years', '14 years', '14 years', '14 years', '14 years', '14 years']
True ans: ['No', 'The Bahamas', 'the Royal Bahamian Defence Force', 'Rawson Square', 'No', 'everal Bahamian islands', 'Molecular biology', 'yes']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 61%|██████████████████████████████████████████████████▏                               | 30/49 [00:08<00:04,  4.63it/s]

['what is molecular biology?', 'who is mentioned as describing it?', 'when?', 'in?', 'do the researchers use general techniques or specific?', 'what do they combine them with?', 'when was the study of gene carried out?', 'what does biology focus on?']
Generated ans: ['14 years', '14 years', '14 years', '14 years', '14 years', '14 years', '14 years', '14 years']
True ans: ['concerns the molecular basis of biological activity between biomolecules in the various systems of a cel', 'William Astbury', '1961', '"Nature', 'specific', 'techniques and ideas from genetics and biochemistry.', '. In the early 2000s', 'molecules']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
['how?', 'or?', 'what happens indirectly?', 'has there been a tradition used?', 'and has it been used for long?', 'what does the tradition study?', 'in what way?', 'was the 2000 study prominent?']
Generated ans: ['14 years', '14 years', '14 years', '14 years', '14 years', '14 years', '14 years', '14 years']
True ans

 63%|███████████████████████████████████████████████████▉                              | 31/49 [00:09<00:05,  3.56it/s]

['as what?', 'has computer science been used?', 'Who felt there was a silence before the storm?', 'Who stood in the centre of the little group?', 'Whose elbow was on the mantelpiece?', 'Who looked anxiously at them?', 'And who stood on the side and held his peace?', 'What was beginning to slowly ebb away from Saton?']
Generated ans: ['14 years', '14 years', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['sub-fields of molecular biology', 'yes', 'Even Saton', 'Saton and Lois', 'Rochester', 'Mary', 'Vandermere', 'courage']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 65%|█████████████████████████████████████████████████████▌                            | 32/49 [00:09<00:06,  2.83it/s]

['Whose hand was he holding?', 'Had she consented to be his wife?', 'Who had shrunk back, terrified?', "Where did Saton say he could take Lois if she couldn't be married there?", 'What is the title of the chapter?', 'When Saton turned toward Rochester was he defiant or scared?', "Who was forbidden to enter the house (Rochester's)?", 'What else was he not supposed to do?']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['Lois', 'yes', 'Pauline', 'the Comtesse', 'THE CHARLATAN UNMASKED', 'defiant', 'Saton', "hold any communication with Rochester's ward"]
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 67%|███████████████████████████████████████████████████████▏                          | 33/49 [00:10<00:05,  2.68it/s]

['What is Lois today?', "Do they still have to have Rochestere's approval?", "Was Saton still holding Lois' hand when he turned to Rochester again?", 'What number chapter is this?', 'Did Saton obey 2 conditions set?', 'Who was in a temper?', 'What kind of temper?', 'Who was restraining him?']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['she is her own mistress', 'no', 'yes', 'XXXVI', 'no', 'Dan', 'fine temper', 'Seth,']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 69%|████████████████████████████████████████████████████████▉                         | 34/49 [00:10<00:05,  2.63it/s]

['How did he turn to his partner?', 'How did seth reply to him?', "What happened to Seth's house?", 'How much did the man get paid?', 'Was the man a runaway criminal?', "Why didn't the man raise his hands?", 'What was his name?', 'What does dan think is a shame?']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it']
True ans: ['savagely', 'soothingly.', ',burned down', 'fifteen cents', 'yes', 'he felt bad', 'Jip Collins', 'letting him go']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.15384615384615383, 0.0, 0.0]


 71%|██████████████████████████████████████████████████████████▌                       | 35/49 [00:10<00:05,  2.63it/s]

['True or False: Seth and Dan are partners.', 'True or False: Seth thinks Dan should have beaten up Jip.', 'What kind of name does Seth want dan to avoid?', 'Is Dan okay with having that name?', 'How many children does he have?', 'Who is he talking to?', 'What show were they on?', 'What were they talking about?']
Generated ans: ['He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'He could had was would to not a be the have it', 'In the prison', 'In the prison', 'In the prison', 'In the prison']
True ans: ['true', 'false', 'a bruiser.', 'yes', 'Six.', 'A.J. Hammer.', 'Showbiz Tonight.', 'The film "Killing Them Softly,"']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 73%|████████████████████████████████████████████████████████████▏                     | 36/49 [00:11<00:04,  3.00it/s]

['Who is his partner?', 'What kind of show is it going to be?', 'Who is directing?', 'Where is he from?', 'What is intriguing about it?', 'What speeches are throughout ?', 'What else does he do besides act?', 'What is the nagging question?']
Generated ans: ['In the prison', 'In the prison', 'In the prison', 'In the prison', 'In the prison', 'In the prison', 'In the prison', 'In the prison']
True ans: ['Angelina Jolie.', 'A mob movie.', 'Andrew Dominik.', 'Australia,', 'The image is more important than the actual substance in America.', 'Political', 'An activist.', 'When he plans to marry Angelina Jolie.']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.1818181818181818, 0.0, 0.0, 0.0]


 76%|█████████████████████████████████████████████████████████████▉                    | 37/49 [00:11<00:03,  3.33it/s]

['What are the comparisons to?', 'What else does he pitch?', 'who made the announcement, the office of the president OR the the armed forces?', 'of what country?', 'do they have a history of unstable transitions?', 'who was absent Thursday morning?', 'what leadership had been changed?', 'who was the new commander of the Army?']
Generated ans: ['In the prison', 'In the prison', 'Republican of the', 'Republican of the', 'Republican of the', 'Republican of the', 'Republican of the', 'Republican of the']
True ans: ['The financial crisis.', 'Perfume.', 'the armed forces', 'Paraguay', 'yes', 'the President', 'military commanders', 'Brig. Gen. Bartolome Ramon Pineda Ortiz']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 78%|███████████████████████████████████████████████████████████████▌                  | 38/49 [00:11<00:03,  3.63it/s]

['were any commanders retained?', 'who?', 'who took over the Navy?', 'were other changes forthcoming?', 'who denied coup rumors?', 'when was the last coup?', 'were there other attempts?', 'when?']
Generated ans: ['Republican of the', 'Republican of the', 'Republican of the', 'Republican of the', 'Republican of the', 'Republican of the', 'Republican of the', 'Republican of the']
True ans: ['yes', 'Cibar Benitez', 'Rear Adm. Egberto Emerito Orie Benegas', 'yes', 'Benitez', '1989', 'yes', 'in 1996 and 2000']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 80%|█████████████████████████████████████████████████████████████████▎                | 39/49 [00:11<00:02,  3.71it/s]

['how many shakeups have there been since Lugo took office?', "what was Lugo's job previously", 'does he have any children?', 'was he a priest when he became a father?', 'was it considered normal, or shocking?', 'how has he fared working with the legislature?', "What's another name for the First Persian Empire?", 'Who founded it?']
Generated ans: ['Republican of the', 'Republican of the', 'Republican of the', 'Republican of the', 'Republican of the', 'Republican of the', 'Spain of the France', 'Spain of the France']
True ans: ['three', 'Catholic bishop', 'yes', 'yes', 'shocking', 'struggled', 'The Achaemenid Empire', 'Cyrus the Great']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 82%|██████████████████████████████████████████████████████████████████▉               | 40/49 [00:12<00:02,  3.84it/s]

['By when had Persians settled in the southwestern portion of the Iranian plateau?', 'Who did Cyrus the Great defeat from there?', 'Who did Alexander the Great admire?', 'By when did Alexander conquer most of the empire?', 'Was that empire one of the largest in history?', 'What did it extend to in the east?', 'And in the west?', 'How many kilometers was it?']
Generated ans: ['Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France']
True ans: ['the 7th century BC', 'the Medes, Lydia, and the Neo-Babylonian Empire', 'Cyrus the Great', 'by 330 BC', 'yes', 'the Indus Valley', 'the Balkans and Eastern Europe', '5.5 million square kilometers']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 84%|████████████████████████████████████████████████████████████████████▌             | 41/49 [00:12<00:02,  3.93it/s]

['What kind of administration is it notable for?', 'Through what?', 'Name one kind of infrastructure they built?', 'Can you name another?', 'And yet another?', 'And one more?', 'Who was it the antagonist for?', 'Is it known for emancipation?']
Generated ans: ['Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France', 'Spain of the France']
True ans: ['centralised and bureaucratic', 'satraps under the King of Kings', 'road systems', 'a postal system', 'civil services', 'a large professional army', 'the Greek city-states', 'yes']
SQuAD F1 score: [0.0, 0.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 86%|██████████████████████████████████████████████████████████████████████▎           | 42/49 [00:12<00:01,  4.02it/s]

['Of whom?', 'Did its sucesses inspire later empires?', 'What country is Tuscany in?', 'Which part of Italy is it in?', 'How many places there were named World Heritage Sites?', 'One of them is the center of what city?', 'Is that the capital?', 'What is its name in Italian?']
Generated ans: ['Spain of the France', 'Spain of the France', '14th years', '14th years', '14th years', '14th years', '14th years', '14th years']
True ans: ['the Jewish exiles in Babylon', 'yes', 'Italy', 'Central', 'Seven', 'Florence', 'Yes', 'Firenze']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 88%|███████████████████████████████████████████████████████████████████████▉          | 43/49 [00:12<00:01,  4.17it/s]

['Is the Leaning Tower of Pisa one of the heritage sites?', 'How many nature reserves are there?', 'How many tourists did they get in 2012?', 'What museums are in Tuscany?', 'Is Tuscany landlocked?', "What's the most visited location that borders water?", 'What movement was born there?', 'How many people live there?']
Generated ans: ['14th years', '14th years', '14th years', '14th years', '14th years', '14th years', '14th years', '14th years']
True ans: ['No', '120', '1.834 million', 'Uffizi and the Pitti Palace', 'No', 'Castiglione della Pescaia', 'Italian Renaissance', 'About 3.8 million']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 90%|█████████████████████████████████████████████████████████████████████████▋        | 44/49 [00:12<00:01,  4.29it/s]

['What Pienza location is a heritage site?', 'When was it designated that?', 'Which city had the second most tourists in the area?', 'Was Florence ranked higher or lower?', 'What alcohol is made there?', 'Does it have its own cultural identity?', "What's the name of the show the story is about?", 'What channel is it on?']
Generated ans: ['14th years', '14th years', '14th years', '14th years', '14th years', '14th years', '15,000', '15,000']
True ans: ['The Centre.', '1996.', 'Pisa', 'Higher', 'Wine.', 'Yes.', 'Running Man', 'SBS']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 92%|███████████████████████████████████████████████████████████████████████████▎      | 45/49 [00:13<00:00,  4.37it/s]

['Where is that channel based?', 'What day is the show aired?', 'And what day is it translated for the Internet?', 'What kind of show is it?', 'Who hosts it?', 'What kind of style does he have?', 'Do people like him?', "Who's the strongest man?"]
Generated ans: ['15,000', '15,000', '15,000', '15,000', '15,000', '15 years.', '15 years.', '15 years.']
True ans: ['South Korea', 'Sunday', 'Monday', 'variety show', 'Liu Zaishi', 'friendly, witty and lovely', 'yes', 'Jin Zhongguo']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 94%|████████████████████████████████████████████████████████████████████████████▉     | 46/49 [00:13<00:00,  4.34it/s]

['Does he have a nickname?', 'What\'s the meaning of "mong"?', 'Who is given that nickname?', 'Is Song Zhixiao a man or woman?', 'Why is she good at the mission?', 'Why do Koreans like the show so much?', 'Are they all on the same team?', 'What are some stars who have been on?']
Generated ans: ['15 years.', '15,000', '15 years.', '15,000', '15 years.', '15,000', '15 years.', '15 years.']
True ans: ['Sparta-kooks', 'confused', 'Song Zhixiao', 'woman', 'superior ability to capture', 'South Korean stars', 'no', "Li Minhao, Girls'Generation, Jin Xiuxian"]
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 96%|██████████████████████████████████████████████████████████████████████████████▋   | 47/49 [00:13<00:00,  4.04it/s]

['What can you learn from the show?', 'Who scored triple century for his country?', 'Was it the first for his country?', 'Which country he played for?', 'In what sports?', 'Who they were facing?', 'What was the ranking of England then?', 'What runs England needed to avoid innings defeat?']
Generated ans: ['15 years.', '14,000', '14,000', '14,000', '14,000', '14,000', '14,000', '14,000']
True ans: ['team spirit', 'Hashim Amla', 'Yes', 'South Africa', 'Cricket', 'England', 'No.1', '150']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


 98%|████████████████████████████████████████████████████████████████████████████████▎ | 48/49 [00:14<00:00,  3.68it/s]

['How many wickets were left?', 'What was the score in his stand for third wicket?', 'Who was his partner for that?', 'What was his scored at the stand?', 'What was crying?', 'what did the author feed it?', 'what did the author feed it?', 'then what did the cat do?']
Generated ans: ['14,000', '14,000', '14,000', '14,000', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.']
True ans: ['Six', '377', 'Jacques Kallis', '182', 'a cat', 'yes', 'fish', 'fell asleep']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


100%|██████████████████████████████████████████████████████████████████████████████████| 49/49 [00:14<00:00,  3.41it/s]

['where was the author?', 'was the father alive by then?', 'how far did the author travel?', 'who did the author check about the cat with?', 'what did he find out?', 'by who?', 'what did he call him?', 'how long has he had him?']
Generated ans: ['He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.', 'He was could to it.']
True ans: ["his father's apartment.", 'no', 'a thousand miles', 'neighbors', 'the cat was abandoned', 'his owner', 'Willis', 'five years']
SQuAD F1 score: [0.0, 0.0, 0.0, 0.0, 0.25, 0.0, 0.0, 0.0]
Mean SQuAD F1-score: 0.005920208576271001





## [Task 5] Question generation with text passage $P$, question $Q$ and dialogue history $H$ + [Task 6] Train and evaluate $f_\theta(P, Q, H)$

**[Task 5]**: We want to define $f_\theta(P, Q, H)$. Write your own script to implement $f_\theta$ for each model: M1 and M2.

#### Formulation

Consider a dialogue on text passage $P$. 

For each question $Q_i$ at dialogue turn $i$, your model should take $P$, $Q_i$, and $H = \{ Q_0, A_0, \dots, Q_{i-1}, A_{i-1} \}$ to generate $A_i$.

**[Task 6]**: Write your own script to train and evaluate your $f_\theta(P, Q, H)$.

#### Instructions

* Perform multiple train/evaluation seed runs: [42, 2022, 1337].$^1$
* Evaluate your models with the following metrics: SQUAD F1-score.$^2$
* Fine-tune each transformer-based models for **3 epochs**.
* Report evaluation SQUAD F1-score computed on the validation and test sets.

$^1$ Remember what we said about code reproducibility in Tutorial 2!

$^2$ You can use ```allennlp``` python package for a quick implementation of SQUAD F1-score: ```from allennlp_models.rc.tools import squad```. 

* [M1] DistilRoBERTa (distilberta-base)

In [55]:
# Tokenizing the Dataset
tokenized_datasets_M1_H = DatasetDict()

# Use the `prepare_features` functions
tokenized_datasets_M1_H['train'] = dataset_COQA['train'].map(
    lambda batch: prepare_features(batch, tokenizer_M1, max_length_input, max_length_answer, history=True),
    batched=True,
    batch_size=batch_size,
    remove_columns=dataset_COQA['train'].column_names 
    #remove_columns=[x for x in dataset_COQA['train'].column_names if x != 'source']
)

# Use the `prepare_features` functions
tokenized_datasets_M1_H['validation'] = dataset_COQA['validation'].map(
    lambda batch: prepare_features(batch, tokenizer_M1, max_length_input, max_length_answer, history=True),
    batched=True,
    batch_size=batch_size,
    remove_columns=dataset_COQA['validation'].column_names 
    #remove_columns=[x for x in dataset_COQA['train'].column_names if x != 'source']
)

# Use the `prepare_features` functions
tokenized_datasets_M1_H['test'] = dataset_COQA['test'].map(
    lambda batch: prepare_features(batch, tokenizer_M1, max_length_input, max_length_answer, history=True, test=True),
    batched=True,
    batch_size=batch_size,
    remove_columns=dataset_COQA['test'].column_names 
    #remove_columns=[x for x in dataset_COQA['train'].column_names if x != 'source']
)

print(tokenized_datasets_M1_H)

100%|███████████████████████████████████████████████████████████████████████████| 21455/21455 [01:43<00:00, 207.97ba/s]
100%|█████████████████████████████████████████████████████████████████████████████| 5363/5363 [00:27<00:00, 193.95ba/s]
100%|█████████████████████████████████████████████████████████████████████████████| 1979/1979 [00:09<00:00, 203.44ba/s]

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 85820
    })
    validation: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 21452
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 7916
    })
})





In [58]:
# Initialize the data collator
data_collator_M1_H = DataCollatorForSeq2Seq(tokenizer=tokenizer_M1, model=model_M1_H)

epochs = 3

# Training hyperparameters
training_args_M1_H = Seq2SeqTrainingArguments(
    output_dir='./M1_Checkpoints_H',
    evaluation_strategy="epoch",
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    predict_with_generate=True,
    overwrite_output_dir=True,
    fp16=True if device == 'cuda' else False, 
    num_train_epochs = epochs,
    weight_decay=0.01,
    logging_steps=10,
    load_best_model_at_end=True,
    save_total_limit=1
)
# Optimizer and scheduler
optimizer_M1_H = AdamW(model_M1_H.parameters(),lr= 5e-5)
train_steps  = epochs*len(tokenized_datasets_M1_H['train'])/batch_size
scheduler_M1_H = transformers.get_cosine_schedule_with_warmup(optimizer=optimizer_M1_H,num_warmup_steps=50,num_training_steps=train_steps)
optimizers_M1_H = optimizer_M1_H, scheduler_M1_H

trainer_M1 = Seq2SeqTrainer(
    model=model_M1_H,
    tokenizer=tokenizer_M1,
    args=training_args_M1_H,
    compute_metrics=lambda pred: compute_metrics(pred, tokenizer_M1),
    train_dataset=tokenized_datasets_M1_H['train'],
    eval_dataset=tokenized_datasets_M1_H['validation'],
    optimizers=optimizers_M1_H,
    data_collator=DataCollatorForSeq2Seq(tokenizer_M1,model=model_M1_H)
)

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
Using the `WAND_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


In [59]:
os.environ["WANDB_DISABLED"] = "true"

if not load_model:
    trainer_M1_H.train()

In [60]:
if not os.path.exists('model_M1_H_42'):
    os.makedirs('model_M1_H_42')
    
if not load_model:
    trainer_M1_H.save_model('model_M1_H_42')

#### Evaluation on the Validation Set

In [34]:
# Create a DataLoader for the dataset using the data collator
test_loader_M1_H = torch.utils.data.DataLoader(tokenized_datasets_M1_H['validation'], 
                                          batch_size=batch_size, 
                                          collate_fn=data_collator_M1_H)

# Generate answers
generate_answers(test_loader_M1_H,model_M1_H,tokenizer_M1,dataset_COQA['validation'])

NameError: name 'tokenized_datasets_M1_H' is not defined

#### Evaluation on the Test Set

In [None]:
# Create a DataLoader for the dataset using the data collator
test_loader_M1_H = torch.utils.data.DataLoader(tokenized_datasets_M1_H['test'], 
                                          batch_size=batch_size, 
                                          collate_fn=data_collator_M1_H)

# Generate answers
generate_answers(test_loader_M1_H,model_M1_H,tokenizer_M1,dataset_COQA['test'],verbose=True)

## [Task 7] Error Analysis

Perform a simple and short error analysis as follows:
* Group dialogues by ```source``` and report the worst 5 model errors for each source (w.r.t. SQUAD F1-score).
* Inspect observed results and try to provide some comments (e.g., do the models make errors when faced with a particular question type?)$^1$

$^1$ Check the [paper](https://arxiv.org/pdf/1808.07042.pdf) for some valuable information about question/answer types (e.g., Table 6, Table 8) 

In [33]:
show_worst_five(dataset_COQA['test'])


Source: cnn


Unnamed: 0,context,question,answer,generated_answer,squad_score
0,"(CNN)Sen. Marco Rubio said he'll make a decision on running for president in the next few weeks and feels confident that he could obtain the resources to ""credibly run a campaign and win,"" despite an increasingly crowded GOP field taking shape. In a wide-ranging interview with CNN's Wolf Blitzer, the Florida Republican also gave his thoughts on Republican efforts to defund the president's executive action on immigration, and further explained why he won't support the administration's new Cuba policy. 2016 Rubio, who released a new book ""American Dreams"" on Tuesday, said he's still deciding whether he thinks he can be more effective as president or as a senator under the new majority. He has already said he won't run for both offices in 2016. Marco Rubio: Radicalized individuals 'very real threat' to the West With power players Mitt Romney and Jeb Bush now considered likely contenders, Rubio said ""they're both credible and well-funded"" candidates but argued there would still be room for his campaign if he decides to plow forward. ""I'm confident that if we decide to run for president ... we will have the funding and the resources necessary to credibly run a campaign and win,"" he said. ""But I understand that the longer you wait, the harder it becomes to do that,"" he added. Romney tells donors he's considering 2016 bid Bush and Romney have already been in active in talking with big-money supporters and securing financial resources, making it strategically more difficult for other potential contenders like Rubio to lock down support from the GOP's donor class.",What's Rubio going to decide in the next few weeks?,running for president,to20,0.0
297,"Asuncion, Paraguay (CNN) -- Paraguay installed new top military commanders, but President Fernando Lugo, who had ordered the change in leadership, was not present for the ceremony. Lugo's absence Thursday morning attracted attention given his administration's silence on the sudden change in the leadership of the country's army, air force and navy. The president's decision to replace the top brass came a day after he publicly dismissed rumors about a military coup. Brig. Gen. Bartolome Ramon Pineda Ortiz was named as the new army commander. Brig. Gen. Hugo Gilberto Aranda Chamorro and Rear Adm. Egberto Emerito Orie Benegas took over the top posts at the air force and navy, respectively. The announcement came from the armed forces, not the president's office. Cibar Benitez, commander of the armed forces, was the only top leader to retain his post. Other changes would be forthcoming in the lower ranks, said Benitez at the swearing-in ceremony, but he denied there was any truth to talk of a coup. Paraguay's history is filled with unstable style=""display:inline"" transitions of power since it emerged from dictatorship in 1989. Although there hasn't been a coup since that year, there were attempted coups in 1996 and 2000, and President Raul Cubas resigned amid controversy in 1999. The military shakeup is the third since Lugo took office. The former Catholic bishop was elected to a five-year term last year. His victory brought an end to six decades of one-party rule in Paraguay, but the honeymoon did not last long. In April, Lugo admitted that he fathered a child while he was still a priest and that he may have fathered more. The revelation, which came as a shock to most, hurt his political image. Calls for his resignation began, and have continued as Lugo has struggled to push reforms through a majority-opposition legislature.",who?,Cibar Benitez,Republican of the,0.0
296,"Asuncion, Paraguay (CNN) -- Paraguay installed new top military commanders, but President Fernando Lugo, who had ordered the change in leadership, was not present for the ceremony. Lugo's absence Thursday morning attracted attention given his administration's silence on the sudden change in the leadership of the country's army, air force and navy. The president's decision to replace the top brass came a day after he publicly dismissed rumors about a military coup. Brig. Gen. Bartolome Ramon Pineda Ortiz was named as the new army commander. Brig. Gen. Hugo Gilberto Aranda Chamorro and Rear Adm. Egberto Emerito Orie Benegas took over the top posts at the air force and navy, respectively. The announcement came from the armed forces, not the president's office. Cibar Benitez, commander of the armed forces, was the only top leader to retain his post. Other changes would be forthcoming in the lower ranks, said Benitez at the swearing-in ceremony, but he denied there was any truth to talk of a coup. Paraguay's history is filled with unstable style=""display:inline"" transitions of power since it emerged from dictatorship in 1989. Although there hasn't been a coup since that year, there were attempted coups in 1996 and 2000, and President Raul Cubas resigned amid controversy in 1999. The military shakeup is the third since Lugo took office. The former Catholic bishop was elected to a five-year term last year. His victory brought an end to six decades of one-party rule in Paraguay, but the honeymoon did not last long. In April, Lugo admitted that he fathered a child while he was still a priest and that he may have fathered more. The revelation, which came as a shock to most, hurt his political image. Calls for his resignation began, and have continued as Lugo has struggled to push reforms through a majority-opposition legislature.",were any commanders retained?,yes,Republican of the,0.0
295,"Asuncion, Paraguay (CNN) -- Paraguay installed new top military commanders, but President Fernando Lugo, who had ordered the change in leadership, was not present for the ceremony. Lugo's absence Thursday morning attracted attention given his administration's silence on the sudden change in the leadership of the country's army, air force and navy. The president's decision to replace the top brass came a day after he publicly dismissed rumors about a military coup. Brig. Gen. Bartolome Ramon Pineda Ortiz was named as the new army commander. Brig. Gen. Hugo Gilberto Aranda Chamorro and Rear Adm. Egberto Emerito Orie Benegas took over the top posts at the air force and navy, respectively. The announcement came from the armed forces, not the president's office. Cibar Benitez, commander of the armed forces, was the only top leader to retain his post. Other changes would be forthcoming in the lower ranks, said Benitez at the swearing-in ceremony, but he denied there was any truth to talk of a coup. Paraguay's history is filled with unstable style=""display:inline"" transitions of power since it emerged from dictatorship in 1989. Although there hasn't been a coup since that year, there were attempted coups in 1996 and 2000, and President Raul Cubas resigned amid controversy in 1999. The military shakeup is the third since Lugo took office. The former Catholic bishop was elected to a five-year term last year. His victory brought an end to six decades of one-party rule in Paraguay, but the honeymoon did not last long. In April, Lugo admitted that he fathered a child while he was still a priest and that he may have fathered more. The revelation, which came as a shock to most, hurt his political image. Calls for his resignation began, and have continued as Lugo has struggled to push reforms through a majority-opposition legislature.",who was the new commander of the Army?,Brig. Gen. Bartolome Ramon Pineda Ortiz,Republican of the,0.0
294,"Asuncion, Paraguay (CNN) -- Paraguay installed new top military commanders, but President Fernando Lugo, who had ordered the change in leadership, was not present for the ceremony. Lugo's absence Thursday morning attracted attention given his administration's silence on the sudden change in the leadership of the country's army, air force and navy. The president's decision to replace the top brass came a day after he publicly dismissed rumors about a military coup. Brig. Gen. Bartolome Ramon Pineda Ortiz was named as the new army commander. Brig. Gen. Hugo Gilberto Aranda Chamorro and Rear Adm. Egberto Emerito Orie Benegas took over the top posts at the air force and navy, respectively. The announcement came from the armed forces, not the president's office. Cibar Benitez, commander of the armed forces, was the only top leader to retain his post. Other changes would be forthcoming in the lower ranks, said Benitez at the swearing-in ceremony, but he denied there was any truth to talk of a coup. Paraguay's history is filled with unstable style=""display:inline"" transitions of power since it emerged from dictatorship in 1989. Although there hasn't been a coup since that year, there were attempted coups in 1996 and 2000, and President Raul Cubas resigned amid controversy in 1999. The military shakeup is the third since Lugo took office. The former Catholic bishop was elected to a five-year term last year. His victory brought an end to six decades of one-party rule in Paraguay, but the honeymoon did not last long. In April, Lugo admitted that he fathered a child while he was still a priest and that he may have fathered more. The revelation, which came as a shock to most, hurt his political image. Calls for his resignation began, and have continued as Lugo has struggled to push reforms through a majority-opposition legislature.",what leadership had been changed?,military commanders,Republican of the,0.0



Source: race


Unnamed: 0,context,question,answer,generated_answer,squad_score
12,"George Washington Carver showed that plant life was more than just food for animals and humans. Carver's first step was to analyze plant parts to find out what they were made of. He then combined these simpler isolated substances with other substances to create new products. The branch of chemistry that studies and finds ways to use raw materials from farm products to make industrial products is called chemurgy. Carver was one of the first and greatest chemurgists of all time. Today the science of chemurgy is better known as the science of synthetics . Each day people depend on and use synthetics made from raw materials. All his life Carver battled against the disposal of waste materials, and warned of the growing need to develop substitutes for the natural substances being used up by humans. Carver never cared about getting credit for the new products he created. He never tried to patent his discoveries or get wealthy from them. He turned down many offers to leave Tuskegee Institute to become a scientist in private industry. Thomas Edison, inventor of the electric light, offered him a laboratory in Detroit to carry out food research. When the United States government made him a collaborator in the Mycology and Plant Disease Survey of the Department of Agriculture, he accepted the position with the understanding that he wouldn't leave Tuskegee. An authority on plant disease--especially of the fungus variety--Carver sent hundreds of specimens to the United States Department of Agriculture. At the peak of his career, Carver's fame and influence were known on every continent.",What branch of Chemistry seeks ways to use raw materials to make industrial products?,chemurgy.,14 years,0.0
357,"Running Man is a variety show which is aired in SBS, a famous South Korean TV channel. The show broadcasts on Sunday every week. The translation can be watched on the Internet every Monday. It's very interesting and funny. In the program, everyone should keep running. Here are some information of its hosts and hostess. Liu Zaishi, the main host of the show, is known as National Moderator(,). His friendly, witty and lovely hosting style makes him become one of the most popular hosts and comedians in South Korean. Jin Zhongguo, the strongest man on the show, is known as Sparta-kooks . During the race, he can capture others quickly. But sometimes, he can be very cute. Song Zhixiao, the beautiful actress who is also called Mong Zhi, where ""mong"" means ""confused"", because of her facial expressions which makes her seem confused. During the race mission, she is ace because of her superior ability to capture. Young people in Korea love the program very much. Why? Because some South Korean stars will be invited to take part in the race every week . They are divided into several teams with MCs. Many stars have participated in the program, for example, Li Minhao, Girls'Generation , Jin Xiuxian etc. What's more, the program is not only relaxing but also educational--- It teaches people the importance of team spirit.",What kind of style does he have?,"friendly, witty and lovely",15 years.,0.0
358,"Running Man is a variety show which is aired in SBS, a famous South Korean TV channel. The show broadcasts on Sunday every week. The translation can be watched on the Internet every Monday. It's very interesting and funny. In the program, everyone should keep running. Here are some information of its hosts and hostess. Liu Zaishi, the main host of the show, is known as National Moderator(,). His friendly, witty and lovely hosting style makes him become one of the most popular hosts and comedians in South Korean. Jin Zhongguo, the strongest man on the show, is known as Sparta-kooks . During the race, he can capture others quickly. But sometimes, he can be very cute. Song Zhixiao, the beautiful actress who is also called Mong Zhi, where ""mong"" means ""confused"", because of her facial expressions which makes her seem confused. During the race mission, she is ace because of her superior ability to capture. Young people in Korea love the program very much. Why? Because some South Korean stars will be invited to take part in the race every week . They are divided into several teams with MCs. Many stars have participated in the program, for example, Li Minhao, Girls'Generation , Jin Xiuxian etc. What's more, the program is not only relaxing but also educational--- It teaches people the importance of team spirit.",Do people like him?,yes,15 years.,0.0
359,"Running Man is a variety show which is aired in SBS, a famous South Korean TV channel. The show broadcasts on Sunday every week. The translation can be watched on the Internet every Monday. It's very interesting and funny. In the program, everyone should keep running. Here are some information of its hosts and hostess. Liu Zaishi, the main host of the show, is known as National Moderator(,). His friendly, witty and lovely hosting style makes him become one of the most popular hosts and comedians in South Korean. Jin Zhongguo, the strongest man on the show, is known as Sparta-kooks . During the race, he can capture others quickly. But sometimes, he can be very cute. Song Zhixiao, the beautiful actress who is also called Mong Zhi, where ""mong"" means ""confused"", because of her facial expressions which makes her seem confused. During the race mission, she is ace because of her superior ability to capture. Young people in Korea love the program very much. Why? Because some South Korean stars will be invited to take part in the race every week . They are divided into several teams with MCs. Many stars have participated in the program, for example, Li Minhao, Girls'Generation , Jin Xiuxian etc. What's more, the program is not only relaxing but also educational--- It teaches people the importance of team spirit.",Who's the strongest man?,Jin Zhongguo,15 years.,0.0
360,"Running Man is a variety show which is aired in SBS, a famous South Korean TV channel. The show broadcasts on Sunday every week. The translation can be watched on the Internet every Monday. It's very interesting and funny. In the program, everyone should keep running. Here are some information of its hosts and hostess. Liu Zaishi, the main host of the show, is known as National Moderator(,). His friendly, witty and lovely hosting style makes him become one of the most popular hosts and comedians in South Korean. Jin Zhongguo, the strongest man on the show, is known as Sparta-kooks . During the race, he can capture others quickly. But sometimes, he can be very cute. Song Zhixiao, the beautiful actress who is also called Mong Zhi, where ""mong"" means ""confused"", because of her facial expressions which makes her seem confused. During the race mission, she is ace because of her superior ability to capture. Young people in Korea love the program very much. Why? Because some South Korean stars will be invited to take part in the race every week . They are divided into several teams with MCs. Many stars have participated in the program, for example, Li Minhao, Girls'Generation , Jin Xiuxian etc. What's more, the program is not only relaxing but also educational--- It teaches people the importance of team spirit.",Does he have a nickname?,Sparta-kooks,15 years.,0.0



Source: mctest


Unnamed: 0,context,question,answer,generated_answer,squad_score
31,"One morning, Justin woke up very excited. He was excited because it was his birthday. He went downstairs to eat breakfast. When he got downstairs his mom said, ""Happy Birthday."" ""Thank you!"" Justin said. ""Am I going to have lots of presents?"" he asked his mom. ""Yes, Justin. When your friends come over for your birthday party you'll get lots!"" ""Hooray!"" Justin said, eating his breakfast. Later that day, Justin's friends came over for his birthday party, and they brought over lots of presents. They ate cake and ice cream. They played games like tag and played with a football. After they were done playing Justin got to open his presents. He saw lots of presents. Red ones, blue ones, tall ones, round ones. ""Thank you!"" Justin said, as he started to open his presents. He got a basketball, a robot toy, a new bike and some super hero toys! After all of Justin's friends left, Justin fell asleep very fast because of the exciting day he had and he was happy he got all of those nice things.",was justin sad?,no,He was could to it.,0.0
133,"A boy was trying to pick out what instrument that he wanted to play. His parents wanted him to pick a good one because playing an instrument was very important to them. So, the boy went to a music store with his parents. When he got there he didn't know where to start, so the boy walked up to the front and asked the man who worked there for help. First the worker brought the boy to a piano. ""No way,"" said the boy, ""that looks way too hard!"" The worker laughed at this, and then brought out a guitar. The boy thought that guitars were too popular and wanted to play something that not many other people would play. Finally, the worker brought the boy to the drums. ""That's it! That is a cool instrument that I could really get into."" However, his mom wasn't so happy with this because she thought that he would be too loud. The boy's dad stepped in and talked her into it because he knew that if the boy liked what he did, he would do way better. Twenty years later, and the boy was the drummer in a band. It the most popular rock band in the world.",did the boy think he could do it?,no,He was could to it.,0.0
134,"A boy was trying to pick out what instrument that he wanted to play. His parents wanted him to pick a good one because playing an instrument was very important to them. So, the boy went to a music store with his parents. When he got there he didn't know where to start, so the boy walked up to the front and asked the man who worked there for help. First the worker brought the boy to a piano. ""No way,"" said the boy, ""that looks way too hard!"" The worker laughed at this, and then brought out a guitar. The boy thought that guitars were too popular and wanted to play something that not many other people would play. Finally, the worker brought the boy to the drums. ""That's it! That is a cool instrument that I could really get into."" However, his mom wasn't so happy with this because she thought that he would be too loud. The boy's dad stepped in and talked her into it because he knew that if the boy liked what he did, he would do way better. Twenty years later, and the boy was the drummer in a band. It the most popular rock band in the world.",what else did he try?,a guitar.,He was could to it.,0.0
135,"A boy was trying to pick out what instrument that he wanted to play. His parents wanted him to pick a good one because playing an instrument was very important to them. So, the boy went to a music store with his parents. When he got there he didn't know where to start, so the boy walked up to the front and asked the man who worked there for help. First the worker brought the boy to a piano. ""No way,"" said the boy, ""that looks way too hard!"" The worker laughed at this, and then brought out a guitar. The boy thought that guitars were too popular and wanted to play something that not many other people would play. Finally, the worker brought the boy to the drums. ""That's it! That is a cool instrument that I could really get into."" However, his mom wasn't so happy with this because she thought that he would be too loud. The boy's dad stepped in and talked her into it because he knew that if the boy liked what he did, he would do way better. Twenty years later, and the boy was the drummer in a band. It the most popular rock band in the world.",what else?,the drums,He was could to it.,0.0
136,"A boy was trying to pick out what instrument that he wanted to play. His parents wanted him to pick a good one because playing an instrument was very important to them. So, the boy went to a music store with his parents. When he got there he didn't know where to start, so the boy walked up to the front and asked the man who worked there for help. First the worker brought the boy to a piano. ""No way,"" said the boy, ""that looks way too hard!"" The worker laughed at this, and then brought out a guitar. The boy thought that guitars were too popular and wanted to play something that not many other people would play. Finally, the worker brought the boy to the drums. ""That's it! That is a cool instrument that I could really get into."" However, his mom wasn't so happy with this because she thought that he would be too loud. The boy's dad stepped in and talked her into it because he knew that if the boy liked what he did, he would do way better. Twenty years later, and the boy was the drummer in a band. It the most popular rock band in the world.",did he want the guitar?,no,He was could to it.,0.0



Source: gutenberg


Unnamed: 0,context,question,answer,generated_answer,squad_score
48,"CHAPTER XII THE MEETING OF THE GEE EYES When Link Merwell went down again Dave looked at Nat Poole, thinking that lad might possibly attack him. But the dudish fellow was too scared to do anything but back away to a safe distance. ""Don--don't you dare to hit me, Porter!"" he cried, in a trembling voice. ""Don't you dare! If you do I'll tell Doctor Clay!"" ""If you behave yourself I'll not lay my fingers on you, Nat Poole,"" was the reply. ""Merwell brought this on himself--you know that as well as I do."" ""He's pretty badly hurt, I fear."" ""Oh, he'll come around all right,"" answered Dave. ""You had better see to it that he gets to the Hall safely."" ""Are you going to leave me?"" ""Yes, I want to find Henshaw and the others."" Nat Poole wanted to argue, but he did not dare. Dave waited until Link Merwell sat up and opened his eyes. Then he leaped on the ice-boat and flung off the three skates he found there. ""Going away?"" mumbled Merwell, when he could speak. ""Yes, and after this, Link Merwell, see that you keep a civil tongue in your head,"" answered Dave, and then he trimmed the sail of the ice-boat, shoved the craft around, and started for the river. Dave was a good deal ""worked up,"" but he had not deemed it wise to let his enemies see it. To be called a ""poorhouse rat"" had stung him to the quick, and once again when touched on that subject he had found his temper as ungovernable as ever.",What is the title of the chapter?,The meeting of the gee eyes,He could had was would to not a the have it is,0.0
253,"CHAPTER XXXVI THE CHARLATAN UNMASKED There seemed for the next few minutes to be a somewhat singular abstention from any desire to interfere with the two people who stood in the centre of the little group, hand-in-hand. Saton, after his first speech, and after Lois had given him her hands, had turned a little defiantly toward Rochester, who remained, however, unmoved, his elbow resting upon the broad mantelpiece, his face almost expressionless. Vandermere, too, stood on one side and held his peace, though the effort with which he did so was a visible one. Lady Mary looked anxiously towards them. Pauline had shrunk back, as though something in the situation terrified her. Even Saton himself felt that it was the silence before the storm. The courage which he had summoned up to meet a storm of disapproval, began to ebb slowly away in the face of this unnatural silence. It was clear that the onus of further speech was to rest with him. Still retaining Lois' hand, he turned toward Rochester. ""You have forbidden me to enter your house, or to hold any communication with your ward until she was of age, Mr. Rochester,"" he said. ""One of your conditions I have obeyed. With regard to the other, I have done as I thought fit. However, to-day she is her own mistress. She has consented to be my wife. I do not need to ask for your consent or approval. If you are not willing that she should be married from your roof, I can take her at once to the Comtesse, who is prepared to receive her.""",When Saton turned toward Rochester was he defiant or scared?,defiant,He could had was would to not a be the have it,0.0
252,"CHAPTER XXXVI THE CHARLATAN UNMASKED There seemed for the next few minutes to be a somewhat singular abstention from any desire to interfere with the two people who stood in the centre of the little group, hand-in-hand. Saton, after his first speech, and after Lois had given him her hands, had turned a little defiantly toward Rochester, who remained, however, unmoved, his elbow resting upon the broad mantelpiece, his face almost expressionless. Vandermere, too, stood on one side and held his peace, though the effort with which he did so was a visible one. Lady Mary looked anxiously towards them. Pauline had shrunk back, as though something in the situation terrified her. Even Saton himself felt that it was the silence before the storm. The courage which he had summoned up to meet a storm of disapproval, began to ebb slowly away in the face of this unnatural silence. It was clear that the onus of further speech was to rest with him. Still retaining Lois' hand, he turned toward Rochester. ""You have forbidden me to enter your house, or to hold any communication with your ward until she was of age, Mr. Rochester,"" he said. ""One of your conditions I have obeyed. With regard to the other, I have done as I thought fit. However, to-day she is her own mistress. She has consented to be my wife. I do not need to ask for your consent or approval. If you are not willing that she should be married from your roof, I can take her at once to the Comtesse, who is prepared to receive her.""",What is the title of the chapter?,THE CHARLATAN UNMASKED,He could had was would to not a be the have it,0.0
251,"CHAPTER XXXVI THE CHARLATAN UNMASKED There seemed for the next few minutes to be a somewhat singular abstention from any desire to interfere with the two people who stood in the centre of the little group, hand-in-hand. Saton, after his first speech, and after Lois had given him her hands, had turned a little defiantly toward Rochester, who remained, however, unmoved, his elbow resting upon the broad mantelpiece, his face almost expressionless. Vandermere, too, stood on one side and held his peace, though the effort with which he did so was a visible one. Lady Mary looked anxiously towards them. Pauline had shrunk back, as though something in the situation terrified her. Even Saton himself felt that it was the silence before the storm. The courage which he had summoned up to meet a storm of disapproval, began to ebb slowly away in the face of this unnatural silence. It was clear that the onus of further speech was to rest with him. Still retaining Lois' hand, he turned toward Rochester. ""You have forbidden me to enter your house, or to hold any communication with your ward until she was of age, Mr. Rochester,"" he said. ""One of your conditions I have obeyed. With regard to the other, I have done as I thought fit. However, to-day she is her own mistress. She has consented to be my wife. I do not need to ask for your consent or approval. If you are not willing that she should be married from your roof, I can take her at once to the Comtesse, who is prepared to receive her.""",Where did Saton say he could take Lois if she couldn't be married there?,the Comtesse,He could had was would to not a be the have it,0.0
250,"CHAPTER XXXVI THE CHARLATAN UNMASKED There seemed for the next few minutes to be a somewhat singular abstention from any desire to interfere with the two people who stood in the centre of the little group, hand-in-hand. Saton, after his first speech, and after Lois had given him her hands, had turned a little defiantly toward Rochester, who remained, however, unmoved, his elbow resting upon the broad mantelpiece, his face almost expressionless. Vandermere, too, stood on one side and held his peace, though the effort with which he did so was a visible one. Lady Mary looked anxiously towards them. Pauline had shrunk back, as though something in the situation terrified her. Even Saton himself felt that it was the silence before the storm. The courage which he had summoned up to meet a storm of disapproval, began to ebb slowly away in the face of this unnatural silence. It was clear that the onus of further speech was to rest with him. Still retaining Lois' hand, he turned toward Rochester. ""You have forbidden me to enter your house, or to hold any communication with your ward until she was of age, Mr. Rochester,"" he said. ""One of your conditions I have obeyed. With regard to the other, I have done as I thought fit. However, to-day she is her own mistress. She has consented to be my wife. I do not need to ask for your consent or approval. If you are not willing that she should be married from your roof, I can take her at once to the Comtesse, who is prepared to receive her.""","Who had shrunk back, terrified?",Pauline,He could had was would to not a be the have it,0.0



Source: wikipedia


Unnamed: 0,context,question,answer,generated_answer,squad_score
68,"Wool is the textile fiber obtained from sheep and other animals, including cashmere and mohair from goats, qiviut from muskoxen, angora from rabbits, and other types of wool from camelids. Wool mainly consists of protein together with a few percent lipids. In this regard it is chemically quite distinct from the more dominant textile, cotton, which is mainly cellulose. Wool is produced by follicles which are small cells located in the skin. These follicles are located in the upper layer of the skin called the epidermis and push down into the second skin layer called the dermis as the wool fibers grow. Follicles can be classed as either primary or secondary follicles. Primary follicles produce three types of fiber: kemp, medullated fibers and true wool fibers. Secondary follicles only produce true wool fibers. Medullated fibers share nearly identical characteristics to hair and are long but lack crimp and elasticity. Kemp fibers are very coarse and shed out. Wool's scaling and crimp make it easier to spin the fleece by helping the individual fibers attach to each other, so they stay together. Because of the crimp, wool fabrics have greater bulk than other textiles, and they hold air, which causes the fabric to retain heat. Wool has a high specific heat coefficient, so it impedes heat transfer in general. This effect has benefited desert peoples, as Bedouins and Tuaregs use wool clothes for insulation.",where are follicles located?,In the skin.,Republican of the town,0.0
318,"The Achaemenid Empire, also called the First Persian Empire, was an empire based in Western Asia, founded by Cyrus the Great. Ranging at its greatest extent from the Balkans and Eastern Europe proper in the west to the Indus Valley in the east, it was one of the largest empires in history, spanning 5.5 million square kilometers, and was larger than any previous empire in history. It is equally notable style=""display:inline"" for its successful model of a centralised, bureaucratic administration (through satraps under the King of Kings), for building infrastructure such as road systems and a postal system, the use of an official language across its territories, and the development of civil services and a large professional army. The empire's successes inspired similar systems in later empires. It is noted in Western history as the antagonist of the Greek city-states during the Greco-Persian Wars and for the emancipation of the Jewish exiles in Babylon. By the 7th century BC, the Persians had settled in the southwestern portion of the Iranian Plateau in the region of Persis, which came to be their . From this region, Cyrus the Great advanced to defeat the Medes, Lydia, and the Neo-Babylonian Empire, establishing the Achaemenid Empire. Alexander the Great, an avid admirer of Cyrus the Great, conquered most of the empire by 330 BC. Upon his death, most of the empire's former territory came under the rule of the Ptolemaic Kingdom and Seleucid Empire, in addition to other minor territories which gained independence at that time. The Iranian population of the central plateau reclaimed power by the second century BC under the Parthian Empire.",And in the west?,the Balkans and Eastern Europe,Spain of the France,0.0
317,"The Achaemenid Empire, also called the First Persian Empire, was an empire based in Western Asia, founded by Cyrus the Great. Ranging at its greatest extent from the Balkans and Eastern Europe proper in the west to the Indus Valley in the east, it was one of the largest empires in history, spanning 5.5 million square kilometers, and was larger than any previous empire in history. It is equally notable style=""display:inline"" for its successful model of a centralised, bureaucratic administration (through satraps under the King of Kings), for building infrastructure such as road systems and a postal system, the use of an official language across its territories, and the development of civil services and a large professional army. The empire's successes inspired similar systems in later empires. It is noted in Western history as the antagonist of the Greek city-states during the Greco-Persian Wars and for the emancipation of the Jewish exiles in Babylon. By the 7th century BC, the Persians had settled in the southwestern portion of the Iranian Plateau in the region of Persis, which came to be their . From this region, Cyrus the Great advanced to defeat the Medes, Lydia, and the Neo-Babylonian Empire, establishing the Achaemenid Empire. Alexander the Great, an avid admirer of Cyrus the Great, conquered most of the empire by 330 BC. Upon his death, most of the empire's former territory came under the rule of the Ptolemaic Kingdom and Seleucid Empire, in addition to other minor territories which gained independence at that time. The Iranian population of the central plateau reclaimed power by the second century BC under the Parthian Empire.",What did it extend to in the east?,the Indus Valley,Spain of the France,0.0
316,"The Achaemenid Empire, also called the First Persian Empire, was an empire based in Western Asia, founded by Cyrus the Great. Ranging at its greatest extent from the Balkans and Eastern Europe proper in the west to the Indus Valley in the east, it was one of the largest empires in history, spanning 5.5 million square kilometers, and was larger than any previous empire in history. It is equally notable style=""display:inline"" for its successful model of a centralised, bureaucratic administration (through satraps under the King of Kings), for building infrastructure such as road systems and a postal system, the use of an official language across its territories, and the development of civil services and a large professional army. The empire's successes inspired similar systems in later empires. It is noted in Western history as the antagonist of the Greek city-states during the Greco-Persian Wars and for the emancipation of the Jewish exiles in Babylon. By the 7th century BC, the Persians had settled in the southwestern portion of the Iranian Plateau in the region of Persis, which came to be their . From this region, Cyrus the Great advanced to defeat the Medes, Lydia, and the Neo-Babylonian Empire, establishing the Achaemenid Empire. Alexander the Great, an avid admirer of Cyrus the Great, conquered most of the empire by 330 BC. Upon his death, most of the empire's former territory came under the rule of the Ptolemaic Kingdom and Seleucid Empire, in addition to other minor territories which gained independence at that time. The Iranian population of the central plateau reclaimed power by the second century BC under the Parthian Empire.",Was that empire one of the largest in history?,yes,Spain of the France,0.0
315,"The Achaemenid Empire, also called the First Persian Empire, was an empire based in Western Asia, founded by Cyrus the Great. Ranging at its greatest extent from the Balkans and Eastern Europe proper in the west to the Indus Valley in the east, it was one of the largest empires in history, spanning 5.5 million square kilometers, and was larger than any previous empire in history. It is equally notable style=""display:inline"" for its successful model of a centralised, bureaucratic administration (through satraps under the King of Kings), for building infrastructure such as road systems and a postal system, the use of an official language across its territories, and the development of civil services and a large professional army. The empire's successes inspired similar systems in later empires. It is noted in Western history as the antagonist of the Greek city-states during the Greco-Persian Wars and for the emancipation of the Jewish exiles in Babylon. By the 7th century BC, the Persians had settled in the southwestern portion of the Iranian Plateau in the region of Persis, which came to be their . From this region, Cyrus the Great advanced to defeat the Medes, Lydia, and the Neo-Babylonian Empire, establishing the Achaemenid Empire. Alexander the Great, an avid admirer of Cyrus the Great, conquered most of the empire by 330 BC. Upon his death, most of the empire's former territory came under the rule of the Ptolemaic Kingdom and Seleucid Empire, in addition to other minor territories which gained independence at that time. The Iranian population of the central plateau reclaimed power by the second century BC under the Parthian Empire.",By when did Alexander conquer most of the empire?,by 330 BC,Spain of the France,0.0





# Assignment Evaluation

The following assignment points will be awarded for each task as follows:

* Task 1, Pre-processing $\rightarrow$ 0.5 points.
* Task 2, Dataset Splitting $\rightarrow$ 0.5 points.
* Task 3 and 4, Models Definition $\rightarrow$ 1.0 points.
* Task 5 and 6, Models Training and Evaluation $\rightarrow$ 2.0 points.
* Task 7, Analysis $\rightarrow$ 1.0 points.
* Report $\rightarrow$ 1.0 points.

**Total** = 6 points <br>

We may award an additional 0.5 points for outstanding submissions. 
 
**Speed Bonus** = 0.5 extra points <br>

# Report

We apply the rules described in Assignment 1 regarding the report.
* Write a clear and concise report following the given overleaf template (**max 2 pages**).
* Report validation and test results in a table.$^1$
* **Avoid reporting** code snippets or copy-paste terminal outputs $\rightarrow$ **Provide a clean schema** of what you want to show

# Comments and Organization

Remember to properly comment your code (it is not necessary to comment each single line) and don't forget to describe your work!

Structure your code for readability and maintenance. If you work with Colab, use sections. 

This allows you to build clean and modular code, as well as easy to read and to debug (notebooks can be quite tricky time to time).

# FAQ (READ THIS!)

---

**Question**: Does Task 3 also include data tokenization and conversion step?

**Answer:** Yes! These steps are usually straightforward since ```transformers``` also offers a specific tokenizer for each model.

**Example**: 

```
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
encoded_text = tokenizer(text)
%% Alternatively
inputs = tokenizer.tokenize(text, add_special_tokens=True, max_length=min(max_length, 512))
input_ids, attention_mask = inputs['input_ids'], inputs['attention_mask']
```

**Suggestion**: Hugginface's documentation is full of tutorials and user-friendly APIs.

---
---

**Question**: I'm hitting **out of memory error** when training my models, do you have any suggestions?

**Answer**: Here are some common workarounds:

1. Try decreasing the mini-batch size
2. Try applying a different padding strategy (if you are applying padding): e.g. use quantiles instead of maximum sequence length

---
---

# Contact

For any doubt, question, issue or help, you can always contact us at the following email addresses:

Teaching Assistants:

* Andrea Galassi -> a.galassi@unibo.it
* Federico Ruggeri -> federico.ruggeri6@unibo.it

Professor:

* Paolo Torroni -> p.torroni@unibo.it

# The End!

Questions?