This tutorial, we will create a Question Answering BERT model using Hugging Face.  
<br>
Question Answering is a **token-level task**, we want to do **classification** on each token. To see which token is classified as `start` and which is classified as `end`. We then use `[start: end+1]` to find out the annswer.   
Note that **token-level task** usually use all token to do prediction and **sentence-level task** usually use only the *\[CLS\]* token to do prediction.   
<br>
You can use `AutoModelForQuestionAnswering` directly but here we will use `AutoModel` ONLY and build the Question Answering model.  
`AutoModelForQuestionAnswering` works like below
```
=========================================================================================================
Layer (type:depth-idx)                                  Output Shape              Param #
=========================================================================================================
DistilBertForQuestionAnswering                          [16, 384]                 --
├─DistilBertModel: 1-1                                  [16, 384, 768]            --
│    └─Embeddings: 2-1                                  [16, 384, 768]            --
│    │    └─Embedding: 3-1                              [16, 384, 768]            23,440,896
│    │    └─Embedding: 3-2                              [1, 384, 768]             393,216
│    │    └─LayerNorm: 3-3                              [16, 384, 768]            1,536
│    │    └─Dropout: 3-4                                [16, 384, 768]            --
│    └─Transformer: 2-2                                 [16, 384, 768]            --
│    │    └─ModuleList: 3-5                             --                        42,527,232
├─Dropout: 1-2                                          [16, 384, 768]            --
├─Linear: 1-3                                           [16, 384, 2]              1,538
=========================================================================================================
```
The `AutoModel` we built will have two branches, start_position and end_position instead of using only one `Linear(768, 2)`.  The two branches are both `Linear(784, hidden_dim) -> GELU -> Linear(hidden_dim, 1)`.  
The structure looks like below.
```
=========================================================================================================
Layer (type:depth-idx)                                  Output Shape              Param #
=========================================================================================================
DistBERT                                                [16, 384]                 --
├─pretrain_model: 1-1                                   [16, 384, 768]            --
│    └─Embeddings: 2-1                                  [16, 384, 768]            --
│    │    └─Embedding: 3-1                              [16, 384, 768]            23,440,896
│    │    └─Embedding: 3-2                              [1, 384, 768]             393,216
│    │    └─LayerNorm: 3-3                              [16, 384, 768]            1,536
│    │    └─Dropout: 3-4                                [16, 384, 768]            --
│    └─Transformer: 2-2                                 [16, 384, 768]            --
│    │    └─ModuleList: 3-5                             --                        42,527,232
├─start: 1-1                                            [16, 384, 1]              --
│    └─Linear: 2-1                                      [16, 384, 512]            393,728
│    └─GELU: 2-2                                        [16, 384, 512]            --
│    └─Linear: 2-3                                      [16, 384, 1]              513
├─end: 1-1                                              [16, 384, 1]              --
│    └─Linear: 2-1                                      [16, 384, 512]            393,728
│    └─GELU: 2-2                                        [16, 384, 512]            --
│    └─Linear: 2-3                                      [16, 384, 1]              513
=========================================================================================================
```

In [1]:
!pip install git+https://github.com/brianbt/btorch
!pip install transformers
!pip install datasets

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/brianbt/btorch
  Cloning https://github.com/brianbt/btorch to /tmp/pip-req-build-ez00ithp
  Running command git clone -q https://github.com/brianbt/btorch /tmp/pip-req-build-ez00ithp
Collecting torchinfo
  Downloading torchinfo-1.7.0-py3-none-any.whl (22 kB)
Building wheels for collected packages: btorch
  Building wheel for btorch (setup.py) ... [?25l[?25hdone
  Created wheel for btorch: filename=btorch-0.0.1-py3-none-any.whl size=56341 sha256=bc85f983443e23e2155fbacd29a47c30d0644c529f3f1accb2e3f002a61e7009
  Stored in directory: /tmp/pip-ephem-wheel-cache-r66yslyw/wheels/fa/ef/1e/1248ce8683f1b6fd8e6552260da8c1dcfbb352d899fef03d72
Successfully built btorch
Installing collected packages: torchinfo, btorch
Successfully installed btorch-0.0.1 torchinfo-1.7.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/

# Import

In [2]:
import torch
from tqdm import tqdm

# Btorch
import btorch
from btorch import nn
import btorch.nn.functional as F

# Hugging Face
from datasets import load_dataset
from transformers import AutoTokenizer, AutoConfig, AutoModel, AutoModelForQuestionAnswering

# Load dataset

In [3]:
datasets = load_dataset("squad")

Downloading builder script:   0%|          | 0.00/1.97k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/1.02k [00:00<?, ?B/s]

Downloading and preparing dataset squad/plain_text (download: 33.51 MiB, generated: 85.63 MiB, post-processed: Unknown size, total: 119.14 MiB) to /root/.cache/huggingface/datasets/squad/plain_text/1.0.0/d6ec3ceb99ca480ce37cdd35555d6cb2511d223b9150cce08a837ef62ffea453...


Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/8.12M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.05M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/87599 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/10570 [00:00<?, ? examples/s]

Dataset squad downloaded and prepared to /root/.cache/huggingface/datasets/squad/plain_text/1.0.0/d6ec3ceb99ca480ce37cdd35555d6cb2511d223b9150cce08a837ef62ffea453. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

In [4]:
print(datasets)
# Lets check how one datapoint looks like
example = next(iter(datasets['train']))
display(example)
print(example.keys())

DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 87599
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 10570
    })
})


{'answers': {'answer_start': [515], 'text': ['Saint Bernadette Soubirous']},
 'context': 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.',
 'id': '5733be284776f41900661182',
 'question': 'To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?',
 'title': 'University_of_Notre_Dame'}

dict_keys(['id', 'title', 'context', 'question', 'answers'])


# Create HuggingFace BERT 🤗

In [5]:
model_name = 'distilbert-base-uncased'

#https://huggingface.co/docs/transformers/v4.20.1/en/main_classes/configuration#transformers.PretrainedConfig
#https://huggingface.co/docs/transformers/v4.20.1/en/model_doc/roberta#transformers.RobertaConfig
#https://huggingface.co/roberta-base/blob/main/config.json
config = AutoConfig.from_pretrained(
    model_name, 
    output_hidden_states = True,
    output_attention = False,
    hidden_dropout_prob = 0.2,
) 
print(config)

# Use above config to create our BERT model
pretrain_model = AutoModel.from_pretrained(
    model_name,
    config = config
)
# pretrain_model = AutoModelForQuestionAnswering.from_pretrained(
#     model_name,
#     config=config
# )

# Create a BERT tokenizer
#https://huggingface.co/docs/transformers/v4.20.1/en/model_doc/roberta#transformers.RobertaTokenizer
#https://huggingface.co/docs/transformers/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__
tokenizer = AutoTokenizer.from_pretrained(model_name)


Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]

DistilBertConfig {
  "_name_or_path": "distilbert-base-uncased",
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "output_hidden_states": true,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "transformers_version": "4.20.1",
  "vocab_size": 30522
}



Downloading:   0%|          | 0.00/256M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_transform.weight', 'vocab_projector.weight', 'vocab_projector.bias', 'vocab_transform.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

## Quick tutor for tokenizer usgage on Question Answering

In [6]:
tokenized_example = tokenizer(
    example["question"],
    example["context"],
    max_length=100,
    truncation="only_second",
    return_overflowing_tokens=True, #return all words that is removed by `max_length`
    return_offsets_mapping=True,  
    stride=2,
)
print(tokenized_example.keys())

dict_keys(['input_ids', 'attention_mask', 'offset_mapping', 'overflow_to_sample_mapping'])


In [7]:
display(example["question"])
display(example["context"])
print(tokenized_example['input_ids']) #len()=3, len()=[100,100,27]
# If ``return_overflowing_tokens is false``, above will be len()=100
# Only the first 100 tokens are kept

'To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?'

'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.'

[[101, 2000, 3183, 2106, 1996, 6261, 2984, 9382, 3711, 1999, 8517, 1999, 10223, 26371, 2605, 1029, 102, 6549, 2135, 1010, 1996, 2082, 2038, 1037, 3234, 2839, 1012, 10234, 1996, 2364, 2311, 1005, 1055, 2751, 8514, 2003, 1037, 3585, 6231, 1997, 1996, 6261, 2984, 1012, 3202, 1999, 2392, 1997, 1996, 2364, 2311, 1998, 5307, 2009, 1010, 2003, 1037, 6967, 6231, 1997, 4828, 2007, 2608, 2039, 14995, 6924, 2007, 1996, 5722, 1000, 2310, 3490, 2618, 4748, 2033, 18168, 5267, 1000, 1012, 2279, 2000, 1996, 2364, 2311, 2003, 1996, 13546, 1997, 1996, 6730, 2540, 1012, 3202, 2369, 1996, 13546, 2003, 1996, 24665, 102], [101, 2000, 3183, 2106, 1996, 6261, 2984, 9382, 3711, 1999, 8517, 1999, 10223, 26371, 2605, 1029, 102, 1996, 24665, 23052, 1010, 1037, 14042, 2173, 1997, 7083, 1998, 9185, 1012, 2009, 2003, 1037, 15059, 1997, 1996, 24665, 23052, 2012, 10223, 26371, 1010, 2605, 2073, 1996, 6261, 2984, 22353, 2135, 2596, 2000, 3002, 16595, 9648, 4674, 2061, 12083, 9711, 2271, 1999, 8517, 1012, 2012, 1996, 22

In [8]:
# This return list of 3 lists. Each inner list contains '<s>A</s></s>B</s>'
# A is example["question"]
# B is example["context"]
# A is same in the 3 lists, B is different in the 3 lists
# List 1's B end at: Immediately behind the basilica</s>
# List 2's B start at: </s>basilica is the Grotto,
# orginial B is:...Immediately behind the basilica is the Grotto, a Marian...
# We can see that 'basilica' is repeated, this is because we set stride=2
# If we set stride=3, the repeated sentence will be 'the basilica'. Eg as below
# List 1's B end at: Immediately behind the basilica</s>
# List 2's B start at: </s>the basilica is the Grotto,
[tokenizer.decode(i) for i in tokenized_example['input_ids']]

['[CLS] to whom did the virgin mary allegedly appear in 1858 in lourdes france? [SEP] architecturally, the school has a catholic character. atop the main building\'s gold dome is a golden statue of the virgin mary. immediately in front of the main building and facing it, is a copper statue of christ with arms upraised with the legend " venite ad me omnes ". next to the main building is the basilica of the sacred heart. immediately behind the basilica is the gr [SEP]',
 '[CLS] to whom did the virgin mary allegedly appear in 1858 in lourdes france? [SEP] the grotto, a marian place of prayer and reflection. it is a replica of the grotto at lourdes, france where the virgin mary reputedly appeared to saint bernadette soubirous in 1858. at the end of the main drive ( and in a direct line that connects through 3 statues and the gold dome ), is a simple, modern stone statue of mary. [SEP]']

In [9]:
print([len(i) for i in tokenized_example['offset_mapping']])
[print(i[:25]) for i in tokenized_example['offset_mapping']]
print()
# offset_mapping is the indeices for each token
# (0,0) is the <s> token, because it is NOT in the orginial sentence, so is [0:0]
# (0,2) is 'To'. If we do example["question"][0:2], it will return 'To'
# (57,60) is 'our'.  If we do example["question"][57:60], it will return 'our'
# In the middle of the first list, we see '(70, 71), (0, 0), (0, 0), (0, 4)'
# This is the '...A</s></s>B...' part. 
# The (70,71) is refer to A, example["question"][70:71] -> '?'
# The (0,4) is refer to B, example["context"][0:4] -> 'Arch'

[100, 96]
[(0, 0), (0, 2), (3, 7), (8, 11), (12, 15), (16, 22), (23, 27), (28, 37), (38, 44), (45, 47), (48, 52), (53, 55), (56, 59), (59, 63), (64, 70), (70, 71), (0, 0), (0, 13), (13, 15), (15, 16), (17, 20), (21, 27), (28, 31), (32, 33), (34, 42)]
[(0, 0), (0, 2), (3, 7), (8, 11), (12, 15), (16, 22), (23, 27), (28, 37), (38, 44), (45, 47), (48, 52), (53, 55), (56, 59), (59, 63), (64, 70), (70, 71), (0, 0), (369, 372), (373, 375), (375, 379), (379, 380), (381, 382), (383, 389), (390, 395), (396, 398)]



In [10]:
# None is <s>
# 0 means tokens are from A, 1 means tokens are from B
print(tokenized_example.sequence_ids())

[None, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, None, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, None]


In [11]:
tokenized_example['overflow_to_sample_mapping']

[0, 0]

In [12]:
# This is from here https://huggingface.co/docs/transformers/tasks/question_answering#preprocess
# Input a batch of data, and return the tensored version.
def preprocess_function(examples, max_length=384):
    questions = [q.strip() for q in examples["question"]]
    inputs = tokenizer(
        questions,
        examples["context"],
        max_length=max_length,
        truncation="only_second",
        return_offsets_mapping=True,
        padding="max_length",
    )

    offset_mapping = inputs.pop("offset_mapping")
    answers = examples["answers"]
    start_positions = []
    end_positions = []

    for i, offset in enumerate(offset_mapping):
        answer = answers[i]
        start_char = answer["answer_start"][0]
        end_char = answer["answer_start"][0] + len(answer["text"][0])
        sequence_ids = inputs.sequence_ids(i)

        # Find the start and end of the context
        idx = 0
        while sequence_ids[idx] != 1:
            idx += 1
        context_start = idx
        while sequence_ids[idx] == 1:
            idx += 1
        context_end = idx - 1

        # If the answer is not fully inside the context, label it (0, 0)
        if offset[context_start][0] > end_char or offset[context_end][1] < start_char:
            start_positions.append(0)
            end_positions.append(0)
        else:
            # Otherwise it's the start and end token positions
            idx = context_start
            while idx <= context_end and offset[idx][0] <= start_char:
                idx += 1
            start_positions.append(idx - 1)

            idx = context_end
            while idx >= context_start and offset[idx][1] >= end_char:
                idx -= 1
            end_positions.append(idx + 1)

    inputs["start_positions"] = start_positions
    inputs["end_positions"] = end_positions
    return inputs

In [13]:
max_length = 384
a=preprocess_function(datasets['train'][:5])
print(a.keys())
print(a['input_ids'])
print(a['attention_mask'])
print(a['start_positions'])
print(a['end_positions'])
# Note that here the <s> is `1`, <pad> is `0`

dict_keys(['input_ids', 'attention_mask', 'start_positions', 'end_positions'])
[[101, 2000, 3183, 2106, 1996, 6261, 2984, 9382, 3711, 1999, 8517, 1999, 10223, 26371, 2605, 1029, 102, 6549, 2135, 1010, 1996, 2082, 2038, 1037, 3234, 2839, 1012, 10234, 1996, 2364, 2311, 1005, 1055, 2751, 8514, 2003, 1037, 3585, 6231, 1997, 1996, 6261, 2984, 1012, 3202, 1999, 2392, 1997, 1996, 2364, 2311, 1998, 5307, 2009, 1010, 2003, 1037, 6967, 6231, 1997, 4828, 2007, 2608, 2039, 14995, 6924, 2007, 1996, 5722, 1000, 2310, 3490, 2618, 4748, 2033, 18168, 5267, 1000, 1012, 2279, 2000, 1996, 2364, 2311, 2003, 1996, 13546, 1997, 1996, 6730, 2540, 1012, 3202, 2369, 1996, 13546, 2003, 1996, 24665, 23052, 1010, 1037, 14042, 2173, 1997, 7083, 1998, 9185, 1012, 2009, 2003, 1037, 15059, 1997, 1996, 24665, 23052, 2012, 10223, 26371, 1010, 2605, 2073, 1996, 6261, 2984, 22353, 2135, 2596, 2000, 3002, 16595, 9648, 4674, 2061, 12083, 9711, 2271, 1999, 8517, 1012, 2012, 1996, 2203, 1997, 1996, 2364, 3298, 1006, 1998, 199

## Preprocess Dataset

In [14]:
# Here we use huggingface build in face to transform the entire dataaset using previous defined preprocess_function()
# so each datapoint will be a list of number instead of string
tokenized_datasets = datasets.map(lambda x:preprocess_function(x,max_length), batched=True, remove_columns=datasets["train"].column_names)

print(datasets) #Before
print(tokenized_datasets) #After



  0%|          | 0/88 [00:00<?, ?ba/s]

  0%|          | 0/11 [00:00<?, ?ba/s]

DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 87599
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 10570
    })
})
DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'start_positions', 'end_positions'],
        num_rows: 87599
    })
    validation: Dataset({
        features: ['input_ids', 'attention_mask', 'start_positions', 'end_positions'],
        num_rows: 10570
    })
})


In [15]:
# This is the original dataset
cnt = 0
for i in datasets['train']:
  print('Question>>>')
  print(i['question'])
  print('Context>>>')
  display(i['context'])
  print('Answers>>>')
  print(i['answers'])
  print('======')
  cnt += 1
  if cnt >= 5:
    break

Question>>>
To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?
Context>>>


'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.'

Answers>>>
{'text': ['Saint Bernadette Soubirous'], 'answer_start': [515]}
Question>>>
What is in front of the Notre Dame Main Building?
Context>>>


'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.'

Answers>>>
{'text': ['a copper statue of Christ'], 'answer_start': [188]}
Question>>>
The Basilica of the Sacred heart at Notre Dame is beside to which structure?
Context>>>


'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.'

Answers>>>
{'text': ['the Main Building'], 'answer_start': [279]}
Question>>>
What is the Grotto at Notre Dame?
Context>>>


'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.'

Answers>>>
{'text': ['a Marian place of prayer and reflection'], 'answer_start': [381]}
Question>>>
What sits on top of the Main Building at Notre Dame?
Context>>>


'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.'

Answers>>>
{'text': ['a golden statue of the Virgin Mary'], 'answer_start': [92]}


In [16]:
# This is the transformed dataset
cnt = 0
for data in tokenized_datasets['train']:
  decoded_str = tokenizer.decode(data['input_ids'])
  decoded_answer = tokenizer.decode(data['input_ids'][data['start_positions']:data['end_positions']+1])
  print('inputs>>>')
  display(decoded_str)
  print('answer>>>')
  print(decoded_answer)
  print('GT')
  print(data['start_positions'],', ',data['end_positions'])
  print("==============")
  cnt += 1
  if cnt >= 5:
    break

inputs>>>


'[CLS] to whom did the virgin mary allegedly appear in 1858 in lourdes france? [SEP] architecturally, the school has a catholic character. atop the main building\'s gold dome is a golden statue of the virgin mary. immediately in front of the main building and facing it, is a copper statue of christ with arms upraised with the legend " venite ad me omnes ". next to the main building is the basilica of the sacred heart. immediately behind the basilica is the grotto, a marian place of prayer and reflection. it is a replica of the grotto at lourdes, france where the virgin mary reputedly appeared to saint bernadette soubirous in 1858. at the end of the main drive ( and in a direct line that connects through 3 statues and the gold dome ), is a simple, modern stone statue of mary. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD

answer>>>
saint bernadette soubirous
GT
130 ,  137
inputs>>>


'[CLS] what is in front of the notre dame main building? [SEP] architecturally, the school has a catholic character. atop the main building\'s gold dome is a golden statue of the virgin mary. immediately in front of the main building and facing it, is a copper statue of christ with arms upraised with the legend " venite ad me omnes ". next to the main building is the basilica of the sacred heart. immediately behind the basilica is the grotto, a marian place of prayer and reflection. it is a replica of the grotto at lourdes, france where the virgin mary reputedly appeared to saint bernadette soubirous in 1858. at the end of the main drive ( and in a direct line that connects through 3 statues and the gold dome ), is a simple, modern stone statue of mary. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [P

answer>>>
a copper statue of christ
GT
52 ,  56
inputs>>>


'[CLS] the basilica of the sacred heart at notre dame is beside to which structure? [SEP] architecturally, the school has a catholic character. atop the main building\'s gold dome is a golden statue of the virgin mary. immediately in front of the main building and facing it, is a copper statue of christ with arms upraised with the legend " venite ad me omnes ". next to the main building is the basilica of the sacred heart. immediately behind the basilica is the grotto, a marian place of prayer and reflection. it is a replica of the grotto at lourdes, france where the virgin mary reputedly appeared to saint bernadette soubirous in 1858. at the end of the main drive ( and in a direct line that connects through 3 statues and the gold dome ), is a simple, modern stone statue of mary. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]

answer>>>
the main building
GT
81 ,  83
inputs>>>


'[CLS] what is the grotto at notre dame? [SEP] architecturally, the school has a catholic character. atop the main building\'s gold dome is a golden statue of the virgin mary. immediately in front of the main building and facing it, is a copper statue of christ with arms upraised with the legend " venite ad me omnes ". next to the main building is the basilica of the sacred heart. immediately behind the basilica is the grotto, a marian place of prayer and reflection. it is a replica of the grotto at lourdes, france where the virgin mary reputedly appeared to saint bernadette soubirous in 1858. at the end of the main drive ( and in a direct line that connects through 3 statues and the gold dome ), is a simple, modern stone statue of mary. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] 

answer>>>
a marian place of prayer and reflection
GT
95 ,  101
inputs>>>


'[CLS] what sits on top of the main building at notre dame? [SEP] architecturally, the school has a catholic character. atop the main building\'s gold dome is a golden statue of the virgin mary. immediately in front of the main building and facing it, is a copper statue of christ with arms upraised with the legend " venite ad me omnes ". next to the main building is the basilica of the sacred heart. immediately behind the basilica is the grotto, a marian place of prayer and reflection. it is a replica of the grotto at lourdes, france where the virgin mary reputedly appeared to saint bernadette soubirous in 1858. at the end of the main drive ( and in a direct line that connects through 3 statues and the gold dome ), is a simple, modern stone statue of mary. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]

answer>>>
a golden statue of the virgin mary
GT
33 ,  39


## Quick tutor for DistBERT usgage

In [17]:
# Get 3 data point for testing
data = tokenized_datasets['train'][:3]
print(torch.tensor(data['input_ids']).shape)
# Hugging Face BERT will produce three thing for us base on how we set the config
out = pretrain_model(torch.tensor(data['input_ids']), attention_mask=torch.tensor(data['attention_mask']))
print(out.keys())
# hidden_states is all the hidden_states #List[(N,T,D)]
# last_hidden_state is the last one
# pooler_output is output of tanh(Linear(last_hidden_state))
# torch.equal(out['hidden_states'][-1], out['last_hidden_state']) -> True
# Note that ``out['last_hidden_state'][:,0,:]`` is the CLS token and used for sentence level prediction
print(out['last_hidden_state'].shape)

torch.Size([3, 384])
odict_keys(['last_hidden_state', 'hidden_states'])
torch.Size([3, 384, 768])


# Wrap tokenizer and DistBERT in Btorch

roberta works similar to BERT.  
Here we will use the ``pooler_output`` to do classification.  
We pass ``pooler_output`` into two branch to predict the ``start`` and ``end``


In [59]:
class DistBERT(nn.Module):
    def __init__(self, pretrain_model, hidden_dim, freeze_pretrain=True):
        super(DistBERT, self).__init__()
        self.pretrain_model = pretrain_model
        self.embed_size = pretrain_model.embeddings.word_embeddings.embedding_dim
        self.hidden_dim = hidden_dim
        self.start = nn.Sequential(
                         nn.Linear(self.embed_size, hidden_dim),
                         nn.GELU(),
                         nn.Linear(hidden_dim, 1))
        self.end = nn.Sequential(
                         nn.Linear(self.embed_size, hidden_dim),
                         nn.GELU(),
                         nn.Linear(hidden_dim, 1))
        if freeze_pretrain:
            btorch.utils.trainer.freeze(self.pretrain_model)
        else:
            btorch.utils.trainer.unfreeze(self.pretrain_model)
    def forward(self, x, attn_mask):
        x = self.pretrain_model(x, attention_mask=attn_mask)['last_hidden_state']
        start = self.start(x).squeeze(-1)
        end = self.end(x).squeeze(-1)
        return start, end

    @classmethod
    def train_epoch(cls, net, criterion, trainloader, optimizer, epoch_idx, device='cuda', config=None, **kwargs):
        """This is the very basic training function for one epoch. Override this function when necessary
            
        Returns:
            (float or dict): train_loss
        """
        net.train()
        train_loss = 0
        pbar = tqdm(enumerate(trainloader), total=len(trainloader), disable=(kwargs.get("verbose", 1) == 0))
        batch_idx = 1
        for batch_idx, (inputs) in pbar:
            # here i['input_ids'] is List[Tensor] and transposed so need to below to transpose it back to (N,T)
            text = btorch.utils.to_tensor(inputs['input_ids']).to(net.device()).T #(N,T)
            attn_mask = btorch.utils.to_tensor(inputs['attention_mask']).to(net.device()).T #(N,T)
            start_targets = inputs['start_positions'].to(net.device())
            end_targets = inputs['end_positions'].to(net.device())
            optimizer.zero_grad()
            start_pred, end_pred = net(text, attn_mask)
            start_loss = criterion(start_pred, start_targets)
            end_loss = criterion(end_pred, end_targets)
            loss = start_loss + end_loss
            loss.backward()
            optimizer.step()
            train_loss += loss.item()

            pbar.set_description(
                f"epoch {epoch_idx + 1} iter {batch_idx}: train loss {loss.item():.5f}.")
        return train_loss / (batch_idx + 1)

    @classmethod
    def test_epoch(cls, net, criterion, testloader, scoring=None, epoch_idx=0, device='cuda', config=None, **kwargs):
        """This is the very basic evaluating function for one epoch. Override this function when necessary

        Args:
            scoring (Callable, optional): A scoring function that take in ``y_true`` and ``model_output``.
              Usually, this is your evaluation metric, like accuracy.
              If provided, this method return a dict that include both loss and score.
              This scoring function should return the **sum** (set ``reduction=sum``) of the score of a batch.
              The function signature must be ``scoring(y_true=, model_output=)``.
              
        Returns:
            (float or dict): eval_loss
        """
        net.eval()
        test_loss = 0
        test_score = 0
        total = 0
        with torch.inference_mode():
            for batch_idx, (inputs) in enumerate(testloader):
                # here i['input_ids'] is List[Tensor] and transposed so need to below to transpose it back to (N,T)
                text = btorch.utils.to_tensor(inputs['input_ids']).to(net.device()).T #(N,T)
                attn_mask = btorch.utils.to_tensor(inputs['attention_mask']).to(net.device()).T #(N,T)
                start_targets = inputs['start_positions'].to(net.device())
                end_targets = inputs['end_positions'].to(net.device())
                start_pred, end_pred = net(text, attn_mask)
                start_loss = criterion(start_pred, start_targets)
                end_loss = criterion(end_pred, end_targets)
                test_loss += (start_loss + end_loss).item()
                if scoring is not None:
                    score = scoring(model_output=(start_pred, end_pred), y_true=(start_targets, end_targets))
                    test_score += score
                total += len(text)
        if scoring is None:
            return test_loss / (batch_idx + 1)
        return {'loss': test_loss / (batch_idx + 1), 'score': test_score / total}
    @classmethod
    def predict_(cls, net, loader, device='cuda', config=None):
        """This is the very basic predicting function. Override this function when necessary
            
        Returns:
            (list or dict): predict results
        """
        net.to(device)
        net.eval()
        out = {}
        with torch.inference_mode():
            for batch_idx, (inputs) in enumerate(loader):
                text = btorch.utils.to_tensor(inputs['input_ids']).to(net.device()).T #(N,T)
                attn_mask = btorch.utils.to_tensor(inputs['attention_mask']).to(net.device()).T #(N,T)
                start_pred, end_pred = net(text, attn_mask)
                start_pred_raw = start_pred.max(1)[1]
                end_pred_raw = end_pred.max(1)[1]

                for i in range(len(start_pred_raw)):
                    out[text[i]] = text[i][start_pred_raw[i]: end_pred_raw[i]+1]
        return out

In [78]:
def predict_directly(net, x, tokenizer):
    single_data = 0
    input = torch.tensor(x['input_ids']).to(net.device())
    attn_mask = btorch.utils.to_tensor(x['attention_mask']).to(net.device())
    if len(input.shape) == 1:
        single_data = 1
        input = input.unsqueeze(0)
        attn_mask = attn_mask.unsqueeze(0)
    # print(input.shape, attn_mask.shape)
    start, end = net(input, attn_mask)
    start_pred = start.max(1)[1]
    end_pred = end.max(1)[1]
    preds = {}
    # print(start_pred, end_pred)
    for i in range(len(start_pred)):
        if single_data:
            preds[i] = tokenizer.decode(x['input_ids'][start_pred[i]: end_pred[i]+1])
        else:
            preds[i] = tokenizer.decode(x['input_ids'][i][start_pred[i]: end_pred[i]+1])
    return preds

In [61]:
# Model
model = DistBERT(pretrain_model, 512, freeze_pretrain=False)

# Loss & Optimizer & Config
model._config['max_epoch'] = 1
model._lossfn = nn.CrossEntropyLoss()
model._optimizer = torch.optim.AdamW(model.parameters(), lr=2e-5, weight_decay=0.01)
# model._config['save'] = './checkpoints/'
# model._config['save_every_epoch_checkpoint'] = 1

# Set GPU
device = model.auto_gpu()

auto_gpu: using GPU (Tesla T4)


## Lets look at one batch results

In [36]:
# make to dataloader
dl = torch.utils.data.DataLoader(tokenized_datasets['train'], batch_size=8)
for i in dl:
  break

text = btorch.utils.to_tensor(i['input_ids']).to('cuda').T #(N,T)
attn_mask = btorch.utils.to_tensor(i['attention_mask']).to('cuda').T #(N,T)
print(text.shape)
print(attn_mask.shape)

o = model(text, attn_mask)

torch.Size([8, 384])
torch.Size([8, 384])


In [37]:
print('start_shape:' ,o[0].shape)
print('end_shape:' ,o[1].shape)

start_shape: torch.Size([8, 384])
end_shape: torch.Size([8, 384])


In [38]:
{'input_ids':text, 'attention_mask':attn_mask}

{'attention_mask': tensor([[1, 1, 1,  ..., 0, 0, 0],
         [1, 1, 1,  ..., 0, 0, 0],
         [1, 1, 1,  ..., 0, 0, 0],
         ...,
         [1, 1, 1,  ..., 0, 0, 0],
         [1, 1, 1,  ..., 0, 0, 0],
         [1, 1, 1,  ..., 0, 0, 0]], device='cuda:0'),
 'input_ids': tensor([[  101,  2000,  3183,  ...,     0,     0,     0],
         [  101,  2054,  2003,  ...,     0,     0,     0],
         [  101,  1996, 13546,  ...,     0,     0,     0],
         ...,
         [  101,  2043,  2106,  ...,     0,     0,     0],
         [  101,  2129,  2411,  ...,     0,     0,     0],
         [  101,  2054,  2003,  ...,     0,     0,     0]], device='cuda:0')}

In [42]:
model

DistBERT(
  (pretrain_model): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0): TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
            (lin1): Linear(in_

## Fit

In [24]:
# model.overfit_small_batch(tokenized_datasets['train'])

In [41]:
model.fit(tokenized_datasets['train'], validation_data=tokenized_datasets['validation'], batch_size=32, workers=4)

  f"x might not support {type(validation_data)}. It will treat x as ``Dataset``.")
  f"validation_data might not support {type(validation_data)}. It will treat validation_data as ``Dataset``.")
  cpuset_checked))
epoch 1 iter 2737: train loss 1.85738.: 100%|██████████| 2738/2738 [52:03<00:00,  1.14s/it]


Epoch 0: Training loss: 3.1285910488394073. Testing loss: 2.314729990299549


## Predict

In [66]:
predicted_ans = predict_directly(model, tokenized_datasets['validation'][:5], tokenizer)
predicted_ans

torch.Size([5, 384]) torch.Size([5, 384])
tensor([ 46,  57,  78,  43, 118], device='cuda:0') tensor([ 47,  58,  92,  44, 141], device='cuda:0')


{0: 'denver broncos',
 1: 'carolina panthers',
 2: "levi's stadium in the san francisco bay area at santa clara, california",
 3: 'denver broncos',
 4: 'gold - themed initiatives, as well as temporarily suspending the tradition of naming each super bowl game with roman numerals'}

In [81]:
for idx in range(10):
  print('Question>>>')
  display(tokenizer.decode(tokenized_datasets['validation'][idx]['input_ids']))
  print('GT Answer>>>')
  print(tokenizer.decode(tokenized_datasets['validation'][idx]['input_ids'][tokenized_datasets['validation'][idx]['start_positions']:tokenized_datasets['validation'][idx]['end_positions']+1]))
  print('Pred Answer>>>')
  print(predict_directly(model, tokenized_datasets['validation'][idx], tokenizer))
  print("====")

Question>>>


'[CLS] which nfl team represented the afc at super bowl 50? [SEP] super bowl 50 was an american football game to determine the champion of the national football league ( nfl ) for the 2015 season. the american football conference ( afc ) champion denver broncos defeated the national football conference ( nfc ) champion carolina panthers 24 – 10 to earn their third super bowl title. the game was played on february 7, 2016, at levi\'s stadium in the san francisco bay area at santa clara, california. as this was the 50th super bowl, the league emphasized the " golden anniversary " with various gold - themed initiatives, as well as temporarily suspending the tradition of naming each super bowl game with roman numerals ( under which the game would have been known as " super bowl l " ), so that the logo could prominently feature the arabic numerals 50. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PA

GT Answer>>>
denver broncos
Pred Answer>>>
{0: 'denver broncos'}
====
Question>>>


'[CLS] which nfl team represented the nfc at super bowl 50? [SEP] super bowl 50 was an american football game to determine the champion of the national football league ( nfl ) for the 2015 season. the american football conference ( afc ) champion denver broncos defeated the national football conference ( nfc ) champion carolina panthers 24 – 10 to earn their third super bowl title. the game was played on february 7, 2016, at levi\'s stadium in the san francisco bay area at santa clara, california. as this was the 50th super bowl, the league emphasized the " golden anniversary " with various gold - themed initiatives, as well as temporarily suspending the tradition of naming each super bowl game with roman numerals ( under which the game would have been known as " super bowl l " ), so that the logo could prominently feature the arabic numerals 50. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PA

GT Answer>>>
carolina panthers
Pred Answer>>>
{0: 'carolina panthers'}
====
Question>>>


'[CLS] where did super bowl 50 take place? [SEP] super bowl 50 was an american football game to determine the champion of the national football league ( nfl ) for the 2015 season. the american football conference ( afc ) champion denver broncos defeated the national football conference ( nfc ) champion carolina panthers 24 – 10 to earn their third super bowl title. the game was played on february 7, 2016, at levi\'s stadium in the san francisco bay area at santa clara, california. as this was the 50th super bowl, the league emphasized the " golden anniversary " with various gold - themed initiatives, as well as temporarily suspending the tradition of naming each super bowl game with roman numerals ( under which the game would have been known as " super bowl l " ), so that the logo could prominently feature the arabic numerals 50. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [P

GT Answer>>>
santa clara, california
Pred Answer>>>
{0: "levi's stadium in the san francisco bay area at santa clara, california"}
====
Question>>>


'[CLS] which nfl team won super bowl 50? [SEP] super bowl 50 was an american football game to determine the champion of the national football league ( nfl ) for the 2015 season. the american football conference ( afc ) champion denver broncos defeated the national football conference ( nfc ) champion carolina panthers 24 – 10 to earn their third super bowl title. the game was played on february 7, 2016, at levi\'s stadium in the san francisco bay area at santa clara, california. as this was the 50th super bowl, the league emphasized the " golden anniversary " with various gold - themed initiatives, as well as temporarily suspending the tradition of naming each super bowl game with roman numerals ( under which the game would have been known as " super bowl l " ), so that the logo could prominently feature the arabic numerals 50. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD

GT Answer>>>
denver broncos
Pred Answer>>>
{0: 'denver broncos'}
====
Question>>>


'[CLS] what color was used to emphasize the 50th anniversary of the super bowl? [SEP] super bowl 50 was an american football game to determine the champion of the national football league ( nfl ) for the 2015 season. the american football conference ( afc ) champion denver broncos defeated the national football conference ( nfc ) champion carolina panthers 24 – 10 to earn their third super bowl title. the game was played on february 7, 2016, at levi\'s stadium in the san francisco bay area at santa clara, california. as this was the 50th super bowl, the league emphasized the " golden anniversary " with various gold - themed initiatives, as well as temporarily suspending the tradition of naming each super bowl game with roman numerals ( under which the game would have been known as " super bowl l " ), so that the logo could prominently feature the arabic numerals 50. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [

GT Answer>>>
golden
Pred Answer>>>
{0: 'gold - themed initiatives, as well as temporarily suspending the tradition of naming each super bowl game with roman numerals'}
====
Question>>>


'[CLS] what was the theme of super bowl 50? [SEP] super bowl 50 was an american football game to determine the champion of the national football league ( nfl ) for the 2015 season. the american football conference ( afc ) champion denver broncos defeated the national football conference ( nfc ) champion carolina panthers 24 – 10 to earn their third super bowl title. the game was played on february 7, 2016, at levi\'s stadium in the san francisco bay area at santa clara, california. as this was the 50th super bowl, the league emphasized the " golden anniversary " with various gold - themed initiatives, as well as temporarily suspending the tradition of naming each super bowl game with roman numerals ( under which the game would have been known as " super bowl l " ), so that the logo could prominently feature the arabic numerals 50. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [

GT Answer>>>
" golden anniversary "
Pred Answer>>>
{0: 'golden anniversary'}
====
Question>>>


'[CLS] what day was the game played on? [SEP] super bowl 50 was an american football game to determine the champion of the national football league ( nfl ) for the 2015 season. the american football conference ( afc ) champion denver broncos defeated the national football conference ( nfc ) champion carolina panthers 24 – 10 to earn their third super bowl title. the game was played on february 7, 2016, at levi\'s stadium in the san francisco bay area at santa clara, california. as this was the 50th super bowl, the league emphasized the " golden anniversary " with various gold - themed initiatives, as well as temporarily suspending the tradition of naming each super bowl game with roman numerals ( under which the game would have been known as " super bowl l " ), so that the logo could prominently feature the arabic numerals 50. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]

GT Answer>>>
february 7, 2016
Pred Answer>>>
{0: 'february 7, 2016'}
====
Question>>>


'[CLS] what is the afc short for? [SEP] super bowl 50 was an american football game to determine the champion of the national football league ( nfl ) for the 2015 season. the american football conference ( afc ) champion denver broncos defeated the national football conference ( nfc ) champion carolina panthers 24 – 10 to earn their third super bowl title. the game was played on february 7, 2016, at levi\'s stadium in the san francisco bay area at santa clara, california. as this was the 50th super bowl, the league emphasized the " golden anniversary " with various gold - themed initiatives, as well as temporarily suspending the tradition of naming each super bowl game with roman numerals ( under which the game would have been known as " super bowl l " ), so that the logo could prominently feature the arabic numerals 50. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]

GT Answer>>>
american football conference
Pred Answer>>>
{0: 'american football conference'}
====
Question>>>


'[CLS] what was the theme of super bowl 50? [SEP] super bowl 50 was an american football game to determine the champion of the national football league ( nfl ) for the 2015 season. the american football conference ( afc ) champion denver broncos defeated the national football conference ( nfc ) champion carolina panthers 24 – 10 to earn their third super bowl title. the game was played on february 7, 2016, at levi\'s stadium in the san francisco bay area at santa clara, california. as this was the 50th super bowl, the league emphasized the " golden anniversary " with various gold - themed initiatives, as well as temporarily suspending the tradition of naming each super bowl game with roman numerals ( under which the game would have been known as " super bowl l " ), so that the logo could prominently feature the arabic numerals 50. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [

GT Answer>>>
" golden anniversary "
Pred Answer>>>
{0: 'golden anniversary'}
====
Question>>>


'[CLS] what does afc stand for? [SEP] super bowl 50 was an american football game to determine the champion of the national football league ( nfl ) for the 2015 season. the american football conference ( afc ) champion denver broncos defeated the national football conference ( nfc ) champion carolina panthers 24 – 10 to earn their third super bowl title. the game was played on february 7, 2016, at levi\'s stadium in the san francisco bay area at santa clara, california. as this was the 50th super bowl, the league emphasized the " golden anniversary " with various gold - themed initiatives, as well as temporarily suspending the tradition of naming each super bowl game with roman numerals ( under which the game would have been known as " super bowl l " ), so that the logo could prominently feature the arabic numerals 50. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [

GT Answer>>>
american football conference
Pred Answer>>>
{0: 'american football conference'}
====
