## BERT

I am using the pre-trained model provided by [https://github.com/huggingface/transformers](https://github.com/huggingface/transformers).

In [1]:
from transformers import BertTokenizer, BertForQuestionAnswering
import torch

In [2]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

device(type='cuda')

### Dataset

I think we can use this dataset: [https://www.kaggle.com/stanfordu/stanford-question-answering-dataset](https://www.kaggle.com/stanfordu/stanford-question-answering-dataset).

The question and context below is a sample extracted of the Stanford Question Answering Dataset (SQuAD). In Section 1.2, I show an example without pre-training the model in the SQuAD 1.1 dataset. Section 1.3 shows an example with the model pre-trained in the SQuAD 1.1 dataset.

In [3]:
question, context = (
    "Where did Super Bowl 50 take place?",
    '',
)

### Example 1

Load word tokenizer.

In [4]:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

Encode the input.

In [15]:
encoding = tokenizer.encode_plus(question, context)
input_ids, token_type_ids, attention_mask = encoding["input_ids"], encoding["token_type_ids"], encoding["attention_mask"]

Load the pre-trained model.

In [6]:
model = BertForQuestionAnswering.from_pretrained("bert-base-uncased").to(device)

BertForQuestionAnswering(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_

In [7]:
start_scores, end_scores = model(
    torch.tensor([input_ids]).to(device),
    token_type_ids=torch.tensor([token_type_ids]).to(device),
    attention_mask=torch.tensor([attention_mask]).to(device),
)
all_tokens = tokenizer.convert_ids_to_tokens(input_ids)

Get the output.

In [8]:
answer = " ".join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores) + 1])
answer

''

### Example 2

Load word tokenizer.

In [5]:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

Encode the input.

In [11]:
encoding = tokenizer.encode_plus(question)
input_ids, token_type_ids, attention_mask = encoding["input_ids"], encoding["token_type_ids"], encoding["attention_mask"]

Load the pre-trained model.

In [12]:
model = BertForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad").to(device)

In [13]:
start_scores, end_scores = model(
    torch.tensor([input_ids]).to(device),
    token_type_ids=torch.tensor([token_type_ids]).to(device),
    attention_mask=torch.tensor([attention_mask]).to(device),
)
all_tokens = tokenizer.convert_ids_to_tokens(input_ids)

Get the output.

In [14]:
answer = " ".join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores) + 1])
answer

''

## Conclusions

I think we must use the model from Section 1.2 and fine-tune it in Stanford Question Answering Dataset (SQuAD).