# HuggingFace Q&A 

Based on this [tutorial](https://huggingface.co/course/chapter7/7?fw=pt). 

In this notebook I follow the Huggingface guide to fine-tune DistilBERT on the SQuAD dataset for extractive question answering.

## Imports

In [10]:
from datasets import load_dataset
from transformers import AutoTokenizer
from transformers import DefaultDataCollator
from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer
from transformers import pipeline

## Loading Data 

In [2]:
squad = load_dataset("squad")

Downloading builder script:   0%|          | 0.00/1.97k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/1.02k [00:00<?, ?B/s]

Downloading and preparing dataset squad/plain_text (download: 33.51 MiB, generated: 85.63 MiB, post-processed: Unknown size, total: 119.14 MiB) to C:\Users\ngrec\.cache\huggingface\datasets\squad\plain_text\1.0.0\d6ec3ceb99ca480ce37cdd35555d6cb2511d223b9150cce08a837ef62ffea453...


Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/8.12M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.05M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/87599 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/10570 [00:00<?, ? examples/s]

Dataset squad downloaded and prepared to C:\Users\ngrec\.cache\huggingface\datasets\squad\plain_text\1.0.0\d6ec3ceb99ca480ce37cdd35555d6cb2511d223b9150cce08a837ef62ffea453. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

### Visualize an example

In [3]:
squad["train"][0]

{'id': '5733be284776f41900661182',
 'title': 'University_of_Notre_Dame',
 'context': 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.',
 'question': 'To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?',
 'answers': {'text': ['Saint Bernadette Soubirous'], 'answer_start': [515]}}

## Preprocessing Data 

[HuggingFace video on preprocessing](https://youtu.be/qgaM0weJHpA) 

In this section we:

1. Load the pre trained distilBERT tokenizer 
2. Apply a pre processing function to the data (**Note**: This function is taken directly from the tutorial)

In [4]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

In [5]:
def preprocess_function(examples):
    questions = [q.strip() for q in examples["question"]]
    inputs = tokenizer(
        questions,
        examples["context"],
        max_length=384,
        truncation="only_second",
        return_offsets_mapping=True,
        padding="max_length",
    )

    offset_mapping = inputs.pop("offset_mapping")
    answers = examples["answers"]
    start_positions = []
    end_positions = []

    for i, offset in enumerate(offset_mapping):
        answer = answers[i]
        start_char = answer["answer_start"][0]
        end_char = answer["answer_start"][0] + len(answer["text"][0])
        sequence_ids = inputs.sequence_ids(i)

        # Find the start and end of the context
        idx = 0
        while sequence_ids[idx] != 1:
            idx += 1
        context_start = idx
        while sequence_ids[idx] == 1:
            idx += 1
        context_end = idx - 1

        # If the answer is not fully inside the context, label it (0, 0)
        if offset[context_start][0] > end_char or offset[context_end][1] < start_char:
            start_positions.append(0)
            end_positions.append(0)
        else:
            # Otherwise it's the start and end token positions
            idx = context_start
            while idx <= context_end and offset[idx][0] <= start_char:
                idx += 1
            start_positions.append(idx - 1)

            idx = context_end
            while idx >= context_start and offset[idx][1] >= end_char:
                idx -= 1
            end_positions.append(idx + 1)

    inputs["start_positions"] = start_positions
    inputs["end_positions"] = end_positions
    return inputs

In [6]:
tokenized_squad = squad.map(preprocess_function, batched=True, remove_columns=squad["train"].column_names)

  0%|          | 0/88 [00:00<?, ?ba/s]

  0%|          | 0/11 [00:00<?, ?ba/s]

## Training

In [9]:
data_collator = DefaultDataCollator()
model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")


training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_squad["train"],
    eval_dataset=tokenized_squad["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
)

trainer.train()

Downloading:   0%|          | 0.00/256M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForQuestionAnswering: ['vocab_layer_norm.bias', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_projector.weight', 'vocab_projector.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this mode

  0%|          | 0/16425 [00:00<?, ?it/s]

Saving model checkpoint to ./results\checkpoint-500
Configuration saved in ./results\checkpoint-500\config.json


{'loss': 2.8749, 'learning_rate': 1.9391171993911722e-05, 'epoch': 0.09}


Model weights saved in ./results\checkpoint-500\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-500\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-500\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-1000
Configuration saved in ./results\checkpoint-1000\config.json


{'loss': 1.7325, 'learning_rate': 1.8782343987823442e-05, 'epoch': 0.18}


Model weights saved in ./results\checkpoint-1000\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-1000\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-1000\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-1500
Configuration saved in ./results\checkpoint-1500\config.json


{'loss': 1.5901, 'learning_rate': 1.8173515981735163e-05, 'epoch': 0.27}


Model weights saved in ./results\checkpoint-1500\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-1500\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-1500\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-2000
Configuration saved in ./results\checkpoint-2000\config.json


{'loss': 1.4664, 'learning_rate': 1.756468797564688e-05, 'epoch': 0.37}


Model weights saved in ./results\checkpoint-2000\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-2000\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-2000\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-2500
Configuration saved in ./results\checkpoint-2500\config.json


{'loss': 1.4108, 'learning_rate': 1.69558599695586e-05, 'epoch': 0.46}


Model weights saved in ./results\checkpoint-2500\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-2500\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-2500\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-3000
Configuration saved in ./results\checkpoint-3000\config.json


{'loss': 1.3446, 'learning_rate': 1.634703196347032e-05, 'epoch': 0.55}


Model weights saved in ./results\checkpoint-3000\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-3000\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-3000\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-3500
Configuration saved in ./results\checkpoint-3500\config.json


{'loss': 1.2888, 'learning_rate': 1.573820395738204e-05, 'epoch': 0.64}


Model weights saved in ./results\checkpoint-3500\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-3500\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-3500\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-4000
Configuration saved in ./results\checkpoint-4000\config.json


{'loss': 1.2434, 'learning_rate': 1.5129375951293761e-05, 'epoch': 0.73}


Model weights saved in ./results\checkpoint-4000\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-4000\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-4000\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-4500
Configuration saved in ./results\checkpoint-4500\config.json


{'loss': 1.2414, 'learning_rate': 1.4520547945205482e-05, 'epoch': 0.82}


Model weights saved in ./results\checkpoint-4500\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-4500\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-4500\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-5000
Configuration saved in ./results\checkpoint-5000\config.json


{'loss': 1.2332, 'learning_rate': 1.39117199391172e-05, 'epoch': 0.91}


Model weights saved in ./results\checkpoint-5000\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-5000\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-5000\special_tokens_map.json
***** Running Evaluation *****
  Num examples = 10570
  Batch size = 16


  0%|          | 0/661 [00:00<?, ?it/s]

{'eval_loss': 1.1453055143356323, 'eval_runtime': 63.5102, 'eval_samples_per_second': 166.43, 'eval_steps_per_second': 10.408, 'epoch': 1.0}


Saving model checkpoint to ./results\checkpoint-5500
Configuration saved in ./results\checkpoint-5500\config.json


{'loss': 1.177, 'learning_rate': 1.330289193302892e-05, 'epoch': 1.0}


Model weights saved in ./results\checkpoint-5500\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-5500\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-5500\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-6000
Configuration saved in ./results\checkpoint-6000\config.json


{'loss': 0.9725, 'learning_rate': 1.2694063926940641e-05, 'epoch': 1.1}


Model weights saved in ./results\checkpoint-6000\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-6000\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-6000\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-6500
Configuration saved in ./results\checkpoint-6500\config.json


{'loss': 0.9778, 'learning_rate': 1.2085235920852361e-05, 'epoch': 1.19}


Model weights saved in ./results\checkpoint-6500\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-6500\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-6500\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-7000
Configuration saved in ./results\checkpoint-7000\config.json


{'loss': 0.9969, 'learning_rate': 1.147640791476408e-05, 'epoch': 1.28}


Model weights saved in ./results\checkpoint-7000\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-7000\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-7000\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-7500
Configuration saved in ./results\checkpoint-7500\config.json


{'loss': 0.9872, 'learning_rate': 1.08675799086758e-05, 'epoch': 1.37}


Model weights saved in ./results\checkpoint-7500\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-7500\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-7500\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-8000
Configuration saved in ./results\checkpoint-8000\config.json


{'loss': 0.9602, 'learning_rate': 1.025875190258752e-05, 'epoch': 1.46}


Model weights saved in ./results\checkpoint-8000\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-8000\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-8000\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-8500
Configuration saved in ./results\checkpoint-8500\config.json


{'loss': 0.9631, 'learning_rate': 9.64992389649924e-06, 'epoch': 1.55}


Model weights saved in ./results\checkpoint-8500\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-8500\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-8500\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-9000
Configuration saved in ./results\checkpoint-9000\config.json


{'loss': 0.9517, 'learning_rate': 9.04109589041096e-06, 'epoch': 1.64}


Model weights saved in ./results\checkpoint-9000\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-9000\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-9000\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-9500
Configuration saved in ./results\checkpoint-9500\config.json


{'loss': 0.9366, 'learning_rate': 8.432267884322679e-06, 'epoch': 1.74}


Model weights saved in ./results\checkpoint-9500\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-9500\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-9500\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-10000
Configuration saved in ./results\checkpoint-10000\config.json


{'loss': 0.935, 'learning_rate': 7.823439878234399e-06, 'epoch': 1.83}


Model weights saved in ./results\checkpoint-10000\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-10000\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-10000\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-10500
Configuration saved in ./results\checkpoint-10500\config.json


{'loss': 0.9539, 'learning_rate': 7.214611872146119e-06, 'epoch': 1.92}


Model weights saved in ./results\checkpoint-10500\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-10500\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-10500\special_tokens_map.json
***** Running Evaluation *****
  Num examples = 10570
  Batch size = 16


  0%|          | 0/661 [00:00<?, ?it/s]

{'eval_loss': 1.0922610759735107, 'eval_runtime': 64.6042, 'eval_samples_per_second': 163.612, 'eval_steps_per_second': 10.232, 'epoch': 2.0}


Saving model checkpoint to ./results\checkpoint-11000
Configuration saved in ./results\checkpoint-11000\config.json


{'loss': 0.9225, 'learning_rate': 6.605783866057839e-06, 'epoch': 2.01}


Model weights saved in ./results\checkpoint-11000\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-11000\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-11000\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-11500
Configuration saved in ./results\checkpoint-11500\config.json


{'loss': 0.7918, 'learning_rate': 5.996955859969558e-06, 'epoch': 2.1}


Model weights saved in ./results\checkpoint-11500\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-11500\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-11500\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-12000
Configuration saved in ./results\checkpoint-12000\config.json


{'loss': 0.7585, 'learning_rate': 5.388127853881279e-06, 'epoch': 2.19}


Model weights saved in ./results\checkpoint-12000\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-12000\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-12000\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-12500
Configuration saved in ./results\checkpoint-12500\config.json


{'loss': 0.7621, 'learning_rate': 4.779299847792998e-06, 'epoch': 2.28}


Model weights saved in ./results\checkpoint-12500\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-12500\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-12500\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-13000
Configuration saved in ./results\checkpoint-13000\config.json


{'loss': 0.757, 'learning_rate': 4.170471841704719e-06, 'epoch': 2.37}


Model weights saved in ./results\checkpoint-13000\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-13000\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-13000\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-13500
Configuration saved in ./results\checkpoint-13500\config.json


{'loss': 0.7686, 'learning_rate': 3.5616438356164386e-06, 'epoch': 2.47}


Model weights saved in ./results\checkpoint-13500\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-13500\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-13500\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-14000
Configuration saved in ./results\checkpoint-14000\config.json


{'loss': 0.7631, 'learning_rate': 2.9528158295281586e-06, 'epoch': 2.56}


Model weights saved in ./results\checkpoint-14000\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-14000\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-14000\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-14500
Configuration saved in ./results\checkpoint-14500\config.json


{'loss': 0.7601, 'learning_rate': 2.343987823439878e-06, 'epoch': 2.65}


Model weights saved in ./results\checkpoint-14500\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-14500\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-14500\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-15000
Configuration saved in ./results\checkpoint-15000\config.json


{'loss': 0.7685, 'learning_rate': 1.7351598173515982e-06, 'epoch': 2.74}


Model weights saved in ./results\checkpoint-15000\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-15000\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-15000\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-15500
Configuration saved in ./results\checkpoint-15500\config.json


{'loss': 0.756, 'learning_rate': 1.1263318112633182e-06, 'epoch': 2.83}


Model weights saved in ./results\checkpoint-15500\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-15500\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-15500\special_tokens_map.json
Saving model checkpoint to ./results\checkpoint-16000
Configuration saved in ./results\checkpoint-16000\config.json


{'loss': 0.7335, 'learning_rate': 5.17503805175038e-07, 'epoch': 2.92}


Model weights saved in ./results\checkpoint-16000\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-16000\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-16000\special_tokens_map.json
***** Running Evaluation *****
  Num examples = 10570
  Batch size = 16


  0%|          | 0/661 [00:00<?, ?it/s]



Training completed. Do not forget to share your model on huggingface.co/models =)




{'eval_loss': 1.1355029344558716, 'eval_runtime': 64.9822, 'eval_samples_per_second': 162.66, 'eval_steps_per_second': 10.172, 'epoch': 3.0}
{'train_runtime': 5218.5399, 'train_samples_per_second': 50.358, 'train_steps_per_second': 3.147, 'train_loss': 1.0779655988394217, 'epoch': 3.0}


TrainOutput(global_step=16425, training_loss=1.0779655988394217, metrics={'train_runtime': 5218.5399, 'train_samples_per_second': 50.358, 'train_steps_per_second': 3.147, 'train_loss': 1.0779655988394217, 'epoch': 3.0})

## Using the Model

In [15]:

model_checkpoint = "./results/checkpoint-16000"
question_answerer = pipeline("question-answering", model=model_checkpoint)

loading configuration file ./results/checkpoint-16000\config.json
Model config DistilBertConfig {
  "_name_or_path": "./results/checkpoint-16000",
  "activation": "gelu",
  "architectures": [
    "DistilBertForQuestionAnswering"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "torch_dtype": "float32",
  "transformers_version": "4.18.0",
  "vocab_size": 30522
}

loading configuration file ./results/checkpoint-16000\config.json
Model config DistilBertConfig {
  "_name_or_path": "./results/checkpoint-16000",
  "activation": "gelu",
  "architectures": [
    "DistilBertForQuestionAnswering"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 

In [16]:
## Context and question from tutorial 

context = """
🤗 Transformers is backed by the three most popular deep learning libraries — Jax, PyTorch and TensorFlow — with a seamless integration
between them. It's straightforward to train your models with one before loading them for inference with the other.
"""
question = "Which deep learning libraries back 🤗 Transformers?"
question_answerer(question=question, context=context)

{'score': 0.9715139269828796,
 'start': 78,
 'end': 105,
 'answer': 'Jax, PyTorch and TensorFlow'}

In [17]:
## Context from britannica.com on dogs 

context = """
A dog is a domestic mammal of the family Canidae and the order Carnivora. Its scientific name is Canis lupus familiaris. Dogs are a subspecies of the gray wolf, and they are also related to foxes and jackals. Dogs are one of the two most ubiquitous and most popular domestic animals in the world
"""
question = "What is the scientific name for dogs"
question_answerer(question=question, context=context)

{'score': 0.9615740776062012,
 'start': 98,
 'end': 120,
 'answer': 'Canis lupus familiaris'}

In [18]:
question = "Who is mans best friend?"
question_answerer(question=question, context=context)

{'score': 0.04664964973926544, 'start': 1, 'end': 6, 'answer': 'A dog'}

In [19]:
question = "Who is greg?"
question_answerer(question=question, context=context)

{'score': 0.03929330036044121,
 'start': 1,
 'end': 27,
 'answer': 'A dog is a domestic mammal'}

In [20]:
question = "what is a cat?"
question_answerer(question=question, context=context)

{'score': 0.1440318524837494, 'start': 3, 'end': 6, 'answer': 'dog'}