# Question Answering (QA) is different from classification.

> In classification, the model outputs a class label (e.g., positive/negative sentiment).

>In extractive QA, the model must predict the start and end positions of the answer inside a given passage.

Example:
Context: "The Eiffel Tower is located in Paris, France."

Question: "Where is the Eiffel Tower located?"

Answer: "Paris, France"

In [None]:
!pip install transformers datasets evaluate tokenizers

Collecting evaluate
  Downloading evaluate-0.4.5-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.5-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.5


In [None]:
from datasets import load_dataset
import pprint
dataset = load_dataset("squad")
print(dataset)
pprint.pprint(dataset["train"][0])


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

plain_text/train-00000-of-00001.parquet:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

plain_text/validation-00000-of-00001.par(…):   0%|          | 0.00/1.82M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/87599 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/10570 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 87599
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 10570
    })
})
{'answers': {'answer_start': [515], 'text': ['Saint Bernadette Soubirous']},
 'context': 'Architecturally, the school has a Catholic character. Atop the '
            "Main Building's gold dome is a golden statue of the Virgin Mary. "
            'Immediately in front of the Main Building and facing it, is a '
            'copper statue of Christ with arms upraised with the legend '
            '"Venite Ad Me Omnes". Next to the Main Building is the Basilica '
            'of the Sacred Heart. Immediately behind the basilica is the '
            'Grotto, a Marian place of prayer and reflection. It is a replica '
            'of the grotto at Lourdes, France where the Virgin Mary reputedly '
            'appeared to Saint Berna

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def preprocess_function(examples):
    questions = [q.strip() for q in examples["question"]]
    inputs = tokenizer(
        questions,
        examples["context"],
        truncation=True,
        padding="max_length",
        max_length=384,
        return_offsets_mapping=True
    )

    start_positions = []
    end_positions = []
    for i, offset in enumerate(inputs["offset_mapping"]):
        answer = examples["answers"][i]
        start_char = answer["answer_start"][0]
        end_char = start_char + len(answer["text"][0])
        sequence_ids = inputs.sequence_ids(i)

        token_start = token_end = 0
        for j, (start, end) in enumerate(offset):
            if sequence_ids[j] == 1:  # context tokens
                if start <= start_char < end:
                    token_start = j
                if start < end_char <= end:
                    token_end = j
        start_positions.append(token_start)
        end_positions.append(token_end)

    inputs["start_positions"] = start_positions
    inputs["end_positions"] = end_positions
    inputs.pop("offset_mapping")
    return inputs

tokenized_datasets = dataset.map(
    preprocess_function,
    batched=True,
    remove_columns=dataset["train"].column_names
    )


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Map:   0%|          | 0/87599 [00:00<?, ? examples/s]

Map:   0%|          | 0/10570 [00:00<?, ? examples/s]

In [None]:
from transformers import AutoModelForQuestionAnswering

model = AutoModelForQuestionAnswering.from_pretrained("bert-base-uncased")


model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
from transformers import TrainingArguments, Trainer
import evaluate
import numpy as np
from collections import defaultdict

squad_metric = evaluate.load("squad")


def compute_metrics(eval_pred):
    start_logits, end_logits = eval_pred.predictions
    start_labels, end_labels = eval_pred.label_ids

    start_preds = np.argmax(start_logits, axis=1)
    end_preds = np.argmax(end_logits, axis=1)

    predictions = []
    references = []

    # Loop over each example
    for i in range(len(start_preds)):
        input_ids = tokenized_datasets["validation"][i]["input_ids"]
        # decode predicted answer
        pred_text = tokenizer.decode(input_ids[start_preds[i]:end_preds[i]+1])
        # decode true answer
        true_text = tokenizer.decode(input_ids[start_labels[i]:end_labels[i]+1])

        predictions.append({"id": str(i), "prediction_text": pred_text})
        references.append({"id": str(i), "answers": {"text": [true_text], "answer_start": [0]}})

    results = squad_metric.compute(predictions=predictions, references=references)
    return results


training_args = TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch", # Changed from evaluation_strategy
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.1,
    logging_dir='./logs',
    logging_steps=100,
    save_strategy="epoch",
    report_to="none" # Disable wandb logging

)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"].select(range(10000)),  # small subset for Colab
    eval_dataset=tokenized_datasets["validation"].select(range(2000)),
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

trainer.train()

  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,1.033,1.447192,53.7,66.116648
2,0.7662,1.45378,56.95,68.674484
3,0.5892,1.533821,57.8,69.751597


TrainOutput(global_step=1875, training_loss=0.8466719563802083, metrics={'train_runtime': 2306.7455, 'train_samples_per_second': 13.005, 'train_steps_per_second': 0.813, 'total_flos': 5879177026560000.0, 'train_loss': 0.8466719563802083, 'epoch': 3.0})

In [None]:
# Evaluate on validation
trainer.evaluate()

{'eval_loss': 1.5338207483291626,
 'eval_exact_match': 57.8,
 'eval_f1': 69.75159699522395,
 'eval_runtime': 48.7281,
 'eval_samples_per_second': 41.044,
 'eval_steps_per_second': 2.565,
 'epoch': 3.0}

In [None]:
import torch

# Make sure the model is on the correct device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def answer_question(question, context):
    """
    Given a question and a context, return the model's predicted answer.
    """
    # Tokenize and move inputs to the same device as the model
    inputs = tokenizer(question, context, return_tensors="pt", truncation=True, max_length=384).to(device)

    # Get model outputs
    with torch.no_grad():
        outputs = model(**inputs)

    # Get the most likely start and end token positions
    start_idx = torch.argmax(outputs.start_logits)
    end_idx = torch.argmax(outputs.end_logits)

    # Decode the answer from the input_ids
    answer = tokenizer.decode(inputs["input_ids"][0][start_idx:end_idx+1])

    return answer

# Example usage
question_1 = "Who developed the theory of relativity?"
context_1 = "Albert Einstein developed the theory of relativity in the early 20th century."

question_2 = "In which century did Einstein develop his theory?"
context_2 = "Albert Einstein developed the theory of relativity in the early 20th century."

print("Q1:", question_1)
print("A1:", answer_question(question_1, context_1))
print("\nQ2:", question_2)
print("A2:", answer_question(question_2, context_2))


Q1: Who developed the theory of relativity?
A1: albert einstein

Q2: In which century did Einstein develop his theory?
A2: 20th century


This project demonstrates how to adapt a powerful pre-trained language model, BERT, for the specific task of extractive Question Answering. By fine-tuning the model on the SQuAD dataset, which consists of questions, contexts, and their corresponding answers, the model learns to identify the precise start and end points of the answer within the provided text. The process involves tokenizing the input text into a format the model understands and training the model to predict the token indices for the answer span. Upon evaluation after 2 training epochs, the model achieved an Exact Match score of 57.8%, meaning it correctly identified the answer span exactly, and an F1 score of 69.75%, a measure that considers both the precision and recall of the predicted answer span compared to the true answer. These results demonstrate that the fine-tuned model is capable of extracting answers from text with a reasonable degree of accuracy.