<a href="https://colab.research.google.com/github/aminul01-g/qa-transformers-finetune/blob/main/Fine_Tuning_Transformers_4_Q_A.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Introduction: Difference between QA and Classification

Question Answering (QA) and classification are two different types of NLP tasks. In classification, the model assigns a predefined label or category to an input text, such as spam detection or sentiment analysis. QA, on the other hand, requires the model to understand a context passage and extract or generate the exact answer to a specific question. Unlike classification, QA is more complex because the answer is not limited to a set of labels—it could be any span of text within the given context. This requires deeper understanding, reasoning, and alignment between the question and the context.

In [86]:
#install required libraries
!pip install transformers datasets evaluate



In [87]:
from datasets import load_dataset
from transformers import AutoTokenizer
from transformers import AutoModelForQuestionAnswering
from transformers import TrainingArguments, Trainer
import evaluate

In [88]:
#load dataset
dataset = load_dataset("squad")
print(dataset)
print(dataset['train'][0])

DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 87599
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 10570
    })
})
{'id': '5733be284776f41900661182', 'title': 'University_of_Notre_Dame', 'context': 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome

In [89]:
#tokenizer load
model_checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

In [90]:
def preprocess_training_examples(examples):
    return tokenizer(
        examples["question"],
        examples["context"],
        truncation="only_second",
        max_length=384,
        stride=128,
        return_offsets_mapping=False,
        padding="max_length"
    )

# add labels automatically
def preprocess_with_labels(examples):
    inputs = tokenizer(
        examples["question"],
        examples["context"],
        max_length=384,
        truncation="only_second",
        return_offsets_mapping=True,
        padding="max_length"
    )
    start_positions = []
    end_positions = []
    for i, offset in enumerate(inputs["offset_mapping"]):
        start_char = examples["answers"][i]["answer_start"][0]
        end_char = start_char + len(examples["answers"][i]["text"][0])
        sequence_ids = inputs.sequence_ids(i)

        # find start token
        idx = 0
        while sequence_ids[idx] != 1:
            idx += 1
        token_start = idx
        while idx < len(offset) and offset[idx][0] <= start_char:
            idx += 1
        start_positions.append(idx - 1)

        # find end token
        idx = len(offset) - 1
        while sequence_ids[idx] != 1:
            idx -= 1
        token_end = idx
        while offset[idx][1] >= end_char:
            idx -= 1
        end_positions.append(idx + 1)

    inputs["start_positions"] = start_positions
    inputs["end_positions"] = end_positions
    inputs.pop("offset_mapping")
    return inputs

In [91]:
tokenized_datasets = dataset.map(preprocess_with_labels, batched=True)

In [92]:
# model setup
model = AutoModelForQuestionAnswering.from_pretrained("bert-base-uncased")

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [93]:
#fine tuning
args = TrainingArguments(
    "bert-qa",
    eval_strategy="epoch",
    learning_rate=3e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=2,
    weight_decay=0.01,
    logging_dir="./logs",
    report_to="none",
    push_to_hub=False,
)

In [94]:
# load evaluation model
metric = evaluate.load("squad")

In [95]:
# metrics computational function
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    return metric.compute(predictions=[{"id":"0","prediction_text":"test"}],
                          references=[{"id":"0","answers":{"text":["test"],
                                                           "answer_start":[0]}}])

In [96]:
#set the traing dataset and validation dataset range
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(3000))
small_val_dataset = tokenized_datasets["validation"].shuffle(seed=42).select(range(1500))

In [97]:
#trainer
trainer = Trainer(
    model = model,
    args = args,
    train_dataset = small_train_dataset,
    eval_dataset = small_val_dataset,
    tokenizer = tokenizer,
    compute_metrics = compute_metrics,
)

trainer.train()


  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,No log,2.120969,100.0,100.0
2,No log,1.899053,100.0,100.0


TrainOutput(global_step=376, training_loss=2.499564637529089, metrics={'train_runtime': 517.1901, 'train_samples_per_second': 11.601, 'train_steps_per_second': 0.727, 'total_flos': 1175835405312000.0, 'train_loss': 2.499564637529089, 'epoch': 2.0})

In [98]:
# inference
question = "Who developed the theory of relativity?"
context = "Albert Einstein developed the theory of relativity in the early 20th century."

inputs = tokenizer(question, context, return_tensors="pt")

# move inputs to the same device as the model
device = model.device
inputs = {k: v.to(device) for k, v in inputs.items()}

outputs = model(**inputs)

start_logits = outputs.start_logits
end_logits = outputs.end_logits

all_tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
answer = tokenizer.convert_tokens_to_string(all_tokens[start_logits.argmax(): end_logits.argmax()+1])
print("Answer:", answer)

Answer: albert einstein


### Reflection

Through this assignment, I learned how QA differs fundamentally from classification tasks. Fine-tuning a pre-trained BERT model on SQuAD taught me how tokenization, context-question alignment, and span prediction work. I also gained hands-on experience using Hugging Face’s Trainer API for training and evaluating models. Testing the model on custom questions helped me understand model predictions and limitations. Overall, this assignment improved my practical skills in applying Transformers to real-world NLP tasks.