# Fine-tuning a model with the Trainer API

Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [2]:
!pip install datasets evaluate transformers[sentencepiece]
!pip install evaluate


Collecting evaluate
  Downloading evaluate-0.4.5-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.5-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.5


## 🗂️ Data Preparation for Fine-Tuning a Model with Trainer API

**In this section, we:**

- 📥 **Load the MRPC dataset** (sentence pairs with labels for paraphrase detection)
- 🔤 **Load the BERT tokenizer** to convert text into input IDs
- ✂️ **Tokenize the dataset** – prepare sentence pairs for the model
- 🛒 **Set up a data collator** for dynamic padding during batching

In [4]:
# Import the necessary libraries
from datasets import load_dataset                       # For loading datasets
from transformers import AutoTokenizer, DataCollatorWithPadding  # For tokenization and padding

# Load the GLUE MRPC dataset, which contains pairs of sentences and labels indicating if they are paraphrases
raw_datasets = load_dataset("glue", "mrpc")

# Specify the pretrained model checkpoint for BERT (uncased version)
checkpoint = "bert-base-uncased"

# Load the tokenizer corresponding to the pretrained BERT model
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

# Define a function to tokenize pairs of sentences in the dataset, truncating sequences longer than model allows
def tokenize_function(example):
    return tokenizer(example["sentence1"], example["sentence2"], truncation=True)

# Apply the tokenizer function to the entire dataset with batching for speed and efficiency
tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)

# Initialize a data collator that dynamically pads inputs in each batch to the longest sequence in that batch
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

mrpc/train-00000-of-00001.parquet:   0%|          | 0.00/649k [00:00<?, ?B/s]

mrpc/validation-00000-of-00001.parquet:   0%|          | 0.00/75.7k [00:00<?, ?B/s]

mrpc/test-00000-of-00001.parquet:   0%|          | 0.00/308k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/3668 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/408 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1725 [00:00<?, ? examples/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Map:   0%|          | 0/3668 [00:00<?, ? examples/s]

Map:   0%|          | 0/408 [00:00<?, ? examples/s]

Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

## 🏋️ Training & Fine-Tuning with the Trainer API

**In this section, we:**
- ⚙️ **Define training arguments** (where to save models, how often to evaluate, number of epochs, more)
- 🧠 **Load our classification model** (BERT with a sequence classification head)
- 🤖 **Set up the Trainer** to handle all the training loop and evaluation logic
- 🚀 **Start fine-tuning** the model on our preprocessed dataset

In [3]:
# Import Required Libraries
from transformers import TrainingArguments,Trainer,AutoModelForSequenceClassification

# Step 1: Define training arguments
training_args = TrainingArguments("test-trainer")

# Step 2: Load model
model = AutoModelForSequenceClassification.from_pretrained(checkpoint,num_labels=2)

# Step 3: Set up Trainer
trainer=Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    processing_class=tokenizer,
)

# Step 4: Fine-tune modek on our dataset 'MRPC'
trainer.train()

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  | |_| | '_ \/ _` / _` |  _/ -_)


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mlakshmi-adhikari26[0m ([33mlakshmi-adhikari26-personalproject[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss
500,0.5528
1000,0.2938


TrainOutput(global_step=1377, training_loss=0.3427915109095577, metrics={'train_runtime': 272.6267, 'train_samples_per_second': 40.363, 'train_steps_per_second': 5.051, 'total_flos': 405114969714960.0, 'train_loss': 0.3427915109095577, 'epoch': 3.0})

## 📊 Evaluation During Fine-Tuning

**In this section, we:**
- 🔍 Use the `Trainer.predict()` method to get model predictions on the validation set.
- 🎯 Define a `compute_metrics()` function that converts raw model outputs (logits) into class predictions and calculates evaluation metrics like accuracy and F1 score.
- 📈 Incorporate `compute_metrics()` with the `Trainer` to report validation performance automatically during training.

In [5]:
# Import Required Libraries
import numpy as np
import evaluate
from transformers import TrainingArguments,Trainer,AutoModelForSequenceClassification

# Step 1: Define the compute_metrics function
def compute_metrics(eval_preds):
  # Load the standard evaluation metric for MRPC ( from the GLUE benchmark)
  metric = evaluate.load("glue","mrpc")

  # Unpack the predictions and labels from eval_preds tuple
  logits,labels=eval_preds

  # Convert logits to predicted class indices by taking the argmax of each prediction's scores
  predictions = np.argmax(logits,axis=-1)

  # Compute and return the evaluation metrics (accuracy and f1)
  return metric.compute(predictions=predictions,references=labels)

# Step 2: Setup TrainingArguments with evaluation enabled at the of each epoch
training_args = TrainingArguments(
    "test-trainer",
    eval_strategy="epoch",
    fp16=True,  # Use mixed precision
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-5,
    lr_scheduler_type="cosine",
    num_train_epochs=3,
  )

# Step 3: Load model (new instance for fresh training with metrics)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint,num_labels=2)

# Step 4: Initialize the Trainer with compute_metrics function
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer, # use tokenizer instead of processing_class for latest version
    compute_metrics=compute_metrics # pass our metric function
)

# Step 5: Train the model; metrics will be reported each epoch
trainer.train()

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(
  | |_| | '_ \/ _` / _` |  _/ -_)


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mlakshmi-adhikari26[0m ([33mlakshmi-adhikari26-personalproject[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.456454,0.813725,0.874587
2,No log,0.369663,0.835784,0.879713
3,0.436400,0.421193,0.835784,0.886248


Downloading builder script: 0.00B [00:00, ?B/s]

TrainOutput(global_step=690, training_loss=0.3720386615697888, metrics={'train_runtime': 233.7853, 'train_samples_per_second': 47.069, 'train_steps_per_second': 2.951, 'total_flos': 377531475559680.0, 'train_loss': 0.3720386615697888, 'epoch': 3.0})