# 🤗 Hugging Face Trainer - Detailed Training Lab
This notebook is a comprehensive guide to using Hugging Face's `Trainer` API for training and fine-tuning transformer models. It provides detailed explanations of each component, including dataset loading, tokenization, model selection, training argument configuration, custom optimizer integration, metric evaluation, and model saving.


## 📋 Overview
We will walk through the following steps:
1. **Load and explore a dataset**
2. **Tokenize and preprocess the data**
3. **Load a pre-trained model for classification**
4. **Configure training arguments**
5. **Customize the optimizer and learning rate scheduler**
6. **Define the Hugging Face `Trainer`**
7. **Train and evaluate the model**
8. **Save the fine-tuned model**

## 📦 Install Required Libraries
Install Hugging Face Transformers, Datasets, and Evaluate packages.

In [1]:
#!pip install transformers datasets evaluate -q

## 📚 Import Libraries
We import all necessary components from the Transformers and Datasets libraries.

In [2]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
from transformers import  get_scheduler
import torch
from datasets import load_dataset
import evaluate
import torch
import numpy as np
import os
os.environ["WANDB_MODE"]="offline"
os.environ["WANDB_DISABLED"] = "true"


## 🗂️ Load and Explore the Dataset
We use the IMDb dataset, which is a binary sentiment classification dataset (positive/negative reviews).

In [3]:
dataset = load_dataset("imdb")
dataset = dataset.shuffle(seed=42)
dataset['train'][0]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


{'text': 'There is no relation at all between Fortier and Profiler but the fact that both are police series about violent crimes. Profiler looks crispy, Fortier looks classic. Profiler plots are quite simple. Fortier\'s plot are far more complicated... Fortier looks more like Prime Suspect, if we have to spot similarities... The main character is weak and weirdo, but have "clairvoyance". People like to compare, to judge, to evaluate. How about just enjoying? Funny thing too, people writing Fortier looks American but, on the other hand, arguing they prefer American series (!!!). Maybe it\'s the language, or the spirit, but I think this series is more English than American. By the way, the actors are really good and funny. The acting is not superficial at all...',
 'label': 1}

## ✂️ Tokenize the Dataset
We use a tokenizer corresponding to a pre-trained transformer model to tokenize the raw text data. Tokenization is essential to convert text into input IDs and attention masks suitable for transformers.

In [4]:
checkpoint = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

def tokenize_function(example):
    return tokenizer(example["text"], truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(["text"])
tokenized_datasets = tokenized_datasets.rename_column("label", "labels")
tokenized_datasets.set_format("torch")

## 🧠 Load Pre-trained Model
We load a pre-trained DistilBERT model for sequence classification. The model head is adjusted for binary classification (2 labels).

In [5]:
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## ⚙️ Configure Training Arguments
`TrainingArguments` is a configuration class to customize the training process. You can set batch sizes, learning rate, evaluation strategy, logging, weight decay, and other parameters.

In [6]:
training_args = TrainingArguments(
    report_to=None, # "wandb"
    output_dir="./results",                 # Where to store model checkpoints
    eval_strategy="epoch",            # Evaluate after every epoch
    save_strategy="epoch",                  # Save model after every epoch
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,                        # L2 regularization
    logging_dir="./logs",
    logging_steps=10,
    load_best_model_at_end=True              # Load best checkpoint (based on eval metric)
)

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


## 🛠 Custom Optimizer and Learning Rate Scheduler
Instead of using the default optimizer/scheduler, we define our own:
- `AdamW` is a popular optimizer for transformers
- `get_scheduler` allows for linear decay of the learning rate during training

In [7]:
# Define custom optimizer
optimizer = torch.optim.AdamW(model.parameters(), lr=training_args.learning_rate, weight_decay=training_args.weight_decay)

# Setup learning rate scheduler
num_training_steps = len(tokenized_datasets['train']) // training_args.per_device_train_batch_size * training_args.num_train_epochs
lr_scheduler = get_scheduler(
    name="linear",
    optimizer=optimizer,
    num_warmup_steps=0,
    num_training_steps=num_training_steps
)

## 📏 Define Evaluation Metrics
We use `accuracy` as the evaluation metric. The `compute_metrics` function will be called by the `Trainer` during evaluation.

In [8]:
accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return accuracy.compute(predictions=predictions, references=labels)

## 🧪 Define the Trainer
The `Trainer` class handles the training loop, evaluation, and saving. You pass in the model, datasets, tokenizer, training arguments, metrics function, and optimizer/scheduler.

In [9]:
trainer = Trainer(

    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"].shuffle(seed=42).select(range(2000)),
    eval_dataset=tokenized_datasets["test"].select(range(1000)),
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
    optimizers=(optimizer, lr_scheduler)
)

  trainer = Trainer(


## 🚀 Train the Model
We now train the model using the `train()` method of the `Trainer`.

In [10]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,0.2464,0.287871,0.884
2,0.2217,0.368066,0.885
3,0.188,0.435541,0.878


TrainOutput(global_step=375, training_loss=0.24622172705332437, metrics={'train_runtime': 353.0413, 'train_samples_per_second': 16.995, 'train_steps_per_second': 1.062, 'total_flos': 790006588340928.0, 'train_loss': 0.24622172705332437, 'epoch': 3.0})

## 📊 Evaluate the Model
Use the `evaluate()` method to get performance metrics on the evaluation dataset.

In [11]:
trainer.evaluate()

{'eval_loss': 0.28787073493003845,
 'eval_accuracy': 0.884,
 'eval_runtime': 15.7478,
 'eval_samples_per_second': 63.501,
 'eval_steps_per_second': 4.001,
 'epoch': 3.0}

## 💾 Save the Fine-tuned Model
After training, save the model to disk for later use or deployment.

In [12]:
trainer.save_model("./fine-tuned-imdb")