<a href="https://colab.research.google.com/github/Priscilla97/llm-rag-foundations/blob/main/02_fine_tuning/2_Fine_tuning_a_model_with_the_Trainer_API_or_Keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-tuning a model with the Trainer API

Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [None]:
!pip install datasets evaluate transformers[sentencepiece]

## Preparation
The code examples below assume you have already executed the examples in the previous section. Here is a short summary recapping what you need:

In [None]:
from datasets import load_dataset
from transformers import AutoTokenizer, DataCollatorWithPadding

raw_datasets = load_dataset("glue", "mrpc")
checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)


def tokenize_function(example):
    return tokenizer(example["sentence1"], example["sentence2"], truncation=True)


tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

## Training
1) define a **TrainingArguments** class that will contain all the *hyperparameters* the Trainer will use for training and evaluation.

The only argument you have to provide is a **directory** where the trained model will be **saved**.

Advanced Configuration: https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu

If you want to automatically upload your model to the Hub during training, pass along **push_to_hub=True** in the TrainingArguments.



In [None]:
from transformers import TrainingArguments

training_args = TrainingArguments("test-trainer")

2) define our model.

As in the previous chapter, we will use the **AutoModelForSequenceClassification** class, with two labels:

In [None]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

You get a **warning** after instantiating this pretrained model.

- This is because BERT has not been pretrained on classifying pairs of sentences

- so the **head of the pretrained model has been discarded** and

- a **new head** suitable for sequence classification has been added instead.

The warnings indicate that:
- some **weights were not used** (the ones corresponding to the dropped pretraining head)

- and that some others were **randomly initialized** (the ones for the new head).


3) define a **Trainer** by passing it all the objects constructed up to now
- the model,
- the training_args,
- the training and validation datasets,
- our data_collator,
- our processing_class.

The **processing_class** parameter is a newer addition that *tells the Trainer which tokenizer to use for processing*.

The default **data_collator** used by the Trainer will be a DataCollatorWithPadding.


In [None]:
from transformers import Trainer

trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    processing_class=tokenizer,
)

To fine-tune the model on our dataset, we just have to call the train() method of our Trainer:

In [None]:
trainer.train()

This will start the **fine-tuning** (in GPU) and report the *training loss* every 500 steps.

It won’t, however, tell you how well (or badly) your model is performing. To do so we need:

- to tell the Trainer to **evaluate** during training by setting **eval_strategy** in TrainingArguments

- how many **"steps"** (evaluate every **eval_steps**) or

- **"epoch"** (evaluate at the end of each epoch).

- **compute_metrics**() function to calculate a metric during said evaluation (otherwise the evaluation would just have printed the loss, which is not a very intuitive number).

## Evaluation
Let’s see how we can build a useful **compute_metrics**() function and use it the next time we train.

The function must take an **EvalPrediction object**, a named tuple:
- predictions field (dataset validation)
- label_ids field

and will return a **dictionary** mapping strings (names metrics) to floats (values).

To get some predictions from our model, we can use the **Trainer.predict(**) command:

In [None]:
predictions = trainer.predict(tokenized_datasets["validation"])
print(predictions.predictions.shape, predictions.label_ids.shape)

(408, 2) (408,)

The output of the predict() method is another named tuple with three fields: **predictions, label_ids, and metrics**.

The **metrics field** will just contain the **loss** on the dataset passed, as well as some time metrics.

As you can see, predictions is a **two-dimensional array** with shape 408 x 2 (408 being the number of elements in the dataset we used). <br>Those are the **logits** for each element of the dataset we passed to predict().

To transform them into predictions that we can compare to our labels, we need to take the index with the maximum value on the second axis:

In [None]:
import numpy as np

preds = np.argmax(predictions.predictions, axis=-1)

We can now compare those preds to the labels.

We can load the metrics associated with the MRPC dataset as easily as we loaded the dataset, this time with the **evaluate.load() function.**

The object returned has a **compute()** method we can use to do the metric calculation:

In [None]:
import evaluate

metric = evaluate.load("glue", "mrpc")
metric.compute(predictions=preds, references=predictions.label_ids)

{'accuracy': 0.8578431372549019, 'f1': 0.8996539792387542}

**Random initialization of the model head might change the metrics it achieved.**

Here, we can see our model has an **accuracy** of 85.78% on the validation set and an **F1 score** of 89.97.

Wrapping everything together, we get our compute_metrics() function:

In [None]:
def compute_metrics(eval_preds):
    metric = evaluate.load("glue", "mrpc")
    logits, labels = eval_preds
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

And to see it used in action to report metrics at the end of each epoch, here is how we define a new Trainer with this compute_metrics() function:

In [None]:
training_args = TrainingArguments("test-trainer", evaluation_strategy="epoch")
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    processing_class=tokenizer,
    compute_metrics=compute_metrics,
)

Note that we create a new TrainingArguments with its eval_strategy set to "epoch" and a new model — otherwise, we would just be continuing the training of the model we have already trained. To launch a new training run, we execute:

In [None]:
trainer.train()

## Advanced Training Features

The Trainer comes with many built-in features that make modern deep learning best practices accessible:

### Mixed Precision Training:
Use fp16=True in your training arguments for faster training and reduced memory usage:

In [None]:
training_args = TrainingArguments(
    "test-trainer",
    eval_strategy="epoch",
    fp16=True,  # Enable mixed precision
)

### Gradient Accumulation:
For effective larger batch sizes when GPU memory is limited:

In [None]:
training_args = TrainingArguments(
    "test-trainer",
    eval_strategy="epoch",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,  # Effective batch size = 4 * 4 = 16
)

### Learning Rate Scheduling:
The Trainer uses linear decay by default, but you can customize this:

In [None]:
training_args = TrainingArguments(
    "test-trainer",
    eval_strategy="epoch",
    learning_rate=2e-5,
    lr_scheduler_type="cosine",  # Try different schedulers
)