# FINE-TUNING

This is an example of fine-tunning a pre-trained model to complete a summarization task.

## 1) SET UP THE ENVIROMENT

Install required libraries and import the packages.

In [None]:
!pip install -q transformers datasets peft

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/480.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━[0m [32m317.4/480.6 kB[0m [31m9.3 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m471.0/480.6 kB[0m [31m9.9 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/116.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.3/179.3 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━

In [None]:
import os
import transformers
from transformers import T5Tokenizer, T5ForConditionalGeneration, Seq2SeqTrainer, Seq2SeqTrainingArguments
from datasets import load_dataset
from peft import get_peft_model, LoraConfig, TaskType

## 2) LOAD THE MODEL AND THE DATASET

We'll use the "t5-small" model. And we chose a specific dataset for summarization tasks. We just used a small subset of the dataset for testing the fine-tuning process without a large computational demand.

In [None]:
model_name = "t5-small"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

dataset = load_dataset("cnn_dailymail", "3.0.0", split="train[:1%]")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/15.6k [00:00<?, ?B/s]

train-00000-of-00003.parquet:   0%|          | 0.00/257M [00:00<?, ?B/s]

train-00001-of-00003.parquet:   0%|          | 0.00/257M [00:00<?, ?B/s]

train-00002-of-00003.parquet:   0%|          | 0.00/259M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/34.7M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/30.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/287113 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/13368 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/11490 [00:00<?, ? examples/s]

## 3) INFERENCE WITH THE PRE-TRAINED MODEL

First of all, we create a function to preprocess the dataset for the model. We tokenize the data using the map function, obtaining the dataset called "train_data", that we'll use in the fine-tuning process.

Then, we define the input text (a random "long" piece of text that we want to summarize).

After this, we can call the model to generate the summary. Thereby, we can obtain the answer of the model to the summarization task before being fine-tuned.

In [None]:
def preprocess_function(examples):
    inputs = examples['article']
    targets = examples['highlights']
    model_inputs = tokenizer(inputs, max_length=512, truncation=True, padding="max_length")

    labels = tokenizer(text_target=targets, max_length=128, truncation=True, padding="max_length")
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

train_data = dataset.map(preprocess_function, batched=True)

Map:   0%|          | 0/2871 [00:00<?, ? examples/s]

In [None]:
input_text = """ France is a country located in Western Europe. It is known for
its rich history, cultural heritage, and famous landmarks such as the Eiffel
Tower, the Louvre Museum, and the Palace of Versailles. Paris, its capital,
is often referred to as 'the City of Light' and is one of the most visited
cities in the world. France has been a major player in world history, from the
French Revolution to its colonial empire and its role in both World Wars.
Today, it remains a global leader in politics, culture, and economy. """

# Tokenize the input text
inputs = tokenizer(input_text, return_tensors="pt", max_length=1024, truncation=True)

# Generate the summary
summary_ids = model.generate(inputs['input_ids'], max_length=150, min_length=30, length_penalty=2.0, num_beams=4, early_stopping=True)

# Decode the summary
summary_before = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

# Show the summary
print(summary_before)

, from the French Revolution to its colonial empire and its role in both World Wars. Today, it remains a global leader in politics, culture, and economy.


## 4) SET UP TRAINING CONFIGURATION FOR THE FINE-TUNING PROCESS

1. Define a LoRA configuration
2. Apply the LoRA to the "new" model using the PEFT library.
3. Set the configuration for model training (training_args)
4. Create the Trainer (Seq2SeqTrainer)

In [None]:
lora_config = LoraConfig(
    task_type=TaskType.SEQ_2_SEQ_LM,
    r=4,
    lora_alpha=1,
    target_modules=["q", "v"],
    lora_dropout=0.1,
)

peft_model = get_peft_model(model, lora_config)

training_args = Seq2SeqTrainingArguments(
    output_dir="./results",
    auto_find_batch_size=True,
    learning_rate= 3e-2,
    num_train_epochs=2,
    use_cpu=False
)

trainer = Seq2SeqTrainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_data.select(range(200)),
    tokenizer=tokenizer
)


  trainer = Seq2SeqTrainer(


A wandb API key may be requested before starting training. Weights & Biases (wandb), request API keys to log the training metrics and some details to the wandb platform, which provides visualization and tracking tools to help monitorizing and analyzing ML projects.

In our code, we didn't explicitly added wandb, but some machine learning libraries like Hugging Face's Transformers, automatically integrate with wandb.

For this case we don't need/want these tracking features, so we will disable wandb. We can also make sure that wandb is not being called during the training process removing explicitly the wandbcallback before starting training.

In [None]:
os.environ["WANDB_MODE"] = "disabled"  # Disable W&B globally

# Remove potential existing WandbCallbacks before training
# trainer.remove_callback(transformers.integrations.WandbCallback)

And now, we **FINE-TUNE** the model.

In [None]:
trainer.train()

Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.


Step,Training Loss


TrainOutput(global_step=50, training_loss=2.276714019775391, metrics={'train_runtime': 795.1331, 'train_samples_per_second': 0.503, 'train_steps_per_second': 0.063, 'total_flos': 54317914521600.0, 'train_loss': 2.276714019775391, 'epoch': 2.0})

After the training process, we need to save and then load again the model.

In [None]:
# Save
trainer.save_model("./results")  # Save the entire model with configuration
tokenizer.save_pretrained("./results")  # Save the tokenizer

('./results/tokenizer_config.json',
 './results/special_tokens_map.json',
 './results/spiece.model',
 './results/added_tokens.json')

In [None]:
# Load

fine_tuned_model = T5ForConditionalGeneration.from_pretrained("./results")  # Path where your model was saved
tokenizer = T5Tokenizer.from_pretrained("t5-small")

## 5) INFERENCE THE FINE-TUNED MODEL

Now, we ask for the same task to the fine-tuned model. Thereby, we can see the differences between the model's summaries before and after the fine-tuning.

We do the same process again: first define the input text and tokenize it. Then we ask the model to generate the summary, decode it and show the result.

In [None]:
input_text = """ France is a country located in Western Europe. It is known for
its rich history, cultural heritage, and famous landmarks such as the Eiffel
Tower, the Louvre Museum, and the Palace of Versailles. Paris, its capital,
is often referred to as 'the City of Light' and is one of the most visited
cities in the world. France has been a major player in world history, from the
French Revolution to its colonial empire and its role in both World Wars.
Today, it remains a global leader in politics, culture, and economy. """

# Tokenize the input text
inputs = tokenizer(input_text, return_tensors="pt", max_length=1024, truncation=True)

# Generate the summary using the fine-tuned model
summary_ids = fine_tuned_model.generate(
    inputs['input_ids'],
    max_length=150,
    min_length=30,
    length_penalty=2.0,
    num_beams=4,
    early_stopping=True
)

# Decode the summary
summary_finetuned = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

# Show the result
print(summary_finetuned)

France is a country located in Western Europe. France is known for its rich history, cultural heritage, and famous landmarks such as the Eiffel Tower. France has been a major player in world history, from the French Revolution to its colonial empire.


# **CONCLUSION**


In [None]:
print('COMPARING THE TWO ANSWERS:\n')
print('Pre-trained model: \n')
display(summary_before)
print('\n')
print('Fine-tuned model: \n')
display(summary_finetuned)

COMPARING THE TWO ANSWERS:

Pre-trained model: 



', from the French Revolution to its colonial empire and its role in both World Wars. Today, it remains a global leader in politics, culture, and economy.'



Fine-tuned model: 



'France is a country located in Western Europe. France is known for its rich history, cultural heritage, and famous landmarks such as the Eiffel Tower. France has been a major player in world history, from the French Revolution to its colonial empire.'

We can observe the difference when summarizing the same text using the initial model compared to the fine-tuned model. Although the training process parameters were not set to optimize performance, the fine-tuned model still demonstrates improved summarization results.