## Working with Transformers in the HuggingFace Ecosystem

In this laboratory exercise we will learn how to work with the HuggingFace ecosystem to adapt models to new tasks. As you will see, much of what is required is *investigation* into the inner-workings of the HuggingFace abstractions. With a little work, a little trial-and-error, it is fairly easy to get a working adaptation pipeline up and running.

### Exercise 1: Sentiment Analysis (warm up)

In this first exercise we will start from a pre-trained BERT transformer and build up a model able to perform text sentiment analysis. Transformers are complex beasts, so we will build up our pipeline in several explorative and incremental steps.

#### Exercise 1.1: Dataset Splits and Pre-trained model
There are a many sentiment analysis datasets, but we will use one of the smallest ones available: the [Cornell Rotten Tomatoes movie review dataset](cornell-movie-review-data/rotten_tomatoes), which consists of 5,331 positive and 5,331 negative processed sentences from the Rotten Tomatoes movie reviews.

**Your first task**: Load the dataset and figure out what splits are available and how to get them. Spend some time exploring the dataset to see how it is organized. Note that we will be using the [HuggingFace Datasets](https://huggingface.co/docs/datasets/en/index) library for downloading, accessing, splitting, and batching data for training and evaluation.

In [2]:
from datasets import load_dataset, get_dataset_split_names

# Vediamo che splits sono disponibili
dataset_name = "rotten_tomatoes"
print("Splits disponibili:", get_dataset_split_names(dataset_name))

Splits disponibili: ['train', 'validation', 'test']


In [3]:
# Carichiamo il dataset completo
dataset = load_dataset(dataset_name)
print("\nStruttura del dataset:")
print(dataset)

train.parquet:   0%|          | 0.00/699k [00:00<?, ?B/s]

validation.parquet:   0%|          | 0.00/90.0k [00:00<?, ?B/s]

test.parquet:   0%|          | 0.00/92.2k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/8530 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1066 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1066 [00:00<?, ? examples/s]


Struttura del dataset:
DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 8530
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 1066
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 1066
    })
})


In [4]:
# Esploriamo ogni split
for split_name in dataset.keys():
    split_data = dataset[split_name]
    print(f"\n--- Split: {split_name} ---")
    print(f"Numero di esempi: {len(split_data)}")
    print(f"Caratteristiche: {split_data.features}")
    
    # Mostriamo alcuni esempi
    print(f"\nPrimi 3 esempi del {split_name}:")
    for i in range(min(3, len(split_data))):
        example = split_data[i]
        print(f"Esempio {i+1}:")
        print(f"  Testo: {example['text']}")
        print(f"  Label: {example['label']}")
        print()


--- Split: train ---
Numero di esempi: 8530
Caratteristiche: {'text': Value(dtype='string', id=None), 'label': ClassLabel(names=['neg', 'pos'], id=None)}

Primi 3 esempi del train:
Esempio 1:
  Testo: the rock is destined to be the 21st century's new " conan " and that he's going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .
  Label: 1

Esempio 2:
  Testo: the gorgeously elaborate continuation of " the lord of the rings " trilogy is so huge that a column of words cannot adequately describe co-writer/director peter jackson's expanded vision of j . r . r . tolkien's middle-earth .
  Label: 1

Esempio 3:
  Testo: effective but too-tepid biopic
  Label: 1


--- Split: validation ---
Numero di esempi: 1066
Caratteristiche: {'text': Value(dtype='string', id=None), 'label': ClassLabel(names=['neg', 'pos'], id=None)}

Primi 3 esempi del validation:
Esempio 1:
  Testo: compassionately explores the seemingly irreconcilable situation between co

In [5]:
# Controlliamo la distribuzione delle classi
if 'train' in dataset:
    train_labels = dataset['train']['label']
    unique_labels = set(train_labels)
    print(f"\nClassi uniche: {unique_labels}")
    
    for label in unique_labels:
        count = train_labels.count(label)
        print(f"Label {label}: {count} esempi")


Classi uniche: {0, 1}
Label 0: 4265 esempi
Label 1: 4265 esempi


## Recap Esplorazione Dataset Rotten Tomatoes
Il dataset **Cornell Rotten Tomatoes** è perfettamente organizzato per un task di **sentiment analysis**.

#### Splits Disponibili:
- **Train**: 8,530 esempi (per l'addestramento)  
- **Validation**: 1,066 esempi (per la validazione durante il training)  
- **Test**: 1,066 esempi (per la valutazione finale)

#### Caratteristiche dei Dati:
- **Text**: recensioni di film come stringhe di testo  
- **Label**: etichette di sentimento con encoding:
  - `0 = neg` (negativo)  
  - `1 = pos` (positivo)

### Osservazioni 

#### Bilanciamento Perfetto
Il training set è **perfettamente bilanciato**:
- 4,265 esempi **negativi** (`label 0`)  
- 4,265 esempi **positivi** (`label 1`)

> Questo è **ideale per evitare bias** verso una classe specifica.

#### Qualità del Testo
Le recensioni sono:
- Di **lunghezza variabile** (da brevi a articolate)  
- Espresse in **linguaggio naturale e descrittivo**  
- Riferite a **diversi generi cinematografici**

#### Pronto per BERT
Il dataset è **già pronto** per essere processato da:
- **Tokenizer di BERT** (per convertire testo in token)  
- **Pipeline di classificazione** (binary classification)  
- **Fine-tuning** di un modello pre-addestrato

---

#### Exercise 1.2: A Pre-trained BERT and Tokenizer

The model we will use is a *very* small BERT transformer called [Distilbert](https://huggingface.co/distilbert/distilbert-base-uncased) this model was trained (using self-supervised learning) on the same corpus as BERT but using the full BERT base model as a *teacher*.

**Your next task**: Load the Distilbert model and corresponding tokenizer. Use the tokenizer on a few samples from the dataset and pass the tokens through the model to see what outputs are provided. I suggest you use the [`AutoModel`](https://huggingface.co/transformers/v3.0.2/model_doc/auto.html) class (and the `from_pretrained()` method) to load the model and `AutoTokenizer` to load the tokenizer).

In [1]:
from transformers import AutoTokenizer, AutoModel

# Carico il modello DistilBERT e il tokenizer
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

print(f"Tokenizer: {type(tokenizer)}")
print(f"Modello: {type(model)}")

Tokenizer: <class 'transformers.models.distilbert.tokenization_distilbert_fast.DistilBertTokenizerFast'>
Modello: <class 'transformers.models.distilbert.modeling_distilbert.DistilBertModel'>


In [14]:
# Prendiamo alcune frasi di esempio dal dataset
sample_texts = [
    dataset["train"][0]["text"],  
    dataset["train"][1]["text"],  
    dataset["train"][100]["text"] 
]

# Stampa degli esempi
print("Sample texts:")
for i, text in enumerate(sample_texts):
    print(f"{i+1}. {text}")

Sample texts:
1. the rock is destined to be the 21st century's new " conan " and that he's going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .
2. the gorgeously elaborate continuation of " the lord of the rings " trilogy is so huge that a column of words cannot adequately describe co-writer/director peter jackson's expanded vision of j . r . r . tolkien's middle-earth .
3. chicago is sophisticated , brash , sardonic , completely joyful in its execution .


In [16]:
import torch
print("\n" + "="*80)
print("TOKENIZER + MODEL OUTPUT SU CAMPIONI DEL DATASET")
print("="*80)

# 3. Per ogni esempio: tokenizza e passa attraverso il modello
for i, text in enumerate(sample_texts):
    print(f"\n--- ESEMPIO {i+1} ---")
    print(f"Testo: {text}")
    print(f"Label: {dataset['train'][i if i < 2 else 100]['label']}")
    
    # Tokenizzazione
    tokens = tokenizer.tokenize(text)
    print(f"Token ({len(tokens)}): {tokens[:10]}...")  # Primi 10 token
    
    # Preparazione input per il modello
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    print(f"Input shape: {inputs['input_ids'].shape}")
    
    # Passaggio attraverso il modello
    with torch.no_grad():
        outputs = model(**inputs)
    
    print(f"Output keys: {list(outputs.keys())}")
    print(f"Last hidden state shape: {outputs.last_hidden_state.shape}")
    
    # Il last_hidden_state contiene la rappresentazione di ogni token
    # Shape: (batch_size, sequence_length, hidden_size)
    print(f"Hidden size (dimensioni features): {outputs.last_hidden_state.shape[-1]}")
    
    # Rappresentazione del token [CLS] (primo token, usato per classificazione)
    cls_representation = outputs.last_hidden_state[0, 0, :]  # [0,0,:] = primo batch, primo token
    print(f"[CLS] representation shape: {cls_representation.shape}")
    print(f"[CLS] primi 5 valori: {cls_representation[:5]}")
    
    if hasattr(outputs, 'pooler_output') and outputs.pooler_output is not None:
        print(f"Pooler output shape: {outputs.pooler_output.shape}")
        print(f"Pooler primi 5 valori: {outputs.pooler_output[0, :5]}")


TOKENIZER + MODEL OUTPUT SU CAMPIONI DEL DATASET

--- ESEMPIO 1 ---
Testo: the rock is destined to be the 21st century's new " conan " and that he's going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .
Label: 1
Token (45): ['the', 'rock', 'is', 'destined', 'to', 'be', 'the', '21st', 'century', "'"]...
Input shape: torch.Size([1, 47])
Output keys: ['last_hidden_state']
Last hidden state shape: torch.Size([1, 47, 768])
Hidden size (dimensioni features): 768
[CLS] representation shape: torch.Size([768])
[CLS] primi 5 valori: tensor([-0.0332, -0.0168,  0.0194, -0.0257, -0.1380])

--- ESEMPIO 2 ---
Testo: the gorgeously elaborate continuation of " the lord of the rings " trilogy is so huge that a column of words cannot adequately describe co-writer/director peter jackson's expanded vision of j . r . r . tolkien's middle-earth .
Label: 1
Token (50): ['the', 'gorgeous', '##ly', 'elaborate', 'continuation', 'of', '"', 'the', 'lord', 'of']...


In [17]:
# 4. Informazioni sul modello
print("\n" + "="*80)
print("INFORMAZIONI SUL MODELLO")
print("="*80)
print(f"Parametri totali: {sum(p.numel() for p in model.parameters()):,}")
print(f"Vocab size: {tokenizer.vocab_size:,}")
print(f"Hidden size: {model.config.hidden_size}")
print(f"Num layers: {model.config.n_layers}")
print(f"Num attention heads: {model.config.n_heads}")
print(f"Max length: {tokenizer.model_max_length}")


INFORMAZIONI SUL MODELLO
Parametri totali: 66,362,880
Vocab size: 30,522
Hidden size: 768
Num layers: 6
Num attention heads: 12
Max length: 512


In [18]:
# 5. Confronto lunghezze diverse
print("\n" + "="*80)
print("CONFRONTO LUNGHEZZE DIVERSE")
print("="*80)

short_text = "Great movie!"
long_text = sample_texts[0]  # Testo più lungo

for label, text in [("CORTO", short_text), ("LUNGO", long_text)]:
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    with torch.no_grad():
        outputs = model(**inputs)
    
    print(f"{label}: {len(tokenizer.tokenize(text))} token → output shape {outputs.last_hidden_state.shape}")


CONFRONTO LUNGHEZZE DIVERSE
CORTO: 3 token → output shape torch.Size([1, 5, 768])
LUNGO: 45 token → output shape torch.Size([1, 47, 768])


### Considerazioni su DistilBERT
- **DistilBERT**: carica 66M parametri, 6 layer, vocabolario di 30k token.
- **Tokenizzazione intelligente**: spezza parole complesse in sub-token (es. "gorgeously" → "gorgeous" + "##ly") e aggiunge automaticamente token speciali [CLS] e [SEP].
- **Output del modello**: produce rappresentazioni di 768 dimensioni per ogni token. Il token [CLS] (primo della sequenza) cattura il significato globale della frase ed è quello che verrà usato per classificare il sentiment.
- **Gestione dinamica**: il modello si adatta automaticamente a frasi di lunghezza diversa senza bisogno di padding.

---

#### Exercise 1.3: A Stable Baseline

In this exercise I want you to:
1. Use Distilbert as a *feature extractor* to extract representations of the text strings from the dataset splits;
2. Train a classifier (your choice, by an SVM from Scikit-learn is an easy choice).
3. Evaluate performance on the validation and test splits.

These results are our *stable baseline* -- the **starting** point on which we will (hopefully) improve in the next exercise.

**Hint**: There are a number of ways to implement the feature extractor, but probably the best is to use a [feature extraction `pipeline`](https://huggingface.co/tasks/feature-extraction). You will need to interpret the output of the pipeline and extract only the `[CLS]` token from the *last* transformer layer. *How can you figure out which output that is?*

In [22]:
from transformers import pipeline

# Creazione della pipeline di feature extraction
feature_extractor = pipeline(
    "feature-extraction",
    model="distilbert-base-uncased", 
    tokenizer="distilbert-base-uncased",
    return_tensors=True
)

Device set to use cuda:0


In [33]:
# Vedimao cosa restituisce la pipeline
text = dataset["train"][0]["text"]
features = feature_extractor(text)
print(f"Shape: {features.shape}")  # [1, 47, 768]
print(f"CLS token shape: {features[0, 0, :].shape}")  # [768]
print(f"CLS token first 5 values: {features[0, 0, :5]}")

Shape: torch.Size([1, 47, 768])
CLS token shape: torch.Size([768])
CLS token first 5 values: tensor([-0.0332, -0.0168,  0.0194, -0.0257, -0.1380])


torch.Size([1, 47, 768]) --> 1 = batch size, 47 = sequence length (numero di token), 768 = hidden size
Il token [CLS] è il primo token della sequenza.

In [43]:
batch_texts = ["Hello world", "Another text"]
batch_features = feature_extractor(batch_texts)
print(f"Type: {type(batch_features)}")
print(f"Length: {len(batch_features)}")
print(f"First element shape: {batch_features[0].shape}")
print(f"Second element shape: {batch_features[1].shape}")

Type: <class 'list'>
Length: 2
First element shape: torch.Size([1, 4, 768])
Second element shape: torch.Size([1, 4, 768])


In [51]:
from tqdm import tqdm
import numpy as np
import torch

# Funzione per estrarre features [CLS] - con progress bar
def extract_cls(texts, batch_size):
    all_features = []
    total_batches = (len(texts) + batch_size - 1) // batch_size
    
    print(f"Processing {len(texts)} texts in {total_batches} batches")
    
    with tqdm(total=total_batches, desc="Extracting features", unit="batch") as pbar:
        for i in range(0, len(texts), batch_size):
            batch_texts = texts[i:i+batch_size]
            
            with torch.no_grad():
                batch_features = feature_extractor(batch_texts)
            
            for text_features in batch_features:
                cls_token = text_features[0, 0, :]
                all_features.append(cls_token.cpu().numpy())
            
            pbar.update(1)
    
    features_array = np.vstack(all_features)
    print(f"Features extracted: {features_array.shape}")
    return features_array


In [52]:
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score
import numpy as np

batch_size=64

# Estrazione features dai dataset splits
print("Estrazione features dal training set...")
train_features = extract_cls(train_texts, batch_size)

print("Estrazione features dal validation set...")
val_features = extract_cls(val_texts, batch_size)

print("Estrazione features dal test set...")
test_features = extract_cls(test_texts, batch_size)

# Ora procedi con SVM
print("Training SVM classifier...")
svm_classifier = SVC(kernel='rbf', C=1.0, random_state=42)
svm_classifier.fit(train_features, train_labels)

print("Valutazione su validation set:")
val_predictions = svm_classifier.predict(val_features)
val_accuracy = accuracy_score(val_labels, val_predictions)
print(f"Validation Accuracy: {val_accuracy:.4f}")

print("Valutazione su test set:")
test_predictions = svm_classifier.predict(test_features)
test_accuracy = accuracy_score(test_labels, test_predictions)
print(f"Test Accuracy: {test_accuracy:.4f}")

print(classification_report(test_labels, test_predictions, target_names=['Negative', 'Positive']))

Estrazione features dal training set...
Processing 8530 texts in 134 batches


Extracting features: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 134/134 [00:35<00:00,  3.83batch/s]


Features extracted: (8530, 768)
Estrazione features dal validation set...
Processing 1066 texts in 17 batches


Extracting features: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:04<00:00,  3.78batch/s]


Features extracted: (1066, 768)
Estrazione features dal test set...
Processing 1066 texts in 17 batches


Extracting features: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:04<00:00,  3.92batch/s]


Features extracted: (1066, 768)
Training SVM classifier...
Valutazione su validation set:
Validation Accuracy: 0.8143
Valutazione su test set:
Test Accuracy: 0.7946
              precision    recall  f1-score   support

    Negative       0.77      0.84      0.80       533
    Positive       0.82      0.75      0.79       533

    accuracy                           0.79      1066
   macro avg       0.80      0.79      0.79      1066
weighted avg       0.80      0.79      0.79      1066



### Risultati

- **Accuratezza in validazione**: **81.43%**  
- **Accuratezza sul test set**: **79.46%**  
- **Performance bilanciate**: precision e recall simili per entrambe le classi (*negative* / *positive*)

#### DistilBERT come Feature Extractor
- L’utilizzo del token **[CLS]** di **DistilBERT** (768 dimensioni) fornisce **rappresentazioni semantiche ricche**
- Non è necessario il fine-tuning del modello transformer

#### Considerazioni
- L’approccio **frozen features + SVM** è **significativamente più veloce e leggero**
- Richiede solo **l’estrazione una tantum** delle features → ideale per risorse limitate
- Un’**accuratezza del 79.5%** sul test set rappresenta un punto di partenza **robusto**
- Conferma l’efficacia delle **rappresentazioni pre-addestrate** per il sentiment analysis
- Si dimostra che è possibile ottenere **performance competitive** con **risorse computazionali contenute**
- Questi risultati costituiscono un punto di riferimento su cui confrontare eventuali miglioramenti ottenuti tramite fine-tuning o con architetture diverse.

-----
### Exercise 2: Fine-tuning Distilbert

In this exercise we will fine-tune the Distilbert model to (hopefully) improve sentiment analysis performance.

#### Exercise 2.1: Token Preprocessing

The first thing we need to do is *tokenize* our dataset splits. Our current datasets return a dictionary with *strings*, but we want *input token ids* (i.e. the output of the tokenizer). This is easy enough to do my hand, but the HugginFace `Dataset` class provides convenient, efficient, and *lazy* methods. See the documentation for [`Dataset.map`](https://huggingface.co/docs/datasets/v3.5.0/en/package_reference/main_classes#datasets.Dataset.map).

**Tip**: Verify that your new datasets are returning for every element: `text`, `label`, `intput_ids`, and `attention_mask`.

In [54]:
# Funzione di tokenizzazione - Tokenizza i testi e restituisce input_ids e attention_mask
def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,        # Tronca se troppo lungo
        padding=True,           # Padding per uniformare lunghezze
        max_length=512,         # Lunghezza massima
        return_tensors=None     # Restituisce liste, non tensori (liste piu compatibili con HugginFace Datasets)
    )

In [55]:
# Tokenizzazione ai dataset
print("Tokenizzazione del training set...")
train_dataset_tokenized = dataset["train"].map(
    tokenize_function,
    batched=True,              # Processa in batch per efficienza
    desc="Tokenizing train"
)

print("Tokenizzazione del validation set...")
val_dataset_tokenized = dataset["validation"].map(
    tokenize_function,
    batched=True,
    desc="Tokenizing validation"
)

print("Tokenizzazione del test set...")
test_dataset_tokenized = dataset["test"].map(
    tokenize_function,
    batched=True,
    desc="Tokenizing test"
)

Tokenizzazione del training set...


Tokenizing train:   0%|          | 0/8530 [00:00<?, ? examples/s]

Tokenizzazione del validation set...


Tokenizing validation:   0%|          | 0/1066 [00:00<?, ? examples/s]

Tokenizzazione del test set...


Tokenizing test:   0%|          | 0/1066 [00:00<?, ? examples/s]

In [57]:
# Verifica dei risultati
print("\n" + "="*50)
print("VERIFICA TOKENIZZAZIONE")
print("="*50)

# Controllo un esempio del training set
sample = train_dataset_tokenized[0]
print("Chiavi disponibili:", list(sample.keys()))
print(f"text: {sample['text'][:100]}...")
print(f"label: {sample['label']}")
print(f"input_ids length: {len(sample['input_ids'])}")
print(f"attention_mask length: {len(sample['attention_mask'])}")
print(f"input_ids (primi 10): {sample['input_ids'][:10]}")
print(f"attention_mask (primi 10): {sample['attention_mask'][:10]}")



VERIFICA TOKENIZZAZIONE
Chiavi disponibili: ['text', 'label', 'input_ids', 'attention_mask']
text: the rock is destined to be the 21st century's new " conan " and that he's going to make a splash eve...
label: 1
input_ids length: 70
attention_mask length: 70
input_ids (primi 10): [101, 1996, 2600, 2003, 16036, 2000, 2022, 1996, 7398, 2301]
attention_mask (primi 10): [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]


In [58]:
# Verifica che tutti i dataset abbiano le colonne giuste
required_columns = ['text', 'label', 'input_ids', 'attention_mask']

for split_name, split_data in [
    ("train", train_dataset_tokenized),
    ("validation", val_dataset_tokenized), 
    ("test", test_dataset_tokenized)
]:
    print(f"\n{split_name} dataset:")
    print(f"  Columns: {split_data.column_names}")
    print(f"  Size: {len(split_data)}")
    
    # Verifica che abbia tutte le colonne necessarie
    missing_cols = [col for col in required_columns if col not in split_data.column_names]
    if missing_cols:
        print(f"Missing columns: {missing_cols}")
    else:
        print(f"All required columns present!")


train dataset:
  Columns: ['text', 'label', 'input_ids', 'attention_mask']
  Size: 8530
All required columns present!

validation dataset:
  Columns: ['text', 'label', 'input_ids', 'attention_mask']
  Size: 1066
All required columns present!

test dataset:
  Columns: ['text', 'label', 'input_ids', 'attention_mask']
  Size: 1066
All required columns present!


In [59]:
# Stampa delle statistiche sui token
print("\n" + "="*50)
print("STATISTICHE TOKENIZZAZIONE")
print("="*50)

def compute_token_stats(dataset_split, split_name):
    token_lengths = [len(example['input_ids']) for example in dataset_split]
    
    print(f"\n{split_name} set:")
    print(f"  Min length: {min(token_lengths)}")
    print(f"  Max length: {max(token_lengths)}")
    print(f"  Mean length: {sum(token_lengths) / len(token_lengths):.1f}")
    print(f"  Examples with max length (512): {sum(1 for l in token_lengths if l == 512)}")

compute_token_stats(train_dataset_tokenized, "Training")
compute_token_stats(val_dataset_tokenized, "Validation")
compute_token_stats(test_dataset_tokenized, "Test")


STATISTICHE TOKENIZZAZIONE

Training set:
  Min length: 63
  Max length: 78
  Mean length: 67.3
  Examples with max length (512): 0

Validation set:
  Min length: 54
  Max length: 72
  Mean length: 70.9
  Examples with max length (512): 0

Test set:
  Min length: 52
  Max length: 67
  Mean length: 66.1
  Examples with max length (512): 0


In [66]:
# Test decodifica (verifica reversibilità)
print("\n" + "="*50)
print("TEST DECODIFICA")
print("="*50)

sample_tokens = train_dataset_tokenized[0]['input_ids']
decoded_text = tokenizer.decode(sample_tokens, skip_special_tokens=True)
original_text = train_dataset_tokenized[0]['text']

print(f"Original: {original_text}")
print("--------")
print(f"Decoded:  {decoded_text}")


TEST DECODIFICA
Original: the rock is destined to be the 21st century's new " conan " and that he's going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .
--------
Decoded:  the rock is destined to be the 21st century ' s new " conan " and that he ' s going to make a splash even greater than arnold schwarzenegger, jean - claud van damme or steven segal.


#### Exercise 2.2: Setting up the Model to be Fine-tuned

In this exercise we need to prepare the base Distilbert model for fine-tuning for a *sequence classification task*. This means, at the very least, appending a new, randomly-initialized classification head connected to the `[CLS]` token of the last transformer layer. Luckily, HuggingFace already provides an `AutoModel` for just this type of instantiation: [`AutoModelForSequenceClassification`](https://huggingface.co/transformers/v3.0.2/model_doc/auto.html#automodelforsequenceclassification). You will want you instantiate one of these for fine-tuning.

In [68]:
from transformers import (
    AutoModelForSequenceClassification, 
    AutoTokenizer,
    DataCollatorWithPadding,
    Trainer, 
    TrainingArguments
)
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import numpy as np

# 1. Device setup
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f}GB")

# 2. Carica il modello per sequence classification
print("\nCaricamento modello per sequence classification...")
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2,  # Binary classification (positive/negative)
    id2label={0: "NEGATIVE", 1: "POSITIVE"},
    label2id={"NEGATIVE": 0, "POSITIVE": 1}
)

# Sposta il modello su GPU
model = model.to(device)

# Tokenizer (se non già caricato)
if 'tokenizer' not in globals():
    tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

print(f"Modello caricato: {model.__class__.__name__}")
print(f"Numero di parametri: {sum(p.numel() for p in model.parameters()):,}")
print(f"Device del modello: {next(model.parameters()).device}")

# 3. Stampa architettura del modello
print("\n" + "="*60)
print("ARCHITETTURA DEL MODELLO")
print("="*60)
print(model)

print("\n" + "="*60)
print("DETTAGLI ARCHITETTURA")
print("="*60)

# Mostra i layer principali
for name, module in model.named_children():
    print(f"{name}: {module.__class__.__name__}")
    if hasattr(module, 'config'):
        if hasattr(module.config, 'hidden_size'):
            print(f"  Hidden size: {module.config.hidden_size}")
        if hasattr(module.config, 'n_layers'):
            print(f"  Layers: {module.config.n_layers}")
        if hasattr(module.config, 'n_heads'):
            print(f"  Attention heads: {module.config.n_heads}")

# Dettagli del classifier head
if hasattr(model, 'classifier'):
    print(f"\nClassifier head:")
    print(f"  Input features: {model.classifier.in_features}")
    print(f"  Output features: {model.classifier.out_features}")
    print(f"  Parametri classifier: {sum(p.numel() for p in model.classifier.parameters()):,}")

# Parametri trainable vs frozen
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())
print(f"\nParametri trainable: {trainable_params:,}")
print(f"Parametri totali: {total_params:,}")
print(f"Percentuale trainable: {100 * trainable_params / total_params:.1f}%")

Using device: cuda
GPU: NVIDIA GeForce RTX 3060
GPU Memory: 12.9GB

Caricamento modello per sequence classification...


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Modello caricato: DistilBertForSequenceClassification
Numero di parametri: 66,955,010
Device del modello: cuda:0

ARCHITETTURA DEL MODELLO
DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): DistilBertSdpaAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_lay

### Considerazioni
- DistilBERT base (66.3M parametri) + classification head (1,538 parametri aggiuntivi)
- 6 transformer layers, 12 attention heads, 768 dimensioni hidden
- Classification head: pre_classifier (768→768) + classifier (768→2) + dropout
- 
Differenza dal baseline: invece di usare features fisse + SVM, ora tutti i 67M parametri si adatteranno specificamente al sentiment analysis task.

#### Exercise 2.3: Fine-tuning Distilbert

Finally. In this exercise you should use a HuggingFace [`Trainer`](https://huggingface.co/docs/transformers/main/en/trainer) to fine-tune your model on the Rotten Tomatoes training split. Setting up the trainer will involve (at least):


1. Instantiating a [`DataCollatorWithPadding`](https://huggingface.co/docs/transformers/en/main_classes/data_collator) object which is what *actually* does your batch construction (by padding all sequences to the same length).
2. Writing an *evaluation function* that will measure the classification accuracy. This function takes a single argument which is a tuple containing `(logits, labels)` which you should use to compute classification accuracy (and maybe other metrics like F1 score, precision, recall) and return a `dict` with these metrics.  
3. Instantiating a [`TrainingArguments`](https://huggingface.co/docs/transformers/v4.51.1/en/main_classes/trainer#transformers.TrainingArguments) object using some reasonable defaults.
4. Instantiating a `Trainer` object using your train and validation splits, you data collator, and function to compute performance metrics.
5. Calling `trainer.train()`, waiting, waiting some more, and then calling `trainer.evaluate()` to see how it did.

**Tip**: When prototyping this laboratory I discovered the HuggingFace [Evaluate library](https://huggingface.co/docs/evaluate/en/index) which provides evaluation metrics. However I found it to have insufferable layers of abstraction and getting actual metrics computed. I suggest just using the Scikit-learn metrics...

In [70]:
# 2. Data Collator per batch construction con padding
data_collator = DataCollatorWithPadding(
    tokenizer=tokenizer,
    padding=True,
    return_tensors="pt"
)

# 3. Funzione di valutazione
def compute_metrics(eval_pred):
    """
    Calcola metriche di classificazione
    eval_pred è una tupla (predictions, labels)
    """
    predictions, labels = eval_pred
    
    # I predictions sono logits, prendiamo la classe con probabilità maggiore
    predicted_labels = np.argmax(predictions, axis=1)
    
    # Calcola metriche usando scikit-learn
    accuracy = accuracy_score(labels, predicted_labels)
    precision, recall, f1, _ = precision_recall_fscore_support(
        labels, predicted_labels, average='weighted'
    )
    
    return {
        'accuracy': accuracy,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

# 4. Training Arguments
training_args = TrainingArguments(
    output_dir='./distilbert-finetuned-sentiment',
    num_train_epochs=3,                    # Numero di epoche
    per_device_train_batch_size=16,        # Batch size per training
    per_device_eval_batch_size=64,         # Batch size per evaluation
    warmup_steps=500,                      # Warmup steps per learning rate
    weight_decay=0.01,                     # Weight decay per regolarizzazione
    logging_dir='./logs',                  # Directory per i log
    logging_steps=100,                     # Log ogni 100 steps
    eval_strategy="epoch",                 # Valuta alla fine di ogni epoca (renamed from evaluation_strategy)
    save_strategy="epoch",                 # Salva alla fine di ogni epoca
    load_best_model_at_end=True,          # Carica il miglior modello alla fine
    metric_for_best_model="accuracy",      # Metrica per scegliere il miglior modello
    greater_is_better=True,                # accuracy maggiore = migliore
    save_total_limit=2,                    # Mantieni solo i 2 migliori checkpoint
    seed=42,                               # Seed per riproducibilità
    dataloader_num_workers=2,              # Worker per caricamento dati
    report_to=[],                          # Disabilita reporting (wandb, etc.) - usa lista vuota invece di None
)

print(f"Training arguments configurati:")
print(f"  Epochs: {training_args.num_train_epochs}")
print(f"  Train batch size: {training_args.per_device_train_batch_size}")
print(f"  Eval batch size: {training_args.per_device_eval_batch_size}")

# 5. Trainer setup
print("Configurazione del Trainer...")
trainer = Trainer(
    model=model,                           # Il modello da fine-tunare
    args=training_args,                    # Training arguments
    train_dataset=train_dataset_tokenized, # Dataset di training tokenizzato
    eval_dataset=val_dataset_tokenized,    # Dataset di validation tokenizzato
    tokenizer=tokenizer,                   # Tokenizer
    data_collator=data_collator,           # Data collator per padding
    compute_metrics=compute_metrics,       # Funzione per calcolare metriche
)

print("Trainer configurato con successo!")

# 6. Training
print("\n" + "="*60)
print("INIZIO FINE-TUNING")
print("="*60)

# Informazioni prima del training
print(f"Training examples: {len(train_dataset_tokenized)}")
print(f"Validation examples: {len(val_dataset_tokenized)}")
print(f"Steps per epoch: {len(train_dataset_tokenized) // training_args.per_device_train_batch_size}")
print(f"Total training steps: {training_args.num_train_epochs * (len(train_dataset_tokenized) // training_args.per_device_train_batch_size)}")

# Avvia il training
print("\nAvvio training...")
trainer.train()

print("\n" + "="*60)
print("TRAINING COMPLETATO")
print("="*60)

# 7. Evaluation
print("Valutazione finale sul validation set...")
eval_results = trainer.evaluate()

print("Risultati validation:")
for key, value in eval_results.items():
    if isinstance(value, float):
        print(f"  {key}: {value:.4f}")
    else:
        print(f"  {key}: {value}")

# 8. Test set evaluation
print("\nValutazione sul test set...")
test_results = trainer.evaluate(eval_dataset=test_dataset_tokenized)

print("Risultati test:")
for key, value in test_results.items():
    if isinstance(value, float):
        print(f"  {key}: {value:.4f}")
    else:
        print(f"  {key}: {value}")

# 9. Confronto con baseline
print("\n" + "="*60)
print("CONFRONTO CON BASELINE")
print("="*60)
print("Baseline (DistilBERT + SVM):")
print(f"  Test Accuracy: 0.7946")
print(f"\nFine-tuned DistilBERT:")
print(f"  Test Accuracy: {test_results.get('eval_accuracy', 'N/A'):.4f}")

if 'eval_accuracy' in test_results:
    improvement = test_results['eval_accuracy'] - 0.7946
    print(f"\nMiglioramento: {improvement:+.4f} ({improvement*100:+.2f}%)")

# 10. Salva il modello finale
print("\nSalvataggio del modello fine-tuned...")
trainer.save_model("./distilbert-finetuned-final")
tokenizer.save_pretrained("./distilbert-finetuned-final")
print("Modello salvato in ./distilbert-finetuned-final")

Training arguments configurati:
  Epochs: 3
  Train batch size: 16
  Eval batch size: 64
Configurazione del Trainer...
Trainer configurato con successo!

INIZIO FINE-TUNING
Training examples: 8530
Validation examples: 1066
Steps per epoch: 533
Total training steps: 1599

Avvio training...


  trainer = Trainer(
  torch.set_num_threads(1)


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3982,0.371052,0.842402,0.842346,0.842884,0.842402
2,0.2691,0.413695,0.837711,0.8377,0.837807,0.837711
3,0.0912,0.617461,0.841463,0.841413,0.841898,0.841463


  torch.set_num_threads(1)
  torch.set_num_threads(1)



TRAINING COMPLETATO
Valutazione finale sul validation set...


  torch.set_num_threads(1)


Risultati validation:
  eval_loss: 0.3711
  eval_accuracy: 0.8424
  eval_f1: 0.8423
  eval_precision: 0.8429
  eval_recall: 0.8424
  eval_runtime: 9.5888
  eval_samples_per_second: 111.1720
  eval_steps_per_second: 1.7730
  epoch: 3.0000

Valutazione sul test set...


  torch.set_num_threads(1)


Risultati test:
  eval_loss: 0.3988
  eval_accuracy: 0.8358
  eval_f1: 0.8358
  eval_precision: 0.8359
  eval_recall: 0.8358
  eval_runtime: 9.3502
  eval_samples_per_second: 114.0080
  eval_steps_per_second: 1.8180
  epoch: 3.0000

CONFRONTO CON BASELINE
Baseline (DistilBERT + SVM):
  Test Accuracy: 0.7946

Fine-tuned DistilBERT:
  Test Accuracy: 0.8358

Miglioramento: +0.0412 (+4.12%)

Salvataggio del modello fine-tuned...
Modello salvato in ./distilbert-finetuned-final


### Risultati Ottenuti

| Epoca | Loss Training | Loss Validation | Accuratezza Validation |
|-------|----------------|------------------|--------------------------|
| 1     | 0.398          | 0.371            | 84.24%                   |
| 2     | 0.269          | 0.414            | 83.77%                   |
| 3     | 0.091          | 0.617            | 84.15%                   |

- **Accuratezza finale sul test set**: **83.58%**  
- **Loss sul test set**: **0.3988**  
- **F1-Score**: **0.8358**

---

### Conclusioni

- Il modello **fine-tuned** ha **superato chiaramente** la baseline ottenuta con l'SVM e le feature statiche di DistilBERT  
  (**79.46% → 83.58%**, **+4.12%**)

- Il **fine-tuning end-to-end** ha permesso al modello di adattare tutti i suoi **67M parametri** al compito specifico di **sentiment analysis**, ottenendo **rappresentazioni più efficaci** rispetto all’approccio frozen features

- L’**andamento della loss di validation**  
  (*da 0.371 → 0.414 → 0.617*)  
  mentre quella di **training crolla**  
  (*da 0.398 → 0.091*)  
  indica chiari segni di **overfitting** dalla seconda epoca  
  → **2 epoche** potrebbero rappresentare il **compromesso ottimale**

- Le **performance bilanciate** (*precision/recall ~83.6%*) dimostrano che il modello **non è biased** verso una classe specifica, mantenendo **robustezza** su entrambi i sentiment

In sintesi: il fine-tuning di DistilBERT offre un **netto vantaggio in accuratezza** e **maggiore adattabilità al task**, a fronte di un **maggiore impegno computazionale**

-----
### Exercise 3: Choose at Least One


#### Exercise 3.1: Efficient Fine-tuning for Sentiment Analysis (easy)

In Exercise 2 we fine-tuned the *entire* Distilbert model on Rotten Tomatoes. This is expensive, even for a small model. Find an *efficient* way to fine-tune Distilbert on the Rotten Tomatoes dataset (or some other dataset).

**Hint**: You could check out the [HuggingFace PEFT library](https://huggingface.co/docs/peft/en/index) for some state-of-the-art approaches that should "just work". How else might you go about making fine-tuning more efficient without having to change your training pipeline from above?

In [72]:
!pip install peft

Collecting peft
  Downloading peft-0.15.2-py3-none-any.whl.metadata (13 kB)
Downloading peft-0.15.2-py3-none-any.whl (411 kB)
Installing collected packages: peft
Successfully installed peft-0.15.2


In [74]:
# Exercise 3.1: Efficient Fine-tuning con LoRA (PEFT)

from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Device setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Carico il modello base (stesso di prima)
print("\nCaricamento modello base")
base_model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},
    label2id={"NEGATIVE": 0, "POSITIVE": 1}
).to(device)

# Configurazione LoRA
print("Configurazione LoRA")
lora_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,           # Sequence Classification task
    inference_mode=False,                 # Training mode
    r=16,                                 # Rank della decomposizione LoRA (default: 8-16)
    lora_alpha=32,                        # Scaling parameter (spesso 2x di r)
    lora_dropout=0.1,                     # Dropout per LoRA layers
    target_modules=["q_lin", "v_lin"],    # Target attention layers in DistilBERT
    bias="none",                          # Non adattare i bias
)

Using device: cuda

Caricamento modello base


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Configurazione LoRA


In [77]:
# Applico LoRA al modello
print("Applicazione LoRA al modello")
lora_model = get_peft_model(base_model, lora_config)

# Analisi parametri con metodo PEFT
print("\n" + "="*60)
print("ANALISI PARAMETRI - CONFRONTO EFFICIENZA")
print("="*60)

print("FULL FINE-TUNING:")
full_total = sum(p.numel() for p in base_model.parameters())
print(f"  Parametri totali: {full_total:,}")
print(f"  Parametri trainable: {full_total:,} (100%)")

print("\nLORA FINE-TUNING:")
lora_model.print_trainable_parameters()

# Per calcolare l'efficienza manualmente
lora_trainable = sum(p.numel() for p in lora_model.parameters() if p.requires_grad)
efficiency_gain = full_total / lora_trainable
memory_reduction = (full_total - lora_trainable) / full_total * 100

print(f"\nEFFICIENZA GUADAGNATA:")
print(f"  Riduzione parametri trainable: {efficiency_gain:.1f}x")
print(f"  Risparmio memoria: {memory_reduction:.1f}%")

Applicazione LoRA al modello

ANALISI PARAMETRI - CONFRONTO EFFICIENZA
FULL FINE-TUNING:
  Parametri totali: 67,842,052
  Parametri trainable: 67,842,052 (100%)

LORA FINE-TUNING:
trainable params: 887,042 || all params: 67,842,052 || trainable%: 1.3075

EFFICIENZA GUADAGNATA:
  Riduzione parametri trainable: 76.5x
  Risparmio memoria: 98.7%


In [83]:
# Setup training con LoRA - versione compatibile con PEFT recente
from transformers import DataCollatorWithPadding, TrainingArguments, Trainer
from sklearn.metrics import accuracy_score
import numpy as np

# Custom Trainer per compatibilità PEFT
class LoRACompatibleTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
        """
        Override per rimuovere num_items_in_batch che causa problemi con PEFT
        """
        labels = inputs.get("labels")
        # Forward pass SENZA num_items_in_batch
        outputs = model(**inputs)
        
        if labels is not None:
            loss = outputs.loss
        else:
            # Se non ci sono labels, calcola loss manualmente
            if self.label_smoother is not None and "labels" in inputs:
                loss = self.label_smoother(outputs, inputs["labels"])
            else:
                loss = outputs.get("loss")
        
        return (loss, outputs) if return_outputs else loss

# Data collator semplice
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# Compute metrics semplificato
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return {"accuracy": accuracy_score(labels, predictions)}

# Training arguments semplificati
training_args = TrainingArguments(
    output_dir="./lora-results",
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-4,  # LR più alto con LoRA
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    logging_dir="./lora-logs",
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    report_to=[],
    seed=42,
)

# USA IL CUSTOM TRAINER invece del Trainer normale
trainer = LoRACompatibleTrainer(
    model=lora_model,
    args=training_args,
    train_dataset=train_dataset_tokenized,
    eval_dataset=val_dataset_tokenized,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

print("Inizio training LoRA con Custom Trainer...")
trainer.train()

print("Training completato!")
eval_results = trainer.evaluate()
print("Risultati validation:", eval_results)

test_results = trainer.evaluate(eval_dataset=test_dataset_tokenized)
print("Risultati test:", test_results)

# Confronto finale
print("\n" + "="*60)
print("CONFRONTO FINALE: FULL vs LORA FINE-TUNING")
print("="*60)
print("Full Fine-tuning:")
print(f"  Test Accuracy: 0.8358")

print(f"\nLoRA Fine-tuning:")
print(f"  Test Accuracy: {test_results.get('eval_accuracy', 'N/A'):.4f}")

if 'eval_accuracy' in test_results:
    accuracy_diff = test_results['eval_accuracy'] - 0.8358
    print(f"  Differenza accuracy: {accuracy_diff:+.4f}")

Inizio training LoRA con Custom Trainer...


Epoch,Training Loss,Validation Loss,Accuracy
1,0.4158,0.439776,0.82364
2,0.3311,0.36282,0.840525
3,0.2743,0.373496,0.841463


Training completato!


Risultati validation: {'eval_loss': 0.37349554896354675, 'eval_accuracy': 0.8414634146341463, 'eval_runtime': 1.4363, 'eval_samples_per_second': 742.164, 'eval_steps_per_second': 46.646, 'epoch': 3.0}
Risultati test: {'eval_loss': 0.43192407488822937, 'eval_accuracy': 0.8292682926829268, 'eval_runtime': 1.4608, 'eval_samples_per_second': 729.724, 'eval_steps_per_second': 45.864, 'epoch': 3.0}

CONFRONTO FINALE: FULL vs LORA FINE-TUNING
Full Fine-tuning:
  Test Accuracy: 0.8358

LoRA Fine-tuning:
  Test Accuracy: 0.8293
  Differenza accuracy: -0.0065


## Considerazioni Finali 

### Risultati Ottenuti

| Epoca | Loss Training | Loss Validation | Accuratezza Validation |
|-------|----------------|------------------|--------------------------|
| 1     | 0.416          | 0.440            | 82.36%                   |
| 2     | 0.331          | 0.363            | 84.05%                   |
| 3     | 0.274          | 0.373            | 84.15%                   |

- **Accuratezza finale sul test set**: **82.93%**  
- **Loss finale sul test set**: **0.4319**  
- **Tempo di training**: **1:28 minuti**

---

### Efficienza LoRA

- **Parametri trainable**: **887,042** su **67,842,052** totali (**1.31%**)  
- **Riduzione parametri**: **76.5×** meno parametri da addestrare  
- **Risparmio memoria**: **98.7%**

---

### Confronto delle Performance (SVM vs Full Fine-tuning vs LoRA Fine-tuning)

| Approccio        | Tipo di Addestramento                  | Test Accuracy | Parametri Trainable | Tempo Training | Commento                                                                          |
|------------------|----------------------------------------|----------------|----------------------|----------------|-----------------------------------------------------------------------------------|
| Baseline (SVM)   | Feature extraction + classificatore    | 79.46%         | 0 (DistilBERT frozen)| ~40 secondi     | Veloce ma limitato, buon punto di partenza                                       |
| Full Fine-tuning | End-to-end su tutto il modello         | 83.58%         | 67.8M (100%)         | ~3 minuti       | Performance ottimali ma costoso computazionalmente                               |
| **LoRA Fine-tuning** | Parameter-efficient con adapter    | **82.93%**     | **0.9M (1.31%)**     | **~1.5 minuti** | **Compromesso ideale**: performance competitive con costi ridotti                |

--

### Conclusioni

- **LoRA dimostra efficacia eccezionale**: raggiunge il **99.2%** delle performance del full fine-tuning utilizzando solo l’**1.31%** dei parametri trainabili
- **Ottimo trade-off performance/efficienza**: la perdita di accuracy (**−0.65%** rispetto al full fine-tuning) è **trascurabile** a fronte di un enorme risparmio computazionale
- **Velocità superiore**: training **2× più rapido**, con **98.7% di risparmio memoria**
- **Approccio pratico**: ideale per applicazioni **production** con vincoli di risorse e necessità di **performance competitive**
- **Implementazione robusta**: il **Custom Trainer** ha risolto i problemi di compatibilità tra versioni recenti di `PEFT` e `Transformers`, dimostrando la **flessibilità dell’approccio**

**LoRA** si conferma una **tecnica fondamentale** per il **Parameter-Efficient Fine-Tuning**, permettendo l’adattamento efficiente dei modelli **transformer** a task specifici con **costi computazionali minimali**.

#### Exercise 3.2: Fine-tuning a CLIP Model (harder)

Use a (small) CLIP model like [`openai/clip-vit-base-patch16`](https://huggingface.co/openai/clip-vit-base-patch16) and evaluate its zero-shot performance on a small image classification dataset like ImageNette or TinyImageNet. Fine-tune (using a parameter-efficient method!) the CLIP model to see how much improvement you can squeeze out of it.

**Note**: There are several ways to adapt the CLIP model; you could fine-tune the image encoder, the text encoder, or both. Or, you could experiment with prompt learning.

**Tip**: CLIP probably already works very well on ImageNet and ImageNet-like images. For extra fun, look for an image classification dataset with different image types (e.g. *sketches*).

In [4]:
# Your code here.

#### Exercise 3.3: Choose your Own Adventure

There are a *ton* of interesting and fun models on the HuggingFace hub. Pick one that does something interesting and adapt it in some way to a new task. Or, combine two or more models into something more interesting or fun. The sky's the limit.

**Note**: Reach out to me by email or on the Discord if you are unsure about anything.

In [5]:
# Your code here.