# Contronto tra il fine-tuning completo e il fine-tuning con LoRA

#### Configurazioni

Installazione della libreria `loralib` per implementare la Low-Rank Adaptation.

In [1]:
!pip install loralib

Collecting loralib
  Downloading loralib-0.1.2-py3-none-any.whl.metadata (15 kB)
Downloading loralib-0.1.2-py3-none-any.whl (10 kB)
Installing collected packages: loralib
Successfully installed loralib-0.1.2


Carico i file `lora_utilis.py` e `models.py`.

In [2]:
import sys
sys.path.append('/kaggle/input/lora-utils/')

Importo i moduli necessari.

In [3]:
import os
import random
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import transformers
import lora_utils, models

Impostazione del seme casuale per la riproducibilità.

In [4]:
seed_value = 42

os.environ['PYTHONHASHSEED'] = str(seed_value)
random.seed(seed_value)
np.random.seed(seed_value)
torch.manual_seed(seed_value)

# Imposto il seme casuale anche per i calcoli CUDA
if torch.cuda.is_available():
    torch.cuda.manual_seed(seed_value)
    torch.cuda.manual_seed_all(seed_value)  
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

## Sentiment Analisys

### 1. Ottenimento dei dati e preprocessing

Confronto il fine-tuning completo e quello che usa LoRA innanzitutto sul task di Sentiment Analysis (classificazione binaria) utilizzando il dataset SST-2.

SST-2 contiene esempi che consistono in frasi tratte da recensioni di film le cui etichette sono 1 se la recensione positiva, 0 altrimenti.

Utilizzo la libreria `datasets` di Hugging Face per caricare il dataset e dividerlo in training set, validation set e test set.

In [5]:
from datasets import load_dataset
from sklearn.model_selection import train_test_split

dataset = load_dataset('sst2')


data = dataset['train'].shuffle(seed=42)

temp_data, test_data, temp_labels, test_labels = train_test_split(data['sentence'], 
                                                  data['label'], 
                                                  test_size=1024, 
                                                  random_state=42,
                                                  stratify=data['label'])

train_data, val_data, train_labels, val_labels = train_test_split(data['sentence'], 
                                                  data['label'],
                                                  train_size=10000,
                                                  test_size=1000, 
                                                  random_state=42,
                                                  stratify=data['label'])

from collections import Counter

print("Dimensioni dei set:")
print(f"Train: {len(train_data)}")
print(f"Validation: {len(val_data)}")
print(f"Test: {len(test_data)}")

# Verifica distribuzione delle etichette
print("\nDistribuzione delle etichette:")
print(f"Train: {Counter(train_labels)}")
print(f"Validation: {Counter(val_labels)}")
print(f"Test: {Counter(test_labels)}")


README.md:   0%|          | 0.00/5.27k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/3.11M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/72.8k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/148k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/67349 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/872 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1821 [00:00<?, ? examples/s]

Dimensioni dei set:
Train: 10000
Validation: 1000
Test: 1024

Distribuzione delle etichette:
Train: Counter({1: 5578, 0: 4422})
Validation: Counter({1: 558, 0: 442})
Test: Counter({1: 571, 0: 453})


Creo una classe Dataset personalizzata in cui viene effettuata la tokenizzaione delle recensioni e la conversione dei dati in tensori.

In [6]:
from torch.utils.data import Dataset

class IMDBDataset(Dataset):

    def __init__(self, sentences, labels, tokenizer, max_len):
        self.sentences = sentences
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_len = max_len
    
    def __len__(self):
        return len(self.sentences)
    
    def __getitem__(self,index):
        sentence = self.sentences[index]
        label = self.labels[index]
        
        encoding = self.tokenizer.encode_plus(
            sentence,
            add_special_tokens=True,
            max_length=self.max_len,
            truncation=True,
            return_token_type_ids=True,
            padding="max_length",
            return_attention_mask=True,
            return_tensors='pt')
        
        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'token_type_ids': encoding["token_type_ids"].flatten(),
            'labels': torch.tensor(label, dtype=torch.float)
            }

Inizializzo il Tokenizer BERT per tokenizzare le frasi e creo i dataset personalizzati.

In [7]:
from transformers import BertTokenizer
from torch.utils.data import DataLoader

MAX_SEQ_LEN = 128

# Inizializza il Tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

#Ottieni i dataset
training_data = IMDBDataset(sentences = train_data,
                           labels = train_labels,
                           tokenizer = tokenizer,
                           max_len = MAX_SEQ_LEN)

validation_data = IMDBDataset(sentences = val_data,
                           labels = val_labels,
                           tokenizer = tokenizer,
                           max_len = MAX_SEQ_LEN)

test_data = IMDBDataset(sentences = test_data,
                           labels = test_labels,
                           tokenizer = tokenizer,
                           max_len = MAX_SEQ_LEN)

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]



### 2. Configurazione dei modelli

In [8]:
from transformers import BertModel

class BERTClassifier(nn.Module):
    
    def __init__(self, lora: bool = False, r: int = 16):
        super(BERTClassifier, self).__init__()
        self.bert = BertModel.from_pretrained('bert-base-uncased')
        self.dropout = torch.nn.Dropout(p=0.3)
        self.linear = nn.Linear(self.bert.config.hidden_size, 1)

        if lora:
            print("Adding LoRA to BERT")
            lora_utils.add_lora_to_bert(self.bert, r=r)
            lora_utils.mark_only_lora_as_trainable(self.bert)

    
    def forward(self, input_ids, attention_mask, token_type_ids):
        output_bert = self.bert(
            input_ids, 
            attention_mask=attention_mask, 
            token_type_ids=token_type_ids
        )
        output_dropout = self.dropout(output_bert.pooler_output)
        output = self.linear(output_dropout)
        return output

In [9]:
# Device
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

In [10]:
# MODELLO STANDARD
full_model = BERTClassifier(lora=False)
full_model.to(device)

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

BERTClassifier(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwis

In [11]:
# MODELLO CON LORA
lora_model = BERTClassifier(lora=True, r=16)
lora_model.to(device)

Adding LoRA to BERT


BERTClassifier(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=False)
              (key): Linear(in_features=768, out_features=768, bias=False)
              (value): Linear(in_features=768, out_features=768, bias=False)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, element

### 3. Addestramento dei modelli

Definisco una serie di funzioni per l'addestramento e la valutazione di un modello.

In [12]:
import time
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score


# Funzione di training e valutazione
def train_and_evaluate_model(model, model_name, train_loader, val_loader, criterion, optimizer, scheduler, device, epochs=4):
    history = {"train_loss": [], "train_acc": [], "val_loss": [], "val_acc": []}
    best_accuracy = 0
    
    start_time = time.time()

    for epoch in range(epochs):
        print(f"\nEpoch {epoch + 1}/{epochs}")
        print('-' * 10)

        # Training
        train_loss, train_acc = train_model(model, train_loader, criterion, optimizer, scheduler, device)
        print(f"Train loss: {train_loss:.4f}, Train accuracy: {train_acc:.4f}")

        # Valutazione
        val_loss, val_acc, val_f1, val_auc = eval_model(model, val_loader, criterion, device)
        print(f"Validation loss: {val_loss:.4f}, Validation accuracy: {val_acc:.4f}")

        # Salvataggio del modello migliore
        if val_acc > best_accuracy:
            torch.save(model.state_dict(),  f"sa_{model_name}_best_model_state.bin")
            best_accuracy = val_acc

        # Salva le metriche
        history["train_loss"].append(train_loss)
        history["train_acc"].append(train_acc)
        history["val_loss"].append(val_loss)
        history["val_acc"].append(val_acc)
    
    end_time = time.time()
    total_training_time = end_time - start_time
    
    return history, total_training_time

In [13]:
# Funzione di training 
def train_model(model, data_loader, criterion, optimizer, scheduler, device):
  
    model = model.train() # imposto il modello in modalità di aggiornamento
    
    total_loss = 0
    all_preds = []
    all_labels = []

    for batch in data_loader:
        
        # Sposto i dati sul device
        input_ids = batch["input_ids"].to(device)
        attention_mask = batch["attention_mask"].to(device)
        token_type_ids = batch['token_type_ids'].to(device)
        labels = batch["labels"].unsqueeze(1).to(device)

        #  --- Forward pass ---
        
        optimizer.zero_grad()
 
        outputs = model(
            input_ids = input_ids,
            attention_mask = attention_mask,
            token_type_ids = token_type_ids
        )
        
        loss = criterion(outputs, labels)


        # --- Backward pass ---
        
        
        loss.backward()

        nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

        optimizer.step()
        scheduler.step()

        total_loss += loss.item()
        
        preds = (torch.sigmoid(outputs) > 0.5).long()    # trasformo i dati grezzi in etichette binarie
        
        all_preds.extend(preds.detach().cpu().numpy())
        all_labels.extend(labels.detach().cpu().numpy())

    avg_loss = total_loss / len(data_loader)
    accuracy = accuracy_score(all_labels, all_preds)
   
    return avg_loss, accuracy

In [14]:
# Funzione di valutazione
def eval_model(model, data_loader, criterion, device):
    model = model.eval()

    total_loss = 0
    total_preds = []
    total_labels = []
    total_probs = []

    with torch.no_grad():
        for batch in data_loader:
            input_ids = batch["input_ids"].to(device)
            attention_masks = batch["attention_mask"].to(device)
            token_type_ids = batch['token_type_ids'].to(device)
            labels = batch["labels"].unsqueeze(1).to(device)
            
            outputs = model(
                input_ids=input_ids,
                attention_mask=attention_masks,
                token_type_ids = token_type_ids
            )
            #logits = outputs.logits
            
            loss = criterion(outputs, labels)

            total_loss += loss.item()
            
            probs = torch.sigmoid(outputs)
            preds = (probs > 0.5).long()  # Trasformo in etichette binarie
            
            total_preds.extend(preds.detach().cpu().numpy())
            total_labels.extend(labels.detach().cpu().numpy())
            total_probs.extend(probs.detach().cpu().numpy())

    avg_loss = total_loss / len(data_loader)
    accuracy = accuracy_score(total_labels, total_preds)
    f1 = f1_score(total_labels, total_preds, average="weighted")
    auc = roc_auc_score(total_labels, total_probs)
    
    return avg_loss, accuracy, f1, auc

Definisco le configurazion principali per il training.

In [15]:
# Parametri principali
learning_rate = 3e-5
EPOCHS = 4
BATCH_SIZE = 32

# Creo i DataLoader
train_loader = DataLoader(training_data, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(validation_data, batch_size=BATCH_SIZE, shuffle=False)
test_loader = DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=False)

total_steps = len(train_loader) * EPOCHS

# Funzione di loss
criterion = torch.nn.BCEWithLogitsLoss() 

# Ottimizzatore
optimizer_bert = torch.optim.AdamW(params = full_model.parameters(), lr = learning_rate)

# Scheduler
scheduler_bert = transformers.get_linear_schedule_with_warmup(optimizer = optimizer_bert,
                                                       num_warmup_steps = 0,
                                                       num_training_steps = total_steps)

In [16]:
history_bert, total_time_bert = train_and_evaluate_model(
    full_model, "full_model", train_loader, val_loader, criterion, optimizer_bert, scheduler_bert, device, epochs=4
)
print(f"\nBERT Training Time: {total_time_bert:.2f} seconds, {total_time_bert/60:.2f} minutes.")


Epoch 1/4
----------
Train loss: 0.3386, Train accuracy: 0.8509
Validation loss: 0.2542, Validation accuracy: 0.9030

Epoch 2/4
----------
Train loss: 0.1497, Train accuracy: 0.9514
Validation loss: 0.3192, Validation accuracy: 0.9090

Epoch 3/4
----------
Train loss: 0.0731, Train accuracy: 0.9796
Validation loss: 0.3948, Validation accuracy: 0.9080

Epoch 4/4
----------
Train loss: 0.0394, Train accuracy: 0.9903
Validation loss: 0.4160, Validation accuracy: 0.9130

BERT Training Time: 487.85 seconds, 8.13 minutes.


In [17]:
# Parametri principali
learning_rate = 5e-4
EPOCHS = 4
BATCH_SIZE = 16

# Creo i DataLoader
train_loader = DataLoader(training_data, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(validation_data, batch_size=BATCH_SIZE, shuffle=False)
test_loader = DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=False)

total_steps = len(train_loader) * EPOCHS

# Funzione di loss
criterion = torch.nn.BCEWithLogitsLoss() 

# Ottimizzatore
optimizer_lora = torch.optim.AdamW(filter(lambda p: p.requires_grad, lora_model.parameters()), lr = learning_rate)


# Scheduler
scheduler_lora = transformers.get_linear_schedule_with_warmup(optimizer = optimizer_lora,
                                                       num_warmup_steps = 0,
                                                       num_training_steps = total_steps)

In [18]:
history_lora, total_time_lora = train_and_evaluate_model(
    lora_model,"lora_model", train_loader, val_loader, criterion, optimizer_lora, scheduler_lora, device, epochs=4
)
print(f"BERT with LoRA Training Time: {total_time_lora:.2f} seconds, {total_time_lora/60:.2f} minutes.")


Epoch 1/4
----------
Train loss: 0.4155, Train accuracy: 0.7987
Validation loss: 0.3120, Validation accuracy: 0.8710

Epoch 2/4
----------
Train loss: 0.2785, Train accuracy: 0.8889
Validation loss: 0.2840, Validation accuracy: 0.8820

Epoch 3/4
----------
Train loss: 0.2465, Train accuracy: 0.9006
Validation loss: 0.2802, Validation accuracy: 0.8890

Epoch 4/4
----------
Train loss: 0.2284, Train accuracy: 0.9083
Validation loss: 0.2834, Validation accuracy: 0.8930
BERT with LoRA Training Time: 364.23 seconds, 6.07 minutes.


### 4. Valutazione dei modelli
Valuto i modello calcolando la loss sul test set, l'accuracy, l'F1-score e ROC AUC.

In [19]:
full_model.load_state_dict(torch.load("sa_full_model_best_model_state.bin"))

test_loss, test_acc, test_f1, test_auc = eval_model(full_model, test_loader, criterion, device)
print(f"Full Fine-Tuning - Test loss: {test_loss:.4f}, Accuracy: {test_acc:.4f}, F1 score: {test_f1:.4f}, ROC AUC: {test_auc:.4f}")

  full_model.load_state_dict(torch.load("sa_full_model_best_model_state.bin"))


Full Fine-Tuning - Test loss: 0.3161, Accuracy: 0.9307, F1 score: 0.9305, ROC AUC: 0.9738


In [20]:
lora_model.load_state_dict(torch.load("sa_lora_model_best_model_state.bin"))

lora_test_loss, lora_test_acc, lora_test_f1, lora_test_auc = eval_model(lora_model, test_loader, criterion, device)
print(f"LoRA Fine-Tuning - Test loss: {lora_test_loss:.4f}, Accuracy: {lora_test_acc:.4f}, F1 score: {lora_test_f1:.4f}, ROC AUC: {lora_test_auc:.4f}")

  lora_model.load_state_dict(torch.load("sa_lora_model_best_model_state.bin"))


LoRA Fine-Tuning - Test loss: 0.2291, Accuracy: 0.9141, F1 score: 0.9138, ROC AUC: 0.9701


## Natural Language Inference

Proseguo confrontando i due tipi di fine-tuning sul task di Natural Language Inference utilizzando il dataset MNLI.

MNLI contiene esempi che consistono in una coppia di frasi (premessa e ipotesi). Le etichette indicano la relazione tra queste due frasi.  
Le possibili etichette sono:
- **Entailment**: l'ipotesi è implicata dalla premessa.
- **Contradiction**: l'ipotesi contraddice la premessa.
- **Neutral**: nessuna relazione specifica.
  
Utilizzo sempre la libreria `datasets` di Hugging Face per caricare il dataset e dividerlo in training set, validation set e test set.

In [21]:
from datasets import load_dataset
from sklearn.model_selection import train_test_split
from collections import Counter

# Carico il datast
dataset = load_dataset('glue', 'mnli')
print(dataset)


# Divido i dati in training set, validation set e test set
data = dataset['train'].shuffle(seed=42)

temp_premises, test_premises, temp_hypotheses, test_hypotheses, temp_labels, test_labels = train_test_split(data['premise'], 
                                                  data['hypothesis'],                
                                                  data['label'], 
                                                  test_size=1024, 
                                                  random_state=42,
                                                  stratify=data['label'])

train_premises, val_premises, train_hypotheses, val_hypotheses, train_labels, val_labels = train_test_split(data['premise'], 
                                                  data['hypothesis'],
                                                  data['label'],
                                                  train_size=10000,
                                                  test_size=1000, 
                                                  random_state=42,
                                                  stratify=data['label'])

print("Dimensioni dei set:")
print(f"Train: {len(train_premises)}")
print(f"Validation: {len(val_premises)}")
print(f"Test: {len(test_premises)}")

# Verifica distribuzione delle etichette
print("\nDistribuzione delle etichette:")
print(f"Train: {Counter(train_labels)}")
print(f"Validation: {Counter(val_labels)}")
print(f"Test: {Counter(test_labels)}")

README.md:   0%|          | 0.00/35.3k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/52.2M [00:00<?, ?B/s]

(…)alidation_matched-00000-of-00001.parquet:   0%|          | 0.00/1.21M [00:00<?, ?B/s]

(…)dation_mismatched-00000-of-00001.parquet:   0%|          | 0.00/1.25M [00:00<?, ?B/s]

test_matched-00000-of-00001.parquet:   0%|          | 0.00/1.22M [00:00<?, ?B/s]

test_mismatched-00000-of-00001.parquet:   0%|          | 0.00/1.26M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/392702 [00:00<?, ? examples/s]

Generating validation_matched split:   0%|          | 0/9815 [00:00<?, ? examples/s]

Generating validation_mismatched split:   0%|          | 0/9832 [00:00<?, ? examples/s]

Generating test_matched split:   0%|          | 0/9796 [00:00<?, ? examples/s]

Generating test_mismatched split:   0%|          | 0/9847 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 392702
    })
    validation_matched: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 9815
    })
    validation_mismatched: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 9832
    })
    test_matched: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 9796
    })
    test_mismatched: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 9847
    })
})
Dimensioni dei set:
Train: 10000
Validation: 1000
Test: 1024

Distribuzione delle etichette:
Train: Counter({2: 3334, 1: 3333, 0: 3333})
Validation: Counter({2: 334, 1: 333, 0: 333})
Test: Counter({2: 342, 0: 341, 1: 341})


Creo una classe Dataset personalizzata in cui viene effettuata la tokenizzaione delle recensioni e la conversione dei dati in tensori.

In [22]:
from torch.utils.data import Dataset

class MNLIDataset(Dataset):

    def __init__(self, premises, hypotheses , labels, tokenizer, max_len):
        self.premises = premises
        self.hypotheses = hypotheses
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_len = max_len
    
    def __len__(self):
        return len(self.premises)
    
    def __getitem__(self,index):
        premise = self.premises[index]
        hyphotesis = self.hypotheses[index]
        label = self.labels[index]
        
        encoding = self.tokenizer.encode_plus(
            premise,
            hyphotesis,
            add_special_tokens=True,
            max_length=self.max_len,
            truncation=True,
            return_token_type_ids=True,
            padding="max_length",
            return_attention_mask=True,
            return_tensors='pt')
        
        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'token_type_ids': encoding["token_type_ids"].flatten(),
            'labels': torch.tensor(label, dtype=torch.long)
            }

Inizializzo il Tokenizer BERT per tokenizzare le frasi e creo i dataset personalizzati.

In [23]:
from transformers import BertTokenizer
from torch.utils.data import DataLoader

MAX_SEQ_LEN = 256 

# Inizializza il Tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

#Ottieni i dataset
training_data = MNLIDataset(premises = train_premises,
                            hypotheses = train_hypotheses,
                            labels = train_labels,
                            tokenizer = tokenizer,
                            max_len = MAX_SEQ_LEN)

validation_data = MNLIDataset(premises = train_premises,
                            hypotheses = train_hypotheses,
                            labels = train_labels,
                            tokenizer = tokenizer,
                            max_len = MAX_SEQ_LEN)

test_data = MNLIDataset(premises = train_premises,
                            hypotheses = train_hypotheses,
                            labels = train_labels,
                            tokenizer = tokenizer,
                            max_len = MAX_SEQ_LEN)



### 2. Costruzione dei modelli

In [24]:
from transformers import BertModel

class NLIBERTClassifier(nn.Module):
    
    def __init__(self, num_classes: int = 3, lora: bool = False, r: int = 16):
        super(NLIBERTClassifier, self).__init__()
        self.bert = BertModel.from_pretrained('bert-base-uncased')
        self.dropout = nn.Dropout(p=0.3)
        self.linear = nn.Linear(self.bert.config.hidden_size, num_classes)

        if lora:
            print("Adding LoRA to BERT")
            lora_utils.add_lora_to_bert(self.bert, r=r)
            lora_utils.mark_only_lora_as_trainable(self.bert)

    def forward(self, input_ids, attention_mask, token_type_ids):
        output_bert = self.bert(
            input_ids, 
            attention_mask=attention_mask,
            token_type_ids=token_type_ids
        )
        output_dropout = self.dropout(output_bert.pooler_output)
        logits = self.linear(output_dropout)
        return logits


In [25]:
# Device
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

In [26]:
# MODELLO STANDARD
full_model = NLIBERTClassifier(lora=False)
full_model.to(device)

NLIBERTClassifier(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, element

In [27]:
# MODELLO CON LORA
lora_model = NLIBERTClassifier(lora=True, r=32)
lora_model.to(device)

Adding LoRA to BERT


NLIBERTClassifier(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=False)
              (key): Linear(in_features=768, out_features=768, bias=False)
              (value): Linear(in_features=768, out_features=768, bias=False)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elem

### 3. Addestramento dei modelli

Definisco una serie di funzioni per l'addestramento e la valutazione di un modello.

In [28]:
import time
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score


# Funzione di training e valutazione
def train_and_evaluate_model(model, model_name, train_loader, val_loader, criterion, optimizer, scheduler, device, epochs=4):
    history = {"train_loss": [], "train_acc": [], "val_loss": [], "val_acc": []}
    best_accuracy = 0
    
    start_time = time.time()

    for epoch in range(epochs):
        print(f"\nEpoch {epoch + 1}/{epochs}")
        print('-' * 10)

        # Training
        train_loss, train_acc = train_model(model, train_loader, criterion, optimizer, scheduler, device)
        print(f"Train loss: {train_loss:.4f}, Train accuracy: {train_acc:.4f}")

        # Valutazione
        val_loss, val_acc, val_f1, val_auc = eval_model(model, val_loader, criterion, device)
        print(f"Validation loss: {val_loss:.4f}, Validation accuracy: {val_acc:.4f}")

        # Salvataggio del modello migliore
        if val_acc > best_accuracy:
            torch.save(model.state_dict(),  f"nli_{model_name}_best_model_state.bin")
            best_accuracy = val_acc

        # Salva le metriche
        history["train_loss"].append(train_loss)
        history["train_acc"].append(train_acc)
        history["val_loss"].append(val_loss)
        history["val_acc"].append(val_acc)
    
    end_time = time.time()
    total_training_time = end_time - start_time
    
    return history, total_training_time

In [29]:
# Funzione di training 
def train_model(model, data_loader, criterion, optimizer, scheduler, device):

    model = model.train()  # Imposto il modello in modalità di aggiornamento

    total_loss = 0
    all_preds = []
    all_labels = []

    for batch in data_loader:

        # Sposto i dati sul device
        input_ids = batch["input_ids"].to(device)
        attention_mask = batch["attention_mask"].to(device)
        token_type_ids = batch['token_type_ids'].to(device)
        labels = batch["labels"].to(device)

        # --- Forward pass ---
        optimizer.zero_grad()

        outputs = model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            token_type_ids = token_type_ids
        )

        loss = criterion(outputs, labels)

        # --- Backward pass ---
        loss.backward()

        nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

        optimizer.step()
        scheduler.step()

        total_loss += loss.item()

        preds = torch.argmax(outputs, dim=1)  # Predizioni multiclasse

        all_preds.extend(preds.detach().cpu().numpy())
        all_labels.extend(labels.detach().cpu().numpy())

    avg_loss = total_loss / len(data_loader)
    accuracy = accuracy_score(all_labels, all_preds)

    return avg_loss, accuracy

In [30]:
# Funzione di valutazione
def eval_model(model, data_loader, criterion, device):
    model = model.eval()

    total_loss = 0
    all_preds = []
    all_labels = []
    all_probs = []

    with torch.no_grad():
        for batch in data_loader:
            input_ids = batch["input_ids"].to(device)
            attention_mask = batch["attention_mask"].to(device)
            token_type_ids = batch['token_type_ids'].to(device)
            labels = batch["labels"].to(device)

            outputs = model(
                input_ids=input_ids,
                attention_mask=attention_mask,
                token_type_ids = token_type_ids
            )

            loss = criterion(outputs, labels)

            total_loss += loss.item()

            probs = torch.softmax(outputs, dim=1)  # Probabilità multiclasse
            preds = torch.argmax(probs, dim=1)  # Predizioni multiclasse

            all_preds.extend(preds.detach().cpu().numpy())
            all_labels.extend(labels.detach().cpu().numpy())
            all_probs.extend(probs.detach().cpu().numpy())

    avg_loss = total_loss / len(data_loader)
    accuracy = accuracy_score(all_labels, all_preds)
    f1 = f1_score(all_labels, all_preds, average="weighted")
    auc = roc_auc_score(all_labels, all_probs, multi_class="ovr")
    
    return avg_loss, accuracy, f1, auc


Definisco le configurazion principali per il training.

In [31]:
# Parametri principali
learning_rate = 3e-5
EPOCHS = 4
BATCH_SIZE = 32

# Creo i DataLoader
train_loader = DataLoader(training_data, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(validation_data, batch_size=BATCH_SIZE, shuffle=False)
test_loader = DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=False)

total_steps = len(train_loader) * EPOCHS

# Funzione di loss
criterion = torch.nn.CrossEntropyLoss()

# Ottimizzatore
optimizer_bert = torch.optim.AdamW(params = full_model.parameters(), lr = learning_rate)

# Scheduler
scheduler_bert = transformers.get_linear_schedule_with_warmup(optimizer = optimizer_bert,
                                                       num_warmup_steps = 0,
                                                       num_training_steps = total_steps)

In [32]:
history_bert, total_time_bert = train_and_evaluate_model(
    full_model, "full_model", train_loader, val_loader, criterion, optimizer_bert, scheduler_bert, device, epochs=4
)
print(f"\nBERT Training Time: {total_time_bert:.2f} seconds, {total_time_bert/60:.2f} minutes.")


Epoch 1/4
----------


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Train loss: 0.8532, Train accuracy: 0.5946


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Validation loss: 0.4901, Validation accuracy: 0.8167

Epoch 2/4
----------


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Train loss: 0.5057, Train accuracy: 0.8080


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Validation loss: 0.2371, Validation accuracy: 0.9298

Epoch 3/4
----------


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Train loss: 0.2730, Train accuracy: 0.9076


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Validation loss: 0.1121, Validation accuracy: 0.9701

Epoch 4/4
----------


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Train loss: 0.1563, Train accuracy: 0.9523


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Validation loss: 0.0707, Validation accuracy: 0.9811

BERT Training Time: 1318.83 seconds, 21.98 minutes.


In [33]:
# Parametri principali
learning_rate = 5e-4
EPOCHS = 8
BATCH_SIZE = 16

# Creo i DataLoader
train_loader = DataLoader(training_data, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(validation_data, batch_size=BATCH_SIZE, shuffle=False)
test_loader = DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=False)

total_steps = len(train_loader) * EPOCHS

# Funzione di loss
criterion = torch.nn.CrossEntropyLoss()

# Ottimizzatore
optimizer_lora = torch.optim.AdamW(filter(lambda p: p.requires_grad, lora_model.parameters()), lr = learning_rate)


# Scheduler
scheduler_lora = transformers.get_cosine_schedule_with_warmup(optimizer = optimizer_lora,
                                                       num_warmup_steps = 0,
                                                       num_training_steps = total_steps)

In [34]:
history_lora, total_time_lora = train_and_evaluate_model(
    lora_model,"lora_model", train_loader, val_loader, criterion, optimizer_lora, scheduler_lora, device, epochs=EPOCHS
)
print(f"BERT with LoRA Training Time: {total_time_lora:.2f} seconds, {total_time_lora/60:.2f} minutes.")


Epoch 1/8
----------


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Train loss: 1.0021, Train accuracy: 0.4758


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Validation loss: 0.7936, Validation accuracy: 0.6518

Epoch 2/8
----------


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Train loss: 0.7844, Train accuracy: 0.6569


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Validation loss: 0.6511, Validation accuracy: 0.7334

Epoch 3/8
----------


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Train loss: 0.6960, Train accuracy: 0.7108


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Validation loss: 0.5896, Validation accuracy: 0.7588

Epoch 4/8
----------


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Train loss: 0.6497, Train accuracy: 0.7382


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Validation loss: 0.5537, Validation accuracy: 0.7803

Epoch 5/8
----------


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Train loss: 0.6209, Train accuracy: 0.7505


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Validation loss: 0.5262, Validation accuracy: 0.7930

Epoch 6/8
----------


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Train loss: 0.5908, Train accuracy: 0.7606


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Validation loss: 0.5074, Validation accuracy: 0.8007

Epoch 7/8
----------


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Train loss: 0.5685, Train accuracy: 0.7728


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Validation loss: 0.5005, Validation accuracy: 0.8060

Epoch 8/8
----------


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Train loss: 0.5693, Train accuracy: 0.7718


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Validation loss: 0.4991, Validation accuracy: 0.8058
BERT with LoRA Training Time: 2116.90 seconds, 35.28 minutes.


### 4. Valutazione dei modelli
Valuto i modello calcolando la loss sul test set, l'accuracy, l'F1-score e ROC AUC.

In [35]:
full_model.load_state_dict(torch.load("nli_full_model_best_model_state.bin"))

test_loss, test_acc, test_f1, test_auc = eval_model(full_model, test_loader, criterion, device)
print(f"Full Fine-Tuning - Test loss: {test_loss:.4f}, Accuracy: {test_acc:.4f}, F1 score: {test_f1:.4f}, ROC AUC: {test_auc:.4f}")

  full_model.load_state_dict(torch.load("nli_full_model_best_model_state.bin"))
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Full Fine-Tuning - Test loss: 0.0708, Accuracy: 0.9811, F1 score: 0.9811, ROC AUC: 0.9973


In [36]:
lora_model.load_state_dict(torch.load("nli_lora_model_best_model_state.bin"))

lora_test_loss, lora_test_acc, lora_test_f1, lora_test_auc = eval_model(lora_model, test_loader, criterion, device)
print(f"LoRA Fine-Tuning - Test loss: {lora_test_loss:.4f}, Accuracy: {lora_test_acc:.4f}, F1 score: {lora_test_f1:.4f}, ROC AUC: {lora_test_auc:.4f}")

  lora_model.load_state_dict(torch.load("nli_lora_model_best_model_state.bin"))
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


LoRA Fine-Tuning - Test loss: 0.5005, Accuracy: 0.8060, F1 score: 0.8057, ROC AUC: 0.9314
