# Contronto tra il fine-tuning completo e il fine-tuning che utilizza LoRA


**Configurazioni**  
Installazione delle librerie necessarie.

In [20]:
!pip install transformers datasets torch loralib

  pid, fd = os.forkpty()
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




Carico i file `lora_utilis.py` e `models.py`.

In [21]:
import sys
sys.path.append('/kaggle/input/lora-utils/')

Importo i moduli necessari.

In [22]:
import torch
from transformers import AutoTokenizer, AutoModel
import lora_utils, models

## Sentiment analisys

### 1. Ottenimento dei dati e preprocessing
Confronto il fine-tuning completo e quello che usa LoRA innanzitutto sul task di classificazione binaria del sentimento utilizzando il dataset SST-2.

SST-2 contiene esempi che consistono in frasi tratte da recensioni di film le cui etichette sono 1 se la recensione positiva, 0 altrimenti.

Utilizzo la libreria `datasets` di Hugging Face per caricare il dataset e dividerlo in training sat, velidation set e test set.

In [36]:
from datasets import load_dataset

dataset = load_dataset('sst2')

print(dataset)

train_data = dataset['train'].shuffle(seed=42).select(range(10000))
val_data = dataset['validation']
test_data = dataset['test']

README.md:   0%|          | 0.00/5.27k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/3.11M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/72.8k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/148k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/67349 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/872 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1821 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['idx', 'sentence', 'label'],
        num_rows: 67349
    })
    validation: Dataset({
        features: ['idx', 'sentence', 'label'],
        num_rows: 872
    })
    test: Dataset({
        features: ['idx', 'sentence', 'label'],
        num_rows: 1821
    })
})


In [46]:
from collections import Counter

labels = [example['label'] for example in val_data]
print(Counter(labels))

Counter({1: 444, 0: 428})


In [9]:
# Tokenizzazione

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def tokenize_function(batch):
    return tokenizer(batch["sentence"], padding="max_length", truncation=True, max_length=512)

train_data = train_data.map(tokenize_function, batched=True)
val_data = val_data.map(tokenize_function, batched=True)
test_data = test_data.map(tokenize_function, batched=True)

Map:   0%|          | 0/10000 [00:00<?, ? examples/s]

Map:   0%|          | 0/872 [00:00<?, ? examples/s]

Map:   0%|          | 0/1821 [00:00<?, ? examples/s]

In [10]:
# Conversione dei dati in tensori

def to_tensor(dataset, shuffle=True):
    return torch.utils.data.DataLoader(
        dataset.with_format("torch"), batch_size=16, shuffle=shuffle
    )

train_loader = to_tensor(train_data)
val_loader = to_tensor(val_data)
test_loader = to_tensor(test_data)

### 2. Configurazione dei modelli

Carico il modello BERT pre-addestrato e ne costruisco un altro che utilizza LoRA.  *

Modello stan*dard.

In [11]:
from transformers import AutoModelForSequenceClassification

full_model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased", num_labels=2
)
full_model.to("cuda")

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

*Modello con LoRA*.

In [12]:
lora_model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased", num_labels=2
)
lora_utils.add_lora_to_bert(lora_model.bert, r=8)
lora_utils.mark_only_lora_as_trainable(lora_model.bert)
lora_model.to("cuda")

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=False)
              (key): Linear(in_features=768, out_features=768, bias=False)
              (value): Linear(in_features=768, out_features=768, bias=False)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps

### 3. Addestramento dei modelli

In [13]:
import time
from torch.optim import AdamW
from torch.optim.lr_scheduler import LinearLR


# funzione di perdita e ottimizzatore
criterion = torch.nn.CrossEntropyLoss()
full_optimizer = AdamW(full_model.parameters(), lr=2e-5)
lora_optimizer = AdamW(filter(lambda p: p.requires_grad, lora_model.parameters()), lr=2e-4)


epochs = 5

# Aggiungi il Learning Rate Scheduler lineare
full_scheduler = LinearLR(full_optimizer, start_factor=1.0, end_factor=0.1, total_iters=epochs)  # Modifica total_iters in base alle epoche
lora_scheduler = LinearLR(lora_optimizer, start_factor=1.0, end_factor=0.1, total_iters=epochs)


In [14]:
# Funzione per l'addestramento
def train_model(model, dataloader, optimizer, scheduler, epochs=5):

    model.train()
    start_time = time.time()

    for epoch in range(epochs):
        epoch_loss = 0
        for batch in dataloader:
            optimizer.zero_grad()

            # Sposto i dati su GPU
            input_ids = batch["input_ids"].to("cuda")
            attention_mask = batch["attention_mask"].to("cuda")
            labels = batch["label"].to("cuda")

            # Calcolo della loss
            outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
            loss = outputs.loss
            loss.backward()
            optimizer.step()

            # Loss per l'epoca
            epoch_loss += loss.item()

        # Learning Rate Scheduler
        scheduler.step()

        print(f"Epoch {epoch + 1}/{epochs} - Loss: {epoch_loss:.4f}")

    end_time = time.time()
    training_time = end_time - start_time
    
    return training_time

In [15]:
# Addestramento modello full fine-tuning

print("Inizio addestramento modello full Fine-Tuning ...")
full_training_time = train_model(full_model, train_loader, full_optimizer, full_scheduler, epochs)

print(f"Tempo totale di addestramento Full Fine-Tuning: {full_training_time:.2f} secondi")

Inizio addestramento modello full Fine-Tuning ...
Epoch 1/5 - Loss: 185.9664
Epoch 2/5 - Loss: 79.4242
Epoch 3/5 - Loss: 32.9163
Epoch 4/5 - Loss: 19.9763
Epoch 5/5 - Loss: 10.8248
Tempo totale di addestramento full Fine-Tuning: 2687.65 secondi


In [16]:
# Addestramento dei modello con LoRA
print("Inizio addestramento modello con LoRA ...")
lora_training_time = train_model(lora_model, train_loader, lora_optimizer, lora_scheduler, epochs)

print(f"Tempo totale di addestramento LoRA: {lora_training_time:.2f} secondi")

Inizio addestramento modello con LoRA ...
Epoch 1/5 - Loss: 301.9535
Epoch 2/5 - Loss: 191.0833
Epoch 3/5 - Loss: 172.0531
Epoch 4/5 - Loss: 162.6645
Epoch 5/5 - Loss: 158.8940
Tempo totale di addestramento LoRA: 2062.00 secondi


### 4. Valutazione dei modelli

Calcolo l'accuracy e l'F1-score per valutare le prestazioni di entrambi i modelli.

In [49]:
from sklearn.metrics import accuracy_score, f1_score

def evaluate_model(model, dataloader):
    model.eval()
    all_preds, all_labels = [], []
    with torch.no_grad():
        for batch in dataloader:
            input_ids = batch["input_ids"].to("cuda")
            attention_mask = batch["attention_mask"].to("cuda")
            labels = batch["label"].to("cuda")

            outputs = model(input_ids=input_ids, attention_mask=attention_mask)
            preds = torch.argmax(outputs.logits, axis=1)
            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())

    acc = accuracy_score(all_labels, all_preds)
    f1 = f1_score(all_labels, all_preds, average="weighted")
    return acc, f1

In [50]:
full_acc, full_f1 = evaluate_model(full_model, val_loader)
lora_acc, lora_f1 = evaluate_model(lora_model, val_loader)
print(f"Full Fine-Tuning - Accuracy: {full_acc}, F1: {full_f1}")
print(f"LoRA Fine-Tuning - Accuracy: {lora_acc}, F1: {lora_f1}")

print(f"Tempo totale di addestramento Full Fine-Tuning: {full_training_time:.2f} secondi")
print(f"Tempo totale di addestramento LoRA: {lora_training_time:.2f} secondi")

Full Fine-Tuning - Accuracy: 0.9094036697247706, F1: 0.9093481969598188
LoRA Fine-Tuning - Accuracy: 0.893348623853211, F1: 0.8933170394170504
Tempo totale di addestramento Full Fine-Tuning: 2687.65 secondi
Tempo totale di addestramento LoRA: 2062.00 secondi


### 5. Salvataggio dei moduli LoRA

In [53]:
def save_lora_parameters(model, file_path):
    lora_params = {}
    for name, param in model.named_parameters():
        if "lora_" in name and param.requires_grad:
            lora_params[name] = param.detach().cpu()
    torch.save(lora_params, file_path)
    print(f"LoRA parameters saved to {file_path}")

save_lora_parameters(lora_model.bert, "lora_adapter_params.pt")

LoRA parameters saved to lora_adapter_params.pt


In [54]:
def load_lora_parameters(model, file_path):
    lora_params = torch.load(file_path)
    with torch.no_grad():
        for name, param in model.named_parameters():
            if name in lora_params:
                param.copy_(lora_params[name])
    print(f"LoRA parameters loaded from {file_path}")

load_lora_parameters(lora_model.bert, "lora_adapter_params.pt")


LoRA parameters loaded from lora_adapter_params.pt


  lora_params = torch.load(file_path)


## Classificazione di testi

### 1. Ottenimento dei dati e preprocessing

Proseguo confrontando i due tipi di fine-tuning sul task di topic classificazione utilizzando il dataset AG News.AG News T-2 contiene esempi che consistono in articoli di notizie etichettati in 4 classi: World, Sport, Buisness, Sci/Tech.

Utilizzo sempre la libreria `datasets` di Hugging Face per caricare il dataset e dividerlo in trainion set e test set.

In [55]:
dataset2 = load_dataset('ag_news')

print(dataset2)

train_data = dataset2['train'].shuffle(seed=42).select(range(10000))
test_data = dataset2['test']

README.md:   0%|          | 0.00/8.07k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/18.6M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/1.23M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/120000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/7600 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 120000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 7600
    })
})


In [62]:
labels = [example['label'] for example in test_data]
print(Counter(labels))

Counter({2: 1900, 3: 1900, 1: 1900, 0: 1900})


In [57]:
# Tokenizzazione
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")


def tokenize_function(batch):
    return tokenizer(batch["text"], padding="max_length", truncation=True, max_length=512)

train_data = train_data.map(tokenize_function, batched=True)
test_data = test_data.map(tokenize_function, batched=True)



# Conversione dei dati in tensori
train_loader = to_tensor(train_data)
test_loader = to_tensor(test_data)

Map:   0%|          | 0/10000 [00:00<?, ? examples/s]

Map:   0%|          | 0/7600 [00:00<?, ? examples/s]

### 2. Configurazione dei modelli

*Modello standard*

In [58]:
full_model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased", num_labels=4
)
full_model.to("cuda")

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

*Modello con LoRA*

In [59]:
lora_model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased", num_labels=4
)
lora_utils.add_lora_to_bert(lora_model.bert, r=8)
lora_utils.mark_only_lora_as_trainable(lora_model.bert)
lora_model.to("cuda")

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=False)
              (key): Linear(in_features=768, out_features=768, bias=False)
              (value): Linear(in_features=768, out_features=768, bias=False)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps

### 3. Addestramento dei modelli

In [63]:
epochs = 5

# Aggiungi il Learning Rate Scheduler lineare
full_scheduler = LinearLR(full_optimizer, start_factor=1.0, end_factor=0.1, total_iters=epochs)  # Modifica total_iters in base alle epoche
lora_scheduler = LinearLR(lora_optimizer, start_factor=1.0, end_factor=0.1, total_iters=epochs)

In [64]:
# Addestramento modello full fine-tuning

print("Inizio addestramento modello full Fine-Tuning ...")
full_training_time = train_model(full_model, train_loader, full_optimizer, full_scheduler, epochs)

print(f"Tempo totale di addestramento Full Fine-Tuning: {full_training_time:.2f} secondi")

Inizio addestramento modello full Fine-Tuning ...
Epoch 1/5 - Loss: 887.2023
Epoch 2/5 - Loss: 888.3462
Epoch 3/5 - Loss: 889.7485
Epoch 4/5 - Loss: 889.6065
Epoch 5/5 - Loss: 889.1687
Tempo totale di addestramento Full Fine-Tuning: 2652.91 secondi


In [65]:
# Addestramento dei modello con LoRA
print("Inizio addestramento modello con LoRA ...")
lora_training_time = train_model(lora_model, train_loader, lora_optimizer, lora_scheduler, epochs)

print(f"Tempo totale di addestramento LoRA: {lora_training_time:.2f} secondi")

Inizio addestramento modello con LoRA ...
Epoch 1/5 - Loss: 906.1448
Epoch 2/5 - Loss: 905.2233
Epoch 3/5 - Loss: 904.6969
Epoch 4/5 - Loss: 905.7273
Epoch 5/5 - Loss: 906.4725
Tempo totale di addestramento LoRA: 2062.74 secondi


### 4. Valutazione dei modelli

Calcolo l'accuracy e l'F1-score per valutare le prestazioni di entrambi i modelli.

In [66]:
full_acc, full_f1 = evaluate_model(full_model, test_loader)
lora_acc, lora_f1 = evaluate_model(lora_model, test_loader)
print(f"Full Fine-Tuning - Accuracy: {full_acc}, F1: {full_f1}")
print(f"LoRA Fine-Tuning - Accuracy: {lora_acc}, F1: {lora_f1}")

print(f"Tempo totale di addestramento Full Fine-Tuning: {full_training_time:.2f} secondi")
print(f"Tempo totale di addestramento LoRA: {lora_training_time:.2f} secondi")

Full Fine-Tuning - Accuracy: 0.2530263157894737, F1: 0.1106988190196375
LoRA Fine-Tuning - Accuracy: 0.23578947368421052, F1: 0.1248002773231829
Tempo totale di addestramento Full Fine-Tuning: 2652.91 secondi
Tempo totale di addestramento LoRA: 2062.74 secondi


## Natural Language Inference

### 1. Ottenimento dei dati e preprocessing

Proseguo confrontando i due tipi di fine-tuning sul task di Natural Language Inference utilizzando il dataset MNLI.

MNLI contiene esempi che consistono in una coppia di frasi (premessa e ipotesi). Le etichette indicano la relazione tra queste due frasi. Le possibili etichette sono:
- *Entailment*: l'ipotesi è implicata dalla premessa.
- *Contradiction*: l'ipotesi contraddice la premessa.
- *Neutral*: nessuna relazione specifica.

Utilizzo sempre la libreria `datasets` di Hugging Face per caricare il dataset e dividerlo in training set, validation set e test set.

In [73]:
dataset = load_dataset('glue', 'mnli')

print(dataset)

# Ottengo train, validation e test set
train_data = dataset['train'].shuffle(seed=42).select(range(10000))  
val_data_matched = dataset['validation_matched']  # matched: gli esempi provengono dagli stessi domini presenti nel set di addestramento.
val_data_mismatched = dataset['validation_mismatched'].select(range(5000))  # mismatched: gli esempi provengono da domini diversi rispetto a quelli del set di addestramento.
test_data_matched = dataset['test_matched'] 
test_data_mismatched = dataset['test_mismatched']

DatasetDict({
    train: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 392702
    })
    validation_matched: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 9815
    })
    validation_mismatched: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 9832
    })
    test_matched: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 9796
    })
    test_mismatched: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 9847
    })
})


In [74]:
labels = [example['label'] for example in val_data_mismatched]
print(Counter(labels))

Counter({0: 1797, 2: 1663, 1: 1540})


In [75]:
# Tokenizzazione

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def tokenize_function(batch):
    return tokenizer(
        batch["premise"], 
        batch["hypothesis"], 
        padding="max_length", 
        truncation=True, 
        max_length=512
    )

train_data = train_data.map(tokenize_function, batched=True)
val_data_matched = val_data_matched.map(tokenize_function, batched=True)
val_data_mismatched = val_data_mismatched.map(tokenize_function, batched=True)
test_data_matched = test_data_matched.map(tokenize_function, batched=True)
test_data_mismatched = test_data_mismatched.map(tokenize_function, batched=True)

# Rimozione delle colonne non necessarie
columns_to_keep = ["input_ids", "attention_mask", "label"]

train_data = train_data.remove_columns([col for col in train_data.column_names if col not in columns_to_keep])
val_data_matched = val_data_matched.remove_columns([col for col in val_data_matched.column_names if col not in columns_to_keep])
val_data_mismatched = val_data_mismatched.remove_columns([col for col in val_data_mismatched.column_names if col not in columns_to_keep])
test_data_matched = test_data_matched.remove_columns([col for col in test_data_matched.column_names if col not in columns_to_keep])
test_data_mismatched = test_data_mismatched.remove_columns([col for col in test_data_mismatched.column_names if col not in columns_to_keep])



# Conversione dei dati

def to_tensor(dataset):
    return torch.utils.data.DataLoader(
        dataset.with_format("torch"), 
        batch_size=16, 
        collate_fn=lambda x: {
            key: torch.stack([d[key] for d in x]) for key in x[0].keys()
        }
    )

train_loader = to_tensor(train_data)
val_matched_loader = to_tensor(val_data_matched)
val_mismatched_loader = to_tensor(val_data_mismatched)
test_matched_loader = to_tensor(test_data_matched)
test_mismatched_loader = to_tensor(test_data_mismatched)

Map:   0%|          | 0/10000 [00:00<?, ? examples/s]

Map:   0%|          | 0/9815 [00:00<?, ? examples/s]

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

Map:   0%|          | 0/9796 [00:00<?, ? examples/s]

Map:   0%|          | 0/9847 [00:00<?, ? examples/s]

### 2. Configurazione dei modelli

*Modello standard*

In [77]:
full_model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased", num_labels=3
)
full_model.to("cuda")

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

*Modello con LoRA*.

In [78]:
lora_model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased", num_labels=3
)
lora_utils.add_lora_to_bert(lora_model.bert, r=8)
lora_utils.mark_only_lora_as_trainable(lora_model.bert)
lora_model.to("cuda")

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=False)
              (key): Linear(in_features=768, out_features=768, bias=False)
              (value): Linear(in_features=768, out_features=768, bias=False)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps

### 3. Addestramento dei modelli

In [79]:
epochs = 5

# Aggiungi il Learning Rate Scheduler lineare
full_scheduler = LinearLR(full_optimizer, start_factor=1.0, end_factor=0.1, total_iters=epochs)  # Modifica total_iters in base alle epoche
lora_scheduler = LinearLR(lora_optimizer, start_factor=1.0, end_factor=0.1, total_iters=epochs)

In [80]:
# Addestramento modello full fine-tuning

print("Inizio addestramento modello full Fine-Tuning ...")
full_training_time = train_model(full_model, train_loader, full_optimizer, full_scheduler, epochs)

print(f"Tempo totale di addestramento Full Fine-Tuning: {full_training_time:.2f} secondi")

Inizio addestramento modello full Fine-Tuning ...
Epoch 1/5 - Loss: 693.9929
Epoch 2/5 - Loss: 695.3533
Epoch 3/5 - Loss: 695.7772
Epoch 4/5 - Loss: 695.7490
Epoch 5/5 - Loss: 693.7836
Tempo totale di addestramento Full Fine-Tuning: 2652.42 secondi


In [None]:
# Addestramento dei modello con LoRA
print("Inizio addestramento modello con LoRA ...")
lora_training_time = train_model(lora_model, train_loader, lora_optimizer, lora_scheduler, epochs)

print(f"Tempo totale di addestramento LoRA: {lora_training_time:.2f} secondi")

Inizio addestramento modello con LoRA ...
Epoch 1/5 - Loss: 695.5546
Epoch 2/5 - Loss: 695.5251
Epoch 3/5 - Loss: 695.2860
Epoch 4/5 - Loss: 695.5838


### 4. Valutazione dei modelli

In [82]:
full_acc, full_f1 = evaluate_model(full_model, test_loader)
lora_acc, lora_f1 = evaluate_model(lora_model, test_loader)
print(f"Full Fine-Tuning - Accuracy: {full_acc}, F1: {full_f1}")
print(f"LoRA Fine-Tuning - Accuracy: {lora_acc}, F1: {lora_f1}")

print(f"Tempo totale di addestramento Full Fine-Tuning: {full_training_time:.2f} secondi")
print(f"Tempo totale di addestramento LoRA: {lora_training_time:.2f} secondi")

Full Fine-Tuning - Accuracy: 0.24960526315789475, F1: 0.11765386179251446
LoRA Fine-Tuning - Accuracy: 0.24052631578947367, F1: 0.1337429014304005
Tempo totale di addestramento Full Fine-Tuning: 2652.42 secondi
Tempo totale di addestramento LoRA: 2062.75 secondi
