<a href="https://colab.research.google.com/github/Buggia11/Transformer-Architecture/blob/main/Fine_Tuning_LLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **1. Installazione pacchetti e login HugginFace**

In [None]:
# Installazione pacchetti
!pip install huggingface_hub
!pip install -q transformers accelerate peft datasets torch



In [None]:
# Login a HuggingFace
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
!pip install transformers



#**2. Import del modello e configurazione**

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,      # Usa float16
    device_map="auto"                # Usa GPU se
)

# Configurazione pad token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    model.config.pad_token_id = tokenizer.eos_token_id

# Abilita gradient checkpointing (risparmia memoria durante training)
model.gradient_checkpointing_enable()

print("✅ Modello caricato")
print(f"   - Memoria GPU: ~{torch.cuda.memory_allocated()/1024**3:.2f} GB")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/679 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/3.55G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

✅ Modello caricato
   - Memoria GPU: ~3.31 GB


2.1 Test del modello originale senza fine tuning per benchmark successivo


In [None]:
print("\n" + "="*50)
print("🔬 TEST MODELLO ORIGINALE (Prima del Fine-Tuning)")
print("="*50 + "\n")

# Prompt di test medico
test_prompt = "Approaches to immunotherapy of cancer"

# Prepara l'input per il modello
inputs = tokenizer(
    test_prompt,
    return_tensors="pt",
    padding=True
).to(model.device)

# Genera la risposta
print(f"💬 Prompt: '{test_prompt}'")
print("⏳ Generazione in corso...\n")

with torch.no_grad():  # Non calcola gradienti per risparmiare memoria
    outputs = model.generate(
        **inputs,
        max_new_tokens=100,           # Max 100 token
        temperature=0.5,
        do_sample=True,               # Usa sampling per variabilità
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id
    )

# Decodifica solo la parte generata (senza il prompt)
generated_text = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[-1]:],  # Prende solo i nuovi token
    skip_special_tokens=True                      # Rimuove token speciali
)

print(f"🤖 Risposta Modello Originale:")
print(f"   {generated_text}")
print("\n" + "="*50 + "\n")


🔬 TEST MODELLO ORIGINALE (Prima del Fine-Tuning)

💬 Prompt: 'Approaches to immunotherapy of cancer'
⏳ Generazione in corso...

🤖 Risposta Modello Originale:
   , including immunotherapy and immunosuppression, are increasingly important in the treatment of cancer. The following are some approaches:
1. Immunotherapy with
   a. Immunotherapy
   b. Immunotherapy
   c. Immunotherapy
2. Immunotherapy with
   a. Immunotherapy
   b. Immunotherapy
   c. Immunotherapy
3. Immunotherapy with
   a. Immunotherapy
   b. Immunotherapy
   c. Immunotherapy

Wait, I




# **3. Import dataset e preparazione per il training**

In [None]:
from datasets import load_dataset

ds = load_dataset("TimSchopf/medical_abstracts", "default")
print(ds)
print(ds['train'][0])

README.md: 0.00B [00:00, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/7.67M [00:00<?, ?B/s]

data/test-00000-of-00001.parquet:   0%|          | 0.00/1.94M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/11550 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2888 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['condition_label', 'medical_abstract'],
        num_rows: 11550
    })
    test: Dataset({
        features: ['condition_label', 'medical_abstract'],
        num_rows: 2888
    })
})
{'condition_label': 5, 'medical_abstract': 'Tissue changes around loose prostheses. A canine model to investigate the effects of an antiinflammatory agent. The aseptically loosened prosthesis provided a means for investigating the in vivo and in vitro activity of the cells associated with the loosening process in seven dogs. The cells were isolated and maintained in culture for sufficient periods of time so that their biologic activity could be studied as well as the effect of different agents added to the cells in vivo or in vitro. The biologic response as determined by interleukin-1 and prostaglandin E2 activity paralleled the roentgenographic appearance of loosening and the technetium images and observations made at the time of revision surgery. The 

In [None]:
def tokenize_function(examples):
    return tokenizer(
        examples["medical_abstract"],  # Colonna con il testo medico
        truncation=True,                # Taglia se troppo lungo
        max_length=512,                 # Massimo 512 token
        padding="max_length"            # Riempie fino a 512
    )

# Applica la tokenizzazione a tutto il dataset
tokenized_datasets = ds.map(
    tokenize_function,
    batched=True,                      # Processa più esempi insieme
    remove_columns=ds["train"].column_names  # Rimuove colonne originali
)

# Aggiungi le "labels"
def add_labels(examples):
    examples["labels"] = examples["input_ids"].copy() # Le labels sono uguali agli input (il modello impara a predire la parola successiva)
    return examples

tokenized_datasets = tokenized_datasets.map(add_labels, batched=True)

print("✅ Dataset tokenizzato!")
print(f"   - Esempi train: {len(tokenized_datasets['train'])}")
print(f"   - Esempi test: {len(tokenized_datasets['test'])}")

Map:   0%|          | 0/11550 [00:00<?, ? examples/s]

Map:   0%|          | 0/2888 [00:00<?, ? examples/s]

Map:   0%|          | 0/11550 [00:00<?, ? examples/s]

Map:   0%|          | 0/2888 [00:00<?, ? examples/s]

✅ Dataset tokenizzato!
   - Esempi train: 11550
   - Esempi test: 2888


3.1 Configurazione LoRA



*   Invece di modificare tutti i parametri del modello (1.5 miliardi), crea piccoli adattori (qualche milione) e addestra solo quelli



In [None]:

from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

print("🔧 Configurazione LoRA per fine-tuning efficiente...\n")

# Prepara il modello per il training
model = prepare_model_for_kbit_training(model)

# Configurazione LoRA (addestra solo piccole parti del modello)
lora_config = LoraConfig(
    r=8,                              # Rank (dimensione adattatori LoRA)
    lora_alpha=16,                    # Scaling factor
    target_modules=["q_proj", "v_proj"],  # Parti del modello da addestrare
    lora_dropout=0.05,                # Dropout per evitare overfitting
    bias="none",                      # Non addestra i bias
    task_type="CAUSAL_LM"             # Tipo di task (language modeling)
)

# Applica LoRA al modello
model = get_peft_model(model, lora_config)

print("✅ LoRA configurato!")
print("\n📊 Parametri addestrabili:")
model.print_trainable_parameters()
print("="*50 + "\n")

🔧 Configurazione LoRA per fine-tuning efficiente...

✅ LoRA configurato!

📊 Parametri addestrabili:
trainable params: 1,089,536 || all params: 1,778,177,536 || trainable%: 0.0613



# **4. Configurazione del training**

4.1 Riduzione del dataset per ottimizzare il training

In [None]:
# Riduci il dataset a un subset gestibile
train_subset = tokenized_datasets["train"].select(range(2000))  # 2000 esempi train
test_subset = tokenized_datasets["test"].select(range(500))      # 500 esempi test

In [None]:
from transformers import TrainingArguments, Trainer
import os

# Disabilita wandb (strumento di logging non necessario)
os.environ["WANDB_DISABLED"] = "true"

print("⚙️ Configurazione parametri di training...\n")

training_args = TrainingArguments(
    output_dir="./results",                    # Dove salvare i checkpoint

    # PARAMETRI DI TRAINING
    num_train_epochs=1,                        # 1 epoca (1 giro su tutto il dataset)
    per_device_train_batch_size=1,             # 1 esempio alla volta (per risparmiare memoria)
    per_device_eval_batch_size=1,              # 1 esempio per valutazione
    gradient_accumulation_steps=8,             # Accumula 8 step = simula batch_size=8

    # LEARNING RATE
    learning_rate=2e-4,                        # Velocità di apprendimento
    warmup_steps=100,                          # Primi 100 step con learning rate crescente

    # VALUTAZIONE
    eval_strategy="steps",                     # Valuta ogni tot step
    eval_steps=1000,                            # Valuta ogni 500 step

    # SALVATAGGIO
    save_strategy="steps",                     # Salva ogni tot step
    save_steps=1000,                            # Salva ogni 500 step
    save_total_limit=2,                        # Mantieni solo ultimi 2 checkpoint

    # OTTIMIZZAZIONI MEMORIA
    fp16=True,                                 # Usa float16 (dimezza memoria)
    gradient_checkpointing=True,               # Risparmia memoria durante backprop

    # LOGGING
    logging_steps=100,                          # Stampa loss ogni 50 step
    logging_dir="./logs",                      # Dove salvare i log

    # ALTRO
    load_best_model_at_end=True,               # Carica il modello migliore alla fine
    metric_for_best_model="eval_loss",         # Usa eval_loss per scegliere il migliore
    report_to="none",                          # Non usare servizi esterni
    push_to_hub=False                          # Non caricare su Hugging Face Hub
)

print("✅ Parametri configurati!")
print(f"   - Epoche: {training_args.num_train_epochs}")
print(f"   - Batch size effettivo: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
print(f"   - Learning rate: {training_args.learning_rate}")
print(f"   - Valutazione ogni: {training_args.eval_steps} step")
print("="*50 + "\n")

⚙️ Configurazione parametri di training...

✅ Parametri configurati!
   - Epoche: 1
   - Batch size effettivo: 8
   - Learning rate: 0.0002
   - Valutazione ogni: 1000 step



# **5. Training del modello**

In [None]:
# === BLOCCO 8: CREAZIONE TRAINER E AVVIO TRAINING ===
print("🚀 Preparazione Trainer...\n")

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset= train_subset,
    eval_dataset=test_subset
)

print("="*50)
print("🏋️ INIZIO TRAINING")
print("="*50)
print("⏰ Tempo stimato: ~30-60 minuti")
print("💡 Suggerimento: lascia la tab aperta e monitora i progressi\n")

# Avvia il training
trainer.train()

print("\n" + "="*50)
print("✅ TRAINING COMPLETATO!")
print("="*50 + "\n")

# Salva il modello finale
print("💾 Salvataggio modello...")
trainer.save_model("./medical-deepseek-lora")
tokenizer.save_pretrained("./medical-deepseek-lora")
print("✅ Modello salvato in: ./medical-deepseek-lora")
print("="*50 + "\n")

The model is already on multiple devices. Skipping the move to device specified in `args`.


🚀 Preparazione Trainer...

🏋️ INIZIO TRAINING
⏰ Tempo stimato: ~30-60 minuti
💡 Suggerimento: lascia la tab aperta e monitora i progressi



`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Step,Training Loss,Validation Loss



✅ TRAINING COMPLETATO!

💾 Salvataggio modello...
✅ Modello salvato in: ./medical-deepseek-lora



# **5. Fine Tuning**

In [None]:
# === BLOCCO 10: CARICAMENTO MODELLO FINE-TUNATO ===
from peft import PeftModel
import torch

print("📥 Caricamento modello fine-tunato...\n")

# Carica il modello base
base_model = AutoModelForCausalLM.from_pretrained(
    "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Carica gli adattatori LoRA addestrati
finetuned_model = PeftModel.from_pretrained(
    base_model,
    "./medical-deepseek-lora"
)

# Carica il tokenizer
finetuned_tokenizer = AutoTokenizer.from_pretrained("./medical-deepseek-lora")

print("✅ Modello fine-tunato caricato!")
print("="*50 + "\n")

📥 Caricamento modello fine-tunato...

✅ Modello fine-tunato caricato!



In [None]:
# === BLOCCO 11: TEST MODELLO FINE-TUNATO ===
print("="*50)
print("🧪 TEST MODELLO FINE-TUNATO")
print("="*50 + "\n")

# Stesso prompt usato prima
test_prompt = "Approaches to immunotherapy of cancer"

# Prepara input
inputs = finetuned_tokenizer(
    test_prompt,
    return_tensors="pt",
    padding=True
).to(finetuned_model.device)

# Genera risposta
print(f"💬 Prompt: '{test_prompt}'")
print("⏳ Generazione in corso...\n")

with torch.no_grad():
    outputs = finetuned_model.generate(
        **inputs,
        max_new_tokens=100,
        temperature=0.7,
        do_sample=True,
        pad_token_id=finetuned_tokenizer.pad_token_id
    )

# Decodifica solo la parte generata
finetuned_response = finetuned_tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[-1]:],
    skip_special_tokens=True
)

print(f"🤖 Risposta Modello Fine-Tunato:")
print(f"   {finetuned_response}\n")
print("="*50 + "\n")

🧪 TEST MODELLO FINE-TUNATO

💬 Prompt: 'Approaches to immunotherapy of cancer'
⏳ Generazione in corso...

🤖 Risposta Modello Fine-Tunato:
   . The immunotherapy approach is one of the most promising and promising of the various forms of cancer therapy. It is considered more effective than conventional treatment for some cancers, and in some cases, it may be the only form of treatment available. The immunotherapy approach is also considered to be of great importance in the management of cancers of the liver, gastrointestinal tract, and the stomach. In recent years, the use of immunotherapy has become more common in the treatment of cancer, as the immunotherapy approach


