**Setup dell'Ambiente e Librerie**

In [None]:
%%capture
import os, re

if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    import torch
    torch_ver = re.match(r"[0-9]{1,}\.[0-9]{1,}", str(torch.__version__)).group(0)

    xformers_pkg = "xformers==" + ("0.0.33.post1" if torch_ver=="2.9" else "0.0.32.post2" if torch_ver=="2.8" else "0.0.29.post3")

    !pip install --no-deps bitsandbytes accelerate {xformers_pkg} peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install --no-deps unsloth

!pip install transformers==4.56.2
!pip install --no-deps trl==0.22.2

**Inizializzazione del Modello (Llama 3.2)**

In [None]:
from unsloth import FastLanguageModel
import torch

# Configurazione Parametri Modello
CONF_MAX_SEQ_LEN = 2048
CONF_DTYPE = None
CONF_4BIT_LOADING = True

spam_model, spam_tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
    max_seq_length = CONF_MAX_SEQ_LEN,
    dtype = CONF_DTYPE,
    load_in_4bit = CONF_4BIT_LOADING,
)

==((====))==  Unsloth 2025.12.9: Fast Llama patching. Transformers: 4.56.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.5.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.33.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


**Applicazione Adattatori LoRA (Low-Rank Adaptation)**

In [None]:
# Aggiunta degli adattatori LoRA al modello per il fine-tuning
spam_model = FastLanguageModel.get_peft_model(
    spam_model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",

    use_gradient_checkpointing = "unsloth",
    random_state = 42,
    use_rslora = False,
    loftq_config = None,
)

**Pipeline di Formattazione Dati**

In [None]:
from unsloth.chat_templates import get_chat_template

spam_tokenizer = get_chat_template(
    spam_tokenizer,
    chat_template = "llama-3.1",
)

def prepare_training_data(batch):
    """
    Funzione per convertire il formato JSONL in formato tokenizzato per il modello.
    """
    conversations_list = batch["conversations"]

    formatted_prompts = [
        spam_tokenizer.apply_chat_template(conv, tokenize = False, add_generation_prompt = False)
        for conv in conversations_list
    ]
    return { "text" : formatted_prompts, }

**Caricamento Dataset Locale**

In [None]:
from datasets import load_dataset
from unsloth.chat_templates import standardize_sharegpt

# Nome del file generato localmente
LOCAL_DATASET_FILE = "train_unsloth.jsonl"

# Caricamento dataset dal file system
raw_data = load_dataset("json", data_files=LOCAL_DATASET_FILE, split="train")

train_dataset = standardize_sharegpt(raw_data)
train_dataset = train_dataset.map(prepare_training_data, batched = True,)

Generating train split: 0 examples [00:00, ? examples/s]

Unsloth: Standardizing formats (num_proc=2):   0%|          | 0/4457 [00:00<?, ? examples/s]

Map:   0%|          | 0/4457 [00:00<?, ? examples/s]

**Configurazione del Trainer (SFTTrainer)**

In [None]:
from trl import SFTConfig, SFTTrainer
from transformers import DataCollatorForSeq2Seq

# Istanza del Trainer
spam_trainer = SFTTrainer(
    model = spam_model,
    tokenizer = spam_tokenizer,
    train_dataset = train_dataset,
    dataset_text_field = "text",
    max_seq_length = CONF_MAX_SEQ_LEN,
    data_collator = DataCollatorForSeq2Seq(tokenizer = spam_tokenizer),
    packing = False,

    # Parametri di addestramento
    args = SFTConfig(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 10,
        max_steps = 60,
        learning_rate = 2e-4,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.001,
        lr_scheduler_type = "linear",
        seed = 42,
        output_dir = "my_spam_model_outputs",
        report_to = "none",
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=4):   0%|          | 0/4457 [00:00<?, ? examples/s]

**Mascheramento dei Prompt**

In [None]:
from unsloth.chat_templates import train_on_responses_only

# Configurazione per calcolare la loss solo sulle risposte
spam_trainer = train_on_responses_only(
    spam_trainer,
    instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
    response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)

Map (num_proc=6):   0%|          | 0/4457 [00:00<?, ? examples/s]

**Esecuzione Training**

In [None]:
# Check stats GPU prima del training
gpu_info = torch.cuda.get_device_properties(0)
initial_mem = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_mem = round(gpu_info.total_memory / 1024 / 1024 / 1024, 3)
print(f"Scheda Grafica: {gpu_info.name}. Memoria Max: {max_mem} GB.")

# Avvio addestramento effettivo
training_results = spam_trainer.train()

Scheda Grafica: Tesla T4. Memoria Max: 14.741 GB.


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 4,457 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 24,313,856 of 3,237,063,680 (0.75% trained)


Step,Training Loss
1,4.5987
2,3.1253
3,4.6383
4,3.7146
5,1.9712
6,0.7155
7,0.323
8,0.5869
9,0.108
10,0.0807


**Test di Inferenza (Verifica)**

In [None]:
from unsloth import FastLanguageModel
from transformers import TextStreamer


FastLanguageModel.for_inference(spam_model)


msg_test = [
    {"role": "user", "content": "URGENT! You have won a FREE iPhone 15. Click here: http://bit.ly/fake"},
]

# Preparazione input tensori
input_ids = spam_tokenizer.apply_chat_template(
    msg_test,
    tokenize = True,
    add_generation_prompt = True,
    return_tensors = "pt",
).to("cuda")


print("--- RISPOSTA MODELLO ---")
text_streamer = TextStreamer(spam_tokenizer, skip_prompt = True)
_ = spam_model.generate(
    input_ids = input_ids,
    streamer = text_streamer,
    max_new_tokens = 64,
    use_cache = True,
    temperature = 0.1,
    min_p = 0.1
)

--- RISPOSTA MODELLO ---
It seems like you're trying to test my response. The link you provided appears to be a fake giveaway.<|eot_id|>


**Esportazione GGUF (Download)**

In [None]:
# Salvataggio nel formato GGUF ottimizzato (q4_k_m)
SAVE_FORMAT = "q4_k_m"

if True:
    print(f"Inizio conversione modello in formato {SAVE_FORMAT}...")
    spam_model.save_pretrained_gguf(
        "model",
        spam_tokenizer,
        quantization_method = SAVE_FORMAT
    )
    print("Conversione completata. Controlla la cartella 'model' per scaricare il file.")

Inizio conversione modello in formato q4_k_m...
Unsloth: Merging model weights to 16-bit format...
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
Checking cache directory for required files...
Cache check failed: model-00001-of-00002.safetensors not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.


Unsloth: Preparing safetensor model files: 100%|██████████| 2/2 [00:00<00:00, 13025.79it/s]


Note: tokenizer.model not found (this is OK for non-SentencePiece models)


Unsloth: Merging weights into 16bit: 100%|██████████| 2/2 [03:02<00:00, 91.15s/it]


Unsloth: Merge process complete. Saved to `/content/model`
Unsloth: Converting to GGUF format...
==((====))==  Unsloth: Conversion from HF to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF f16 might take 3 minutes.
\        /    [2] Converting GGUF f16 to ['q4_k_m'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: llama.cpp found in the system. Skipping installation.
Unsloth: Preparing converter script...
Unsloth: [1] Converting model into f16 GGUF format.
This might take 3 minutes...
Unsloth: Initial conversion completed! Files: ['Llama-3.2-3B-Instruct.F16.gguf']
Unsloth: [2] Converting GGUF f16 into q4_k_m. This might take 10 minutes...
Unsloth: Model files cleanup...
Unsloth: All GGUF conversions completed successfully!
Generated files: ['Llama-3.2-3B-Instruct.Q4_K_M.gguf']
Unsloth: example usage for text only LLMs: llama-cli --model Llama-3.2-3B-Instruct.Q4