# üíß LFM2 - SFT with TRL

This tutorial demonstrates how to fine-tune our LFM2 models, e.g. [`LiquidAI/LFM2-1.2B`](https://huggingface.co/LiquidAI/LFM2-1.2B), using the TRL library.

Follow along if it's your first time using trl, or take single code snippets for your own workflow

## üéØ What you'll find:
- **SFT** (Supervised Fine-Tuning) - Basic instruction following
- **LoRA + SFT** - Using LoRA (from PEFT) to SFT while on constrained hardware

## üìã Prerequisites:
- **GPU Runtime**: Select GPU in `Runtime` ‚Üí `Change runtime type`
- **Hugging Face Account**: For accessing models and datasets



# üì¶ Installation & Setup

First, let's install all the required packages:


In [None]:
%uv pip install transformers==4.54.0 trl>=0.18.2 peft>=0.15.2

[2mUsing Python 3.12.6 environment at: /usr/local[0m
[2mAudited [1m3 packages[0m [2min 73ms[0m[0m
Note: you may need to restart the kernel to use updated packages.
[2mUsing Python 3.12.6 environment at: /usr/local[0m
[2mAudited [1m1 package[0m [2min 8ms[0m[0m
Note: you may need to restart the kernel to use updated packages.


Let's now verify the packages are installed correctly

In [None]:
import torch
import transformers
import trl
import os

print(f"üì¶ PyTorch version: {torch.__version__}")
print(f"ü§ó Transformers version: {transformers.__version__}")
print(f"üìä TRL version: {trl.__version__}")

üì¶ PyTorch version: 2.8.0+cu129
ü§ó Transformers version: 4.54.0
üìä TRL version: 0.20.0


# Loading the model from Transformers ü§ó



In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from IPython.display import display, HTML, Markdown
import torch



model_id = "google/gemma-3-4b-it"
print("üìö Loading tokenizer...")

tokenizer = AutoTokenizer.from_pretrained(model_id)

print("üß† Loading model...")
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="bfloat16",
#   attn_implementation="flash_attention_2" #<- uncomment on compatible GPU
)

print("‚úÖ Local model loaded successfully!")
print(f"üî¢ Parameters: {model.num_parameters():,}")
print(f"üìñ Vocab size: {len(tokenizer)}")
print(f"üíæ Model size: ~{model.num_parameters() * 2 / 1e9:.1f} GB (bfloat16)")

üìö Loading tokenizer...
üß† Loading model...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

‚úÖ Local model loaded successfully!
üî¢ Parameters: 4,300,079,472
üìñ Vocab size: 262145
üíæ Model size: ~8.6 GB (bfloat16)


# üéØ Part 1: Supervised Fine-Tuning (SFT)

SFT teaches the model to follow instructions by training on input-output pairs (instruction vs response). This is the foundation for creating instruction-following models.

## Load an SFT Dataset

We will use [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk), limiting ourselves to the first 5k samples for brevity. Feel free to change the limit by changing the slicing index in the parameter `split`.

In [None]:
from datasets import load_dataset

print("üì• Loading SFT dataset...")
dataset = load_dataset('json', data_files='./discourse_qa.json')
dataset = dataset.remove_columns(["question", "answer"])
dataset = dataset.shuffle(seed=42)
dataset = dataset["train"].train_test_split(0.1,seed=42)
train_dataset_sft = dataset["train"]
eval = dataset['test'].train_test_split(0.5, seed=42)
eval_dataset_sft = dataset["test"]

print("‚úÖ SFT Dataset loaded:")
print(f"   üìö Train samples: {len(train_dataset_sft)}")
print(f"   üß™ Eval samples: {len(eval_dataset_sft)}")
print(f"\nüìù Single Sample: {train_dataset_sft[0]['text']}")

üì• Loading SFT dataset...
‚úÖ SFT Dataset loaded:
   üìö Train samples: 7950
   üß™ Eval samples: 884

üìù Single Sample: You are a patient that has gone to do an interview with a psychologist. The psychologist will ask you a series of questions and you will answer them in a natural way:

### Input:
mhm .

### Expected Response:
and they have like a dredge .
 it's a big suction hose .


# üéõÔ∏è Part 2: LoRA + SFT (Parameter-Efficient Fine-tuning)

LoRA (Low-Rank Adaptation) allows efficient fine-tuning by only training a small number of additional parameters. Perfect for limited compute resources!


## Wrap the model with PEFT

We specify target modules that will be finetuned while the rest of the models weights remains frozen. Feel free to modify the `r` (rank) value:
- higher -> better approximation of full-finetuning
- lower -> needs even less compute resources

In [None]:
from peft import LoraConfig, get_peft_model, TaskType

GLU_MODULES = ["w1", "w2", "w3"]
MHA_MODULES = ["q_proj", "k_proj", "v_proj", "out_proj"]
CONV_MODULES = ["in_proj", "out_proj"]

lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=16,  # <- lower values = fewer parameters
    lora_alpha=16,
    lora_dropout=0.1,
    target_modules=GLU_MODULES + MHA_MODULES + CONV_MODULES,
    bias="none",
    modules_to_save=None,
)

lora_model = get_peft_model(model, lora_config)
lora_model.print_trainable_parameters()

print("‚úÖ LoRA configuration applied!")
print(f"üéõÔ∏è  LoRA rank: {lora_config.r}")
print(f"üìä LoRA alpha: {lora_config.lora_alpha}")
print(f"üéØ Target modules: {lora_config.target_modules}")

trainable params: 41,549,824 || all params: 4,341,629,296 || trainable%: 0.9570
‚úÖ LoRA configuration applied!
üéõÔ∏è  LoRA rank: 64
üìä LoRA alpha: 16
üéØ Target modules: {'w1', 'v_proj', 'k_proj', 'in_proj', 'w3', 'w2', 'out_proj', 'q_proj'}


## Launch Training

Now ready to launch the SFT training, but this time with the LoRA-wrapped model

In [None]:
from trl import SFTConfig, SFTTrainer

lora_sft_config = SFTConfig(
    output_dir="./lfm2-sft-lora",
    num_train_epochs=10,
    per_device_train_batch_size=2,
    learning_rate=5e-5,
    lr_scheduler_type="linear",
    warmup_steps=100,
    warmup_ratio=0.2,
    logging_steps=10,
    save_strategy="epoch",
    eval_strategy="epoch",
    load_best_model_at_end=True,
    report_to=None,
)

print("üèóÔ∏è  Creating LoRA SFT trainer...")
lora_sft_trainer = SFTTrainer(
    model=lora_model,
    args=lora_sft_config,
    train_dataset=train_dataset_sft,
    eval_dataset=eval_dataset_sft,
    processing_class=tokenizer,
)

print("\nüöÄ Starting LoRA + SFT training...")
lora_sft_trainer.train()

print("üéâ LoRA + SFT training completed!")

lora_sft_trainer.save_model()
print(f"üíæ LoRA model saved to: {lora_sft_config.output_dir}")

üèóÔ∏è  Creating LoRA SFT trainer...


Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.



üöÄ Starting LoRA + SFT training...


Epoch,Training Loss,Validation Loss
