# Fine-Tuning LLM Models — Practical Examples

This notebook provides runnable examples for fine-tuning LLMs using:
1. **Vertex AI** (Gemini supervised fine-tuning) — managed, no GPU required locally
2. **Hugging Face TRL** with LoRA/QLoRA — open-source, full control

See also: `docs/FINE_TUNING_LLM_GUIDE.md` for the full conceptual guide.

## Part 1: Data Preparation

Both approaches start with well-formatted training data. Let's create example datasets.

In [None]:
import json
from pathlib import Path

# Example: Fine-tuning a model to be a video production assistant
# that gives structured ad recommendations

training_examples = [
    {
        "messages": [
            {"role": "system", "content": "You are a video production assistant specializing in ad concepts. Always respond with structured JSON containing: concept_name, target_emotion, visual_description, key_message, and duration_seconds."},
            {"role": "user", "content": "Create an ad concept for a premium yoga mat brand targeting beginners."},
            {"role": "assistant", "content": json.dumps({
                "concept_name": "First Step",
                "target_emotion": "empowerment",
                "visual_description": "A beginner unrolls a yoga mat in a sunlit living room. Gentle morning light. Close-ups of hands and feet finding balance. Transition to confident poses.",
                "key_message": "Every journey starts with a single pose.",
                "duration_seconds": 30
            }, indent=2)}
        ]
    },
    {
        "messages": [
            {"role": "system", "content": "You are a video production assistant specializing in ad concepts. Always respond with structured JSON containing: concept_name, target_emotion, visual_description, key_message, and duration_seconds."},
            {"role": "user", "content": "Create an ad concept for an organic skincare brand targeting millennials."},
            {"role": "assistant", "content": json.dumps({
                "concept_name": "Skin Deep",
                "target_emotion": "authenticity",
                "visual_description": "Split screen: left side shows harsh chemical formulas, right side shows ingredients growing in nature. Camera follows a woman choosing the natural path. Ends with product close-up.",
                "key_message": "What you put on your skin matters.",
                "duration_seconds": 15
            }, indent=2)}
        ]
    },
    {
        "messages": [
            {"role": "system", "content": "You are a video production assistant specializing in ad concepts. Always respond with structured JSON containing: concept_name, target_emotion, visual_description, key_message, and duration_seconds."},
            {"role": "user", "content": "Create an ad concept for a fitness app targeting busy professionals."},
            {"role": "assistant", "content": json.dumps({
                "concept_name": "5-Minute Transform",
                "target_emotion": "motivation",
                "visual_description": "Fast-paced montage: alarm clock, coffee, commute, desk. Then a pause — phone notification. Quick workout in office clothes. Energized return to work.",
                "key_message": "No gym. No excuses. Just 5 minutes.",
                "duration_seconds": 30
            }, indent=2)}
        ]
    },
]

print(f"Created {len(training_examples)} training examples")
print(f"\nExample format:")
print(json.dumps(training_examples[0], indent=2)[:500] + "...")

In [None]:
# Save as JSONL (standard format for both Vertex AI and HuggingFace)

output_dir = Path("../datasets")
output_dir.mkdir(exist_ok=True)

# HuggingFace format (messages)
hf_path = output_dir / "training_data_hf.jsonl"
with open(hf_path, "w") as f:
    for example in training_examples:
        f.write(json.dumps(example) + "\n")
print(f"HuggingFace format saved to: {hf_path}")

# Vertex AI format (systemInstruction + contents with 'model' role)
vertex_examples = []
for example in training_examples:
    vertex_ex = {
        "systemInstruction": {
            "parts": [{"text": example["messages"][0]["content"]}]
        },
        "contents": []
    }
    for msg in example["messages"][1:]:
        role = "model" if msg["role"] == "assistant" else "user"
        vertex_ex["contents"].append({
            "role": role,
            "parts": [{"text": msg["content"]}]
        })
    vertex_examples.append(vertex_ex)

vertex_path = output_dir / "training_data_vertex.jsonl"
with open(vertex_path, "w") as f:
    for example in vertex_examples:
        f.write(json.dumps(example) + "\n")
print(f"Vertex AI format saved to: {vertex_path}")

# Verify
print(f"\nVertex AI format example:")
print(json.dumps(vertex_examples[0], indent=2)[:500] + "...")

## Part 2: Fine-Tuning on Vertex AI (Gemini)

Managed fine-tuning — no local GPU required. Requires:
- A Google Cloud project with Vertex AI enabled
- Training data uploaded to Google Cloud Storage
- The `google-genai` package (`pip install google-genai`)

In [None]:
# Step 1: Upload training data to GCS
# (Run this from your terminal or uncomment below)
#
# gsutil cp datasets/training_data_vertex.jsonl gs://YOUR_BUCKET/fine-tuning/training_data.jsonl

# Configuration
PROJECT_ID = "artful-striker-483214-b0"  # Your GCP project
LOCATION = "us-central1"
TRAINING_DATA_URI = "gs://YOUR_BUCKET/fine-tuning/training_data.jsonl"  # Update this
TUNED_MODEL_NAME = "video-ad-assistant-v1"

print(f"Project: {PROJECT_ID}")
print(f"Location: {LOCATION}")
print(f"Training data: {TRAINING_DATA_URI}")

In [None]:
# Step 2: Launch fine-tuning job using Google Gen AI SDK (recommended)

from google import genai
from google.genai.types import HttpOptions

client = genai.Client(
    vertexai=True,
    project=PROJECT_ID,
    location=LOCATION,
    http_options=HttpOptions(api_version="v1"),
)

# Launch supervised fine-tuning
tuning_job = client.tunings.tune(
    base_model="gemini-2.5-flash",
    training_dataset={
        "gcs_uri": TRAINING_DATA_URI,
    },
    config={
        "tuned_model_display_name": TUNED_MODEL_NAME,
        "epoch_count": 3,
        "learning_rate_multiplier": 1.0,
    },
)

print(f"Tuning job started: {tuning_job.name}")
print(f"State: {tuning_job.state}")

In [None]:
# Step 3: Monitor the tuning job

import time

while not tuning_job.has_ended:
    print(f"Status: {tuning_job.state}")
    time.sleep(120)  # Check every 2 minutes
    tuning_job.refresh()

print(f"\nTuning complete!")
print(f"Tuned model: {tuning_job.tuned_model.name}")

In [None]:
# Step 4: Test the fine-tuned model

tuned_model_name = tuning_job.tuned_model.name

response = client.models.generate_content(
    model=tuned_model_name,
    contents="Create an ad concept for a sustainable fashion brand targeting Gen Z.",
)

print("Fine-tuned model response:")
print(response.text)

## Part 3: Fine-Tuning Open-Source Models with Hugging Face TRL

Full control over the training process. Requires:
- A GPU (local or cloud) — T4 (16GB) minimum for LoRA on 7B models
- `pip install trl peft transformers datasets bitsandbytes accelerate`

In [None]:
# Install dependencies (uncomment to run)
# !pip install trl peft transformers datasets bitsandbytes accelerate torch

In [None]:
# Example: LoRA fine-tuning with SFTTrainer

from datasets import load_dataset, Dataset
from trl import SFTConfig, SFTTrainer
from peft import LoraConfig

# Load our custom dataset
dataset = Dataset.from_json(str(hf_path))
print(f"Dataset size: {len(dataset)} examples")
print(f"Columns: {dataset.column_names}")
print(f"\nFirst example messages: {dataset[0]['messages'][:1]}")

In [None]:
# Configure LoRA
peft_config = LoraConfig(
    r=16,                        # Rank — controls adapter capacity
    lora_alpha=16,               # Scaling factor (typically equal to r)
    lora_dropout=0.05,           # Light regularization
    bias="none",                 # Don't train bias terms
    target_modules="all-linear", # Apply LoRA to ALL linear layers
    task_type="CAUSAL_LM",       # Causal language modeling
)

# Configure training
training_args = SFTConfig(
    output_dir="./sft-output",
    
    # Training duration
    num_train_epochs=3,           # Number of passes over the data
    max_steps=-1,                 # -1 = use num_train_epochs
    
    # Batch size
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4, # Effective batch = 2 * 4 = 8
    
    # Learning rate
    learning_rate=2e-4,
    warmup_ratio=0.05,
    lr_scheduler_type="cosine",
    
    # Memory optimization
    bf16=True,
    gradient_checkpointing=True,
    
    # Logging
    logging_steps=5,
    save_steps=50,
    save_total_limit=3,
)

print("Training config ready.")
print(f"  LoRA rank: {peft_config.r}")
print(f"  Learning rate: {training_args.learning_rate}")
print(f"  Epochs: {training_args.num_train_epochs}")

In [None]:
# Train (requires GPU)
# Using a small model for demonstration — replace with your target model

MODEL_NAME = "Qwen/Qwen3-0.6B"  # Small model for testing
# MODEL_NAME = "meta-llama/Llama-3.1-8B"  # Production-scale model

trainer = SFTTrainer(
    model=MODEL_NAME,
    args=training_args,
    train_dataset=dataset,
    peft_config=peft_config,
)

# Start training
trainer.train()

# Save the LoRA adapter
trainer.save_model("./video-ad-assistant-lora")
print("\nTraining complete! Adapter saved to ./video-ad-assistant-lora")

In [None]:
# Inference with the fine-tuned LoRA model

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model + LoRA adapter
base_model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
model = PeftModel.from_pretrained(base_model, "./video-ad-assistant-lora")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# Generate
messages = [
    {"role": "system", "content": "You are a video production assistant specializing in ad concepts. Always respond with structured JSON."},
    {"role": "user", "content": "Create an ad concept for a sustainable fashion brand targeting Gen Z."},
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=256)
response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)

print("Fine-tuned model response:")
print(response)

In [None]:
# Optional: Merge LoRA adapter into base model for production
# This eliminates any inference overhead from the adapter

merged_model = model.merge_and_unload()
merged_model.save_pretrained("./video-ad-assistant-merged")
tokenizer.save_pretrained("./video-ad-assistant-merged")
print("Merged model saved to ./video-ad-assistant-merged")

## Part 4: QLoRA Example (4-bit Quantization)

For fine-tuning larger models (e.g., 70B) on limited hardware.

In [None]:
import torch
from transformers import BitsAndBytesConfig
from trl import SFTConfig, SFTTrainer
from peft import LoraConfig

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",            # NormalFloat4 quantization
    bnb_4bit_compute_dtype=torch.bfloat16, # Compute in bf16
    bnb_4bit_use_double_quant=True,        # Double quantization for extra savings
)

# LoRA config (same as before)
peft_config = LoraConfig(
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    target_modules="all-linear",
    task_type="CAUSAL_LM",
)

# Training config — smaller batch size for memory efficiency
training_args = SFTConfig(
    output_dir="./qlora-output",
    num_train_epochs=3,
    per_device_train_batch_size=1,     # Smaller batch for large models
    gradient_accumulation_steps=8,      # Effective batch = 1 * 8 = 8
    learning_rate=2e-4,
    bf16=True,
    gradient_checkpointing=True,
    logging_steps=5,
)

# Train with QLoRA
trainer = SFTTrainer(
    model="meta-llama/Llama-3.1-8B",  # Or any model
    args=training_args,
    train_dataset=dataset,
    peft_config=peft_config,
    model_init_kwargs={"quantization_config": bnb_config},
)

trainer.train()
trainer.save_model("./qlora-adapter")
print("QLoRA training complete!")

## Summary: Which Approach to Use?

| Scenario | Recommended Approach |
|----------|---------------------|
| Quick experiment, small data | Vertex AI managed fine-tuning |
| Production Gemini deployment | Vertex AI managed fine-tuning |
| Full control over training | HuggingFace TRL + LoRA |
| Large model, limited GPU | HuggingFace TRL + QLoRA |
| Maximum accuracy, large data | Full fine-tuning (multi-GPU) |
| This project (Vertex AI stack) | **Vertex AI managed fine-tuning** |