# üöÄ Auto-Tuner: Fine-Tune Llama 3 with Unsloth

This notebook fine-tunes Llama 3 8B on your generated dataset using Unsloth (2x faster, 60% less memory).

**Requirements:**
- Free T4 GPU (Runtime ‚Üí Change runtime type ‚Üí T4 GPU)
- Dataset uploaded to Google Drive

**Time:** ~20-30 minutes for 100 examples

## üìã Configuration

**IMPORTANT:** Update these values before running!

In [1]:
# ============================================================================
# CONFIGURATION - UPDATE THESE VALUES
# ============================================================================

# Dataset filename (must exist in Drive: Finetune_Jobs/datasets/)
DATASET_FILENAME = "dataset-20251125_095111.jsonl"  # ‚Üê CHANGE THIS

# Model name (will be saved to Drive: Finetune_Jobs/models/)
MODEL_NAME = "Financial gpt"  # ‚Üê CHANGE THIS

# Training settings
MAX_SEQ_LENGTH = 2048        # Context window size
BATCH_SIZE = 2               # Larger = faster but more memory
GRADIENT_ACCUMULATION = 4    # Effective batch size = 2 * 4 = 8
LEARNING_RATE = 2e-4         # Learning rate
NUM_EPOCHS = 3               # Training epochs
WARMUP_STEPS = 5             # Warmup steps

print("‚úÖ Configuration loaded")
print(f"Dataset: {DATASET_FILENAME}")
print(f"Model: {MODEL_NAME}")

‚úÖ Configuration loaded
Dataset: dataset-20251125_095111.jsonl
Model: Financial gpt


## üîó Step 1: Mount Google Drive

In [2]:
from google.colab import drive
import os

# Mount Drive
drive.mount('/content/drive')

# Set paths
DRIVE_ROOT = "/content/drive/MyDrive/Finetune_Jobs"
DATASET_PATH = f"{DRIVE_ROOT}/datasets/{DATASET_FILENAME}"
MODEL_OUTPUT_DIR = f"{DRIVE_ROOT}/models/{MODEL_NAME}"

# Create directories if they don't exist
os.makedirs(f"{DRIVE_ROOT}/datasets", exist_ok=True)
os.makedirs(f"{DRIVE_ROOT}/models", exist_ok=True)

# Verify dataset exists
if not os.path.exists(DATASET_PATH):
    raise FileNotFoundError(
        f"‚ùå Dataset not found: {DATASET_PATH}\n\n"
        f"Please upload {DATASET_FILENAME} to Drive: Finetune_Jobs/datasets/"
    )

print(f"‚úÖ Drive mounted")
print(f"‚úÖ Dataset found: {DATASET_PATH}")
print(f"‚úÖ Model will be saved to: {MODEL_OUTPUT_DIR}")

Mounted at /content/drive
‚úÖ Drive mounted
‚úÖ Dataset found: /content/drive/MyDrive/Finetune_Jobs/datasets/dataset-20251125_095111.jsonl
‚úÖ Model will be saved to: /content/drive/MyDrive/Finetune_Jobs/models/Financial gpt


## üì¶ Step 2: Install Unsloth

In [3]:
%%capture
import IPython

!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes

print("‚úÖ Unsloth installed")

## ü§ñ Step 3: Load Base Model (Llama 3 8B)

In [8]:
from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/llama-3-8b-bnb-4bit",
    max_seq_length=MAX_SEQ_LENGTH,
    dtype=None,
    load_in_4bit=True,
)

# Llama 3 Chat Template (from Hugging Face documentation for Llama-3)
# Explicitly set the chat template for the tokenizer
tokenizer.chat_template = (
    "{% if messages[0]['role'] == 'system' %}"
    "{% set loop_messages = messages[1:] %}"
    "{% set system_message = messages[0]['content'] %}"
    "{% else %}"
    "{% set loop_messages = messages %}"
    "{% set system_message = false %}"
    "{% endif %}"
    "{% for message in loop_messages %}"
    "{% if (message['role'] == 'user') != (loop.index % 2 == 1) %}"
    "{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}"
    "{% endif %}"
    "{% if loop.index == 1 and system_message != false %}"
    "{% set content = system_message + '\n' + message['content'] %}"
    "{% else %}"
    "{% set content = message['content'] %}"
    "{% endif %}"
    "{% if message['role'] == 'user' %}"
    "{{ '<|start_header_id|>user<|end_header_id|>\n' + content + '<|eot_id|>' }}"
    "{% elif message['role'] == 'assistant' %}"
    "{{ '<|start_header_id|>assistant<|end_header_id|>\n' + content + '<|eot_id|>' }}"
    "{% endif %}"
    "{% endfor %}"
    "{% if add_generation_prompt %}"
    "{{ '<|start_header_id|>assistant<|end_header_id|>\n' }}"
    "{% endif %}"
)

print("‚úÖ Base model loaded (Llama 3 8B 4-bit)")
print(f"Model size: ~4.5GB")
print(f"Max sequence length: {MAX_SEQ_LENGTH}")

==((====))==  Unsloth 2025.11.4: Fast Llama patching. Transformers: 4.57.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.5.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
‚úÖ Base model loaded (Llama 3 8B 4-bit)
Model size: ~4.5GB
Max sequence length: 2048


## üéõÔ∏è Step 4: Add LoRA Adapters

In [5]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
)

print("‚úÖ LoRA adapters added")
trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Trainable parameters: {trainable:,}")

Unsloth 2025.11.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


‚úÖ LoRA adapters added
Trainable parameters: 41,943,040


## üìä Step 5: Load and Preview Dataset

In [6]:
from datasets import load_dataset
import json

dataset = load_dataset("json", data_files=DATASET_PATH, split="train")

print(f"‚úÖ Dataset loaded: {len(dataset)} examples")
print(f"\nFirst example:")
print(json.dumps(dataset[0], indent=2))

print(f"\nüìã Preview of conversations:")
for i in range(min(3, len(dataset))):
    messages = dataset[i]['messages']
    user_msg = next((m['content'] for m in messages if m['role'] == 'user'), '')
    assistant_msg = next((m['content'] for m in messages if m['role'] == 'assistant'), '')
    print(f"\nExample {i+1}:")
    print(f"  User: {user_msg[:80]}...")
    print(f"  Assistant: {assistant_msg[:80]}...")

Generating train split: 0 examples [00:00, ? examples/s]

‚úÖ Dataset loaded: 100 examples

First example:
{
  "messages": [
    {
      "role": "user",
      "content": "Calculate DCF valuation for a company with $15M revenue, 25% annual growth for 3 years, 12% discount rate, 2% terminal growth"
    },
    {
      "role": "assistant",
      "content": "DCF Valuation Analysis:\nProjected Cash Flows:\nYear 1: $18.75M (25% growth)\nYear 2: $23.44M (25% growth)\nYear 3: $29.3M (25% growth)\nTerminal Value: $29.3M \u00d7 1.02 / (0.12 - 0.02) = $317.39M\nPresent Values:\nPV Year 1-3: $43.95M\nPV Terminal: $173.11M\nEnterprise Value: $217.06M\nRecommendation: Fair value is $217.06M. BUY if trading below $195.65M (10% margin of safety)."
    }
  ]
}

üìã Preview of conversations:

Example 1:
  User: Calculate DCF valuation for a company with $15M revenue, 25% annual growth for 3...
  Assistant: DCF Valuation Analysis:
Projected Cash Flows:
Year 1: $18.75M (25% growth)
Year ...

Example 2:
  User: Analyze portfolio with 70% stocks ($70k), 30% bonds 

## üîÑ Step 6: Format Dataset for Training

In [9]:
def format_chat(example):
    """Format messages using chat template"""
    messages = example['messages']
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=False
    )
    return {"text": text}

dataset = dataset.map(format_chat, batched=False)

print("‚úÖ Dataset formatted for training")
print(f"\nFormatted example (first 500 chars):")
print(dataset[0]['text'][:500] + "...")

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

‚úÖ Dataset formatted for training

Formatted example (first 500 chars):
<|start_header_id|>user<|end_header_id|>
Calculate DCF valuation for a company with $15M revenue, 25% annual growth for 3 years, 12% discount rate, 2% terminal growth<|eot_id|><|start_header_id|>assistant<|end_header_id|>
DCF Valuation Analysis:
Projected Cash Flows:
Year 1: $18.75M (25% growth)
Year 2: $23.44M (25% growth)
Year 3: $29.3M (25% growth)
Terminal Value: $29.3M √ó 1.02 / (0.12 - 0.02) = $317.39M
Present Values:
PV Year 1-3: $43.95M
PV Terminal: $173.11M
Enterprise Value: $217.06M
Rec...


## üèãÔ∏è Step 7: Configure Trainer

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=MAX_SEQ_LENGTH,
    dataset_num_proc=2,
    packing=False,
    args=TrainingArguments(
        per_device_train_batch_size=BATCH_SIZE,
        gradient_accumulation_steps=GRADIENT_ACCUMULATION,
        warmup_steps=WARMUP_STEPS,
        num_train_epochs=NUM_EPOCHS,
        learning_rate=LEARNING_RATE,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        report_to="none",
    ),
)

print("‚úÖ Trainer configured")
print(f"Effective batch size: {BATCH_SIZE * GRADIENT_ACCUMULATION}")

## üöÄ Step 8: Start Training!

**This will take 20-30 minutes. Don't close the browser!**

In [None]:
import time

print("üöÄ Starting training...\n")
start_time = time.time()

trainer_stats = trainer.train()

elapsed = time.time() - start_time
print(f"\n‚úÖ Training complete!")
print(f"Time: {elapsed/60:.1f} minutes")
print(f"Final loss: {trainer_stats.training_loss:.4f}")

## üíæ Step 9: Save Model to Google Drive

In [None]:
model.save_pretrained(MODEL_OUTPUT_DIR)
tokenizer.save_pretrained(MODEL_OUTPUT_DIR)

# Save metadata
import json
import time

metadata = {
    "model_name": MODEL_NAME,
    "dataset": DATASET_FILENAME,
    "base_model": "unsloth/llama-3-8b-bnb-4bit",
    "training_loss": float(trainer_stats.training_loss),
    "num_examples": len(dataset),
    "num_epochs": NUM_EPOCHS,
    "training_time_minutes": elapsed / 60,
    "timestamp": time.strftime("%Y-%m-%d %H:%M:%S")
}

with open(f"{MODEL_OUTPUT_DIR}/metadata.json", "w") as f:
    json.dump(metadata, f, indent=2)

print(f"‚úÖ Model saved to: {MODEL_OUTPUT_DIR}")
print(f"\nMetadata:")
print(json.dumps(metadata, indent=2))

## üß™ Step 10: Test the Model

In [None]:
FastLanguageModel.for_inference(model)

test_messages = [
    {"role": "user", "content": "Hello! Can you help me?"}
]

inputs = tokenizer.apply_chat_template(
    test_messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

print("ü§ñ Testing model...\n")
print("User: Hello! Can you help me?\n")

outputs = model.generate(
    input_ids=inputs,
    max_new_tokens=128,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Assistant: {response}")
print("\nüéâ All done! Your model is ready to use.")