# ðŸ¦™ Fine-tune LLaMA 3.1 with Unsloth

This notebook demonstrates how to fine-tune LLaMA 3.1 8B for structured JSON output generation using QLoRA and Unsloth.

## Prerequisites
- Google Colab with GPU runtime (T4 or better)
- HuggingFace account with access to LLaMA 3.1

In [None]:
# Install dependencies
!pip install -q unsloth
!pip install -q transformers datasets peft accelerate bitsandbytes

In [None]:
# Imports
from unsloth import FastLanguageModel
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments
import torch

## 1. Load Base Model with 4-bit Quantization

In [None]:
# Configuration
max_seq_length = 2048
dtype = None  # Auto-detect
load_in_4bit = True

# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Meta-Llama-3.1-8B-Instruct",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

print(f"Model loaded successfully!")

## 2. Add LoRA Adapters

In [None]:
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # LoRA rank
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=42,
)

print(f"LoRA adapters added!")
model.print_trainable_parameters()

## 3. Load and Prepare Dataset

In [None]:
# Define prompt template
PROMPT_TEMPLATE = """### Instruction:
You are an AI that converts natural language instructions into structured JSON action plans.
Given the following instruction, output a valid JSON with these fields:
- object: the object to manipulate
- initial_position: where the object currently is
- action: what to do (move, rotate, scale)
- target_position: the destination or target state

### Input:
{instruction}

### Response:
{output}"""

def format_prompts(examples):
    texts = []
    for instruction, output in zip(examples['instruction'], examples['output']):
        text = PROMPT_TEMPLATE.format(
            instruction=instruction,
            output=output
        )
        texts.append(text)
    return {"text": texts}

In [None]:
# Load dataset (from local file or HuggingFace)
# Option 1: Load from local JSONL
# dataset = load_dataset('json', data_files='../data/train.jsonl')

# Option 2: Create sample dataset for demo
sample_data = {
    "instruction": [
        "Move the red box to the blue platform",
        "Rotate the green sphere 90 degrees",
        "Scale the yellow cube to twice its size",
    ],
    "output": [
        '{"object": "red box", "initial_position": "floor", "action": "move", "target_position": "top of blue platform"}',
        '{"object": "green sphere", "initial_position": "center", "action": "rotate", "target_position": "90 degrees clockwise"}',
        '{"object": "yellow cube", "initial_position": "origin", "action": "scale", "target_position": "2x original size"}',
    ]
}

from datasets import Dataset
dataset = Dataset.from_dict(sample_data)
dataset = dataset.map(format_prompts, batched=True)

print(f"Dataset size: {len(dataset)}")
print(f"Sample:\n{dataset[0]['text'][:500]}...")

## 4. Training Configuration

In [None]:
# Training arguments
training_args = TrainingArguments(
    output_dir="./outputs",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=10,
    max_steps=100,  # Increase for full training
    learning_rate=2e-4,
    fp16=not torch.cuda.is_bf16_supported(),
    bf16=torch.cuda.is_bf16_supported(),
    logging_steps=10,
    save_steps=50,
    optim="adamw_8bit",
    weight_decay=0.01,
    lr_scheduler_type="linear",
    seed=42,
)

In [None]:
# Initialize trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    args=training_args,
)

## 5. Train the Model

In [None]:
# Start training
trainer_stats = trainer.train()

print(f"Training complete!")
print(f"Training time: {trainer_stats.metrics['train_runtime']:.2f} seconds")

## 6. Save the Model

In [None]:
# Save LoRA adapters
model.save_pretrained("text-to-action-lora")
tokenizer.save_pretrained("text-to-action-lora")

print("Model saved to 'text-to-action-lora'")

In [None]:
# Optional: Save merged model for easier inference
# model.save_pretrained_merged("text-to-action-merged", tokenizer, save_method="merged_16bit")

## 7. Quick Inference Test

In [None]:
# Test inference
FastLanguageModel.for_inference(model)

test_instruction = "Move the purple cylinder to the corner"

test_prompt = f"""### Instruction:
You are an AI that converts natural language instructions into structured JSON action plans.
Given the following instruction, output a valid JSON with these fields:
- object: the object to manipulate
- initial_position: where the object currently is
- action: what to do (move, rotate, scale)
- target_position: the destination or target state

### Input:
{test_instruction}

### Response:
"""

inputs = tokenizer(test_prompt, return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    temperature=0.1,
    do_sample=True,
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.split("### Response:")[-1].strip())

## Next Steps

1. **Expand dataset** - Add more diverse instruction-action pairs
2. **Hyperparameter tuning** - Experiment with LoRA rank, learning rate
3. **Evaluation** - Run on held-out test set
4. **Deploy** - Export for Ollama or serve with FastAPI