# Colab 1: Full Fine-tuning with TinyLlama
This notebook demonstrates full fine-tuning (not LoRA) using Unsloth.

## Key Points:
- Full fine-tuning updates ALL model parameters
- Using TinyLlama (1.1B parameters) as our base model
- Dataset: Alpaca dataset for instruction tuning
- We'll compare outputs BEFORE and AFTER training to see the difference

In [None]:
# Install Unsloth (simplified version that works)
!pip install -q "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install -q --no-deps xformers trl peft accelerate bitsandbytes

In [None]:
from unsloth import FastLanguageModel
import torch
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

In [None]:
# Model configuration
max_seq_length = 2048
dtype = None  # Auto-detect
load_in_4bit = False  # Disable 4-bit for full fine-tuning

# Load TinyLlama model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/tinyllama",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

# IMPORTANT: For full fine-tuning, we DON'T use get_peft_model
# Instead, we prepare the model for training directly
model = FastLanguageModel.for_training(model)

# Count trainable parameters
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())
print(f"Full Fine-tuning Mode:")
print(f"Trainable params: {trainable_params:,} ({100 * trainable_params / total_params:.2f}%)")
print(f"Total params: {total_params:,}")
print(f"\nALL parameters will be updated during training!")

## Test Model BEFORE Training
Let's see how the model performs before any fine-tuning:

In [None]:
# Define the Alpaca prompt format
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

# Test function
def test_model(model, tokenizer, instruction, input_text=""):
    FastLanguageModel.for_inference(model)
    prompt = alpaca_prompt.format(instruction, input_text, "")
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    outputs = model.generate(
        **inputs,
        max_new_tokens=128,
        temperature=0.7,
        pad_token_id=tokenizer.eos_token_id
    )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Extract only the response part
    response_start = response.find("### Response:") + len("### Response:")
    return response[response_start:].strip()

print("=" * 50)
print("BEFORE TRAINING:")
print("=" * 50)
print("\nQ: What is 2+2?")
print("A:", test_model(model, tokenizer, "What is 2+2?"))
print("\nQ: Name three colors")
print("A:", test_model(model, tokenizer, "Name three colors"))
print("\nQ: Write a haiku about coding")
print("A:", test_model(model, tokenizer, "Write a haiku about coding"))

In [None]:
# Prepare the Alpaca dataset
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs = examples["input"]
    outputs = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        text = alpaca_prompt.format(instruction, input, output)
        texts.append(text)
    return {"text": texts}

# Load dataset (using a smaller subset for demo)
dataset = load_dataset("yahma/alpaca-cleaned", split="train[:5000]")  # Using 5000 samples
dataset = dataset.map(formatting_prompts_func, batched=True)

print(f"Training on {len(dataset)} samples")

In [None]:
# Training configuration for FULL fine-tuning
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,
    args=TrainingArguments(
        per_device_train_batch_size=1,  # Small batch size for full fine-tuning
        gradient_accumulation_steps=8,
        warmup_steps=10,
        max_steps=200,  # More steps for better training
        learning_rate=1e-5,  # Lower learning rate for full fine-tuning
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="cosine",
        seed=3407,
        output_dir="outputs_full_finetuning",
        save_steps=50,
        report_to="none",  # Disable wandb
    ),
)

# Disable caching during training
model.config.use_cache = False

In [None]:
# Start training
print("Starting FULL fine-tuning training...")
print("This updates ALL model parameters.")
print("Watch the loss decrease over time:\n")

trainer_stats = trainer.train()

print(f"\nTraining completed!")
print(f"Final loss: {trainer_stats.training_loss:.4f}")

## Test Model AFTER Training
Now let's see how the model performs after full fine-tuning:

In [None]:
# Re-enable cache for inference
model.config.use_cache = True

print("=" * 50)
print("AFTER TRAINING:")
print("=" * 50)
print("\nQ: What is 2+2?")
print("A:", test_model(model, tokenizer, "What is 2+2?"))
print("\nQ: Name three colors")
print("A:", test_model(model, tokenizer, "Name three colors"))
print("\nQ: Write a haiku about coding")
print("A:", test_model(model, tokenizer, "Write a haiku about coding"))
print("\nQ: Explain photosynthesis in simple terms")
print("A:", test_model(model, tokenizer, "Explain photosynthesis in simple terms"))
print("\nQ: What is the capital of France?")
print("A:", test_model(model, tokenizer, "What is the capital of France?"))

In [None]:
# Save the model
model.save_pretrained("tinyllama_full_finetuned")
tokenizer.save_pretrained("tinyllama_full_finetuned")
print("Model saved successfully!")

# Check model size
import os
model_size = sum(os.path.getsize(f"tinyllama_full_finetuned/{f}") 
                 for f in os.listdir("tinyllama_full_finetuned") 
                 if os.path.isfile(f"tinyllama_full_finetuned/{f}"))
print(f"Full model size: {model_size / 1024 / 1024:.2f} MB")

## Key Observations about Full Fine-tuning:

### What Happened:
1. **ALL parameters updated**: 100% of the model's 1.1B parameters were trained
2. **Memory intensive**: Required more GPU memory than LoRA would
3. **Better task adaptation**: Model learned the instruction format deeply

### Common Issues & Solutions:
- **Repetitive outputs**: Increase training steps and use diverse data
- **Wrong answers**: Model needs more training or better data quality
- **Memory errors**: Reduce batch size or use LoRA instead

### Pros of Full Fine-tuning:
- Maximum performance potential
- Complete adaptation to new task
- No adapter overhead

### Cons of Full Fine-tuning:
- High memory requirements
- Slower training
- Risk of catastrophic forgetting
- Large model files to store

### When to Use:
- When you need maximum performance
- When you have enough GPU memory
- When fine-tuning for a very different domain
- When you don't need to preserve original capabilities

### Video Recording Points:
1. Show the parameter count (100% trainable)
2. Demonstrate before/after outputs
3. Explain memory usage differences
4. Show training loss decreasing
5. Compare this with LoRA in Colab 2