# Fine-Tuning Your First LLM: A Hands-On Tutorial with LoRA

**Welcome to the complete hands-on tutorial for fine-tuning language models!**

In this notebook, we'll walk through the entire process of fine-tuning a small open-source LLM using LoRA (Low-Rank Adaptation). This is perfect for learning, teaching, and practicing LLM customization.

## What We'll Cover:
1. Environment setup
2. Model and dataset selection
3. Data preprocessing
4. LoRA configuration
5. Training
6. Testing and inference
7. Saving models

Let's get started! 🚀

## Step 0: Environment Setup

First, let's install all the required libraries. Run this cell if you're in Google Colab or if you need to install dependencies.

In [None]:
# Install required packages (uncomment if needed)
# !pip install -q "transformers>=4.44.0" "datasets>=2.19.0" peft accelerate bitsandbytes sentencepiece

# Import all necessary libraries
import torch
import numpy as np
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    DataCollatorForLanguageModeling,
    TrainingArguments,
    Trainer,
    pipeline,
    BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

print("✅ All libraries imported successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## Step 1: Choose Base Model and Configure Settings

We'll start with a small model for faster training and experimentation.

In [None]:
# Configuration
BASE_MODEL = "distilgpt2"  # Small & fast for demos
EOS_TOKEN = "</s>"         # End-of-sequence token
MAX_LEN = 512              # Maximum sequence length

print(f"📋 Configuration:")
print(f"   Base Model: {BASE_MODEL}")
print(f"   EOS Token: {EOS_TOKEN}")
print(f"   Max Length: {MAX_LEN}")

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"   Device: {device}")

## Step 2: Load and Explore the Dataset

We'll use the Alpaca dataset, which contains instruction-response pairs perfect for teaching models to follow directions.

In [None]:
# Load the Alpaca dataset
print("📚 Loading Alpaca dataset...")
ds = load_dataset("yahma/alpaca-cleaned")

print(f"Dataset structure: {ds}")
print(f"Number of training examples: {len(ds['train'])}")

# Let's look at a few examples
print("\n🔍 Sample examples:")
for i in range(3):
    example = ds['train'][i]
    print(f"\n--- Example {i+1} ---")
    print(f"Instruction: {example['instruction'][:100]}...")
    print(f"Input: {example['input'][:50]}...")
    print(f"Output: {example['output'][:100]}...")

## Step 3: Set Up Tokenizer and Data Preprocessing

We'll create a consistent prompt template and tokenize our data.

In [None]:
# Load tokenizer
print("🔤 Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, use_fast=True)

# Handle models without explicit pad tokens
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    print("   Set pad_token = eos_token")

print(f"   Vocabulary size: {len(tokenizer)}")
print(f"   EOS token ID: {tokenizer.eos_token_id}")
print(f"   PAD token ID: {tokenizer.pad_token_id}")

In [None]:
def format_example(example):
    """Convert each example into a training-ready format with proper prompt template"""
    inst = example["instruction"].strip()
    inp = example.get("input", "").strip()
    out = example["output"].strip()
    
    # Create prompt template
    if inp:
        prompt = f"### Instruction:\n{inst}\n\n### Input:\n{inp}\n\n### Response:\n"
    else:
        prompt = f"### Instruction:\n{inst}\n\n### Response:\n"
    
    # Complete training text = prompt + response + EOS
    text = prompt + out + EOS_TOKEN
    return {"text": text}

# Apply formatting to the dataset
print("🔄 Formatting examples...")
formatted = ds["train"].map(format_example, remove_columns=ds["train"].column_names)

# Show a formatted example
print("\n📝 Example of formatted text:")
print("=" * 50)
print(formatted[0]["text"][:500] + "...")
print("=" * 50)

In [None]:
# Create train/validation split
print("📊 Creating train/validation split...")
split = formatted.train_test_split(test_size=0.01, seed=42)
train_ds, val_ds = split["train"], split["test"]

print(f"   Training examples: {len(train_ds)}")
print(f"   Validation examples: {len(val_ds)}")

def tokenize(batch):
    """Convert text to tokens with proper padding/truncation"""
    return tokenizer(
        batch["text"],
        truncation=True,
        max_length=MAX_LEN,
        padding="max_length",
    )

# Tokenize both datasets
print("🔢 Tokenizing datasets...")
tokenized_train = train_ds.map(tokenize, batched=True, remove_columns=["text"])
tokenized_val = val_ds.map(tokenize, batched=True, remove_columns=["text"])

print("   ✅ Tokenization complete!")

In [None]:
def add_labels(batch):
    """For causal LM, labels are input_ids with padding tokens masked out"""
    labels = np.array(batch["input_ids"])
    # Mask padding tokens in labels (set to -100 so they're ignored in loss calculation)
    labels[np.array(batch["attention_mask"]) == 0] = -100
    batch["labels"] = labels.tolist()
    return batch

# Add labels to both datasets
print("🏷️ Adding labels...")
tokenized_train = tokenized_train.map(add_labels, batched=True)
tokenized_val = tokenized_val.map(add_labels, batched=True)

# Verify the data structure
print(f"   Dataset features: {tokenized_train.features}")
print(f"   First example shape - input_ids: {len(tokenized_train[0]['input_ids'])}, labels: {len(tokenized_train[0]['labels'])}")

## Step 4: Load Base Model and Configure LoRA

Now we'll load our base model and set up LoRA for parameter-efficient fine-tuning.

In [None]:
# Set up quantization for memory efficiency (if available)
bnb_kwargs = {}
try:
    if torch.cuda.is_available():
        bnb_kwargs["quantization_config"] = BitsAndBytesConfig(load_in_8bit=True)
        print("✅ 8-bit quantization enabled")
except Exception as e:
    print(f"⚠️ Quantization not available: {e}")

device_map = "auto" if torch.cuda.is_available() else None

# Load the base model
print(f"🤖 Loading base model: {BASE_MODEL}...")
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    trust_remote_code=False,
    device_map=device_map,
    **bnb_kwargs
)

# Prepare for low-bit fine-tuning if quantized
if bnb_kwargs:
    base_model = prepare_model_for_kbit_training(base_model)
    print("   ✅ Model prepared for k-bit training")

print(f"   Model loaded on: {base_model.device}")
print(f"   Model parameters: {sum(p.numel() for p in base_model.parameters()):,}")

In [None]:
# Configure LoRA
print("⚙️ Configuring LoRA...")

lora_config = LoraConfig(
    r=16,                    # Rank of adaptation matrices (lower = fewer params)
    lora_alpha=32,           # Scaling factor (usually 2x the rank)
    target_modules=["c_attn"],  # Which layers to adapt (GPT-2 specific)
    lora_dropout=0.05,       # Dropout for regularization
    bias="none",
    task_type="CAUSAL_LM",
)

# Wrap base model with LoRA
model = get_peft_model(base_model, lora_config)

# Print trainable parameters
model.print_trainable_parameters()

print("\n📊 LoRA Configuration:")
print(f"   Rank (r): {lora_config.r}")
print(f"   Alpha: {lora_config.lora_alpha}")
print(f"   Target modules: {lora_config.target_modules}")
print(f"   Dropout: {lora_config.lora_dropout}")

## Step 5: Set Up Training Configuration

Let's configure our training parameters for optimal learning.

In [None]:
# Data collator for language modeling
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False  # Causal LM, not masked LM
)

# Training arguments
training_args = TrainingArguments(
    output_dir="distilgpt2-alpaca-lora",
    num_train_epochs=1,                  # Start with 1 epoch
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=4,       # Effective batch size = 16
    warmup_ratio=0.03,
    learning_rate=2e-4,                  # LoRA can handle higher learning rates
    weight_decay=0.0,
    logging_steps=50,
    evaluation_strategy="steps",
    eval_steps=200,
    save_steps=200,
    save_total_limit=2,
    bf16=torch.cuda.is_available() and torch.cuda.get_device_capability()[0] >= 8,  # Ampere+ GPUs
    fp16=torch.cuda.is_available() and torch.cuda.get_device_capability()[0] < 8,   # Older GPUs
    report_to="none",                    # Disable wandb/tensorboard
)

print("🎯 Training Configuration:")
print(f"   Epochs: {training_args.num_train_epochs}")
print(f"   Batch size per device: {training_args.per_device_train_batch_size}")
print(f"   Gradient accumulation steps: {training_args.gradient_accumulation_steps}")
print(f"   Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
print(f"   Learning rate: {training_args.learning_rate}")
print(f"   Mixed precision: BF16={training_args.bf16}, FP16={training_args.fp16}")

In [None]:
# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_val,
    data_collator=data_collator,
)

print("✅ Trainer initialized successfully!")
print(f"   Training samples: {len(tokenized_train)}")
print(f"   Validation samples: {len(tokenized_val)}")

## Step 6: Train the Model!

Now for the exciting part - let's train our model!

In [None]:
# Start training
print("🚀 Starting training...")
print("This may take a while depending on your hardware.")
print("Watch the loss values - they should decrease over time.\n")

# Train the model
trainer.train()

print("\n🎉 Training completed!")

## Step 7: Test Your Fine-Tuned Model

Let's see how well your model follows instructions!

In [None]:
# Create text generation pipeline
print("🔮 Setting up inference pipeline...")

gen = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=0 if torch.cuda.is_available() else -1
)

def build_prompt(instruction, inp=""):
    """Create properly formatted prompts for inference"""
    if inp:
        return f"### Instruction:\n{instruction}\n\n### Input:\n{inp}\n\n### Response:\n"
    else:
        return f"### Instruction:\n{instruction}\n\n### Response:\n"

print("✅ Pipeline ready for testing!")

In [None]:
# Test with sample instructions
test_instructions = [
    "Explain the Moon's phases in one friendly paragraph for a 10-year-old.",
    "Write a short poem about programming.",
    "List three benefits of exercise.",
    "Explain what machine learning is in simple terms."
]

print("🧪 Testing the fine-tuned model:\n")

for i, instruction in enumerate(test_instructions, 1):
    print(f"{'='*60}")
    print(f"Test {i}: {instruction}")
    print(f"{'='*60}")
    
    prompt = build_prompt(instruction)
    
    outputs = gen(
        prompt,
        max_new_tokens=128,
        do_sample=True,
        top_p=0.9,
        temperature=0.7,
        pad_token_id=tokenizer.pad_token_id
    )
    
    # Extract just the response (everything after the prompt)
    generated_text = outputs[0]["generated_text"]
    response = generated_text[len(prompt):].strip()
    
    print(f"Response: {response}")
    print("\n")

## Step 8: Interactive Testing

Try your own instructions!

In [None]:
def test_model_interactive(instruction, input_text="", max_tokens=128, temperature=0.7):
    """Interactive function to test the model with custom instructions"""
    prompt = build_prompt(instruction, input_text)
    
    outputs = gen(
        prompt,
        max_new_tokens=max_tokens,
        do_sample=True,
        top_p=0.9,
        temperature=temperature,
        pad_token_id=tokenizer.pad_token_id
    )
    
    generated_text = outputs[0]["generated_text"]
    response = generated_text[len(prompt):].strip()
    
    print(f"🤖 Model Response:")
    print(f"{response}")
    
    return response

# Example usage - modify these as you like!
print("🎮 Interactive Testing - Try your own instructions!\n")

# Uncomment and modify these lines to test with your own instructions:
# test_model_interactive("Write a haiku about artificial intelligence")
# test_model_interactive("Explain quantum computing to a child")
# test_model_interactive("Give me 3 cooking tips for beginners")

## Step 9: Save Your Fine-Tuned Model

Let's save our work so we can use it later!

In [None]:
# Save the LoRA adapter (lightweight option)
print("💾 Saving LoRA adapter...")
adapter_path = "distilgpt2-alpaca-lora/adapter"
model.save_pretrained(adapter_path)
print(f"   ✅ Adapter saved to: {adapter_path}")

# Save tokenizer
tokenizer.save_pretrained(adapter_path)
print(f"   ✅ Tokenizer saved to: {adapter_path}")

# Check saved files
import os
saved_files = os.listdir(adapter_path)
print(f"\n📁 Saved files: {saved_files}")

In [None]:
# Optional: Merge and save full model (larger but standalone)
print("🔄 Creating merged model (optional)...")

try:
    # Merge LoRA weights with base model
    merged_model = model.merge_and_unload()
    
    # Save merged model
    merged_path = "distilgpt2-alpaca-merged"
    merged_model.save_pretrained(merged_path)
    tokenizer.save_pretrained(merged_path)
    
    print(f"   ✅ Merged model saved to: {merged_path}")
    
    # Check file sizes
    def get_folder_size(path):
        total = 0
        for dirpath, dirnames, filenames in os.walk(path):
            for filename in filenames:
                total += os.path.getsize(os.path.join(dirpath, filename))
        return total / (1024 * 1024)  # MB
    
    adapter_size = get_folder_size(adapter_path)
    merged_size = get_folder_size(merged_path)
    
    print(f"\n📊 File sizes:")
    print(f"   Adapter only: {adapter_size:.1f} MB")
    print(f"   Merged model: {merged_size:.1f} MB")
    
except Exception as e:
    print(f"   ⚠️ Could not create merged model: {e}")

## Step 10: Loading Your Saved Model

Here's how to load your fine-tuned model later:

In [None]:
# Example: How to load your saved model later
print("📖 How to load your saved model:")

loading_code = '''
# To load the LoRA adapter:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("distilgpt2")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "distilgpt2-alpaca-lora/adapter")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("distilgpt2-alpaca-lora/adapter")

# Or, if you saved the merged model:
model = AutoModelForCausalLM.from_pretrained("distilgpt2-alpaca-merged")
tokenizer = AutoTokenizer.from_pretrained("distilgpt2-alpaca-merged")
'''

print(loading_code)

## 🎉 Congratulations!

You've successfully fine-tuned your first language model! Here's what you accomplished:

✅ **Loaded and preprocessed** a real instruction dataset  
✅ **Configured LoRA** for efficient fine-tuning  
✅ **Trained a language model** to follow instructions  
✅ **Generated text** with your custom model  
✅ **Saved your work** for future use  

## Next Steps

Now that you understand the basics, here are some ideas to explore:

### 🚀 Scale Up
- Try larger models like `TinyLlama/TinyLlama-1.1B-Chat-v1.0`
- Increase sequence length to 1024 or 2048 tokens
- Train for more epochs (2-3)

### 🎛️ Experiment with LoRA Settings
- Higher rank for complex tasks: `r = 32` or `r = 64`
- Different target modules for other architectures
- Adjust learning rates between `1e-4` and `5e-4`

### 📊 Custom Datasets
- Fine-tune on your own instruction data
- Try domain-specific datasets (code, medical, legal)
- Create conversational datasets for chatbots

### 🛠️ Advanced Features
- Multi-GPU training with `DataParallel`
- Evaluation metrics like BLEU or ROUGE
- Model deployment with FastAPI or Gradio
- Push models to Hugging Face Hub

## Resources for Continued Learning

- [Hugging Face Course](https://huggingface.co/course/)
- [LoRA Paper](https://arxiv.org/abs/2106.09685)
- [PEFT Documentation](https://huggingface.co/docs/peft/)
- [Community Discussions](https://discuss.huggingface.co/)

**Happy fine-tuning! 🤖✨**

---

## Troubleshooting Section

If you encounter issues, here are common solutions:

### CUDA Out of Memory
```python
# Reduce batch size
per_device_train_batch_size = 2

# Or reduce sequence length
MAX_LEN = 256

# Or use gradient checkpointing
gradient_checkpointing = True
```

### Poor Generation Quality
```python
# Adjust generation parameters
temperature = 0.8  # Higher for more creativity
top_p = 0.95      # Higher for more diversity
max_new_tokens = 256  # More tokens for complete responses
```

### Model Not Following Instructions
- Check that your prompt format matches training exactly
- Train for more epochs
- Verify your dataset quality
- Try a larger LoRA rank (`r = 32`)

### Slow Training
- Enable mixed precision (`bf16=True` or `fp16=True`)
- Use gradient accumulation instead of larger batch sizes
- Consider using a smaller model for experimentation