# Module 09: Fine-Tuning Transformers

**Difficulty**: ⭐⭐⭐ Advanced  
**Estimated Time**: 140 minutes  
**Prerequisites**: Modules 07-08

## Learning Objectives

1. Understand fine-tuning vs pre-training
2. Use Hugging Face Trainer API
3. Implement parameter-efficient fine-tuning (LoRA, adapters)
4. Optimize hyperparameters for fine-tuning
5. Handle common fine-tuning challenges
6. Evaluate fine-tuned models

## Fine-Tuning: Transfer Learning for NLP

**Idea**: Leverage pre-trained knowledge for specific tasks.

### Why Fine-Tune?

✅ Better performance with less data  
✅ Faster training (vs from scratch)  
✅ State-of-the-art results  
✅ Accessible with limited compute  

### Fine-Tuning Workflow:

1. Choose pre-trained model
2. Prepare task-specific dataset
3. Add task head (if needed)
4. Fine-tune with smaller learning rate
5. Evaluate and iterate

## Setup

In [None]:
import numpy as np
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import Trainer, TrainingArguments
from transformers import DataCollatorWithPadding
from datasets import load_dataset
import evaluate

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('✓ Ready!')

## 1. Using Hugging Face Trainer

**Trainer API**: Simplifies training loop.

In [None]:
# Load dataset
dataset = load_dataset('imdb', split='train[:1000]')

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')
model = AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=2)

# Tokenize
def tokenize_function(examples):
    return tokenizer(examples['text'], padding='max_length', truncation=True, max_length=512)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

print('✓ Data prepared!')

In [None]:
# Training arguments
training_args = TrainingArguments(
    output_dir='./results',
    evaluation_strategy='epoch',
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets,
    tokenizer=tokenizer,
)

print('✓ Trainer ready! (Run trainer.train() to start)')

## 2. Parameter-Efficient Fine-Tuning (PEFT)

**Problem**: Fine-tuning all parameters is expensive.

**Solution**: Only update small number of parameters!

### LoRA (Low-Rank Adaptation)

Instead of updating $W$, learn $\Delta W = AB$ where $A, B$ are small matrices.

**Benefits**:
- 10,000x fewer parameters
- Same performance
- Multiple task adapters

In [None]:
from peft import LoraConfig, get_peft_model

# Configure LoRA
lora_config = LoraConfig(
    r=8,  # Rank
    lora_alpha=32,
    target_modules=['q_lin', 'v_lin'],
    lora_dropout=0.1,
    bias='none',
)

# Apply LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
print('✓ LoRA applied!')

**Exercise**: Fine-tune with LoRA

1. Fine-tune BERT with LoRA on sentiment analysis
2. Compare with full fine-tuning
3. Measure: performance, training time, memory
4. Try different LoRA ranks

In [None]:
# YOUR CODE HERE

## Summary

### Key Concepts:

1. **Transfer Learning**: Pre-train → Fine-tune
2. **Trainer API**: Simplified training
3. **LoRA**: Parameter-efficient fine-tuning
4. **Hyperparameter Tuning**: Learning rate, batch size, epochs

### Best Practices:

- Use smaller LR than pre-training
- Warm-up for stability
- Monitor for overfitting
- Save checkpoints

### What's Next?

Modules 10-14: Applications (classification, NER, QA, generation, project)