# Week 10 Lab: Fine-tuning & Prompt Engineering

## Learning Objectives
- Understand different fine-tuning strategies
- Implement LoRA (Low-Rank Adaptation)
- Practice prompt engineering techniques
- Compare full fine-tuning vs parameter-efficient methods

## Prerequisites
```bash
pip install transformers torch numpy matplotlib peft
```

In [None]:
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Setup
print('Week 10: Fine-tuning & Prompt Engineering')
print('=' * 50)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

## Part 1: Understanding Fine-tuning

Fine-tuning adapts a pre-trained model to a specific task. Let's visualize the parameter space.

In [None]:
# Load a pre-trained model and count parameters
model_name = 'distilbert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

# Count parameters by layer type
def count_parameters(model):
    total = sum(p.numel() for p in model.parameters())
    trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
    return total, trainable

total, trainable = count_parameters(model)
print(f"Model: {model_name}")
print(f"Total parameters: {total:,}")
print(f"Trainable parameters: {trainable:,}")
print(f"Model size: {total * 4 / 1e6:.1f} MB (float32)")

In [None]:
# Visualize parameter distribution
layer_params = {}
for name, param in model.named_parameters():
    layer_type = name.split('.')[0]
    if layer_type not in layer_params:
        layer_params[layer_type] = 0
    layer_params[layer_type] += param.numel()

fig, ax = plt.subplots(figsize=(10, 5))
bars = ax.barh(list(layer_params.keys()), 
               [v/1e6 for v in layer_params.values()],
               color='#3333B2')
ax.set_xlabel('Parameters (millions)', fontsize=12)
ax.set_title('Parameter Distribution by Layer Type', fontsize=14, fontweight='bold')

for bar, val in zip(bars, layer_params.values()):
    ax.text(bar.get_width() + 0.5, bar.get_y() + bar.get_height()/2,
            f'{val/1e6:.1f}M', va='center', fontsize=10)

plt.tight_layout()
plt.show()

## Part 2: Implementing LoRA from Scratch

LoRA (Low-Rank Adaptation) freezes the pre-trained weights and adds trainable low-rank matrices.

In [None]:
class LoRALayer(nn.Module):
    """LoRA layer that wraps a linear layer"""
    
    def __init__(self, original_layer, rank=4, alpha=1.0):
        super().__init__()
        self.original_layer = original_layer
        self.rank = rank
        self.alpha = alpha
        
        # Freeze original weights
        for param in self.original_layer.parameters():
            param.requires_grad = False
        
        # Get dimensions
        in_features = original_layer.in_features
        out_features = original_layer.out_features
        
        # Initialize low-rank matrices
        # A: down-projection (in_features -> rank)
        # B: up-projection (rank -> out_features)
        self.lora_A = nn.Parameter(torch.zeros(rank, in_features))
        self.lora_B = nn.Parameter(torch.zeros(out_features, rank))
        
        # Initialize A with random values, B with zeros
        nn.init.kaiming_uniform_(self.lora_A, a=np.sqrt(5))
        nn.init.zeros_(self.lora_B)
        
        self.scaling = alpha / rank
    
    def forward(self, x):
        # Original forward pass
        original_output = self.original_layer(x)
        
        # LoRA forward pass: x @ A^T @ B^T * scaling
        lora_output = (x @ self.lora_A.T @ self.lora_B.T) * self.scaling
        
        return original_output + lora_output
    
    def get_lora_params(self):
        return self.lora_A.numel() + self.lora_B.numel()

In [None]:
# Demonstrate LoRA on a simple linear layer
original_linear = nn.Linear(768, 768)
lora_linear = LoRALayer(original_linear, rank=8, alpha=16)

# Count parameters
original_params = sum(p.numel() for p in original_linear.parameters())
lora_params = lora_linear.get_lora_params()

print("LoRA Parameter Comparison:")
print(f"  Original layer parameters: {original_params:,}")
print(f"  LoRA trainable parameters: {lora_params:,}")
print(f"  Parameter reduction: {100 * (1 - lora_params/original_params):.1f}%")

# Test forward pass
x = torch.randn(2, 10, 768)
with torch.no_grad():
    output = lora_linear(x)
print(f"\nInput shape: {x.shape}")
print(f"Output shape: {output.shape}")

In [None]:
# Visualize LoRA parameter efficiency
ranks = [1, 2, 4, 8, 16, 32, 64]
hidden_size = 768

original_params = hidden_size * hidden_size
lora_params = [2 * r * hidden_size for r in ranks]
reduction = [100 * (1 - lp/original_params) for lp in lora_params]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# Parameters vs rank
ax1.plot(ranks, [original_params/1000]*len(ranks), 'r--', label='Full fine-tuning', linewidth=2)
ax1.plot(ranks, [lp/1000 for lp in lora_params], 'b-o', label='LoRA', linewidth=2)
ax1.set_xlabel('LoRA Rank', fontsize=12)
ax1.set_ylabel('Parameters (thousands)', fontsize=12)
ax1.set_title('Parameters vs LoRA Rank', fontsize=14, fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Reduction percentage
ax2.bar(range(len(ranks)), reduction, color='#2CA02C')
ax2.set_xticks(range(len(ranks)))
ax2.set_xticklabels(ranks)
ax2.set_xlabel('LoRA Rank', fontsize=12)
ax2.set_ylabel('Parameter Reduction (%)', fontsize=12)
ax2.set_title('Parameter Reduction by Rank', fontsize=14, fontweight='bold')
ax2.set_ylim(90, 100)

plt.tight_layout()
plt.show()

## Part 3: Prompt Engineering

Prompt engineering is the art of designing effective prompts to guide model behavior.

In [None]:
# Prompt templates for different tasks
prompt_templates = {
    'zero_shot': """Classify the sentiment of the following text as positive or negative.

Text: {text}
Sentiment:""",
    
    'one_shot': """Classify the sentiment of texts as positive or negative.

Text: "I love this product!"
Sentiment: positive

Text: {text}
Sentiment:""",
    
    'few_shot': """Classify the sentiment of texts as positive or negative.

Text: "I love this product!"
Sentiment: positive

Text: "This is terrible, waste of money."
Sentiment: negative

Text: "Best purchase I've ever made!"
Sentiment: positive

Text: {text}
Sentiment:""",
    
    'chain_of_thought': """Analyze the sentiment of the following text step by step.

Text: {text}

Step 1: Identify key emotional words.
Step 2: Determine the overall tone.
Step 3: Classify as positive or negative.

Analysis:"""
}

test_text = "The movie was okay, but the ending was disappointing."

print("Prompt Engineering Examples")
print("=" * 60)
for name, template in prompt_templates.items():
    print(f"\n--- {name.upper()} ---")
    print(template.format(text=test_text))
    print()

In [None]:
# Prompt template analysis
def analyze_prompt(template, text):
    """Analyze prompt characteristics"""
    prompt = template.format(text=text)
    tokens = tokenizer.tokenize(prompt)
    return {
        'char_length': len(prompt),
        'token_count': len(tokens),
        'word_count': len(prompt.split()),
        'example_count': prompt.count('Sentiment:') - 1  # Exclude final query
    }

print("Prompt Analysis")
print("-" * 60)
print(f"{'Template':<20} {'Chars':<10} {'Tokens':<10} {'Examples':<10}")
print("-" * 60)

for name, template in prompt_templates.items():
    analysis = analyze_prompt(template, test_text)
    print(f"{name:<20} {analysis['char_length']:<10} {analysis['token_count']:<10} {analysis['example_count']:<10}")

## Part 4: Fine-tuning Strategies Comparison

Let's compare different approaches to adapting pre-trained models.

In [None]:
# Simulated comparison of fine-tuning strategies
strategies = {
    'Full Fine-tuning': {'params': 100, 'memory': 100, 'accuracy': 95, 'training_time': 100},
    'Freeze Embeddings': {'params': 85, 'memory': 85, 'accuracy': 93, 'training_time': 80},
    'Only Classifier': {'params': 1, 'memory': 50, 'accuracy': 85, 'training_time': 10},
    'LoRA (r=8)': {'params': 0.5, 'memory': 55, 'accuracy': 93, 'training_time': 30},
    'LoRA (r=16)': {'params': 1, 'memory': 58, 'accuracy': 94, 'training_time': 35},
    'Prefix Tuning': {'params': 0.1, 'memory': 52, 'accuracy': 90, 'training_time': 25},
}

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

metrics = ['params', 'memory', 'accuracy', 'training_time']
titles = ['Trainable Parameters (%)', 'Memory Usage (%)', 'Accuracy (%)', 'Training Time (%)']
colors = ['#3333B2', '#FF7F0E', '#2CA02C', '#D62728', '#9467BD', '#8C564B']

for ax, metric, title in zip(axes.flat, metrics, titles):
    values = [strategies[s][metric] for s in strategies]
    bars = ax.barh(list(strategies.keys()), values, color=colors)
    ax.set_xlabel(title.split('(')[0], fontsize=11)
    ax.set_title(title, fontsize=12, fontweight='bold')
    
    for bar, val in zip(bars, values):
        ax.text(bar.get_width() + 1, bar.get_y() + bar.get_height()/2,
                f'{val}', va='center', fontsize=9)

plt.suptitle('Fine-tuning Strategy Comparison', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

## Part 5: Practical Fine-tuning Example

Let's fine-tune a small model for sentiment classification.

In [None]:
# Create a simple training example
train_texts = [
    "I love this product, it's amazing!",
    "Terrible experience, would not recommend.",
    "Best purchase ever, highly satisfied!",
    "Waste of money, very disappointed.",
    "Excellent quality and fast shipping!",
    "Poor customer service, never buying again.",
]
train_labels = [1, 0, 1, 0, 1, 0]  # 1 = positive, 0 = negative

# Tokenize
encodings = tokenizer(train_texts, padding=True, truncation=True, return_tensors='pt')
labels = torch.tensor(train_labels)

print("Training Data:")
for text, label in zip(train_texts, train_labels):
    sentiment = 'positive' if label == 1 else 'negative'
    print(f"  [{sentiment:8}] {text[:50]}...")

In [None]:
# Simple training loop demonstration
model.train()
optimizer = torch.optim.AdamW(model.parameters(), lr=2e-5)

losses = []
print("Training Progress:")
print("-" * 40)

for epoch in range(3):
    optimizer.zero_grad()
    outputs = model(**encodings, labels=labels)
    loss = outputs.loss
    loss.backward()
    optimizer.step()
    
    losses.append(loss.item())
    print(f"Epoch {epoch+1}: Loss = {loss.item():.4f}")

# Plot training loss
plt.figure(figsize=(8, 4))
plt.plot(range(1, len(losses)+1), losses, 'b-o', linewidth=2, markersize=8)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Loss', fontsize=12)
plt.title('Training Loss Over Epochs', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.show()

## Exercises

1. **LoRA Implementation**: Extend the LoRALayer class to support different alpha values
2. **Prompt Optimization**: Design prompts for a Q&A task and compare effectiveness
3. **Fine-tuning**: Fine-tune a model on a custom dataset with different learning rates
4. **Comparison**: Compare LoRA vs full fine-tuning on a classification task

In [None]:
# Exercise starter: Implement a prompt evaluation function
def evaluate_prompt_quality(prompt, criteria):
    """
    Evaluate prompt quality based on criteria.
    
    Args:
        prompt: The prompt string
        criteria: Dict of criteria to check
    
    Returns:
        Dict of scores
    """
    scores = {}
    
    # Length appropriateness (not too short, not too long)
    tokens = tokenizer.tokenize(prompt)
    if 10 < len(tokens) < 100:
        scores['length'] = 1.0
    elif 5 < len(tokens) < 200:
        scores['length'] = 0.5
    else:
        scores['length'] = 0.0
    
    # Has clear instruction
    instruction_words = ['classify', 'determine', 'analyze', 'identify', 'extract']
    scores['instruction'] = 1.0 if any(w in prompt.lower() for w in instruction_words) else 0.0
    
    # Has examples (for few-shot)
    scores['examples'] = min(prompt.count('Example:') / 3, 1.0)
    
    return scores

# Test
test_prompt = prompt_templates['few_shot'].format(text="Test")
scores = evaluate_prompt_quality(test_prompt, {})
print("Prompt Quality Scores:")
for criterion, score in scores.items():
    print(f"  {criterion}: {score:.2f}")

## Summary

In this lab, we explored:

1. **Fine-tuning basics**: Understanding parameter counts and layer distributions
2. **LoRA implementation**: Built a low-rank adaptation layer from scratch
3. **Prompt engineering**: Designed and analyzed different prompt templates
4. **Strategy comparison**: Compared efficiency vs performance tradeoffs
5. **Practical training**: Demonstrated a simple fine-tuning workflow

**Key Takeaways**:
- LoRA can achieve >99% parameter reduction with minimal accuracy loss
- Prompt engineering is crucial for zero/few-shot performance
- Different strategies suit different resource constraints