# QLoRA Diagnostic Analysis - Part 1: Baseline LoRA (16-bit)

## Objective
Establish baseline performance using standard LoRA with 16-bit precision on GPT-2 Medium (355M parameters). This serves as the reference point for comparing against QLoRA's 4-bit quantization.

## Key Questions
1. What is the memory requirement for 16-bit LoRA fine-tuning?
2. How does performance scale with different ranks (r ‚àà {2, 4, 8, 16})?
3. What is the training efficiency (time per step)?

---

## 1. Environment Setup

In [None]:
# Install required packages
%pip install -q transformers datasets accelerate peft bitsandbytes matplotlib seaborn pandas numpy scikit-learn tqdm

In [None]:
# Import utilities
import sys
import os
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Add src to path (if running in Colab, upload src files first)
# sys.path.append('./src')

# Import custom modules
from model_utils import (
    load_base_model_16bit,
    setup_lora_16bit,
    get_model_memory_usage,
    print_model_architecture,
    clear_memory
)

from training import (
    prepare_alpaca_dataset,
    train_model,
    run_experiment
)

from visualization import (
    plot_memory_comparison,
    create_results_table,
    print_diagnostic_summary
)

print(f"‚úì PyTorch version: {torch.__version__}")
print(f"‚úì CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"‚úì GPU: {torch.cuda.get_device_name(0)}")

## 2. Configuration

In [None]:
# Experimental configuration
MODEL_NAME = "gpt2-medium"  # 355M parameters
NUM_SAMPLES = 1000          # Small dataset for quick diagnostic experiments
MAX_STEPS = 200             # Training steps per experiment
BATCH_SIZE = 4
LEARNING_RATE = 2e-4

# Ranks to test
RANKS_TO_TEST = [2, 4, 8, 16]

# Output directory
OUTPUT_DIR = "./results_baseline_lora"
os.makedirs(OUTPUT_DIR, exist_ok=True)

print("Configuration:")
print(f"  Model: {MODEL_NAME}")
print(f"  Training samples: {NUM_SAMPLES}")
print(f"  Max steps: {MAX_STEPS}")
print(f"  Batch size: {BATCH_SIZE}")
print(f"  Learning rate: {LEARNING_RATE}")
print(f"  Ranks to test: {RANKS_TO_TEST}")

## 3. Run Baseline LoRA Experiments

We'll train LoRA with different ranks to establish baseline performance and memory usage.

In [None]:
# Store results
results_list = []

for rank in RANKS_TO_TEST:
    print(f"\n{'='*80}")
    print(f"Running LoRA (16-bit) with rank r={rank}")
    print(f"{'='*80}\n")
    
    try:
        result, model, tokenizer = run_experiment(
            model_name=MODEL_NAME,
            quantization="16bit",
            rank=rank,
            num_samples=NUM_SAMPLES,
            max_steps=MAX_STEPS,
            batch_size=BATCH_SIZE,
            learning_rate=LEARNING_RATE,
            output_dir=OUTPUT_DIR
        )
        
        results_list.append(result)
        
        # Clean up to free memory
        del model
        del tokenizer
        clear_memory()
        
    except Exception as e:
        print(f"‚ùå Error with rank {rank}: {e}")
        continue

print("\n‚úì All experiments complete!")

## 4. Results Analysis

### 4.1 Create Results Table

In [None]:
# Create comprehensive results table
results_df = create_results_table(
    results_list,
    save_path=f"{OUTPUT_DIR}/baseline_lora_results.csv"
)

print("\nüìä BASELINE LoRA RESULTS")
print("="*80)
display(results_df)

### 4.2 Memory Usage Analysis

In [None]:
# Plot memory usage by rank
plt.figure(figsize=(10, 6))
plt.bar(results_df['rank'], results_df['peak_memory_mb'])
plt.xlabel('LoRA Rank (r)', fontsize=12, fontweight='bold')
plt.ylabel('Peak GPU Memory (MB)', fontsize=12, fontweight='bold')
plt.title('Baseline LoRA (16-bit): Memory Usage by Rank', fontsize=14, fontweight='bold')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig(f"{OUTPUT_DIR}/baseline_memory_by_rank.png", dpi=300)
plt.show()

print(f"Average memory usage: {results_df['peak_memory_mb'].mean():.2f} MB")

### 4.3 Training Efficiency

In [None]:
# Plot time per step
plt.figure(figsize=(10, 6))
plt.bar(results_df['rank'], results_df['time_per_step'])
plt.xlabel('LoRA Rank (r)', fontsize=12, fontweight='bold')
plt.ylabel('Time per Step (seconds)', fontsize=12, fontweight='bold')
plt.title('Baseline LoRA (16-bit): Training Speed by Rank', fontsize=14, fontweight='bold')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig(f"{OUTPUT_DIR}/baseline_speed_by_rank.png", dpi=300)
plt.show()

print(f"Average time per step: {results_df['time_per_step'].mean():.3f}s")

## 5. Key Findings

### TODO: Fill in after running experiments

**Memory Usage:**
- Rank 2: [TODO: FILL] MB  
- Rank 4: [TODO: FILL] MB  
- Rank 8: [TODO: FILL] MB  
- Rank 16: [TODO: FILL] MB  

**Training Speed:**
- Average time per step: [TODO: FILL]s  

**Observations:**
- [TODO: Document any trends observed]  
- [TODO: Note any unexpected behavior]  

---

**Next Steps:**
- Proceed to Part 2: Implement QLoRA (4-bit) and compare results  
- Use these baseline metrics as reference for quantization impact analysis  

## 6. Save Results for Next Notebook

In [None]:
# Save results for comparison in subsequent notebooks
import pickle

with open(f"{OUTPUT_DIR}/baseline_results.pkl", 'wb') as f:
    pickle.dump(results_list, f)

print(f"‚úì Results saved to {OUTPUT_DIR}/baseline_results.pkl")
print("\nüéâ Baseline LoRA experiments complete!")
print("üìù Proceed to notebook 02_qlora_implementation.ipynb")