# QLoRA Diagnostic Analysis - Part 2: QLoRA (4-bit) Implementation

## Objective
Implement QLoRA with 4-bit NF4 quantization and compare against the 16-bit LoRA baseline from Part 1.

## Key Questions
1. How much memory does 4-bit quantization save compared to 16-bit?
2. Does QLoRA preserve performance (cosine similarity > 0.95)?
3. What is the optimal rank for QLoRA?

---

## 1. Environment Setup

In [None]:
# Install required packages
%pip install -q transformers datasets accelerate peft bitsandbytes matplotlib seaborn pandas numpy scikit-learn tqdm

In [None]:
# Import utilities
import sys
import os
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pickle

# Add src to path
sys.path.append('../src')

from model_utils import (
    load_base_model_4bit,
    setup_lora_4bit,
    get_model_memory_usage,
    print_model_architecture,
    clear_memory
)

from training import (
    prepare_alpaca_dataset,
    train_model,
    run_experiment
)

from visualization import (
    plot_memory_comparison,
    create_results_table,
    print_diagnostic_summary
)

print(f"‚úì PyTorch version: {torch.__version__}")
print(f"‚úì CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"‚úì GPU: {torch.cuda.get_device_name(0)}")

## 2. Configuration

In [None]:
# Experimental configuration
MODEL_NAME = "gpt2-medium"  # 355M parameters
NUM_SAMPLES = 1000  # Match baseline
MAX_STEPS = 200
BATCH_SIZE = 4
LEARNING_RATE = 2e-4

# Ranks to test (match baseline)
RANKS_TO_TEST = [2, 4, 8, 16]

# Output directory
OUTPUT_DIR = "./results_qlora"
os.makedirs(OUTPUT_DIR, exist_ok=True)

print("Configuration:")
print(f"  Model: {MODEL_NAME}")
print(f"  Quantization: 4-bit NF4")
print(f"  Training samples: {NUM_SAMPLES}")
print(f"  Max steps: {MAX_STEPS}")
print(f"  Ranks to test: {RANKS_TO_TEST}")

## 3. Load Baseline Results for Comparison

In [None]:
# Load baseline LoRA results
try:
    with open('../results_baseline_lora/baseline_results.pkl', 'rb') as f:
        baseline_results = pickle.load(f)
    print(f"‚úì Loaded {len(baseline_results)} baseline results")
    baseline_df = pd.DataFrame(baseline_results)
    print("\nBaseline Summary:")
    display(baseline_df[['rank', 'peak_memory_mb', 'time_per_step', 'training_loss']])
except FileNotFoundError:
    print("‚ö†Ô∏è  Baseline results not found. Run 01_baseline_lora.ipynb first.")
    baseline_results = None

## 4. Run QLoRA Experiments

Train QLoRA (4-bit quantized base + high-precision adapters) with different ranks.

In [None]:
# Store results
qlora_results_list = []

for rank in RANKS_TO_TEST:
    print(f"\n{'='*80}")
    print(f"Running QLoRA (4-bit) with rank r={rank}")
    print(f"{'='*80}\n")
    
    try:
        result, model, tokenizer = run_experiment(
            model_name=MODEL_NAME,
            quantization="4bit",
            rank=rank,
            num_samples=NUM_SAMPLES,
            max_steps=MAX_STEPS,
            batch_size=BATCH_SIZE,
            learning_rate=LEARNING_RATE,
            output_dir=OUTPUT_DIR
        )
        
        qlora_results_list.append(result)
        
        # Clean up
        del model
        del tokenizer
        clear_memory()
        
    except Exception as e:
        print(f"‚ùå Error with rank {rank}: {e}")
        continue

print("\n‚úì All QLoRA experiments complete!")

## 5. Results Analysis

### 5.1 Create Results Table

In [None]:
# Create QLoRA results table
qlora_df = create_results_table(
    qlora_results_list,
    save_path=f"{OUTPUT_DIR}/qlora_results.csv"
)

print("\nüìä QLORA RESULTS")
print("="*80)
display(qlora_df)

### 5.2 Compare LoRA vs QLoRA

In [None]:
if baseline_results:
    # Combine results
    combined_df = pd.concat([baseline_df, qlora_df], ignore_index=True)
    
    # Calculate memory reduction
    comparison = pd.DataFrame()
    for rank in RANKS_TO_TEST:
        lora_mem = baseline_df[baseline_df['rank'] == rank]['peak_memory_mb'].values[0]
        qlora_mem = qlora_df[qlora_df['rank'] == rank]['peak_memory_mb'].values[0]
        reduction = ((lora_mem - qlora_mem) / lora_mem) * 100
        
        comparison = pd.concat([comparison, pd.DataFrame({
            'rank': [rank],
            'lora_memory_mb': [lora_mem],
            'qlora_memory_mb': [qlora_mem],
            'memory_reduction_%': [reduction]
        })], ignore_index=True)
    
    print("\nüîã MEMORY COMPARISON: LoRA vs QLoRA")
    print("="*80)
    display(comparison)
    
    print(f"\n‚ú® Average memory reduction: {comparison['memory_reduction_%'].mean():.2f}%")

### 5.3 Visualize Memory Comparison

In [None]:
if baseline_results:
    # Plot memory comparison
    plot_memory_comparison(
        combined_df,
        save_path=f"../results/figures/memory_comparison.png"
    )

### 5.4 Training Efficiency Comparison

In [None]:
if baseline_results:
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
    
    # Time per step
    ax1.bar(baseline_df['rank'] - 0.2, baseline_df['time_per_step'], 0.4, 
            label='LoRA (16-bit)', color='#3498db', alpha=0.8)
    ax1.bar(qlora_df['rank'] + 0.2, qlora_df['time_per_step'], 0.4,
            label='QLoRA (4-bit)', color='#e74c3c', alpha=0.8)
    ax1.set_xlabel('Rank', fontweight='bold')
    ax1.set_ylabel('Time per Step (s)', fontweight='bold')
    ax1.set_title('Training Speed Comparison')
    ax1.legend()
    ax1.grid(axis='y', alpha=0.3)
    
    # Training loss
    ax2.plot(baseline_df['rank'], baseline_df['training_loss'], 
             marker='o', linewidth=2, label='LoRA (16-bit)', color='#3498db')
    ax2.plot(qlora_df['rank'], qlora_df['training_loss'],
             marker='s', linewidth=2, label='QLoRA (4-bit)', color='#e74c3c')
    ax2.set_xlabel('Rank', fontweight='bold')
    ax2.set_ylabel('Training Loss', fontweight='bold')
    ax2.set_title('Training Loss Comparison')
    ax2.legend()
    ax2.grid(alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('../results/figures/training_efficiency.png', dpi=300, bbox_inches='tight')
    plt.show()

## 6. Key Findings

### TODO: Fill in after running experiments

**Memory Reduction:**
- Average reduction: [TODO: FILL]%
- Rank 8: LoRA [TODO] MB ‚Üí QLoRA [TODO] MB

**Performance:**
- Training loss comparable: [YES/NO]
- Time per step: [FASTER/SLOWER/SIMILAR]

**Observations:**
- [TODO: Document trends]
- [TODO: Note any unexpected behavior]

---

**Next Steps:**
- Proceed to Part 3: Diagnostic analysis (weight similarity, hypothesis testing)

## 7. Save Results

In [None]:
# Save QLoRA results
with open(f"{OUTPUT_DIR}/qlora_results.pkl", 'wb') as f:
    pickle.dump(qlora_results_list, f)

# Save comparison
if baseline_results:
    comparison.to_csv('../results/tables/memory_comparison.csv', index=False)

print(f"‚úì Results saved to {OUTPUT_DIR}/")
print("\nüéâ QLoRA experiments complete!")
print("üìù Proceed to notebook 03_diagnostic_analysis.ipynb")