# Temporal LoRA for Dynamic Sentence Embeddings - Complete Workflow

This notebook demonstrates the full pipeline including:
1. Environment setup
2. Data preparation
3. Model training (LoRA)
4. Benchmark comparison against baselines
5. Report generation with visualizations

**Expected improvements:**
- Within-period: +2-4% NDCG@10
- Cross-period: +8-15% NDCG@10
- Parameter efficiency: <2% trainable params

## 1. Setup Environment

**⚠️ IMPORTANT:** After running this cell, you MUST restart the runtime before proceeding!

In [None]:
# Clone repository
!git clone https://github.com/YOUR_USERNAME/DynamicEmbeddings.git
%cd DynamicEmbeddings

# Run setup script
print("🔄 Removing all conflicting packages...")
!pip uninstall -y sentence-transformers transformers torch accelerate peft numpy typer click -q

# Install exact working versions in correct order
print("\n📦 Installing torch first...")
!pip install torch==2.2.1 --no-cache-dir -q

print("📦 Installing numpy...")
!pip install "numpy>=1.26.0,<2.0.0" --no-cache-dir -q

print("📦 Installing transformers...")
!pip install transformers==4.40.0 --no-cache-dir -q

print("📦 Installing sentence-transformers...")
!pip install sentence-transformers==3.0.1 --no-cache-dir -q

print("📦 Installing PEFT libraries...")
!pip install accelerate==0.29.0 peft==0.10.0 --no-cache-dir -q

print("📦 Installing CLI tools...")
!pip install "typer[all]==0.9.0" "click>=8.0.0,<8.2.0" --no-cache-dir -q

print("📦 Installing other dependencies...")
!pip install datasets faiss-cpu pyyaml umap-learn scikit-learn matplotlib seaborn pandas --no-cache-dir -q

print("📦 Installing project...")
!pip install -e . --no-cache-dir -q

print("\n" + "="*60)
print("✅ Installation complete!")
print("="*60)
print("\n⚠️  IMPORTANT: You MUST restart runtime now!")
print("   Go to: Runtime → Restart runtime")
print("="*60)

## 2. Verify Installation (Run After Restart)

In [None]:
# Verify imports
import torch
import transformers
from sentence_transformers import SentenceTransformer
from peft import LoraConfig

print(f"✅ PyTorch: {torch.__version__}")
print(f"✅ Transformers: {transformers.__version__}")
print(f"✅ CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"✅ GPU: {torch.cuda.get_device_name(0)}")

# Verify CLI
!python -m temporal_lora.cli --help

## 3. Prepare Data

Download and preprocess arXiv abstracts into time buckets.

In [None]:
# Prepare data (smaller sample for Colab)
!python -m temporal_lora.cli prepare-data \
  --max-per-bucket 3000 \
  --balance-per-bin

print("\n✅ Data preparation complete!")

## 4. Train Temporal LoRA Adapters

Train one LoRA adapter per time bucket with hard temporal negatives.

In [None]:
# Train LoRA adapters with hard negatives
!python -m temporal_lora.cli train-adapters \
  --mode lora \
  --hard-temporal-negatives \
  --neg-k 4 \
  --lora-r 16 \
  --epochs 2 \
  --batch-size 16

print("\n✅ Training complete!")

## 5. Build FAISS Indexes

Create retrieval indexes for evaluation.

In [None]:
# Build indexes
!python -m temporal_lora.cli build-indexes

print("\n✅ Indexes built!")

## 6. Run Comprehensive Benchmark

Compare Temporal LoRA against multiple baselines:
- **Frozen SBERT** (all-MiniLM-L6-v2) - No training
- **All-MPNet-base-v2** - Larger model
- **Temporal LoRA** (ours) - Time-aware adapters

In [None]:
# Run benchmark with automatic report generation
!python -m temporal_lora.cli benchmark \
  --baseline-models "sentence-transformers/all-MiniLM-L6-v2,sentence-transformers/all-mpnet-base-v2" \
  --report

print("\n✅ Benchmark complete!")

## 7. View Results

Display benchmark results and improvements.

In [None]:
import pandas as pd
from IPython.display import display, Markdown, Image

# Load results
results_df = pd.read_csv("deliverables/results/benchmark/benchmark_comparison.csv")

print("📊 Benchmark Results:")
display(results_df)

# Calculate average scores
print("\n📈 Average Performance:")
avg_scores = results_df.groupby("model")[["ndcg@10", "recall@10", "mrr"]].mean()
display(avg_scores)

## 8. Display Report

View the comprehensive benchmark report.

In [None]:
# Display markdown report
with open("deliverables/results/benchmark/BENCHMARK_REPORT.md", "r") as f:
    report = f.read()

display(Markdown(report))

## 9. Display Visualizations

In [None]:
# Display comparison plot
print("📊 Performance Comparison:")
display(Image("deliverables/results/benchmark/figures/benchmark_comparison.png"))

print("\n🔥 Improvement Heatmap:")
display(Image("deliverables/results/benchmark/figures/improvement_heatmap.png"))

## 10. Key Improvements Summary

In [None]:
# Calculate improvements
baseline_name = "all-MiniLM-L6-v2"
lora_name = "Temporal-LoRA"

baseline_df = results_df[results_df["model"] == baseline_name]
lora_df = results_df[results_df["model"] == lora_name]

print("🎯 KEY IMPROVEMENTS:\n")
print("="*60)

for bucket in results_df["bucket"].unique():
    base_row = baseline_df[baseline_df["bucket"] == bucket]
    lora_row = lora_df[lora_df["bucket"] == bucket]
    
    if not base_row.empty and not lora_row.empty:
        print(f"\n{bucket}:")
        
        for metric in ["ndcg@10", "recall@10", "mrr"]:
            base_val = base_row[metric].values[0]
            lora_val = lora_row[metric].values[0]
            improvement = ((lora_val - base_val) / base_val) * 100
            
            icon = "🔥" if improvement > 5 else "✅" if improvement > 0 else "⚠️"
            print(f"  {icon} {metric}: {improvement:+.1f}% ({base_val:.4f} → {lora_val:.4f})")

# Overall improvement
print("\n" + "="*60)
baseline_avg = baseline_df["ndcg@10"].mean()
lora_avg = lora_df["ndcg@10"].mean()
overall_improvement = ((lora_avg - baseline_avg) / baseline_avg) * 100

print(f"\n🏆 OVERALL IMPROVEMENT: {overall_improvement:+.1f}%")
print(f"   Baseline Avg NDCG@10: {baseline_avg:.4f}")
print(f"   Temporal LoRA Avg NDCG@10: {lora_avg:.4f}")
print(f"\n💡 Parameter Efficiency: <2% trainable parameters")
print("="*60)

## 11. Export Results

Package all results for submission.

In [None]:
# Export all deliverables
!python -m temporal_lora.cli export-deliverables

print("\n✅ All results exported to deliverables/")
print("\nDownload the following:")
print("  📁 deliverables/results/benchmark/BENCHMARK_REPORT.md")
print("  📁 deliverables/results/benchmark/figures/")
print("  📁 deliverables/repro/environment.txt")

## Summary

### What We've Demonstrated:

1. **Problem:** Semantic drift over time (e.g., "transformer" in 2010 vs 2024)
2. **Solution:** Time-aware LoRA adapters on frozen encoder
3. **Results:** 
   - Improved retrieval performance across time periods
   - <2% trainable parameters (vs 100% for full fine-tuning)
   - Maintained base model while adapting to temporal shifts

### Key Improvements:
- **Within-period queries:** Better semantic matching
- **Cross-period queries:** Handles semantic drift effectively
- **Parameter efficiency:** Minimal overhead per time bucket

### Next Steps:
- Run ablation studies (LoRA rank, bucket count)
- Test on additional time periods
- Explore term drift trajectories