# 🔥 GPU-Accelerated OrdinalSustain Test

This notebook tests the GPU-accelerated OrdinalSustain implementation on Google Colab.

## 📋 What this notebook does:
1. ✅ Verifies GPU availability
2. ✅ Clones the repository with GPU implementation
3. ✅ Installs dependencies
4. ✅ Tests GPU vs CPU performance
5. ✅ Validates correctness (GPU results match CPU)
6. ✅ Benchmarks across different dataset sizes

## ⚙️ Before running:
**IMPORTANT:** Enable GPU in Colab!
- Click `Runtime` → `Change runtime type`
- Select `T4 GPU` under Hardware accelerator
- Click `Save`

---

## 1️⃣ Check GPU Availability

In [None]:
import subprocess
import sys

print("="*70)
print("🔍 GPU DETECTION")
print("="*70)

# Check if nvidia-smi works
try:
    result = subprocess.run(['nvidia-smi'], capture_output=True, text=True)
    if result.returncode == 0:
        print("\n✅ GPU detected!\n")
        print(result.stdout)
    else:
        print("\n⚠️  nvidia-smi failed. GPU may not be enabled.")
        print("Please enable GPU: Runtime → Change runtime type → GPU")
except FileNotFoundError:
    print("\n❌ nvidia-smi not found. GPU is not available.")
    print("Please enable GPU: Runtime → Change runtime type → GPU")

## 2️⃣ Install Dependencies

In [None]:
print("="*70)
print("📦 INSTALLING DEPENDENCIES")
print("="*70)

# Install core dependencies
print("\n📦 Installing core packages...")
!pip install -q torch numpy scipy matplotlib tqdm scikit-learn pandas pathos dill

# Install awkde and kde_ebm (may take ~30 seconds)
print("📦 Installing awkde and kde_ebm (this may take ~30 seconds)...")
!pip install -q git+https://github.com/noxtoby/awkde.git
!pip install -q git+https://github.com/ucl-pond/kde_ebm.git

print("\n✅ All dependencies installed!")

# Verify PyTorch can see GPU
import torch
print(f"\n🔧 PyTorch GPU Info:")
print(f"   • PyTorch version: {torch.__version__}")
print(f"   • CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"   • CUDA version: {torch.version.cuda}")
    print(f"   • GPU device: {torch.cuda.get_device_name(0)}")
    print(f"   • GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
else:
    print("   ⚠️  CUDA not available. Please enable GPU runtime.")

## 3️⃣ Clone Repository

In [None]:
print("="*70)
print("📥 CLONING REPOSITORY")
print("="*70)

# Remove existing directory if it exists
!rm -rf mphil

# Clone the repository
!git clone https://github.com/Amelia3141/mphil.git

# Change to repository directory
%cd mphil

# Checkout the GPU optimization branch
!git checkout claude/optimize-sustain-speed-011CV4Lk8FuUjS6hZNj13WE3

# Add to Python path
import sys
sys.path.insert(0, '/content/mphil')

print("\n✅ Repository cloned and branch checked out!")

## 4️⃣ Generate Test Data

In [None]:
import numpy as np

print("="*70)
print("🎲 GENERATING TEST DATA")
print("="*70)

def generate_test_data(n_subjects=1000, n_biomarkers=10, n_scores=3, seed=42):
    """Generate synthetic test data for OrdinalSustain."""
    np.random.seed(seed)
    
    # Set the proportion of individuals with correct scores to 0.9
    p_correct = 0.9
    p_nl_dist = np.full((n_scores + 1), (1 - p_correct) / n_scores)
    p_nl_dist[0] = p_correct
    
    p_score_dist = np.full((n_scores, n_scores + 1), (1 - p_correct) / n_scores)
    for score in range(n_scores):
        p_score_dist[score, score + 1] = p_correct
    
    # Generate data
    data = np.random.choice(range(n_scores + 1), n_subjects * n_biomarkers,
                          replace=True, p=p_nl_dist)
    data = data.reshape((n_subjects, n_biomarkers))
    
    # Turn the data into probabilities
    prob_nl = p_nl_dist[data]
    
    prob_score = np.zeros((n_subjects, n_biomarkers, n_scores))
    for n in range(n_biomarkers):
        for z in range(n_scores):
            for score in range(n_scores + 1):
                prob_score[data[:, n] == score, n, z] = p_score_dist[z, score]
    
    # Create score_vals matrix
    score_vals = np.tile(np.arange(1, n_scores + 1), (n_biomarkers, 1))
    
    # Create biomarker labels
    biomarker_labels = [f"Biomarker_{i}" for i in range(n_biomarkers)]
    
    return prob_nl, prob_score, score_vals, biomarker_labels

# Generate test data
prob_nl, prob_score, score_vals, biomarker_labels = generate_test_data(
    n_subjects=1000, n_biomarkers=10, n_scores=3
)

print(f"\n✅ Test data generated:")
print(f"   • Subjects: {prob_nl.shape[0]}")
print(f"   • Biomarkers: {prob_nl.shape[1]}")
print(f"   • Scores: {prob_score.shape[2]}")
print(f"   • prob_nl shape: {prob_nl.shape}")
print(f"   • prob_score shape: {prob_score.shape}")
print(f"   • score_vals shape: {score_vals.shape}")

## 5️⃣ Test GPU Implementation

In [None]:
import time
from pySuStaIn.TorchOrdinalSustain import TorchOrdinalSustain
from pySuStaIn.OrdinalSustain import OrdinalSustain

print("="*70)
print("🔥 TESTING GPU IMPLEMENTATION")
print("="*70)

try:
    # Create GPU instance
    gpu_sustain = TorchOrdinalSustain(
        prob_nl, prob_score, score_vals, biomarker_labels,
        N_startpoints=1,
        N_S_max=1,
        N_iterations_MCMC=100,
        output_folder="./temp",
        dataset_name="gpu_test",
        use_parallel_startpoints=False,
        seed=42,
        use_gpu=True,
        device_id=0
    )
    
    if gpu_sustain.use_gpu:
        print("\n✅ GPU implementation initialized successfully!")
        print(f"   • Using device: {gpu_sustain.torch_backend.device_manager.device}")
        print(f"   • Data type: {gpu_sustain.torch_backend.device_manager.torch_dtype}")
    else:
        print("\n⚠️  GPU not available, running on CPU")
        
except Exception as e:
    print(f"\n❌ Error initializing GPU implementation: {e}")
    import traceback
    traceback.print_exc()

## 6️⃣ Validate Correctness (GPU vs CPU)

In [None]:
print("="*70)
print("🔬 VALIDATION: GPU vs CPU Correctness")
print("="*70)

if not gpu_sustain.use_gpu:
    print("\n⚠️  Skipping validation - GPU not available")
else:
    # Create CPU instance for comparison
    cpu_sustain = OrdinalSustain(
        prob_nl, prob_score, score_vals, biomarker_labels,
        N_startpoints=1,
        N_S_max=1,
        N_iterations_MCMC=100,
        output_folder="./temp",
        dataset_name="cpu_test",
        use_parallel_startpoints=False,
        seed=42
    )
    
    # Get sustainData
    cpu_data = getattr(cpu_sustain, '_OrdinalSustain__sustainData')
    gpu_data = getattr(gpu_sustain, '_OrdinalSustain__sustainData')
    
    # Test with random sequences
    # Get the actual number of stages from the sustain object
    N = cpu_data.getNumStages()
    n_tests = 5
    all_passed = True
    tolerance = 1e-5
    
    for test_idx in range(n_tests):
        # Generate random sequence
        np.random.seed(test_idx)
        S_test = np.random.permutation(N).astype(float)
        
        # Compute likelihoods
        cpu_result = cpu_sustain._calculate_likelihood_stage(cpu_data, S_test)
        gpu_result = gpu_sustain._calculate_likelihood_stage(gpu_data, S_test)
        
        # Compare results
        max_diff = np.max(np.abs(cpu_result - gpu_result))
        mean_diff = np.mean(np.abs(cpu_result - gpu_result))
        rel_diff = max_diff / (np.mean(np.abs(cpu_result)) + 1e-10)
        
        print(f"\nTest {test_idx + 1}/{n_tests}:")
        print(f"   • Max absolute diff: {max_diff:.2e}")
        print(f"   • Mean absolute diff: {mean_diff:.2e}")
        print(f"   • Relative diff: {rel_diff:.2e}")
        
        if max_diff > tolerance:
            print(f"   ❌ FAILED (exceeds tolerance {tolerance:.2e})")
            all_passed = False
        else:
            print(f"   ✅ PASSED")
    
    print("\n" + "="*70)
    if all_passed:
        print("✅ All validation tests PASSED!")
        print("   GPU results match CPU within numerical tolerance")
    else:
        print("❌ Some validation tests FAILED")
    print("="*70)

## 7️⃣ Performance Benchmark

In [None]:
print("="*70)
print("⚡ PERFORMANCE BENCHMARK")
print("="*70)

if not gpu_sustain.use_gpu:
    print("\n⚠️  Skipping benchmark - GPU not available")
else:
    # Prepare test sequence
    # Get the actual number of stages from the sustain object
    N = cpu_data.getNumStages()
    S_test = np.random.permutation(N).astype(float)
    
    n_iterations = 20
    
    # Benchmark CPU
    print(f"\n🐌 Benchmarking CPU ({n_iterations} iterations)...")
    cpu_times = []
    for i in range(n_iterations):
        start = time.time()
        _ = cpu_sustain._calculate_likelihood_stage(cpu_data, S_test)
        cpu_times.append(time.time() - start)
    
    cpu_mean = np.mean(cpu_times)
    cpu_std = np.std(cpu_times)
    
    print(f"   • Mean time: {cpu_mean*1000:.2f}ms ± {cpu_std*1000:.2f}ms")
    print(f"   • Min time: {np.min(cpu_times)*1000:.2f}ms")
    print(f"   • Max time: {np.max(cpu_times)*1000:.2f}ms")
    
    # Benchmark GPU (with warmup)
    print(f"\n🔥 Benchmarking GPU ({n_iterations} iterations)...")
    print("   • Warming up GPU...")
    for _ in range(5):
        _ = gpu_sustain._calculate_likelihood_stage(gpu_data, S_test)
    
    gpu_times = []
    for i in range(n_iterations):
        start = time.time()
        _ = gpu_sustain._calculate_likelihood_stage(gpu_data, S_test)
        gpu_times.append(time.time() - start)
    
    gpu_mean = np.mean(gpu_times)
    gpu_std = np.std(gpu_times)
    
    print(f"   • Mean time: {gpu_mean*1000:.2f}ms ± {gpu_std*1000:.2f}ms")
    print(f"   • Min time: {np.min(gpu_times)*1000:.2f}ms")
    print(f"   • Max time: {np.max(gpu_times)*1000:.2f}ms")
    
    # Calculate speedup
    speedup = cpu_mean / gpu_mean
    
    print("\n" + "="*70)
    print(f"🚀 SPEEDUP: {speedup:.2f}x")
    print("="*70)
    print(f"\n📊 Summary:")
    print(f"   • Dataset: {prob_nl.shape[0]} subjects, {prob_nl.shape[1]} biomarkers")
    print(f"   • CPU time: {cpu_mean*1000:.2f}ms")
    print(f"   • GPU time: {gpu_mean*1000:.2f}ms")
    print(f"   • Speedup: {speedup:.2f}x faster on GPU")
    
    # Get GPU performance stats
    perf_stats = gpu_sustain.get_performance_stats()
    if perf_stats['computation_times']:
        print("\n⏱️  Detailed GPU timing:")
        for op_name, op_time in perf_stats['computation_times'].items():
            print(f"   • {op_name}: {op_time*1000:.2f}ms")

## 8️⃣ Benchmark Across Dataset Sizes

In [None]:
print("="*70)
print("📈 BENCHMARK ACROSS DATASET SIZES")
print("="*70)

if not gpu_sustain.use_gpu:
    print("\n⚠️  Skipping - GPU not available")
else:
    configs = [
        {"n_subjects": 100, "n_biomarkers": 5, "n_scores": 3},
        {"n_subjects": 500, "n_biomarkers": 10, "n_scores": 3},
        {"n_subjects": 1000, "n_biomarkers": 10, "n_scores": 3},
        {"n_subjects": 2000, "n_biomarkers": 15, "n_scores": 3},
    ]
    
    results = []
    
    for config in configs:
        print(f"\n{'─'*70}")
        print(f"Testing: {config['n_subjects']} subjects, {config['n_biomarkers']} biomarkers")
        print('─'*70)
        
        # Generate data
        test_prob_nl, test_prob_score, test_score_vals, test_labels = generate_test_data(**config)
        
        # Create instances
        test_cpu = OrdinalSustain(
            test_prob_nl, test_prob_score, test_score_vals, test_labels,
            1, 1, 100, "./temp", "test", False, 42
        )
        test_gpu = TorchOrdinalSustain(
            test_prob_nl, test_prob_score, test_score_vals, test_labels,
            1, 1, 100, "./temp", "test", False, 42, use_gpu=True
        )
        
        # Get data
        test_cpu_data = getattr(test_cpu, '_OrdinalSustain__sustainData')
        test_gpu_data = getattr(test_gpu, '_OrdinalSustain__sustainData')
        
        # Prepare sequence
        # Get the actual number of stages from the sustain object
        test_N = test_cpu_data.getNumStages()
        test_S = np.random.permutation(test_N).astype(float)
        
        # Benchmark CPU
        cpu_times_test = []
        for _ in range(10):
            start = time.time()
            _ = test_cpu._calculate_likelihood_stage(test_cpu_data, test_S)
            cpu_times_test.append(time.time() - start)
        
        # Benchmark GPU (with warmup)
        for _ in range(3):
            _ = test_gpu._calculate_likelihood_stage(test_gpu_data, test_S)
        
        gpu_times_test = []
        for _ in range(10):
            start = time.time()
            _ = test_gpu._calculate_likelihood_stage(test_gpu_data, test_S)
            gpu_times_test.append(time.time() - start)
        
        cpu_mean_test = np.mean(cpu_times_test)
        gpu_mean_test = np.mean(gpu_times_test)
        speedup_test = cpu_mean_test / gpu_mean_test
        
        results.append({
            'subjects': config['n_subjects'],
            'biomarkers': config['n_biomarkers'],
            'cpu_time': cpu_mean_test,
            'gpu_time': gpu_mean_test,
            'speedup': speedup_test
        })
        
        print(f"   • CPU: {cpu_mean_test*1000:.2f}ms")
        print(f"   • GPU: {gpu_mean_test*1000:.2f}ms")
        print(f"   • Speedup: {speedup_test:.2f}x")
    
    # Summary table
    print("\n" + "="*70)
    print("📊 SUMMARY")
    print("="*70)
    print(f"\n{'Subjects':<12} {'Biomarkers':<12} {'CPU (ms)':<12} {'GPU (ms)':<12} {'Speedup':<10}")
    print("─"*70)
    for r in results:
        print(f"{r['subjects']:<12} {r['biomarkers']:<12} "
              f"{r['cpu_time']*1000:<12.2f} {r['gpu_time']*1000:<12.2f} "
              f"{r['speedup']:<10.2f}x")
    print("="*70)

---

## 🎉 Test Complete!

### What we tested:
1. ✅ GPU detection and initialization
2. ✅ Correctness validation (GPU matches CPU results)
3. ✅ Performance benchmarking (GPU vs CPU speedup)
4. ✅ Scalability across dataset sizes

### Expected Results:
- **Speedup**: 8-15x on Google Colab T4 GPU
- **Correctness**: GPU results match CPU within 1e-5 tolerance
- **Scalability**: Speedup increases with dataset size

### Next Steps:
- Try with your own data
- Experiment with different dataset sizes
- Run full SuStaIn algorithm with `run_sustain_algorithm()`

---

**Repository:** https://github.com/Amelia3141/mphil

**Branch:** `claude/optimize-sustain-speed-011CV4Lk8FuUjS6hZNj13WE3`

**Documentation:** See `GPU_ORDINAL_OPTIMIZATION.md` in the repository

---