# üî• GPU-Accelerated OrdinalSustain Analysis

**Google Colab GPU Setup**

This notebook runs your OrdinalSustain analysis on GPU, reducing runtime from **30 days ‚Üí 2-4 days**!

## ‚ö° Before You Start:
1. **Enable GPU**: Runtime ‚Üí Change runtime type ‚Üí Select **T4 GPU** (free) or **A100 GPU** (Pro)
2. **Run cells in order**: Press `Shift + Enter` on each cell
3. **Test first**: Run quick test (Cell 6) before full analysis

---

## 1Ô∏è‚É£ GPU Detection

Check if GPU is available and working.

In [None]:
import subprocess

print("="*70)
print("üîç GPU DETECTION")
print("="*70)

# Check GPU
try:
    result = subprocess.run(['nvidia-smi'], capture_output=True, text=True)
    if result.returncode == 0:
        print("\n‚úÖ GPU detected!\n")
        print(result.stdout)
    else:
        print("\n‚ö†Ô∏è  GPU may not be enabled.")
        print("Please enable GPU: Runtime ‚Üí Change runtime type ‚Üí GPU")
except FileNotFoundError:
    print("\n‚ùå GPU is not available.")
    print("Please enable GPU: Runtime ‚Üí Change runtime type ‚Üí GPU")

## 2Ô∏è‚É£ Install Dependencies

Install all required packages (takes ~2-3 minutes).

In [None]:
print("="*70)
print("üì¶ INSTALLING DEPENDENCIES")
print("="*70)

# Install packages (GPU OrdinalSustain only - minimal dependencies)
print("\nüì¶ Installing packages for GPU OrdinalSustain...")
!pip install -q torch numpy scipy matplotlib tqdm scikit-learn pandas

print("\n‚úÖ Core dependencies installed!")
print("\n‚ÑπÔ∏è  Note: kde_ebm and awkde are NOT installed (only needed for MixtureSustain)")
print("         OrdinalSustain only needs PyTorch + standard scientific packages")

# Verify PyTorch can see GPU
import torch
print(f"\nüîß PyTorch GPU Info:")
print(f"   ‚Ä¢ PyTorch version: {torch.__version__}")
print(f"   ‚Ä¢ CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"   ‚Ä¢ CUDA version: {torch.version.cuda}")
    print(f"   ‚Ä¢ GPU device: {torch.cuda.get_device_name(0)}")
    print(f"   ‚Ä¢ GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
else:
    print("   ‚ö†Ô∏è  CUDA not available. Please enable GPU runtime.")

print("\n‚úÖ All dependencies installed!")

## 3Ô∏è‚É£ Clone Repository

Get the latest GPU-optimized code from GitHub.

In [None]:
print("="*70)
print("üì• CLONING REPOSITORY")
print("="*70)

# Remove existing directory if present
!rm -rf mphil

# Clone repository
!git clone https://github.com/Amelia3141/mphil.git
%cd mphil

# Checkout GPU branch with latest optimizations
!git checkout claude/optimize-sustain-speed-011CV4Lk8FuUjS6hZNj13WE3

# Add to Python path
import sys
sys.path.insert(0, '/content/mphil')

print("\n‚úÖ Repository ready!")

## 4Ô∏è‚É£ Prepare Your Data

**Option A**: Load your real data (uncomment and edit paths below)
**Option B**: Use synthetic test data (runs as-is)

In [None]:
import numpy as np

# ============================================================================
# OPTION A: Load Your Real Data (Uncomment and edit paths)
# ============================================================================
# prob_nl = np.load('/content/drive/MyDrive/your_data/prob_nl.npy')
# prob_score = np.load('/content/drive/MyDrive/your_data/prob_score.npy')
# score_vals = np.load('/content/drive/MyDrive/your_data/score_vals.npy')
# biomarker_labels = ['Domain1', 'Domain2', 'Domain3', ...]  # Your labels

# ============================================================================
# OPTION B: Generate Synthetic Test Data (Default)
# ============================================================================
def generate_test_data(n_subjects=8000, n_biomarkers=13, n_scores=3, seed=42):
    """Generate synthetic test data for OrdinalSustain."""
    np.random.seed(seed)
    
    # Probability distributions
    p_correct = 0.9
    p_nl_dist = np.full((n_scores + 1), (1 - p_correct) / n_scores)
    p_nl_dist[0] = p_correct
    
    p_score_dist = np.full((n_scores, n_scores + 1), (1 - p_correct) / n_scores)
    for score in range(n_scores):
        p_score_dist[score, score + 1] = p_correct
    
    # Generate data
    data = np.random.choice(range(n_scores + 1), n_subjects * n_biomarkers,
                          replace=True, p=p_nl_dist)
    data = data.reshape((n_subjects, n_biomarkers))
    
    # Calculate probabilities
    prob_nl = p_nl_dist[data]
    
    prob_score = np.zeros((n_subjects, n_biomarkers, n_scores))
    for n in range(n_biomarkers):
        for z in range(n_scores):
            for score in range(n_scores + 1):
                prob_score[data[:, n] == score, n, z] = p_score_dist[z, score]
    
    score_vals = np.tile(np.arange(1, n_scores + 1), (n_biomarkers, 1))
    biomarker_labels = [f"Biomarker_{i}" for i in range(n_biomarkers)]
    
    return prob_nl, prob_score, score_vals, biomarker_labels

# Generate test data (matches your dataset dimensions)
prob_nl, prob_score, score_vals, biomarker_labels = generate_test_data(
    n_subjects=8000,    # YOUR dataset size
    n_biomarkers=13,    # YOUR number of biomarkers  
    n_scores=3          # YOUR severity levels
)

print(f"‚úÖ Data ready:")
print(f"   ‚Ä¢ Subjects: {prob_nl.shape[0]}")
print(f"   ‚Ä¢ Biomarkers: {prob_nl.shape[1]}")
print(f"   ‚Ä¢ Severity levels: {prob_score.shape[2]}")
print(f"\nData shapes:")
print(f"   ‚Ä¢ prob_nl: {prob_nl.shape}")
print(f"   ‚Ä¢ prob_score: {prob_score.shape}")
print(f"   ‚Ä¢ score_vals: {score_vals.shape}")

## üíæ (Optional) Mount Google Drive

Uncomment and run this cell if you want to:
- Load data from Google Drive
- Save results to Google Drive
- Preserve results after Colab session ends

In [None]:
# from google.colab import drive
# drive.mount('/content/drive')
# print("‚úÖ Google Drive mounted at /content/drive")

## 5Ô∏è‚É£ Quick Test (IMPORTANT - Run This First!)

**‚ö†Ô∏è Run this before the full analysis!**

This test will:
- ‚úÖ Verify GPU is working
- ‚úÖ Measure actual speedup
- ‚úÖ Estimate time for full run

Takes ~2-5 minutes.

In [None]:
from pySuStaIn.TorchOrdinalSustain import TorchOrdinalSustain
import time
import os
from datetime import datetime

print("="*70)
print("üß™ QUICK TEST - GPU Speedup Verification")
print("="*70)

# Create test output directory
test_output = "./test_output"
os.makedirs(test_output, exist_ok=True)

# Create GPU instance with small iteration count
test_sustain = TorchOrdinalSustain(
    prob_nl, 
    prob_score, 
    score_vals, 
    biomarker_labels,
    N_startpoints=5,               # Small for testing
    N_S_max=1,                     # Single subtype for testing
    N_iterations_MCMC=1000,        # Small for quick test
    output_folder=test_output,
    dataset_name="quicktest",
    use_parallel_startpoints=False,
    seed=42,
    use_gpu=True,                  # ENABLE GPU!
    device_id=0
)

# Check GPU status
if test_sustain.use_gpu:
    print("\n‚úÖ GPU initialized successfully!")
    print(f"   ‚Ä¢ Device: {test_sustain.torch_backend.device_manager.device}")
    print(f"   ‚Ä¢ Expected speedup: 8-15x on T4, 15-25x on A100")
else:
    print("\n‚ö†Ô∏è  GPU not available, running on CPU")
    print("   Check: Runtime ‚Üí Change runtime type ‚Üí GPU")

# Run test with progress tracking
print("\nüöÄ Running quick test...")
print(f"‚è∞ Started: {datetime.now().strftime('%H:%M:%S')}")
print("\n" + "-"*70)

start_time = time.time()
test_sustain.run_sustain_algorithm()
test_time = time.time() - start_time

print("-"*70)
print(f"‚è∞ Finished: {datetime.now().strftime('%H:%M:%S')}")
print(f"‚úÖ Test completed in {test_time:.1f} seconds")

# Estimate full run time
full_iterations = 100000
test_iterations = 1000
estimated_time = test_time * (full_iterations / test_iterations)
estimated_hours = estimated_time / 3600
estimated_days = estimated_hours / 24

print("\n" + "="*70)
print("üìä PROJECTIONS FOR FULL RUN")
print("="*70)
print(f"Full run parameters: {full_iterations} MCMC iterations, 25 startpoints, 3 subtypes")
print(f"\nEstimated runtime:")
print(f"   ‚Ä¢ Hours: {estimated_hours:.1f} hours")
print(f"   ‚Ä¢ Days: {estimated_days:.1f} days")

if estimated_days < 30:
    speedup = 30 / estimated_days
    time_saved = 30 - estimated_days
    print(f"\n‚ö° GPU Speedup:")
    print(f"   ‚Ä¢ {speedup:.1f}x faster than CPU (30 days)")
    print(f"   ‚Ä¢ Time saved: {time_saved:.1f} days")
    print(f"\n‚úÖ GPU is working! Ready for full analysis.")
else:
    print(f"\n‚ö†Ô∏è  Warning: Estimated time seems slow. GPU may not be active.")

print("="*70)

## üîÑ Keep Colab Alive (For Multi-Day Runs)

**‚ö†Ô∏è Colab disconnects after ~12 hours of inactivity!**

### Option 1: Auto-Click Connect Button
1. Open browser console: Press `F12` (Chrome/Firefox) or `Cmd+Option+J` (Mac)
2. Paste this code and press Enter:
```javascript
function ClickConnect(){
  console.log("Clicking connect...");
  document.querySelector("colab-connect-button").click();
}
setInterval(ClickConnect, 60000); // Click every minute
```

### Option 2: Colab Pro/Pro+ (Recommended for Multi-Day)
- **Colab Pro** ($10/month): Longer sessions, better GPUs
- **Colab Pro+** ($50/month): Background execution, longest sessions

### Option 3: Run Cell Below (Keep Output Active)

In [None]:
# This helps prevent disconnection by keeping output active
from google.colab import output
output.enable_custom_widget_manager()

print("‚úÖ Output manager enabled - helps prevent disconnection")
print("üí° Still recommended: Use browser console auto-click (see above)")

## 6Ô∏è‚É£ Full GPU-Accelerated Analysis

**üö® BEFORE RUNNING:**
1. ‚úÖ Verify quick test (Cell 5) showed good speedup
2. ‚úÖ Set up keep-alive (Cell above)
3. ‚úÖ Consider Colab Pro/Pro+ for multi-day runs
4. ‚úÖ (Optional) Change output folder to Google Drive for persistent storage

**‚è∞ This will take 2-4 days even on GPU!**

In [None]:
from pySuStaIn.TorchOrdinalSustain import TorchOrdinalSustain
import time
from datetime import timedelta, datetime
import os

print("="*70)
print("üî¨ FULL GPU-ACCELERATED ANALYSIS")
print("="*70)

# Output folder (change to Google Drive if mounted)
output_folder = "./gpu_sustain_output"  
# output_folder = "/content/drive/MyDrive/sustain_output"  # Uncomment for Google Drive
os.makedirs(output_folder, exist_ok=True)

# Create GPU instance with full parameters
gpu_sustain = TorchOrdinalSustain(
    prob_nl, 
    prob_score, 
    score_vals, 
    biomarker_labels,
    N_startpoints=25,              # Full startpoints
    N_S_max=3,                     # 3 subtypes
    N_iterations_MCMC=100000,      # Full MCMC iterations
    output_folder=output_folder,
    dataset_name="ordinal_gpu_analysis",
    use_parallel_startpoints=False,
    seed=42,
    use_gpu=True,                  # GPU ENABLED
    device_id=0
)

# Verify GPU
if gpu_sustain.use_gpu:
    print("\n‚úÖ GPU confirmed active!")
    print(f"   ‚Ä¢ Device: {gpu_sustain.torch_backend.device_manager.device}")
else:
    print("\n‚ö†Ô∏è  WARNING: GPU not available, will use CPU (very slow!)")
    response = input("Continue anyway? (yes/no): ")
    if response.lower() != 'yes':
        raise RuntimeError("GPU not available. Please enable GPU runtime.")

# Show estimated runtime from quick test
try:
    print(f"\n‚è∞ Estimated runtime: ~{estimated_days:.1f} days")
except:
    print("\n‚è∞ Estimated runtime: ~2-4 days on T4 GPU, ~1.5-2 days on A100")

print("\nüö® IMPORTANT:")
print("   ‚Ä¢ Keep this tab/window open")
print("   ‚Ä¢ Keep browser console auto-click running (if using)")
print("   ‚Ä¢ Consider Colab Pro/Pro+ for better reliability")
print("   ‚Ä¢ Results automatically saved to pickle files")

print("\n" + "="*70)
input("Press Enter to start full analysis...")
print("="*70)

# Progress tracking
def print_progress(stage, current=None, total=None):
    """Print formatted progress update"""
    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    if current is not None and total is not None:
        print(f"[{timestamp}] {stage} ({current}/{total})")
    else:
        print(f"[{timestamp}] {stage}")

# START THE ANALYSIS
start_time = time.time()
start_timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

print("\n" + "="*70)
print_progress("üöÄ ANALYSIS STARTED")
print("="*70)

# Track progress through pickle file checks
n_s_max = 3
for n_s in range(1, n_s_max + 1):
    print(f"\n{'='*70}")
    print_progress(f"üìä Processing N={n_s} subtype model", n_s, n_s_max)
    print(f"{'='*70}")
    
    subtype_start = time.time()
    
    # Check if this subtype already exists (from pickle)
    pickle_file = os.path.join(output_folder, "pickle_files", 
                               f"ordinal_gpu_analysis_subtype{n_s-1}.pickle")
    
    if os.path.exists(pickle_file):
        print(f"   ‚úÖ Found existing results for N={n_s}")
        print(f"   üìÇ Pickle file: {pickle_file}")
    else:
        print(f"   ‚öôÔ∏è  Running inference for N={n_s} subtypes...")
        print(f"   ‚è∞ This may take several hours...")

# RUN!
print(f"\n{'='*70}")
print_progress("‚öôÔ∏è  Running SuStaIn algorithm (this will take days)...")
print(f"{'='*70}\n")

samples_sequence, samples_f, ml_subtype, prob_ml_subtype, \
ml_stage, prob_ml_stage, prob_subtype_stage = gpu_sustain.run_sustain_algorithm()

# Calculate runtime
end_time = time.time()
end_timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
runtime = end_time - start_time
runtime_str = str(timedelta(seconds=int(runtime)))
runtime_hours = runtime / 3600
runtime_days = runtime_hours / 24

# Results summary
print("\n" + "="*70)
print("‚úÖ ANALYSIS COMPLETE!")
print("="*70)
print(f"Started:  {start_timestamp}")
print(f"Finished: {end_timestamp}")
print(f"Runtime:  {runtime_str} ({runtime_hours:.1f} hours = {runtime_days:.1f} days)")

# Show speedup
if runtime_days < 30:
    speedup = 30 / runtime_days
    print(f"\n‚ö° GPU Speedup Achieved:")
    print(f"   ‚Ä¢ {speedup:.1f}x faster than CPU estimate")
    print(f"   ‚Ä¢ Time saved: {30 - runtime_days:.1f} days")

print(f"\nüìÅ Results saved to: {output_folder}")
print("\nüìä Output files:")
!ls -lh {output_folder}

# Show pickle files
pickle_folder = os.path.join(output_folder, "pickle_files")
if os.path.exists(pickle_folder):
    print("\nüì¶ Pickle files (models for each N):")
    !ls -lh {pickle_folder}

print("="*70)

## 7Ô∏è‚É£ Download Results

After analysis completes, download the results to your computer.

In [None]:
import os
from google.colab import files
import shutil

print("üì¶ Preparing results for download...")

# Create zip file
output_folder = "./gpu_sustain_output"  # Match the folder from Cell 6
zip_filename = "sustain_results"

if os.path.exists(output_folder):
    shutil.make_archive(zip_filename, 'zip', output_folder)
    print(f"\n‚úÖ Results packaged: {zip_filename}.zip")
    print(f"   Size: {os.path.getsize(zip_filename + '.zip') / 1024**2:.1f} MB")
    
    # Download
    print("\nüì• Starting download...")
    files.download(f"{zip_filename}.zip")
    print("‚úÖ Download complete!")
else:
    print(f"‚ùå Output folder not found: {output_folder}")
    print("   Make sure analysis has completed successfully.")

---

## üìö Additional Resources

- **SuStaIn Documentation**: [pySuStaIn GitHub](https://github.com/ucl-pond/pySuStaIn)
- **Google Colab Tips**: [Research Colab FAQ](https://research.google.com/colaboratory/faq.html)
- **GPU Optimization**: See `TorchOrdinalSustain.py` in the repository

## üÜò Troubleshooting

**GPU not detected?**
- Runtime ‚Üí Change runtime type ‚Üí Select GPU ‚Üí Save
- Restart runtime and re-run from Cell 1

**Session disconnected?**
- Results are saved in pickle files automatically
- Reload and check output folder for partial results
- Use browser console auto-click (see Cell above)

**Out of memory?**
- Try reducing `N_startpoints` to 10-15
- Upgrade to Colab Pro for more RAM

**Too slow?**
- Verify GPU is active (check Cell 5 output)
- Try A100 GPU (Colab Pro)
- Check CUDA is being used in test output

---

**Created by**: GPU-optimized SuStaIn pipeline  
**Version**: TorchOrdinalSustain with CUDA acceleration  
**Last Updated**: 2025-11-17

In [None]:
import os

# Create scripts directory
scripts_dir = "./parallel_scripts"
os.makedirs(scripts_dir, exist_ok=True)

# Template for each N value script
script_template = """#!/usr/bin/env python3
'''
GPU-Accelerated OrdinalSustain for N={n_subtypes} subtypes
Device: GPU {device_id}
Generated: {timestamp}
'''

import numpy as np
import sys
import time
from datetime import datetime, timedelta

# Add pySuStaIn to path
sys.path.insert(0, '/path/to/mphil')  # UPDATE THIS PATH!

from pySuStaIn.TorchOrdinalSustain import TorchOrdinalSustain

def print_progress(msg):
    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    print(f"[{{timestamp}}] N={n_subtypes} GPU{device_id}: {{msg}}")
    sys.stdout.flush()

print("="*70)
print_progress("STARTING")
print("="*70)

# Load your data here
print_progress("Loading data...")
# TODO: Replace with your actual data loading
# prob_nl = np.load('your_prob_nl.npy')
# prob_score = np.load('your_prob_score.npy')  
# score_vals = np.load('your_score_vals.npy')
# biomarker_labels = ['Domain1', 'Domain2', ...]

# For now, generate test data
def generate_test_data(n_subjects=7000, n_biomarkers=13, n_scores=3, seed=42):
    np.random.seed(seed)
    p_correct = 0.9
    p_nl_dist = np.full((n_scores + 1), (1 - p_correct) / n_scores)
    p_nl_dist[0] = p_correct
    p_score_dist = np.full((n_scores, n_scores + 1), (1 - p_correct) / n_scores)
    for score in range(n_scores):
        p_score_dist[score, score + 1] = p_correct
    data = np.random.choice(range(n_scores + 1), n_subjects * n_biomarkers,
                          replace=True, p=p_nl_dist)
    data = data.reshape((n_subjects, n_biomarkers))
    prob_nl = p_nl_dist[data]
    prob_score = np.zeros((n_subjects, n_biomarkers, n_scores))
    for n in range(n_biomarkers):
        for z in range(n_scores):
            for score in range(n_scores + 1):
                prob_score[data[:, n] == score, n, z] = p_score_dist[z, score]
    score_vals = np.tile(np.arange(1, n_scores + 1), (n_biomarkers, 1))
    biomarker_labels = [f"Biomarker_{{i}}" for i in range(n_biomarkers)]
    return prob_nl, prob_score, score_vals, biomarker_labels

prob_nl, prob_score, score_vals, biomarker_labels = generate_test_data()
print_progress(f"Data loaded: {{prob_nl.shape[0]}} subjects, {{prob_nl.shape[1]}} biomarkers")

# Initialize model
print_progress("Initializing TorchOrdinalSustain...")
output_folder = f"./output_N{n_subtypes}_GPU{device_id}"

sustain = TorchOrdinalSustain(
    prob_nl, 
    prob_score, 
    score_vals, 
    biomarker_labels,
    N_startpoints=25,
    N_S_max={n_subtypes},
    N_iterations_MCMC=10000,       # Adjust for your needs
    output_folder=output_folder,
    dataset_name=f"EDS_POTS_N{n_subtypes}",
    use_parallel_startpoints=False,
    seed=42,
    use_gpu=True,
    device_id={device_id}
)

if not sustain.use_gpu:
    print_progress("ERROR: GPU not available!")
    sys.exit(1)

print_progress(f"GPU {{sustain.torch_backend.device_manager.device}} confirmed")

# Run analysis
print_progress("Running SuStaIn algorithm...")
start_time = time.time()

samples_sequence, samples_f, ml_subtype, prob_ml_subtype, \\
ml_stage, prob_ml_stage, prob_subtype_stage = sustain.run_sustain_algorithm()

runtime = time.time() - start_time
runtime_str = str(timedelta(seconds=int(runtime)))

print("="*70)
print_progress(f"COMPLETE! Runtime: {{runtime_str}}")
print_progress(f"Results: {{output_folder}}")
print("="*70)
"""

# Generate script for each N value
from datetime import datetime
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

for n in range(1, 7):  # N=1 to N=6
    device_id = n - 1  # GPU 0-5
    script_content = script_template.format(
        n_subtypes=n,
        device_id=device_id,
        timestamp=timestamp
    )
    
    script_path = os.path.join(scripts_dir, f"run_N{n}_GPU{device_id}.py")
    with open(script_path, 'w') as f:
        f.write(script_content)
    
    # Make executable
    os.chmod(script_path, 0o755)
    
    print(f"‚úÖ Created: {script_path}")

# Create master launch script
launch_script = """#!/bin/bash
# Launch all N values in parallel across 6 GPUs
# Generated: {timestamp}

echo "Launching parallel GPU jobs..."
echo "======================================================================"

# Launch each script in background
for N in 1 2 3 4 5 6; do
    GPU=$((N-1))
    echo "Starting N=$N on GPU $GPU..."
    nohup python3 run_N${{N}}_GPU${{GPU}}.py > log_N${{N}}_GPU${{GPU}}.txt 2>&1 &
    echo "  PID: $!"
done

echo "======================================================================"
echo "All jobs launched!"
echo "Monitor with: tail -f log_N*_GPU*.txt"
echo "Check status: ps aux | grep run_N"
""".format(timestamp=timestamp)

launch_path = os.path.join(scripts_dir, "launch_all.sh")
with open(launch_path, 'w') as f:
    f.write(launch_script)
os.chmod(launch_path, 0o755)

print(f"\n‚úÖ Created master launcher: {launch_path}")
print(f"\nüìÇ All scripts in: {scripts_dir}/")
print("\nüìã To run on GBSH:")
print("   1. Copy these scripts to your GBSH server")
print("   2. Update data paths in each script")
print(f"   3. Run: cd {scripts_dir} && ./launch_all.sh")
print("   4. Monitor: tail -f log_N*.txt")

## 8Ô∏è‚É£ Generate Parallel Execution Scripts (For GBSH Multi-GPU)

If you want to run N=1-6 in parallel across 6 GPUs on your GBSH servers, run this cell to generate standalone Python scripts.