# 🫀 ECG-LLM Training Runner for GitHub Sync

This notebook runs your ECG-LLM code from GitHub on Colab GPU.

**Workflow:**
1. Develop in Windsurf locally
2. Push to GitHub
3. Run this notebook in Colab for GPU training
4. Results auto-save to Google Drive

In [None]:
# Check GPU setup
!nvidia-smi

import torch
print(f"🔥 PyTorch: {torch.__version__}")
print(f"🖥️  CUDA: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"🚀 GPU: {torch.cuda.get_device_name()}")

print("\n💾 Available storage:")
!df -h | head -2

In [None]:
# Install dependencies
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install wfdb neurokit2 pandas numpy matplotlib seaborn
!pip install opencv-python scikit-learn tqdm
!pip install transformers timm efficientnet-pytorch
!pip install Pillow

print("✅ All packages installed!")

In [None]:
# Clone your GitHub repository
# REPLACE with your actual GitHub repo URL
GITHUB_REPO = "https://github.com/yourusername/ecg-llm.git"  # CHANGE THIS!

import os
if os.path.exists('ecg-llm'):
    print("📁 Repository already exists, pulling latest changes...")
    %cd ecg-llm
    !git pull
else:
    print("📥 Cloning repository...")
    !git clone {GITHUB_REPO}
    %cd ecg-llm

print("\n📂 Repository contents:")
!ls -la

In [None]:
# Setup Google Drive for results
from google.colab import drive
drive.mount('/content/drive')

# Create results directory
results_dir = '/content/drive/MyDrive/ECG_Training_Results'
os.makedirs(results_dir, exist_ok=True)
os.makedirs(f'{results_dir}/models', exist_ok=True)
os.makedirs(f'{results_dir}/experiments', exist_ok=True)

print(f"✅ Results will be saved to: {results_dir}")

In [None]:
# Download PTB-XL dataset (only run once)
import urllib.request
import zipfile

data_dir = "data"
dataset_name = "ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.3"

if not os.path.exists(f"{data_dir}/{dataset_name}"):
    print("📥 Downloading PTB-XL dataset (this takes 5-10 minutes)...")
    
    url = "https://physionet.org/static/published-projects/ptb-xl/ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.3.zip"
    
    os.makedirs(data_dir, exist_ok=True)
    
    # Download
    urllib.request.urlretrieve(url, "ptb-xl.zip")
    print("✅ Download complete!")
    
    # Extract
    print("📦 Extracting dataset...")
    with zipfile.ZipFile("ptb-xl.zip", 'r') as zipf:
        zipf.extractall(data_dir)
    
    # Cleanup
    os.remove("ptb-xl.zip")
    print("✅ Dataset ready!")
else:
    print("✅ PTB-XL dataset already available!")

# Verify dataset
dataset_path = f"{data_dir}/{dataset_name}"
if os.path.exists(f"{dataset_path}/ptbxl_database.csv"):
    import pandas as pd
    db = pd.read_csv(f"{dataset_path}/ptbxl_database.csv")
    print(f"📊 Dataset verified: {len(db):,} ECG records")
else:
    print("❌ Dataset verification failed")

In [None]:
# Run bootstrap training (Phase 1)
print("🚀 Starting Bootstrap R-Peak Training...")

# Your bootstrap trainer adapted for Colab
!python bootstrap_trainer.py --device cuda --batch-size 16 --epochs 25 --output-dir {results_dir}/bootstrap

print("✅ Bootstrap training completed!")

In [None]:
# Run advanced training (Phase 2)
print("🚀 Starting Advanced Multi-Model Training...")

# Your advanced trainer
!python training/advanced_trainer.py --model-type ensemble --device cuda --epochs 50 --batch-size 8 --output-dir {results_dir}/advanced

print("✅ Advanced training completed!")

In [None]:
# Save all results to Google Drive
import shutil
from datetime import datetime

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
session_dir = f"{results_dir}/training_session_{timestamp}"
os.makedirs(session_dir, exist_ok=True)

print("💾 Saving results to Google Drive...")

# Save models
model_files = ["best_model.pth", "latest_checkpoint.pth"]
for model_file in model_files:
    if os.path.exists(model_file):
        shutil.copy2(model_file, f"{session_dir}/{model_file}")
        print(f"✅ Saved {model_file}")

# Save experiments
if os.path.exists("experiments"):
    shutil.copytree("experiments", f"{session_dir}/experiments", dirs_exist_ok=True)
    print("✅ Saved experiments")

# Save any plots/results
result_patterns = ["*.png", "*.jpg", "*.json", "training_*.csv"]
import glob
for pattern in result_patterns:
    files = glob.glob(pattern)
    for file in files:
        shutil.copy2(file, f"{session_dir}/{file}")
        print(f"✅ Saved {file}")

print(f"\n🎉 All results saved to: {session_dir}")
print("You can access them from Google Drive!")

In [None]:
# Quick model test
print("🧪 Testing trained model...")

if os.path.exists("test_real_data.py"):
    !python test_real_data.py --model-path best_model.pth --num-samples 5
else:
    print("⚠️  Test script not found, skipping model test")

print("✅ Testing complete!")

## 🎯 Training Complete!

Your ECG-LLM model has been trained on Google Colab GPU using your latest Windsurf code.

**Next Steps:**
1. Check Google Drive for your training results
2. Download the best model for local testing
3. Continue development in Windsurf
4. Push updates and re-run this notebook as needed

**Perfect Workflow:**
- 💻 Develop in Windsurf (fast, local)
- 🚀 Train on Colab (GPU power)
- 🗂️ Results in Drive (persistent)
- 🔄 Repeat and iterate!
