# GPU-Accelerated Fire & Water Training (Google Colab)

This notebook runs **1000+ parallel environments on GPU** for massive speedup.

## Key Features:
- **Pure GPU Physics**: All game logic runs on GPU
- **Massive Parallelism**: Train 1000+ environments simultaneously
- **50-100x Speedup**: Compared to CPU-based training
- **Compatible Checkpoints**: Works with visualize.py
- **Efficient Memory**: Pre-allocated GPU buffers

## Expected Performance:
- **CPU Training**: ~100-200 steps/sec (8-24 envs)
- **GPU Training**: ~10,000-50,000 steps/sec (1024+ envs)

---

## 1. Clone Repository and Setup

In [None]:
# Clone the GPU training branch
!git clone -b gpu-training https://github.com/Ben-jpg-del/CalHacks.git
%cd CalHacks

print("\n✅ Repository cloned!")
print("Current directory:", !pwd)

## 2. Install Dependencies

In [None]:
# Install required packages (PyTorch should already be on Colab)
!pip install wandb -q

print("✅ Dependencies installed!")

## 3. Check GPU

In [None]:
import torch
import numpy as np
import os

# Check GPU
print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())

if torch.cuda.is_available():
    print("\n✅ GPU DETECTED")
    print("GPU Device:", torch.cuda.get_device_name(0))
    total_mem = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"GPU Memory: {total_mem:.1f} GB")
    
    # Check if L4 or T4
    gpu_name = torch.cuda.get_device_name(0)
    if 'L4' in gpu_name:
        print("\n🚀 L4 GPU DETECTED - Excellent for training!")
        print("💡 Recommended: NUM_ENVS = 1024-2048")
    elif 'T4' in gpu_name:
        print("\n⚠️  T4 GPU - Good but limited memory")
        print("💡 Recommended: NUM_ENVS = 512-1024")
    else:
        print(f"\n💡 Recommended environments: {int(total_mem * 100)} (approx)")
else:
    print("\n❌ NO GPU DETECTED")
    print("Enable GPU: Runtime → Change runtime type → Hardware accelerator → GPU")

## 4. Configuration

Adjust these parameters based on your GPU:

In [None]:
# ============================================
# GPU TRAINING CONFIGURATION
# ============================================

# Environment settings
NUM_ENVS = 1024          # Parallel environments (T4: use 512, L4: use 1024-2048)
NUM_EPISODES = 5000      # Total episodes
BATCH_SIZE = 1024        # Neural network batch size

# Learning parameters
LEARNING_RATE = 3e-4
GAMMA = 0.99
EPSILON_START = 1.0
EPSILON_END = 0.01
EPSILON_DECAY = 0.995

# Buffer and update frequency
BUFFER_CAPACITY = 1000000
TARGET_UPDATE_FREQ = 1000

# Checkpoint settings
SAVE_DIR = 'checkpoints_gpu'
SAVE_FREQ = 100          # Save every N episodes
LOG_FREQ = 10            # Log every N episodes

# Map distribution
MAP_DISTRIBUTION = {
    'tutorial': 0.5,
    'tower': 0.3,
    'map2': 0.2,
}

# Weights & Biases
USE_WANDB = False
WANDB_PROJECT = 'firewater-gpu'

# Device
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'

# ============================================

print("Configuration:")
print("=" * 60)
print(f"Parallel Environments: {NUM_ENVS}")
print(f"Episodes: {NUM_EPISODES}")
print(f"Batch Size: {BATCH_SIZE}")
print(f"Device: {DEVICE}")
print(f"Save Directory: {SAVE_DIR}")
print("=" * 60)

# Estimate memory usage
if DEVICE == 'cuda':
    obs_mem = NUM_ENVS * 52 * 4 / 1e6  # MB
    buffer_mem = BUFFER_CAPACITY * 52 * 4 * 2 / 1e9  # GB
    print(f"\nEstimated GPU Memory:")
    print(f"  Observations: {obs_mem:.1f} MB")
    print(f"  Replay Buffer: {buffer_mem:.2f} GB")
    print(f"  Total (approx): ~{buffer_mem + 0.5:.2f} GB")

## 5. Test GPU Environment

Quick test to ensure everything works:

In [None]:
import sys

# Add gpu_training to path
sys.path.insert(0, '/content/CalHacks/gpu_training')
# Add parent CalHacks directory to path (for map imports)
sys.path.insert(0, '/content/CalHacks')

print("Python path:")
for p in sys.path[:3]:
    print(f"  {p}")
print()

# Now imports should work
from torch_env import TorchFireWaterEnv
from map_config import LevelLibrary

print("✅ Imports successful!")
print("\nTesting GPU environment with 16 parallel instances...\n")

# Create small test environment
test_configs = [LevelLibrary.get_tutorial_level() for _ in range(16)]
test_env = TorchFireWaterEnv(
    num_envs=16,
    level_configs=test_configs,
    device=DEVICE
)

print("✅ Environment created")

# Test reset
fire_obs, water_obs = test_env.reset()
print(f"✅ Reset successful")
print(f"   Fire obs shape: {fire_obs.shape}")
print(f"   Water obs shape: {water_obs.shape}")
print(f"   Device: {fire_obs.device}")

# Test step
fire_actions = torch.randint(0, 6, (16,), device=DEVICE)
water_actions = torch.randint(0, 6, (16,), device=DEVICE)

(fire_obs, water_obs), (fire_rewards, water_rewards), \
(fire_dones, water_dones), infos = test_env.step(fire_actions, water_actions)

print(f"\n✅ Step successful")
print(f"   Rewards: fire={fire_rewards[0]:.2f}, water={water_rewards[0]:.2f}")
print(f"   Dones: fire={fire_dones[0]}, water={water_dones[0]}")
print(f"\n🚀 GPU environment is ready!")

## 6. Setup Weights & Biases (Optional)

In [None]:
if USE_WANDB:
    import wandb
    wandb.login()
    print("✅ Logged in to W&B")
else:
    print("W&B disabled. Set USE_WANDB=True to enable.")

## 7. Train Agents

Start GPU-accelerated training:

In [None]:
# Force reload the module to get latest code
import importlib
import sys

# Remove cached module if it exists
if 'train_gpu' in sys.modules:
    del sys.modules['train_gpu']

# Now import fresh
from train_gpu import train_gpu

print("🚀 Starting GPU-Accelerated Training...")
print("=" * 80)

train_gpu(
    num_envs=NUM_ENVS,
    num_episodes=NUM_EPISODES,
    batch_size=BATCH_SIZE,
    learning_rate=LEARNING_RATE,
    gamma=GAMMA,
    epsilon_start=EPSILON_START,
    epsilon_end=EPSILON_END,
    epsilon_decay=EPSILON_DECAY,
    target_update_freq=TARGET_UPDATE_FREQ,
    buffer_capacity=BUFFER_CAPACITY,
    save_dir=SAVE_DIR,
    save_freq=SAVE_FREQ,
    log_freq=LOG_FREQ,
    device=DEVICE,
    use_wandb=USE_WANDB,
    wandb_project=WANDB_PROJECT,
    map_distribution=MAP_DISTRIBUTION
)

print("\n✅ Training complete!")

## 8. List Checkpoints

In [None]:
if os.path.exists(SAVE_DIR):
    checkpoints = sorted([d for d in os.listdir(SAVE_DIR) if d.startswith('checkpoint_')])
    
    print(f"Checkpoints in {SAVE_DIR}:")
    print("=" * 60)
    
    for cp in checkpoints:
        cp_dir = os.path.join(SAVE_DIR, cp)
        if os.path.isdir(cp_dir):
            files = os.listdir(cp_dir)
            fire_exists = 'fire_agent.pth' in files
            water_exists = 'water_agent.pth' in files
            status = "✅" if (fire_exists and water_exists) else "⚠️"
            print(f"{status} {cp}")
    
    final_dir = os.path.join(SAVE_DIR, 'final')
    if os.path.exists(final_dir):
        print(f"\n✅ Final checkpoint saved")
    
    print("\n" + "=" * 60)
    print(f"Total checkpoints: {len(checkpoints)}")
else:
    print(f"No checkpoints found in {SAVE_DIR}")

## 9. Download Checkpoints

In [None]:
import shutil
from google.colab import files

# Zip checkpoints
shutil.make_archive('checkpoints_gpu', 'zip', SAVE_DIR)

print("✅ Checkpoints zipped!")
print(f"\nFile: checkpoints_gpu.zip")
print(f"Size: {os.path.getsize('checkpoints_gpu.zip') / 1e6:.1f} MB")
print("\nDownloading...")

# Auto-download
files.download('checkpoints_gpu.zip')

## 10. Visualize Locally (Instructions)

To visualize your trained agents:

1. **Download** `checkpoints_gpu.zip` from Colab (ran in cell above)
2. **Extract** to your local CalHacks repository
3. **Run visualization**:

```bash
# From CalHacks directory
python visualize.py trained checkpoints_gpu/final/fire_agent.pth checkpoints_gpu/final/water_agent.pth --map tutorial

# Try different maps
python visualize.py trained checkpoints_gpu/final/fire_agent.pth checkpoints_gpu/final/water_agent.pth --map tower
python visualize.py trained checkpoints_gpu/final/fire_agent.pth checkpoints_gpu/final/water_agent.pth --map map2
```

The GPU-trained models are fully compatible with the existing visualization system!

## Performance Comparison

Expected speedup vs CPU training:

| Method | Envs | Steps/sec | Time for 5000 eps |
|--------|------|-----------|-------------------|
| CPU Sequential | 24 | ~200 | ~100 hours |
| CPU Multiprocessing | 24 | ~1,000 | ~20 hours |
| **GPU (This)** | **1024** | **~20,000** | **~1-2 hours** |

With GPU training on Colab, you can achieve in **1-2 hours** what would take **100+ hours** on CPU!