# Fire & Water RL Training on Google Colab

This notebook allows you to train reinforcement learning agents for the Fire & Water cooperative game on Google Colab with GPU acceleration.

## Setup Instructions:
1. **Enable GPU**: Runtime → Change runtime type → Hardware accelerator → GPU
2. **Upload existing checkpoints** (optional - if resuming training)
3. **Run all cells in order**
4. **Training will save checkpoints** to `checkpoints/` directory
5. **Download checkpoints** when complete for local visualization

---

## 1. Clone Repository and Install Dependencies

In [None]:
# Clone the repository
!git clone https://github.com/Ben-jpg-del/CalHacks.git
%cd CalHacks

In [None]:
# Install required packages
!pip install torch numpy pygame wandb -q

print("✅ Dependencies installed!")

## 2. Verify GPU Availability

In [None]:
import torch

# Check GPU
print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())

if torch.cuda.is_available():
    print("GPU Device:", torch.cuda.get_device_name(0))
    print("GPU Memory:", torch.cuda.get_device_properties(0).total_memory / 1e9, "GB")
else:
    print("⚠️ WARNING: No GPU detected. Training will be slower on CPU.")
    print("Enable GPU: Runtime → Change runtime type → Hardware accelerator → GPU")

## 3. Upload Existing Checkpoints (OPTIONAL - For Resuming Training)

**Skip this cell if starting fresh training.**

If you have existing checkpoints from a previous training session, upload them here to continue training.

### Option A: Upload checkpoints.zip

In [None]:
from google.colab import files
import os

# Upload checkpoints.zip file
print("📤 Upload your checkpoints.zip file:")
print("(If starting fresh training, skip this cell)\n")

uploaded = files.upload()

if 'checkpoints.zip' in uploaded:
    # Extract checkpoints
    !unzip -q checkpoints.zip
    print("✅ Checkpoints extracted!")
    
    # List uploaded checkpoints
    if os.path.exists('checkpoints'):
        checkpoints = sorted([f for f in os.listdir('checkpoints') if f.endswith('.pth')])
        print(f"\nFound {len(checkpoints)} checkpoint files")
        
        # Find episodes
        episodes = set()
        for cp in checkpoints:
            if 'ep' in cp:
                ep_num = cp.split('ep')[1].split('.')[0]
                episodes.add(int(ep_num))
        
        if episodes:
            latest_ep = max(episodes)
            print(f"\n📊 Available episodes: {sorted(episodes)}")
            print(f"💡 Latest checkpoint: Episode {latest_ep}")
            print(f"\n⚙️ Set RESUME_EPISODE = {latest_ep} in the configuration cell below")
else:
    print("No checkpoints uploaded. Starting fresh training.")

### Option B: Upload Individual Checkpoint Files

In [None]:
# Alternative: Upload specific checkpoint files
print("📤 Upload checkpoint files (fire and water .pth files):")
print("Select BOTH fire_staged_dqn_epXXX.pth AND water_staged_dqn_epXXX.pth\n")

uploaded = files.upload()

if uploaded:
    # Create checkpoints directory
    !mkdir -p checkpoints
    
    # Move uploaded files to checkpoints directory
    for filename in uploaded.keys():
        !mv "{filename}" checkpoints/
        print(f"✅ Moved {filename} to checkpoints/")
    
    # Find episode number
    for filename in uploaded.keys():
        if 'ep' in filename:
            ep_num = filename.split('ep')[1].split('.')[0]
            print(f"\n💡 Detected episode {ep_num}")
            print(f"⚙️ Set RESUME_EPISODE = {ep_num} in the configuration cell below")
            break

## 4. Test Environment Loading

In [None]:
from game_environment import FireWaterEnv
from map_config import LevelLibrary
from map_1 import LevelLibrary as Map1Library

# Test tutorial map
tutorial_level = LevelLibrary.get_tutorial_level()
print(f"Tutorial Map: {tutorial_level.name}")
print(f"  Dimensions: {tutorial_level.width}x{tutorial_level.height}")
print(f"  Platforms: {len(tutorial_level.base_solids)}")

# Test custom tower map
tower_level = Map1Library.get_tower_level()
print(f"\nCustom Map: {tower_level.name}")
print(f"  Dimensions: {tower_level.width}x{tower_level.height}")
print(f"  Platforms: {len(tower_level.base_solids)}")
print(f"  Hazards: {list(tower_level.get_hazards().keys())}")

# Create environment
env = FireWaterEnv(level=tower_level)
fire_obs, water_obs = env.reset()

print(f"\n✅ Environment loaded successfully!")
print(f"Observation shape: {fire_obs.shape}")

## 5. Configure Training Parameters

**IMPORTANT**: If you uploaded checkpoints, set `RESUME_EPISODE` to the episode number you want to continue from.

In [None]:
# ============================================
# TRAINING CONFIGURATION
# ============================================

# Map selection
MAP_NAME = "tower"  # Options: "tutorial" or "tower"

# Training parameters
NUM_EPISODES = 2000      # Total episodes to train (will continue from RESUME_EPISODE)
MAX_STEPS = 3000         # Max steps per episode
LEARNING_RATE = 3e-4     # Learning rate for optimizer
BATCH_SIZE = 64          # Batch size for training

# Logging
LOG_FREQUENCY = 10       # Log every N episodes
SAVE_FREQUENCY = 100     # Save checkpoint every N episodes

# Weights & Biases (optional)
USE_WANDB = False        # Set to True to enable W&B logging
WANDB_PROJECT = "firewater-colab"  # W&B project name

# ======================================
# RESUME FROM CHECKPOINT
# ======================================
# Set this to continue training from existing checkpoints
# Example: If you uploaded episode 1500 checkpoints, set RESUME_EPISODE = 1500

RESUME_EPISODE = None    # Set to episode number (e.g., 1500) to resume

# ======================================

print("Training Configuration:")
print(f"  Map: {MAP_NAME}")
print(f"  Total Episodes: {NUM_EPISODES}")
if RESUME_EPISODE:
    print(f"  ⚠️  RESUMING from Episode {RESUME_EPISODE}")
    print(f"  Will train episodes {RESUME_EPISODE} → {NUM_EPISODES}")
else:
    print(f"  Starting fresh from Episode 0")
print(f"  Max Steps: {MAX_STEPS}")
print(f"  Learning Rate: {LEARNING_RATE}")
print(f"  Batch Size: {BATCH_SIZE}")
print(f"  W&B Logging: {USE_WANDB}")

## 6. Verify Checkpoints (If Resuming)

In [None]:
import os

if RESUME_EPISODE is not None:
    fire_checkpoint = f'checkpoints/fire_staged_dqn_ep{RESUME_EPISODE}.pth'
    water_checkpoint = f'checkpoints/water_staged_dqn_ep{RESUME_EPISODE}.pth'
    
    fire_exists = os.path.exists(fire_checkpoint)
    water_exists = os.path.exists(water_checkpoint)
    
    print(f"Checking for episode {RESUME_EPISODE} checkpoints...")
    print(f"  Fire checkpoint: {'✅ Found' if fire_exists else '❌ NOT FOUND'}")
    print(f"  Water checkpoint: {'✅ Found' if water_exists else '❌ NOT FOUND'}")
    
    if fire_exists and water_exists:
        print(f"\n✅ Ready to resume from episode {RESUME_EPISODE}!")
    else:
        print(f"\n❌ ERROR: Checkpoint files not found!")
        print(f"Upload the checkpoint files or set RESUME_EPISODE = None")
        
        # Show available checkpoints
        if os.path.exists('checkpoints'):
            available = sorted([f for f in os.listdir('checkpoints') if 'ep' in f and f.endswith('.pth')])
            if available:
                print(f"\nAvailable checkpoints:")
                for cp in available[:10]:  # Show first 10
                    print(f"  - {cp}")
else:
    print("Starting fresh training (no resume checkpoint specified)")

## 7. Setup Weights & Biases (Optional)

If you want to track your training with W&B, run this cell and login:

In [None]:
if USE_WANDB:
    import wandb
    wandb.login()
    print("✅ Logged in to Weights & Biases")
else:
    print("W&B logging disabled. Set USE_WANDB=True to enable.")

## 8. Train Agents

This will train both fire and water agents using DQN with staged milestone rewards.

**Note**: 
- If `RESUME_EPISODE` is set, training will continue from that checkpoint
- Training can take several hours depending on NUM_EPISODES
- Monitor the output for progress

In [None]:
# Import training function
from train_stage_milestone_dqn import train_dqn_with_staged_rewards

# Start training
print("Starting training...")
print("=" * 60)

if RESUME_EPISODE:
    print(f"📂 Loading checkpoints from episode {RESUME_EPISODE}")
    print(f"🔄 Will continue training to episode {NUM_EPISODES}")
else:
    print(f"🆕 Starting fresh training")

print("=" * 60 + "\n")

train_dqn_with_staged_rewards(
    use_wandb=USE_WANDB,
    wandb_project=WANDB_PROJECT,
    resume_episode=RESUME_EPISODE,
    map_name=MAP_NAME
)

print("\n✅ Training complete!")

## 9. List Available Checkpoints

In [None]:
import os

# List all checkpoint files
if os.path.exists('checkpoints'):
    checkpoints = sorted([f for f in os.listdir('checkpoints') if f.endswith('.pth')])
    
    print(f"Found {len(checkpoints)} checkpoint files:\n")
    
    # Group by episode
    episodes = set()
    for cp in checkpoints:
        if 'ep' in cp:
            ep_num = cp.split('ep')[1].split('.')[0]
            episodes.add(int(ep_num))
    
    for ep in sorted(episodes):
        fire_exists = os.path.exists(f'checkpoints/fire_staged_dqn_ep{ep}.pth')
        water_exists = os.path.exists(f'checkpoints/water_staged_dqn_ep{ep}.pth')
        status = "✅" if (fire_exists and water_exists) else "⚠️"
        print(f"{status} Episode {ep}: Fire={fire_exists}, Water={water_exists}")
    
    # Show final checkpoints
    if os.path.exists('checkpoints/fire_final.pth'):
        print("\n✅ Final checkpoint: fire_final.pth, water_final.pth")
else:
    print("No checkpoints directory found. Train agents first.")

## 10. Download Checkpoints

Download trained models to use locally or to resume training in another session:

In [None]:
# Zip all checkpoints for easy download
!zip -r checkpoints.zip checkpoints/

print("✅ Checkpoints zipped!")
print("\nDownload 'checkpoints.zip' from the Files panel on the left.")
print("\n💡 TIP: Save this file to continue training in a future Colab session!")

# Alternative: Download specific checkpoint
from google.colab import files

# Uncomment to download final checkpoints directly:
# files.download('checkpoints/fire_final.pth')
# files.download('checkpoints/water_final.pth')

# Or download the complete zip:
# files.download('checkpoints.zip')

## 11. Evaluate Trained Agents

Test your agents' performance (no visualization on Colab):

In [None]:
from example_dqn import DQNAgent
from game_environment import FireWaterEnv
import numpy as np

# Specify checkpoint to evaluate
EVAL_EPISODE = 1500  # Change this to the episode you want to evaluate

# Load level
if MAP_NAME == "tower":
    level = Map1Library.get_tower_level()
else:
    level = LevelLibrary.get_tutorial_level()

# Create environment
env = FireWaterEnv(level=level)

# Create and load agents
device = 'cuda' if torch.cuda.is_available() else 'cpu'
fire_agent = DQNAgent(state_dim=52, action_dim=6, device=device)
water_agent = DQNAgent(state_dim=52, action_dim=6, device=device)

fire_agent.load(f'checkpoints/fire_staged_dqn_ep{EVAL_EPISODE}.pth')
water_agent.load(f'checkpoints/water_staged_dqn_ep{EVAL_EPISODE}.pth')

# Set to evaluation mode (no exploration)
fire_agent.epsilon = 0.0
water_agent.epsilon = 0.0

# Run evaluation episodes
num_eval_episodes = 100
successes = 0
total_rewards = []
episode_lengths = []

print(f"Evaluating episode {EVAL_EPISODE} agents over {num_eval_episodes} episodes...\n")

for ep in range(num_eval_episodes):
    fire_obs, water_obs = env.reset()
    done = False
    episode_reward = 0
    steps = 0
    
    while not done:
        # Get actions (greedy)
        fire_action = fire_agent.select_action(fire_obs, training=False)
        water_action = water_agent.select_action(water_obs, training=False)
        
        # Step
        (fire_obs, water_obs), (fire_reward, water_reward), (fire_done, water_done), info = env.step(
            fire_action, water_action
        )
        
        episode_reward += fire_reward + water_reward
        steps += 1
        done = fire_done or water_done
    
    if info.get('both_won', False):
        successes += 1
    
    total_rewards.append(episode_reward)
    episode_lengths.append(steps)
    
    if (ep + 1) % 10 == 0:
        print(f"Episode {ep+1}/{num_eval_episodes}: Success rate = {successes/(ep+1)*100:.1f}%")

# Print results
print("\n" + "=" * 60)
print("EVALUATION RESULTS")
print("=" * 60)
print(f"Checkpoint: Episode {EVAL_EPISODE}")
print(f"Map: {level.name}")
print(f"Episodes: {num_eval_episodes}")
print(f"Success Rate: {successes/num_eval_episodes*100:.1f}% ({successes}/{num_eval_episodes})")
print(f"Average Reward: {np.mean(total_rewards):.2f} ± {np.std(total_rewards):.2f}")
print(f"Average Length: {np.mean(episode_lengths):.1f} ± {np.std(episode_lengths):.1f} steps")
print("=" * 60)

## 12. Visualize Locally (Instructions)

To visualize your trained agents:

1. **Download checkpoints** from Colab (see cell above)
2. **Extract** `checkpoints.zip` to your local repository
3. **Run visualization** locally:

```bash
# On your local machine:
python visualize.py trained checkpoints/fire_staged_dqn_ep1500.pth checkpoints/water_staged_dqn_ep1500.pth --map tower
```

**Note**: Pygame doesn't work on Colab, so visualization must be done locally.

## 13. Resume Training Workflow

If you need to continue training across multiple Colab sessions:

### Session 1:
1. Train for X episodes
2. Download `checkpoints.zip`

### Session 2:
1. Upload `checkpoints.zip` (Cell 3)
2. Set `RESUME_EPISODE` to last saved episode (Cell 5)
3. Set `NUM_EPISODES` to desired end episode
4. Run training (Cell 8)
5. Download updated `checkpoints.zip`

### Example:
```python
# Session 1
NUM_EPISODES = 2000
RESUME_EPISODE = None
# Training: Episode 0 → 2000

# Session 2 (after uploading checkpoints)
NUM_EPISODES = 4000
RESUME_EPISODE = 2000
# Training: Episode 2000 → 4000

# Session 3 (after uploading checkpoints)
NUM_EPISODES = 6000
RESUME_EPISODE = 4000
# Training: Episode 4000 → 6000
```