# Chess RL Agent - Training

**Strategy:** Split into safe 5-6 hour sessions (no browser babysitting needed)

**Workflow:**
1. Run cells 1-6 → Train 5 iterations (~5-6 hours)
2. Checkpoints auto-backup to Drive
3. Close browser, take a break
4. Re-run cells 1-5, then cell 8 → Resume for remaining iterations

**Total time:** 2 sessions × 5-6 hours = 10-12 hours for proof of concept

## 1. Verify GPU

In [None]:
import torch

print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}")
print(f"CUDA: {torch.cuda.is_available()}")

if not torch.cuda.is_available():
    print("\n⚠ Go to Runtime → Change runtime type → Select GPU (L4 recommended)")
else:
    print("\n✓ GPU ready")

## 2. Mount Google Drive

In [None]:
from google.colab import drive

drive.mount('/content/drive')
!mkdir -p /content/drive/MyDrive/chess_checkpoints

print("✓ Drive mounted")

## 3. Clone Repository

In [None]:
# Remove existing repo
!rm -rf rl_chess_agent

# Clone latest version
!git clone https://github.com/Capacap/rl_chess_agent.git
%cd rl_chess_agent

# Verify latest commit
!git log --oneline -1
print("\n✓ Repository ready")

## 4. Install Dependencies

In [None]:
# Install chess library (Colab has torch, numpy, etc.)
!pip install -q -r requirements-colab.txt

# Verify imports
import chess
from model.network import ChessNet

print(f"✓ Dependencies installed")
print(f"  chess: {chess.__version__}")
print(f"  torch: {torch.__version__}")

## 5. Configure Training

In [None]:
import datetime

# Training Strategy: Split into safe 5-6 hour sessions
# Session 1: Run 5 iterations, then resume for remaining iterations

# === Session Configuration ===
# Modify these values for each session:
SESSION_ITERATIONS = 5    # Iterations for THIS session (5-6 hours each)
TOTAL_TARGET = 10         # Total iterations you want overall

# Training parameters
GAMES_PER_ITER = 50       # Games per iteration
SIMULATIONS = 20          # MCTS simulations per move
ARENA_GAMES = 20          # Arena evaluation games

# Advanced (rarely need to change)
BATCH_SIZE = 256
EPOCHS = 5
LEARNING_RATE = 1e-3

# Auto-backup to Drive
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
CHECKPOINT_DIR = f"checkpoints/{timestamp}"
GDRIVE_BACKUP = "/content/drive/MyDrive/chess_checkpoints"

print("Split-Run Training Configuration:")
print(f"  This session: {SESSION_ITERATIONS} iterations (~{SESSION_ITERATIONS * 1.2:.0f}-{SESSION_ITERATIONS * 1.5:.0f} hours)")
print(f"  Target total: {TOTAL_TARGET} iterations")
print(f"  {GAMES_PER_ITER} games/iter, {SIMULATIONS} MCTS sims")
print(f"\nCheckpoints: {CHECKPOINT_DIR}")
print(f"Drive backup: {GDRIVE_BACKUP}")
print(f"\n💡 After this session, use Cell 8 to resume for remaining iterations")

## 6. Launch Training

In [None]:
# Start training session (5-6 hours for 5 iterations)
!python train.py \
  --iterations {SESSION_ITERATIONS} \
  --games-per-iter {GAMES_PER_ITER} \
  --simulations {SIMULATIONS} \
  --arena-games {ARENA_GAMES} \
  --batch-size {BATCH_SIZE} \
  --epochs {EPOCHS} \
  --lr {LEARNING_RATE} \
  --checkpoint-dir {CHECKPOINT_DIR} \
  --gdrive-backup-dir {GDRIVE_BACKUP}

print(f"\n✓ Session complete!")
print(f"Checkpoints auto-backed up to: {GDRIVE_BACKUP}/{timestamp}/")
print(f"\nNext: Run Cell 8 to continue training for remaining iterations")

## 7. Monitor Progress

Run this cell periodically to check training status

In [None]:
import os
import glob

# Count completed iterations
checkpoints = glob.glob(f"{CHECKPOINT_DIR}/iteration_*.pt")
pkl_checkpoints = glob.glob(f"{CHECKPOINT_DIR}/iteration_*.pkl")

completed = len([c for c in checkpoints if "_challenger" not in c])
print(f"Progress: {completed}/{TOTAL_TARGET} iterations")
print(f"Tournament files (.pkl): {len(pkl_checkpoints)}")

# Show latest checkpoint
if checkpoints:
    latest = sorted(checkpoints)[-1]
    print(f"\nLatest: {os.path.basename(latest)}")

# View recent log entries
print(f"\n--- Recent log ---")
!tail -15 {CHECKPOINT_DIR}/training.log 2>/dev/null || echo "Log not yet created"

# GPU status
print(f"\n--- GPU utilization ---")
!nvidia-smi --query-gpu=name,utilization.gpu,memory.used --format=csv,noheader

## 8. Resume Training (if interrupted)

In [None]:
# Resume training for remaining iterations
import os
import glob

# Find latest checkpoint
checkpoints = sorted(glob.glob(f"{CHECKPOINT_DIR}/iteration_*.pt"))
if checkpoints:
    latest = checkpoints[-1]
    completed = int(latest.split('_')[-1].split('.')[0])
    remaining = TOTAL_TARGET - completed
    
    print(f"Resume Training:")
    print(f"  Completed: {completed}/{TOTAL_TARGET} iterations")
    print(f"  Remaining: {remaining} iterations")
    print(f"  Estimated time: {remaining * 1.2:.0f}-{remaining * 1.5:.0f} hours")
    print(f"  Resuming from: {os.path.basename(latest)}\n")
    
    if remaining > 0:
        # Resume training
        !python train.py \
          --resume {latest} \
          --iterations {remaining} \
          --games-per-iter {GAMES_PER_ITER} \
          --simulations {SIMULATIONS} \
          --arena-games {ARENA_GAMES} \
          --batch-size {BATCH_SIZE} \
          --epochs {EPOCHS} \
          --checkpoint-dir {CHECKPOINT_DIR} \
          --gdrive-backup-dir {GDRIVE_BACKUP}
        
        print(f"\n✓ Training complete!")
        print(f"Final model: {CHECKPOINT_DIR}/iteration_{TOTAL_TARGET}.pkl")
    else:
        print("✓ Target iterations already reached!")
else:
    print("❌ No checkpoint found. Run Cell 6 first to start training.")

## Troubleshooting

**Out of memory:** Reduce `BATCH_SIZE = 128` or `GAMES_PER_ITER = 25`

**Too slow:** Reduce `SIMULATIONS = 15` or `ARENA_GAMES = 10`

**Download checkpoints:** Already in Google Drive at `/MyDrive/chess_checkpoints/{timestamp}/`