# Blazing Eights - Colab GPU Training

Clone repo → Train PPO agent (batched collection on CPU, PPO on GPU) → Download model & logs

**Game**: UNO variant with custom special cards (8=Wild, K=All draw, J=Skip, Swap=Swap hands).

## 1. Setup: Clone repo & install deps

In [None]:
# ====== CONFIG ======
GITHUB_USERNAME = "YurenHao0426"
REPO_NAME = "blazing8"
# ====================

!git clone https://github.com/{GITHUB_USERNAME}/{REPO_NAME}.git
%cd {REPO_NAME}
!pip install -q torch numpy tqdm

In [None]:
import torch
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_mem / 1024**3:.1f} GB")
else:
    print("WARNING: No GPU detected. Go to Runtime → Change runtime type → GPU")

## 2. Train

Batched collection: runs many games in parallel with a single forward pass per step.
- `--collect_batch`: number of parallel games (higher = faster, more VRAM). Default = 64.
- Game simulation on CPU, PPO gradient updates on GPU (auto-detected).

In [None]:
# 2-player training (GPU PPO + batched collection)
!python train.py \
    --num_players 2 \
    --episodes 200000 \
    --collect_batch 128 \
    --save_path blazing_ppo_2p

In [None]:
# (Optional) 3-player training
# !python train.py --num_players 3 --episodes 300000 --collect_batch 128 --save_path blazing_ppo_3p

# (Optional) Larger batch for faster throughput
# !python train.py --num_players 2 --episodes 200000 --collect_batch 256 --save_path blazing_ppo_2p

# (Optional) Skip greedy warmup
# !python train.py --num_players 2 --episodes 200000 --greedy_warmup 0 --save_path blazing_ppo_2p_no_warmup

In [None]:
# Show training log
import pandas as pd
df = pd.read_csv("blazing_ppo_2p_log.csv")
print(df.to_string(index=False))

## 3. Download model & logs

In [ ]:
from google.colab import files
import glob

# Download final model(s)
for f in glob.glob("*_final.pt"):
    print(f"Downloading {f}...")
    files.download(f)

# Download training log(s)
for f in glob.glob("*_log.csv"):
    print(f"Downloading {f}...")
    files.download(f)

## 4. Push model to GitHub (Option B)

Push trained .pt files to a `models/` directory in the repo.

You'll need a **GitHub Personal Access Token** (PAT).
Create one at: https://github.com/settings/tokens → Generate new token (classic) → check `repo` scope.

In [None]:
from getpass import getpass
import os

TOKEN = getpass("Enter your GitHub PAT: ")

# Configure git
!git config user.email "colab@training.ai"
!git config user.name "Colab Training"

# Create models dir, move .pt files there
os.makedirs("models", exist_ok=True)
!mv *_final.pt models/
!ls -lh models/

# Remove .pt from gitignore temporarily for models/ dir
with open(".gitignore", "r") as f:
    lines = f.readlines()
with open(".gitignore", "w") as f:
    for line in lines:
        f.write(line)
    f.write("\n# Allow models dir\n!models/\n!models/*.pt\n")

!git add models/ .gitignore
!git commit -m "Add trained models from Colab GPU"
!git push https://{TOKEN}@github.com/{GITHUB_USERNAME}/{REPO_NAME}.git main

## 5. Quick evaluation

In [ ]:
import sys
sys.path.insert(0, ".")
from train import PolicyValueNet, evaluate_vs_greedy_batch

device = "cpu"
model = PolicyValueNet().to(device)

import glob
final_models = glob.glob("*_final.pt") + glob.glob("models/*_final.pt")
if final_models:
    ckpt = torch.load(final_models[0], map_location=device, weights_only=True)
    model.load_state_dict(ckpt["model"])
    model.eval()
    print(f"Loaded: {final_models[0]}")
    print(f"Trained for {ckpt.get('episode', '?')} episodes")
    print()

    for n in [2, 3, 4]:
        wr = evaluate_vs_greedy_batch(model, num_players=n, num_games=2000, device=device)
        print(f"  {n} players: win rate = {wr:.1%} (random baseline: {1/n:.1%})")
else:
    print("No model found. Train first!")