# Bettafish - AI Catan Player

AlphaBeta search, MCTS, and AlphaZero on a fast bitboard engine.

**Runtime**: Use GPU (T4) for neural net training, or high-RAM CPU for multi-core search benchmarks.

Go to **Runtime > Change runtime type** and select your preferred hardware.

## 1. Setup

In [None]:
# Install uv (fast Python package manager)
!curl -LsSf https://astral.sh/uv/install.sh | sh
import os
os.environ["PATH"] = f"{os.path.expanduser('~')}/.local/bin:{os.environ['PATH']}"

In [None]:
# Clone the repo
!git clone https://github.com/Samffprice/bettafish.git
os.chdir("bettafish")
!pwd

In [None]:
# Install all dependencies (including Cython for the fast bitboard engine)
# Uses the system Python (Colab's Python 3.11+)
!uv pip install --system -e ".[colab]" -e "./catanatron[gym]" 2>&1 | tail -5

In [None]:
# Build the Cython extension for the fast bitboard engine
!python robottler/bitboard/setup_cython.py build_ext --inplace

# Verify it built
import importlib
from robottler.bitboard import _fast
print(f"Cython module loaded: {_fast.__file__}")

In [None]:
# Check hardware
import torch
import multiprocessing

NCPU = multiprocessing.cpu_count()
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB")
print(f"CPU cores: {NCPU}")
print(f"\nRecommended --workers: {max(1, NCPU - 1)}")

## 2. Benchmark (Gauntlet)

Run the bitboard search player against baseline opponents.

| Flag | Description |
|------|-------------|
| `--bb-search` | Use the fast bitboard search player |
| `--search-depth N` | Search depth (2 = fast, 3 = strong) |
| `--blend-weight W` | Neural/heuristic blend (1e8 optimal) |
| `--dice-sample N` | Sample top-N dice rolls (5 = 3x speedup) |
| `--games N` | Games per opponent |
| `--workers N` | Parallel processes (use all cores!) |
| `--baselines` | Run against all baseline opponents |

In [None]:
import multiprocessing; W = max(1, multiprocessing.cpu_count() - 1)

# Quick benchmark: bitboard search depth 2 vs all baselines (50 games each)
!python -m robottler.benchmark \
    --bb-search \
    --search-depth 2 \
    --blend-weight 1e8 \
    --dice-sample 5 \
    --baselines \
    --games 50 \
    --workers {W}

In [None]:
# Strong benchmark: depth 3 (slower but ~72% vs AlphaBeta)
!python -m robottler.benchmark \
    --bb-search \
    --search-depth 3 \
    --blend-weight 1e8 \
    --dice-sample 5 \
    --baselines \
    --games 20 \
    --workers {W}

## 3. AlphaZero Self-Play Training

Generate self-play data with MCTS, then train the dual-head network.

This benefits from **GPU** for neural net forward passes during MCTS.

In [None]:
# Generate self-play games (adjust --games and --sims for speed vs quality)
!python -m robottler.az_selfplay generate \
    --checkpoint robottler/models/az_iter0.pt \
    --games 100 \
    --sims 200 \
    --output-dir datasets/az_selfplay/colab_gen1 \
    --workers {W}

In [None]:
# Train on the generated data
!python -m robottler.az_selfplay train \
    --checkpoint robottler/models/az_iter0.pt \
    --data-dir datasets/az_selfplay/colab_gen1 \
    --output robottler/models/az_colab_iter1.pt \
    --epochs 20 \
    --batch-size 256 \
    --lr 1e-3

In [None]:
# Evaluate new checkpoint vs old
!python -m robottler.az_selfplay evaluate \
    --new-checkpoint robottler/models/az_colab_iter1.pt \
    --old-checkpoint robottler/models/az_iter0.pt \
    --games 100 \
    --sims 200

## 4. Full AlphaZero Training Loop

Automated generate -> train -> evaluate cycle. This is the long-running job
you'd want to run with a GPU runtime.

In [None]:
# Full loop: 5 iterations of generate/train/evaluate
!python -m robottler.az_selfplay loop \
    --start-checkpoint robottler/models/az_iter0.pt \
    --iterations 5 \
    --games-per-iter 200 \
    --sims 200 \
    --output-dir datasets/az_selfplay/colab_loop \
    --epochs 20 \
    --eval-games 100 \
    --workers {W}

## 5. RL Training (MaskablePPO)

Train a policy network with reinforcement learning. Benefits from **multi-core** for
parallel environment rollouts.

In [None]:
!python -m robottler.train_rl \
    --opponent alphabeta \
    --total-steps 200000 \
    --n-envs 8 \
    --bc-model robottler/models/value_net_v2.pt \
    --vps 10

## 6. Save Results

Download trained models back to your local machine.

In [None]:
# List all model checkpoints
!ls -lh robottler/models/*.pt

In [None]:
# Zip models for download
!zip -j colab_models.zip robottler/models/az_colab_*.pt robottler/models/az_iter*.pt 2>/dev/null || echo "No new models yet"

from google.colab import files
try:
    files.download("colab_models.zip")
except:
    print("Download manually from the file browser (left panel)")