# Run Stochastic MuZero Harness Benchmarks (Othello + 2048)
This notebook:
- Clones the repo to **Colab's local storage** (fast, no Drive sync issues)
- Trains on **Othello** and **2048**
- Saves outputs (checkpoints + rollout PNGs) to **Google Drive** for persistence

**Key improvement**: Repo is in `/content/tg_smn` (local), outputs in Drive. No more git/Drive conflicts!

## Setup

In [None]:
!nvidia-smi -L || true
import torch, sys
print('torch', torch.__version__, 'cuda available?', torch.cuda.is_available())

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import os, pathlib

# Repo cloned to LOCAL Colab storage (fast, no Drive conflicts)
REPO_DIR = '/content/tg_smn'

# Outputs saved to Drive (persistent)
OUTROOT = '/content/drive/MyDrive/tg_smn_outputs'

# Branch to use
BRANCH = 'stoch-muzero-harness'

pathlib.Path(OUTROOT).mkdir(parents=True, exist_ok=True)
print('REPO_DIR (local):', REPO_DIR)
print('OUTROOT (Drive):', OUTROOT)
print('BRANCH:', BRANCH)

In [None]:
%%bash -s "$REPO_DIR" "$BRANCH"
REPO_DIR=$1
BRANCH=$2

# Clone fresh each time (local storage resets on disconnect anyway)
if [ -d "$REPO_DIR" ]; then
  echo "✓ Repo already exists, pulling latest..."
  cd "$REPO_DIR"
  git fetch --all
  git checkout "$BRANCH"
  git pull origin "$BRANCH"
else
  echo "✓ Cloning repository..."
  git clone -b "$BRANCH" https://github.com/RespectableGlioma/tg_smn.git "$REPO_DIR"
fi

echo "✓ Repository ready at $REPO_DIR"
ls -F "$REPO_DIR/world_models/"

In [None]:
# Install dependencies
!pip -q install -U pip setuptools wheel
!pip -q install tqdm pillow numpy matplotlib

# Install the package from local clone
%cd $REPO_DIR
!pip install -e .

## Quick import test

In [None]:
%cd $REPO_DIR
import world_models.stoch_muzero_harness as smh
print('✓ Imported:', smh.__name__)
print('✓ Current directory:', os.getcwd())
print('✓ Output directory:', OUTROOT)

## Train: Othello

In [None]:
%cd $REPO_DIR
GAME='othello'
!python -u -m world_models.stoch_muzero_harness.train \
  --game $GAME \
  --collect_episodes 300 \
  --collect_episodes 2000 \
  --train_steps 40000 \
  --batch 128 \
  --unroll 10 \
  --w_policy 0 \
  --w_value 0 \
  --w_reward 0 \
  --w_chance 0 \
  --w_aux 1 \
  --w_after_aux 3 \
  --w_style 0.2 \
  --w_inv 1 \
  --device cuda \
  --outdir "$OUTROOT"

## Eval: Othello (prediction-only)

In [None]:
%cd $REPO_DIR
!python -u -m world_models.stoch_muzero_harness.eval \
  --game othello \
  --ckpt "$OUTROOT/othello/ckpt_final.pt" \
  --episodes 50 \
  --device cuda

## Train: 2048

In [None]:
%cd $REPO_DIR
GAME='2048'
!python -u -m world_models.stoch_muzero_harness.train \
  --game $GAME \
  --num_styles 1 \
  --collect_episodes 2000 \
  --train_steps 30000 \
  --batch 128 \
  --unroll 5 \
  --w_policy 0 \
  --w_value 0 \
  --w_reward 0 \
  --w_aux 5.0 \
  --w_after_aux 5.0 \
  --w_chance 1 \
  --w_style 0 \
  --w_inv 0 \
  --device cuda \
  --outdir "$OUTROOT"

## Eval: 2048 (prediction-only)

In [None]:
%cd $REPO_DIR
!python -u -m world_models.stoch_muzero_harness.eval \
  --game 2048 \
  --ckpt "$OUTROOT/2048/ckpt_final.pt" \
  --episodes 50 \
  --device cuda

## Eval: 2048 with MCTS planning (optional)
This uses the learned latent model for search and applies the entropy shortcut:
- low entropy chance → deterministic rollout
- high entropy chance → sample outcomes

In [None]:
%cd $REPO_DIR
!python -u -m world_models.stoch_muzero_harness.eval \
  --game 2048 \
  --ckpt "$OUTROOT/2048/ckpt_final.pt" \
  --episodes 50 \
  --mcts_sims 64 \
  --entropy_thr 0.5 \
  --device cuda

## View latest rollout images

In [None]:
import glob
from PIL import Image
import matplotlib.pyplot as plt

def show_latest(pattern, title):
    paths = sorted(glob.glob(pattern))
    if not paths:
        print('No images found for', pattern)
        return
    p = paths[-1]
    print(title, '->', p)
    img = Image.open(p)
    plt.figure(figsize=(14, 4))
    plt.imshow(img, cmap='gray')
    plt.axis('off')
    plt.show()

show_latest(f'{OUTROOT}/othello/rollout_gt_vs_pred_step*.png', 'Othello rollout')
show_latest(f'{OUTROOT}/2048/rollout_gt_vs_pred_step*.png', '2048 rollout')

## Summary

All outputs (checkpoints, rollouts) are saved to:
```
/content/drive/MyDrive/tg_smn_outputs/
├── othello/
│   ├── ckpt_final.pt
│   └── rollout_*.png
└── 2048/
    ├── ckpt_final.pt
    └── rollout_*.png
```

Access them anytime via Google Drive, even after Colab disconnects!