# HRM vs PoT on Maze 30×30 Hard (Colab)

This notebook benchmarks the original HRM (vendor/hrm) against our PoT-HRM on Maze 30×30 hard.

- HRM repo: https://github.com/sapientinc/HRM/tree/main
- HRM paper: https://arxiv.org/pdf/2506.21734
- This repo: PoT (Pointer-over-Heads) with HRM integration

Steps:
1. Clone PoT and initialize submodules (includes HRM under `vendor/hrm`).
2. Install requirements (CUDA + FlashAttention recommended; Colab A100 works).
3. Build HRM Maze 30×30 dataset and run HRM training (reduced settings by default).
4. Run PoT-HRM benchmark at 30×30 with HRM-style normalization.
5. Summarize results and show paths to logs.



In [None]:
%%bash
set -e
# Setup: clone repo and init submodules
WORKDIR=/content/PoT
if [ ! -d "$WORKDIR" ]; then
  git clone https://github.com/Eran-BA/PoT.git $WORKDIR
fi
cd $WORKDIR
# Initialize HRM submodule
git submodule update --init --recursive

# Optional: checkout scaling branch if needed
# git checkout scaling_parameter_size
# git pull

python3 -V && pip -V



In [None]:
%%bash
set -e
# Install deps
pip install -q maze-dataset wandb
# For HRM CUDA environment (FlashAttention etc.), Colab A100 includes CUDA; skip heavy installs here



In [None]:
%%bash
set -e
# Ensure repo and HRM submodule exist
if [ ! -d "/content/PoT" ]; then
  git clone https://github.com/Eran-BA/PoT.git /content/PoT
fi
cd /content/PoT
git submodule update --init --recursive || true
# Fallback: clone HRM directly if submodule didn't populate
if [ ! -d "/content/PoT/vendor/hrm" ]; then
  mkdir -p /content/PoT/vendor
  git clone https://github.com/sapientinc/HRM /content/PoT/vendor/hrm
fi

# HRM: Build Maze 30x30 Hard dataset and run pretrain (reduced)
# Require CUDA: enforce Colab GPU runtime (Runtime → Change runtime type → GPU)
python - <<'PY'
import torch, sys
assert torch.cuda.is_available(), "CUDA GPU not available. Enable GPU runtime and rerun."
print("[HRM] CUDA available: ", torch.cuda.get_device_name(0))
PY

cd /content/PoT/vendor/hrm
# Install HRM Python requirements
pip install -q -r requirements.txt

# Download and process HuggingFace dataset (sapientinc/maze-30x30-hard-1k)
python dataset/build_maze_dataset.py --output-dir data/maze-30x30-hard-1k

# Run HRM pretrain (reduced epochs for Colab)
python pretrain.py data_path=data/maze-30x30-hard-1k epochs=500 eval_interval=50 lr=1e-4 puzzle_emb_lr=1e-4



In [None]:
%%bash
set -e
# PoT: Run 30x30 benchmark (reduced) with HRM-style normalization
cd /content/PoT
python -u experiments/maze_scaling_benchmark.py \
  --maze-sizes 30 \
  --train 300 --test 60 \
  --R 4 --T 4 --heads 8 \
  --epochs 60 --seed 42 \
  --output experiments/results/colab_maze30



In [None]:
# Summarize artifacts
import os, json
pot_dir = '/content/PoT/experiments/results/colab_maze30'
hrm_dir = '/content/PoT/experiments/results/hrm_original_maze30'
print('PoT logs:', os.listdir(pot_dir) if os.path.isdir(pot_dir) else 'not found')
print('HRM logs:', os.listdir(hrm_dir) if os.path.isdir(hrm_dir) else 'not found')

# If PoT wrote JSON summaries, show them (best-effort)
for root, _, files in os.walk(pot_dir):
    for f in files:
        if f.endswith('.json'):
            p = os.path.join(root, f)
            try:
                print('\n', p)
                print(json.dumps(json.load(open(p)), indent=2)[:1000])
            except Exception as e:
                print('Failed to read', p, e)

