# HRM Grid-to-Grid Maze Benchmark

This notebook implements the **correct** HRM task format:
- **Input**: 30×30 maze grid (900 tokens)
- **Output**: 30×30 solution grid with path marked (900 tokens)

This matches the HRM paper's task formulation (74% accuracy reported).

Previously we were doing autoregressive path generation (easier → 100% accuracy).


In [None]:
%%bash
# Clone repo and install deps
if [ ! -d "/content/PoT" ]; then
  git clone https://github.com/Eran-BA/PoT.git /content/PoT
fi
cd /content/PoT
git checkout scaling_parameter_size
git pull
pip install -q tqdm numpy torch


In [None]:
%%bash
# Download HRM maze-30x30-hard dataset
cd /content/PoT
if [ ! -d "vendor/hrm" ]; then
  mkdir -p vendor
  git clone https://github.com/sapientinc/HRM vendor/hrm
fi
cd vendor/hrm
pip install -q -r requirements.txt
python dataset/build_maze_dataset.py --output-dir data/maze-30x30-hard-1k


In [None]:
%%bash
# Run Baseline Transformer
cd /content/PoT
python -u experiments/maze_grid2grid_hrm.py \
  --data-dir vendor/hrm/data/maze-30x30-hard-1k \
  --model baseline \
  --d-model 256 \
  --n-heads 8 \
  --n-layers 4 \
  --batch-size 32 \
  --epochs 100 \
  --lr 1e-3 \
  --output experiments/results/grid2grid_baseline \
  --seed 42


In [None]:
%%bash
# Run PoH-HRM
cd /content/PoT
python -u experiments/maze_grid2grid_hrm.py \
  --data-dir vendor/hrm/data/maze-30x30-hard-1k \
  --model poh \
  --d-model 256 \
  --n-heads 8 \
  --n-layers 1 \
  --R 4 \
  --T 4 \
  --batch-size 32 \
  --epochs 100 \
  --lr 1e-3 \
  --output experiments/results/grid2grid_poh \
  --seed 42


In [None]:
# Compare results
import json

baseline_results = json.load(open('/content/PoT/experiments/results/grid2grid_baseline/baseline_results.json'))
poh_results = json.load(open('/content/PoT/experiments/results/grid2grid_poh/poh_results.json'))

print("\n" + "="*80)
print("GRID-TO-GRID MAZE BENCHMARK RESULTS (HRM Task Format)")
print("="*80)
print(f"\nHRM Paper (30x30 Hard): ~74% grid accuracy\n")
print(f"Baseline Transformer:")
print(f"  Parameters: {baseline_results['parameters']:,}")
print(f"  Grid Accuracy: {baseline_results['best_grid_acc']:.2f}%")
print(f"  Token Accuracy: {baseline_results['final_token_acc']:.2f}%")
print(f"\nPoH-HRM (R=4, T=4):")
print(f"  Parameters: {poh_results['parameters']:,}")
print(f"  Grid Accuracy: {poh_results['best_grid_acc']:.2f}%")
print(f"  Token Accuracy: {poh_results['final_token_acc']:.2f}%")
print(f"\n" + "="*80)
