# Sokoban PPO Benchmark - Pointer-Over-Heads Transformer

This notebook runs the Sokoban PPO benchmark with PoT iterative refinement.

**Training modes:**
- `heuristic`: Pretrain with heuristic pseudo-labels
- `ppo`: Pure PPO training
- `combined`: Pretrain + PPO fine-tuning

**Augmentations:** Geometric symmetries (flip, rotate)


In [None]:
# Clone repository and install dependencies
!git clone https://github.com/Eran-BA/PoT.git
%cd PoT
!pip install -q torch numpy tqdm


In [None]:
# Check GPU
import torch
print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU'}")


## 1. Heuristic Training WITH Augmentations


In [None]:
# Heuristic training WITH augmentations (geometric symmetries)
!python experiments/sokoban_pot_benchmark.py \
    --mode heuristic \
    --download \
    --model-type pot \
    --R 4 \
    --d-model 256 \
    --n-heads 4 \
    --n-layers 2 \
    --controller-type transformer \
    --max-depth 32 \
    --heuristic-epochs 10 \
    --batch-size 64 \
    --learning-rate 1e-4 \
    --warmup-steps 100 \
    --eval-interval 2 \
    --output-dir experiments/results/sokoban_heuristic_aug


## 2. Heuristic Training WITHOUT Augmentations


In [None]:
# Heuristic training WITHOUT augmentations
!python experiments/sokoban_pot_benchmark.py \
    --mode heuristic \
    --model-type pot \
    --R 4 \
    --d-model 256 \
    --n-heads 4 \
    --n-layers 2 \
    --controller-type transformer \
    --max-depth 32 \
    --no-augment \
    --heuristic-epochs 10 \
    --batch-size 64 \
    --learning-rate 1e-4 \
    --warmup-steps 100 \
    --eval-interval 2 \
    --output-dir experiments/results/sokoban_heuristic_no_aug


In [None]:
# PPO training WITH augmentations
!python experiments/sokoban_pot_benchmark.py \
    --mode ppo \
    --model-type pot \
    --R 4 \
    --d-model 256 \
    --n-heads 4 \
    --n-layers 2 \
    --controller-type transformer \
    --max-depth 32 \
    --ppo-timesteps 100000 \
    --ppo-n-envs 8 \
    --batch-size 64 \
    --learning-rate 3e-4 \
    --output-dir experiments/results/sokoban_ppo_aug


## 4. Display Results


In [None]:
import json
from pathlib import Path

result_dirs = [
    ('Heuristic + Aug', 'experiments/results/sokoban_heuristic_aug'),
    ('Heuristic - No Aug', 'experiments/results/sokoban_heuristic_no_aug'),
    ('PPO + Aug', 'experiments/results/sokoban_ppo_aug'),
]

for name, d in result_dirs:
    results_file = Path(d) / 'results.json'
    if results_file.exists():
        with open(results_file) as f:
            results = json.load(f)
        print(f"\n=== {name} ===")
        if 'evaluation' in results:
            e = results['evaluation']
            print(f"Solve Rate @50:  {e.get('solve_rate@50', 0):.2%}")
            print(f"Solve Rate @100: {e.get('solve_rate@100', 0):.2%}")
            print(f"Solve Rate @200: {e.get('solve_rate@200', 0):.2%}")
            print(f"Median Steps:    {e.get('median_steps', 0):.1f}")
