# Blocksworld PoT (Pointer-Over-Heads Transformer) PPO Benchmark

This notebook runs the Blocksworld benchmark with PPO training.

**Features:**
- PPO training with good/bad trajectory discrimination
- Sub-trajectory augmentation using FastDownward planner
- SimplePoT and HybridPoT architectures


## Setup


In [None]:
# Clone the repository
!git clone https://github.com/yourusername/PoT.git
%cd PoT

# Install dependencies
!pip install -q torch numpy tqdm datasets wandb


In [None]:
# Install FastDownward (for trajectory generation)
!apt-get update && apt-get install -y cmake g++ python3
!git clone https://github.com/aibasel/downward.git
%cd downward
!python build.py
%cd ..


In [None]:
# W&B Login (optional but recommended)
import wandb
wandb.login()


In [None]:
# Check GPU
import torch
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Device: {device}")
if device == 'cuda':
    print(f"GPU: {torch.cuda.get_device_name(0)}")


## SimplePoT PPO WITH Augmentations

Basic PoT model with sub-trajectory augmentation.


In [None]:
# SimplePoT PPO WITH augmentations
!python experiments/blocksworld_ppo_benchmark.py \
    --download \
    --generate-trajectories \
    --fd-path /content/downward/fast-downward.py \
    --mode ppo \
    --epochs 5 \
    --batch-size 32 \
    --max-blocks 6 \
    --model-type simple \
    --R 4 \
    --d-model 256 \
    --n-heads 8 \
    --n-layers 6 \
    --good-bad-ratio 1.0 \
    --eval-interval 1 \
    --wandb \
    --project blocksworld-ppo \
    --run-name simple-with-aug \
    --output-dir experiments/results/blocksworld_simple_aug


## SimplePoT PPO WITHOUT Augmentations

For comparison - no sub-trajectory augmentation.


In [None]:
# SimplePoT PPO WITHOUT augmentations
!python experiments/blocksworld_ppo_benchmark.py \
    --download \
    --generate-trajectories \
    --fd-path /content/downward/fast-downward.py \
    --mode ppo \
    --epochs 5 \
    --batch-size 32 \
    --max-blocks 6 \
    --model-type simple \
    --R 4 \
    --d-model 256 \
    --n-heads 8 \
    --n-layers 6 \
    --no-augmentation \
    --good-bad-ratio 1.0 \
    --eval-interval 1 \
    --wandb \
    --project blocksworld-ppo \
    --run-name simple-no-aug \
    --output-dir experiments/results/blocksworld_simple_no_aug


## HybridPoT PPO WITH Augmentations (Aligned with Sudoku)

Full HybridPoT model with H_cycles, L_cycles, ACT, and injection.


In [None]:
# HybridPoT PPO WITH augmentations (aligned with Sudoku)
!python experiments/blocksworld_ppo_benchmark.py \
    --download \
    --generate-trajectories \
    --fd-path /content/downward/fast-downward.py \
    --mode ppo \
    --epochs 5 \
    --batch-size 32 \
    --max-blocks 6 \
    --model-type hybrid \
    --controller-type transformer \
    --d-ctrl 128 \
    --max-depth 128 \
    --d-model 256 \
    --n-heads 8 \
    --H-cycles 2 \
    --L-cycles 6 \
    --H-layers 2 \
    --L-layers 2 \
    --halt-max-steps 2 \
    --hrm-grad-style \
    --halt-exploration-prob 0.1 \
    --injection-mode broadcast \
    --good-bad-ratio 1.0 \
    --eval-interval 1 \
    --wandb \
    --project blocksworld-ppo \
    --run-name hybrid-with-aug \
    --output-dir experiments/results/blocksworld_hybrid_aug


## HybridPoT PPO WITHOUT Augmentations

For comparison - no sub-trajectory augmentation.


In [None]:
# HybridPoT PPO WITHOUT augmentations
!python experiments/blocksworld_ppo_benchmark.py \
    --download \
    --generate-trajectories \
    --fd-path /content/downward/fast-downward.py \
    --mode ppo \
    --epochs 5 \
    --batch-size 32 \
    --max-blocks 6 \
    --model-type hybrid \
    --controller-type transformer \
    --d-ctrl 128 \
    --max-depth 128 \
    --d-model 256 \
    --n-heads 8 \
    --H-cycles 2 \
    --L-cycles 6 \
    --H-layers 2 \
    --L-layers 2 \
    --halt-max-steps 2 \
    --hrm-grad-style \
    --halt-exploration-prob 0.1 \
    --injection-mode broadcast \
    --no-augmentation \
    --good-bad-ratio 1.0 \
    --eval-interval 1 \
    --wandb \
    --project blocksworld-ppo \
    --run-name hybrid-no-aug \
    --output-dir experiments/results/blocksworld_hybrid_no_aug


## Display Results


In [None]:
import json
import os

results_dirs = [
    'experiments/results/blocksworld_simple_aug',
    'experiments/results/blocksworld_simple_no_aug',
    'experiments/results/blocksworld_hybrid_aug',
    'experiments/results/blocksworld_hybrid_no_aug',
]

for result_dir in results_dirs:
    result_file = os.path.join(result_dir, 'results.json')
    if os.path.exists(result_file):
        with open(result_file, 'r') as f:
            results = json.load(f)
        print(f"\n{'='*60}")
        print(f"Results: {result_dir}")
        print(f"{'='*60}")
        if 'test' in results:
            test = results['test']
            print(f"Slot Accuracy: {test.get('slot_acc', 0):.2%}")
            print(f"Exact Match:   {test.get('exact_match', 0):.2%}")
        if 'training' in results:
            print(f"Best Val Slot Acc: {results['training'].get('best_val_slot_acc', 0):.2%}")
    else:
        print(f"\nNo results found at {result_file}")
