# 🧩 Maze Solving: PoH-HRM vs Baseline Transformer

**A/B Test on A100 GPU**

This notebook compares:
- **Baseline**: Standard Transformer encoder-decoder
- **PoH-HRM**: Pointer-over-Heads with Hierarchical Reasoning Module (f_L + f_H with T=4)

**Task**: Find shortest path through randomly generated mazes

**Why Mazes?**
- Requires multi-step planning
- Benefits from hierarchical reasoning
- Challenging sequential decision-making


## Setup


In [None]:
# Clone repository
!git clone https://github.com/Eran-BA/PoT.git
%cd PoT


In [None]:
# Install dependencies
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install transformers datasets scipy numpy tqdm matplotlib seaborn


In [None]:
# Verify GPU
import torch
print(f"GPU Available: {torch.cuda.is_available()}")
print(f"GPU Name: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'N/A'}")
print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB" if torch.cuda.is_available() else "N/A")


## Run Maze A/B Test


In [None]:
# Small mazes (11x11, 15x15)
!python experiments/maze_ab_test.py \
  --sizes 11 15 \
  --epochs 100 \
  --train-samples 2000 \
  --test-samples 400 \
  --R 4 \
  --T 4 \
  --n-heads 4


In [None]:
# Larger mazes (19x19, 25x25) - where HRM should shine!
!python experiments/maze_ab_test.py \
  --sizes 19 25 \
  --epochs 150 \
  --train-samples 3000 \
  --test-samples 500 \
  --R 4 \
  --T 4 \
  --n-heads 4


## Visualize Results


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load results
df = pd.read_csv('experiments/results/maze_ab/maze_ab_R4_T4_nheads4.csv')

# Display
print("\n📊 Maze Solving Results")
print("="*80)
print(df.to_string())

# Plot success rates
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
baseline_data = df[df['model'] == 'Baseline']
poh_data = df[df['model'] == 'PoH-HRM']

x = range(len(baseline_data))
width = 0.35

plt.bar([i - width/2 for i in x], baseline_data['final_success'], width, label='Baseline', alpha=0.8)
plt.bar([i + width/2 for i in x], poh_data['final_success'], width, label='PoH-HRM', alpha=0.8)

plt.xlabel('Maze Size')
plt.ylabel('Success Rate')
plt.title('Maze Solving Success Rate by Model')
plt.xticks(x, [f"{int(s)}x{int(s)}" for s in baseline_data['maze_size']])
plt.legend()
plt.grid(axis='y', alpha=0.3)

plt.subplot(1, 2, 2)
plt.bar([i - width/2 for i in x], baseline_data['final_overlap'], width, label='Baseline', alpha=0.8)
plt.bar([i + width/2 for i in x], poh_data['final_overlap'], width, label='PoH-HRM', alpha=0.8)

plt.xlabel('Maze Size')
plt.ylabel('Path Overlap')
plt.title('Path Quality (Overlap with Optimal)')
plt.xticks(x, [f"{int(s)}x{int(s)}" for s in baseline_data['maze_size']])
plt.legend()
plt.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.savefig('maze_results.png', dpi=300, bbox_inches='tight')
plt.show()

# Calculate improvements
print("\n🏆 PoH-HRM Improvements:")
print("="*80)
for idx, row in poh_data.iterrows():
    maze_size = row['maze_size']
    baseline_success = baseline_data[baseline_data['maze_size'] == maze_size]['final_success'].values[0]
    poh_success = row['final_success']
    improvement = ((poh_success - baseline_success) / baseline_success) * 100 if baseline_success > 0 else 0
    print(f"Maze {int(maze_size)}x{int(maze_size)}: {improvement:+.1f}% improvement (Baseline: {baseline_success:.1%}, PoH: {poh_success:.1%})")


## Download Results


In [None]:
from google.colab import files

# Download CSV results
files.download('experiments/results/maze_ab/maze_ab_R4_T4_nheads4.csv')

# Download plot
files.download('maze_results.png')
