# Pointer-over-Heads Transformer (PoT) - Full Experimental Pipeline

**Dynamic multi-head attention with adaptive routing for dependency parsing**

**Author:** Eran Ben Artzy  
**License:** Apache 2.0  
**Repository:** https://github.com/Eran-BA/PoT

---

## What's New in This Version

✅ **Core Features:**
- Baseline vs PoH A/B comparison with parameter matching
- Multi-head routing (soft mixture, hard top-k)
- Adaptive halting (fixed, entropy, ACT-style)
- UAS and LAS evaluation with punctuation masking

✅ **New Advanced Features:**
- **Deep Supervision**: Auxiliary losses on intermediate iterations
- **ACT-style Differentiable Halting**: Adaptive computation with ponder cost
- **Gradient Modes**: Full BPTT vs HRM-style last-iterate gradients
- **Comprehensive Logging**: All metrics, hyperparameters, and diagnostics

---

## Quick Navigation

1. **Setup** (5 min) - GPU check, clone repo, install dependencies, download data
2. **Smoke Test** (2 min) - Verify installation with dummy data
3. **Quick A/B Test** (10 min) - Baseline vs PoH on real UD data
4. **Core Ablations** (30 min) - Test iterations, routing, halting modes
5. **Advanced: Deep Supervision & ACT** (20 min) - Iterative refinement experiments
6. **Advanced: Gradient Modes** (15 min) - Full BPTT vs last-iterate
7. **Visualization** (5 min) - Generate plots and analysis
8. **Multi-Seed Robustness** (45 min) - Run best config with 3 seeds
9. **Results Summary** - Aggregate and package all results

**Total Runtime:** ~2.5 hours on A100 GPU

---
## 1. Setup (5 minutes)

### Check GPU

In [None]:
!nvidia-smi

### Clone Repository

In [None]:
!git clone https://github.com/Eran-BA/PoT.git
%cd PoT

### Install Dependencies

In [None]:
!pip install -q -r requirements.txt
print("✓ Installation complete!")

### Download Universal Dependencies English EWT Dataset

In [None]:
!mkdir -p ud_data
!wget -q https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu -O ud_data/en_ewt-ud-train.conllu
!wget -q https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu -O ud_data/en_ewt-ud-dev.conllu
!wget -q https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu -O ud_data/en_ewt-ud-test.conllu

!ls -lh ud_data/
!wc -l ud_data/*.conllu
print("\n✓ Dataset downloaded successfully!")

---
## 2. Smoke Test (2 minutes)

Quick verification with dummy data:

In [None]:
!python ab_ud_pointer_vs_baseline.py \
  --data_source dummy \
  --epochs 2 \
  --batch_size 8 \
  --max_inner_iters 2 \
  --log_csv smoke_test.csv

print("\n" + "="*80)
print("✓ Smoke test complete!")
print("="*80)

---
## 3. Quick A/B Test on Real Data (10 minutes)

Run a parameter-matched comparison:

In [None]:
!python ab_ud_pointer_vs_baseline.py \
  --data_source conllu \
  --conllu_dir ud_data \
  --epochs 3 \
  --batch_size 32 \
  --lr 3e-5 \
  --max_inner_iters 1 \
  --routing_topk 0 \
  --halting_mode fixed \
  --param_match baseline \
  --log_csv quick_ab.csv

---
## 4. Core Ablations (30 minutes)

### 4.1 Iterations Ablation

In [None]:
for iters in [1, 2, 3, 5]:
    print(f"\nTesting {iters} inner iterations...")
    !python ab_ud_pointer_vs_baseline.py \
      --data_source conllu --conllu_dir ud_data \
      --epochs 3 --batch_size 32 --lr 3e-5 \
      --max_inner_iters {iters} --routing_topk 0 \
      --halting_mode fixed --param_match baseline \
      --log_csv ablation_iterations.csv

print("\n✓ Iterations ablation complete!")

### 4.2 Routing Ablation

In [None]:
for topk in [0, 2]:
    mode = "soft" if topk == 0 else f"top{topk}"
    print(f"\nTesting {mode} routing...")
    !python ab_ud_pointer_vs_baseline.py \
      --data_source conllu --conllu_dir ud_data \
      --epochs 3 --batch_size 32 --lr 3e-5 \
      --max_inner_iters 1 --routing_topk {topk} \
      --halting_mode fixed --param_match baseline \
      --log_csv ablation_routing.csv

print("\n✓ Routing ablation complete!")

### 4.3 Halting Mode Ablation

In [None]:
for mode in ["fixed", "entropy"]:
    print(f"\nTesting {mode} halting...")
    !python ab_ud_pointer_vs_baseline.py \
      --data_source conllu --conllu_dir ud_data \
      --epochs 3 --batch_size 32 --lr 3e-5 \
      --max_inner_iters 3 --routing_topk 0 \
      --halting_mode {mode} --ent_threshold 0.7 \
      --param_match baseline --log_csv ablation_halting.csv

print("\n✓ Halting mode ablation complete!")

---
## 5. Advanced: Deep Supervision & ACT (20 minutes)

### 5.1 Deep Supervision

In [None]:
print("Testing deep supervision...\n")
!python ab_ud_pointer_vs_baseline.py \
  --data_source conllu --conllu_dir ud_data \
  --epochs 3 --batch_size 32 --lr 3e-5 \
  --max_inner_iters 3 --routing_topk 0 \
  --halting_mode fixed --deep_supervision \
  --ramp_strength 1.0 --param_match baseline \
  --log_csv deep_supervision.csv

### 5.2 ACT-style Differentiable Halting

In [None]:
print("Testing ACT-style halting...\n")
!python ab_ud_pointer_vs_baseline.py \
  --data_source conllu --conllu_dir ud_data \
  --epochs 3 --batch_size 32 --lr 3e-5 \
  --max_inner_iters 5 --routing_topk 0 \
  --halting_mode fixed --act_halting \
  --ponder_coef 1e-3 --param_match baseline \
  --log_csv act_halting.csv

### 5.3 Combined: Deep Supervision + ACT

In [None]:
print("Testing combined deep supervision + ACT...\n")
!python ab_ud_pointer_vs_baseline.py \
  --data_source conllu --conllu_dir ud_data \
  --epochs 3 --batch_size 32 --lr 3e-5 \
  --max_inner_iters 5 --routing_topk 0 \
  --halting_mode fixed --deep_supervision \
  --act_halting --ramp_strength 1.0 \
  --ponder_coef 1e-3 --param_match baseline \
  --log_csv combined_refinement.csv

---
## 6. Advanced: Gradient Modes (15 minutes)

### 6.1 Full BPTT

In [None]:
print("Testing full BPTT...\n")
!python ab_ud_pointer_vs_baseline.py \
  --data_source conllu --conllu_dir ud_data \
  --epochs 3 --batch_size 32 --lr 3e-5 \
  --max_inner_iters 3 --routing_topk 0 \
  --halting_mode fixed --grad_mode full \
  --param_match baseline --log_csv grad_full.csv

### 6.2 HRM-style Last-Iterate

In [None]:
print("Testing HRM-style last-iterate gradients...\n")
!python ab_ud_pointer_vs_baseline.py \
  --data_source conllu --conllu_dir ud_data \
  --epochs 3 --batch_size 32 --lr 3e-5 \
  --max_inner_iters 3 --routing_topk 0 \
  --halting_mode fixed --grad_mode last \
  --param_match baseline --log_csv grad_last.csv

---
## 7. Visualization (5 minutes)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import glob

sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

# Load all CSV files
csv_files = glob.glob("*.csv")
print(f"Found {len(csv_files)} result files")

for f in csv_files:
    print(f"  - {f}")

### Plot: Configuration Comparison

In [None]:
# Compare all configurations
configs = []
uas_vals = []

files_to_check = [
    ("quick_ab.csv", "Baseline", "baseline_dev_uas"),
    ("quick_ab.csv", "PoH Basic", "poh_dev_uas"),
    ("deep_supervision.csv", "PoH + Deep Sup", "poh_dev_uas"),
    ("act_halting.csv", "PoH + ACT", "poh_dev_uas"),
    ("combined_refinement.csv", "PoH + Deep Sup + ACT", "poh_dev_uas"),
]

for fname, label, col in files_to_check:
    try:
        df = pd.read_csv(fname)
        df_final = df[df['epoch'] == df['epoch'].max()]
        configs.append(label)
        uas_vals.append(df_final[col].mean())
    except:
        pass

if configs:
    plt.figure(figsize=(10, 6))
    colors = ['coral'] + ['skyblue'] * (len(configs) - 1)
    plt.bar(configs, uas_vals, color=colors, alpha=0.8, edgecolor='black')
    plt.ylabel('Dev UAS', fontsize=12)
    plt.title('Configuration Comparison', fontsize=14, fontweight='bold')
    plt.xticks(rotation=15, ha='right')
    plt.grid(axis='y', alpha=0.3)
    
    for i, v in enumerate(uas_vals):
        plt.text(i, v + 0.001, f'{v:.3f}', ha='center', fontweight='bold')
    
    plt.tight_layout()
    plt.savefig('plot_comparison.png', dpi=150)
    plt.show()
    print("✓ Saved plot_comparison.png")
else:
    print("⚠ No data to plot")

---
## 8. Multi-Seed Robustness (45 minutes)

In [None]:
print("Running multi-seed evaluation (optimal config)\n")

for seed in [42, 1337, 2023]:
    print(f"\n{'='*80}")
    print(f"Seed: {seed}")
    print(f"{'='*80}")
    !python ab_ud_pointer_vs_baseline.py \
      --data_source conllu --conllu_dir ud_data \
      --epochs 5 --batch_size 32 --lr 3e-5 \
      --max_inner_iters 1 --routing_topk 0 \
      --halting_mode fixed --param_match baseline \
      --seed {seed} --log_csv multiseed_results.csv

print("\n✓ Multi-seed evaluation complete!")

### Analyze Multi-Seed Results

In [None]:
df = pd.read_csv('multiseed_results.csv')
df_final = df[df['epoch'] == df['epoch'].max()]

print("\n" + "="*80)
print("MULTI-SEED ROBUSTNESS ANALYSIS")
print("="*80)

for model in ['baseline', 'poh']:
    uas_col = f'{model}_dev_uas'
    las_col = f'{model}_dev_las'
    
    uas_mean = df_final[uas_col].mean()
    uas_std = df_final[uas_col].std()
    las_mean = df_final[las_col].mean()
    las_std = df_final[las_col].std()
    
    print(f"\n{model.upper()}:")
    print(f"  UAS: {uas_mean:.4f} ± {uas_std:.4f}")
    print(f"  LAS: {las_mean:.4f} ± {las_std:.4f}")

uas_improvement = (df_final['poh_dev_uas'].mean() - df_final['baseline_dev_uas'].mean()) * 100
print(f"\n✅ PoH Improvement: +{uas_improvement:.2f}% UAS")
print("="*80)

---
## 9. Results Summary

In [None]:
print("\n" + "="*80)
print("COMPREHENSIVE RESULTS SUMMARY")
print("="*80)

print("\n📊 Generated CSV files:")
!ls -lh *.csv

print("\n📈 Generated plots:")
!ls -lh *.png 2>/dev/null || echo "No plots generated yet"

print("\n📦 Creating results package...")
!zip -q pot_colab_results.zip *.csv *.png 2>/dev/null || echo "Creating package..."
!ls -lh pot_colab_results.zip 2>/dev/null || echo "Package not created"

print("\n✅ All experiments complete!")
print("\n📥 Download 'pot_colab_results.zip' for full results")
print("="*80)

---

## 🎉 Experiments Complete!

**What we tested:**
- ✅ Baseline vs PoH comparison with parameter matching
- ✅ Core ablations (iterations, routing, halting)
- ✅ Advanced features (deep supervision, ACT, gradient modes)
- ✅ Multi-seed robustness evaluation
- ✅ Comprehensive visualization and analysis

**Next Steps:**
1. Download `pot_colab_results.zip` for all results
2. Analyze CSV files for detailed metrics
3. Use plots for publication-ready figures

**For more information:**
- See `DEEP_SUPERVISION_GUIDE.md` for implementation details
- See `GRADIENT_MODES_THEORY.md` for mathematical foundations
- Visit https://github.com/Eran-BA/PoT for documentation

**Author:** Eran Ben Artzy  
**License:** Apache 2.0