# 🧠 PoH vs BERT: Natural Language Inference Benchmark
**Eran Ben-Artzy — October 2025**

This notebook benchmarks **BERT** against **PoH Transformer** on Natural Language Inference (NLI) task.

---

## Task: Natural Language Inference

Given a **premise** and **hypothesis**, classify their relationship:
- ✅ **Entailment**: hypothesis follows from premise
- ⚖️ **Neutral**: hypothesis could be true
- ❌ **Contradiction**: hypothesis contradicts premise

**Example:**
- Premise: "A man is playing guitar"
- Hypothesis: "A musician is performing" → **Entailment**


## 1️⃣ Setup


In [None]:
!pip install torch torchvision torchaudio pandas matplotlib seaborn pyyaml tqdm --quiet

# Clone repository
!git clone https://github.com/Eran-BA/PoT.git
%cd PoT

# Optional: Update to latest
# !git pull


## 2️⃣ Quick Benchmark (100 steps, ~3 minutes)


In [None]:
!PYTHONPATH=$PWD python experiments/quick_nli_test.py


## 3️⃣ Full Benchmark (10K steps, ~30 minutes) - Optional


In [None]:
# Uncomment to run full benchmark
# !PYTHONPATH=$PWD python experiments/fair_ab_nli.py


## ✅ Summary

This benchmark compares:

### Models
- **BERT-Base**: Standard transformer encoder (12 layers, 768 dim, 12 heads)
- **PoH**: Same architecture + adaptive head routing + iterative refinement

### Key Features
- ✅ Fair comparison (matched parameters & hyperparameters)
- ✅ Synthetic NLI data (no external dependencies)  
- ✅ 3-way classification (entailment/neutral/contradiction)
- ✅ Automatic result logging

### Expected Outcome
PoH should achieve **higher accuracy** than BERT baseline by leveraging:
- Adaptive head routing (focus on relevant attention patterns)
- Iterative refinement (multi-step reasoning)
- Outer residuals (stable gradient flow)

---

**Author:** Eran Ben-Artzy  
**License:** Apache 2.0  
**Year:** 2025
