# üêâ BDH Interpretability Suite ‚Äî Full Training & Merge Pipeline

This notebook runs the complete pipeline for the KRITI 2026 AI Interpretability Challenge:

1. Download Europarl data (English-French + English-Portuguese)
2. Train French specialist model
3. Train Portuguese specialist model (same architecture!)
4. Merge both into a polyglot model
5. Evaluate all three models on both languages
6. Generate frontend visualization data

**Requirements:** Google Colab with GPU (T4 or better)

In [None]:
# Check GPU
!nvidia-smi
import torch
print(f'PyTorch: {torch.__version__}')
print(f'CUDA: {torch.cuda.is_available()}')
if torch.cuda.is_available():
    print(f'GPU: {torch.cuda.get_device_name(0)}')

In [None]:
# Clone repo (or upload your zip)
# !git clone https://github.com/YOUR_USERNAME/BDH_Pathway-monosemanticity-architecture.git
# %cd BDH_Pathway-monosemanticity-architecture

# Or if uploading zip:
# from google.colab import files
# uploaded = files.upload()  # Upload your zip
# !unzip BDH_Pathway-monosemanticity-architecture.zip
# %cd BDH_Pathway-monosemanticity-architecture

In [None]:
!pip install -q pyyaml numpy torch

## 1. Download Europarl Data

In [None]:
!python training/download_europarl.py --languages en-fr en-pt --output data/

In [None]:
# Verify data
import os
for lang in ['en-fr', 'en-pt']:
    for split in ['train.bin', 'val.bin']:
        path = f'data/{lang}/{split}'
        if os.path.exists(path):
            size_mb = os.path.getsize(path) / 1024 / 1024
            print(f'  ‚úì {path}: {size_mb:.1f} MB')
        else:
            print(f'  ‚úó {path}: NOT FOUND')

## 2. Train French Specialist

Architecture: 6 layers, 192 embedding dim, 4 heads, 64√ó MLP multiplier
‚Üí 3,072 neurons per head

In [None]:
# Write French config (matching the checked-in french.yaml)
french_config = """
train_data: "data/en-fr/train.bin"
val_data: "data/en-fr/val.bin"

# Model architecture
n_layer: 6
n_embd: 192
n_head: 4
mlp_multiplier: 64
dropout: 0.1
vocab_size: 256

# Training
batch_size: 16
block_size: 256
max_iters: 5000
learning_rate: 1.0e-3
min_lr: 1.0e-4
warmup_iters: 500
weight_decay: 0.1
grad_clip: 1.0
gradient_accumulation_steps: 8

log_interval: 100
eval_interval: 500
save_interval: 2500
eval_iters: 100

output_dir: "checkpoints"
run_name: "french_specialist"

device: "cuda"
dtype: "bfloat16"
compile_model: false
"""

os.makedirs('training/configs', exist_ok=True)
with open('training/configs/french_colab.yaml', 'w') as f:
    f.write(french_config)
print('French config saved!')

In [None]:
# Train French model
!python training/train.py --config training/configs/french_colab.yaml

## 3. Train Portuguese Specialist

‚ö†Ô∏è **Architecture MUST match French exactly** ‚Äî same n_layer, n_embd, n_head, mlp_multiplier.
This is required for the merge to work.

In [None]:
# Write Portuguese config ‚Äî identical architecture to French!
portuguese_config = """
train_data: "data/en-pt/train.bin"
val_data: "data/en-pt/val.bin"

# MUST MATCH FRENCH MODEL EXACTLY!
n_layer: 6
n_embd: 192
n_head: 4
mlp_multiplier: 64
dropout: 0.1
vocab_size: 256

# Training (same schedule)
batch_size: 16
block_size: 256
max_iters: 5000
learning_rate: 1.0e-3
min_lr: 1.0e-4
warmup_iters: 500
weight_decay: 0.1
grad_clip: 1.0
gradient_accumulation_steps: 8

log_interval: 100
eval_interval: 500
save_interval: 2500
eval_iters: 100

output_dir: "checkpoints"
run_name: "portuguese_specialist"

device: "cuda"
dtype: "bfloat16"
compile_model: false
"""

with open('training/configs/portuguese_colab.yaml', 'w') as f:
    f.write(portuguese_config)
print('Portuguese config saved!')

In [None]:
# Train Portuguese model
!python training/train.py --config training/configs/portuguese_colab.yaml

## 4. Merge Models + Evaluate + Generate Samples

This single command:
1. Loads both specialists
2. Verifies they're compatible
3. Concatenates neuron spaces (N ‚Üí 2N)
4. Averages embeddings and lm_head
5. Validates the merged model
6. Evaluates on both language test sets
7. Generates sample text
8. Outputs merge_data.json for the frontend

In [None]:
!python analysis/merge.py \
    --model1 checkpoints/french_specialist/checkpoint_best.pt \
    --model2 checkpoints/portuguese_specialist/checkpoint_best.pt \
    --output checkpoints/merged_polyglot.pt \
    --name1 french \
    --name2 portuguese \
    --french-val data/en-fr/val.bin \
    --portuguese-val data/en-pt/val.bin \
    --frontend-json frontend/public/merge/merge_data.json \
    --device cuda

## 5. Verify Results

In [None]:
import json

# Load and display merge results
with open('frontend/public/merge/merge_data.json') as f:
    merge_data = json.load(f)

print('\nüìä Model Info:')
for name, info in merge_data['models'].items():
    print(f"  {info['flag']} {info['name']}: {info['n_neurons']} neurons/head, {info['params']:,} params")

print('\nüìä Evaluation (next-byte loss, lower = better):')
print(f"  {'Model':<20} {'French':>10} {'Portuguese':>12}")
print(f"  {'‚îÄ'*44}")
for name, ev in merge_data['evaluation'].items():
    fr = f"{ev['french_loss']:.4f}" if ev['french_loss'] else '‚Äî'
    pt = f"{ev['portuguese_loss']:.4f}" if ev['portuguese_loss'] else '‚Äî'
    print(f"  {name:<20} {fr:>10} {pt:>12}")

print('\nüìù Samples:')
for s in merge_data['samples']:
    print(f"  [{s['label']}]")
    print(f"  {s['generated'][:100]}...\n")

## 6. Generate Monosemanticity Data (for French model)

In [None]:
!python scripts/precompute_monosemanticity.py \
    --model checkpoints/french_specialist/checkpoint_best.pt \
    --output frontend/public/monosemanticity/precomputed.json

## 7. Download Everything

In [None]:
# Package results for download
!zip -r bdh_results.zip \
    checkpoints/french_specialist/checkpoint_best.pt \
    checkpoints/portuguese_specialist/checkpoint_best.pt \
    checkpoints/merged_polyglot.pt \
    checkpoints/merged_polyglot.heritage.json \
    frontend/public/merge/ \
    frontend/public/monosemanticity/ \
    2>/dev/null

import os
size = os.path.getsize('bdh_results.zip') / 1024 / 1024
print(f'\nüì¶ bdh_results.zip: {size:.1f} MB')

In [None]:
# Download (Colab)
from google.colab import files
files.download('bdh_results.zip')

## Done! üéâ

**Next steps:**
1. Download and extract `bdh_results.zip`
2. Copy `frontend/public/merge/` and `frontend/public/monosemanticity/` to your local project
3. Run the frontend: `cd frontend && npm install && npm run dev`
4. Open http://localhost:5173 and explore!

The **Merge** page will now show real data from your trained models.