Skip to content

Glassbox v4.2.2 — All Bugs Fixed

Choose a tag to compare

@designer-coderajay designer-coderajay released this 04 Apr 06:08
· 175 commits to main since this release

5 bugs fixed — found via end-to-end test (pip install → audit report)

Fixes

1. RMSNorm fold dimension mismatch (multi_arch.py)
TransformerLens stores W_Q as (n_heads, d_model, d_head). Code assumed (n_heads, d_head, d_model). gamma.unsqueeze(0) produced wrong broadcast shape. Fixed: gamma.unsqueeze(1).

2. Comprehensiveness = 0 for all non-IOI prompts (core.py)
Name-swap fallback produced a corrupted prompt with identical prefix to clean, so corrupt-patching was a no-op. Added degenerate-corruption detection + _comp_zero_ablation() fallback. Factual recall now gives comp≈0.40, sentiment≈0.27.

3. GlassboxV2 accepts model name string (core.py)
GlassboxV2("gpt2") now works — auto-loads via HookedTransformer.from_pretrained().

4. Warning when clean_ld ≤ 0 (core.py)
Model prefers distractor over correct token → circuit results unreliable. Now emits logger.warning.

5. CrossModelComparison Pearson r always 0 (cross_model.py)
Only circuit heads (1-10) were stored in attributions dict. Now stores all n_layers×n_heads attributions. Pearson r: 0.000 → 0.127 (distilgpt2 vs gpt2).

pip install glassbox-mech-interp==4.2.2