v4.0.0 — FoldedLayerNorm + BH FDR + SAE Polysemanticity
What's New in v4.0.0
FoldedLayerNorm (glassbox/layernorm_correction.py)
Absorbs LayerNorm scale γ into W_Q/K/V weight matrices (Elhage et al. 2021 §4.1), removing multiplicative scale bias from attribution scores. Reports per-head bias ratio Δα(h)/|α_raw(h)|, flags heads where ratio > 0.15 as layernorm_biased.
BenjaminiHochberg FDR (glassbox/fdr.py)
Multiple testing correction for 144+ simultaneous head tests. Controls E[FDR] ≤ α via the BH step-up procedure (Benjamini & Hochberg 1995). Reports BH and Bonferroni side-by-side. Supports z-test SE, bootstrap SE, and permutation-based p-values.
PolysemanticityScorerSAE (glassbox/polysemanticity.py)
Quantifies whether attention heads are monosemantic or polysemantic via H(p(feature|head_h)). SAE-entropy method when sae-lens installed; PCA participation ratio fallback otherwise. monosemantic_fraction across circuit heads.
EU AI Act Mapping
- Art. 13(1) Transparency: bias-corrected attribution scores
- Art. 9(1) Risk Management: FDR prevents false-positive circuit identifications
- Art. 15(1) Robustness: polysemanticity quantifies interpretability quality
Mathematical Completeness: 13/18 frameworks
pip install glassbox-mech-interp==4.0.0