Release v4.0.0 — FoldedLayerNorm + BH FDR + SAE Polysemanticity · designer-coderajay/glassbox-mech

What's New in v4.0.0

FoldedLayerNorm (`glassbox/layernorm_correction.py`)

Absorbs LayerNorm scale γ into W_Q/K/V weight matrices (Elhage et al. 2021 §4.1), removing multiplicative scale bias from attribution scores. Reports per-head bias ratio Δα(h)/|α_raw(h)|, flags heads where ratio > 0.15 as layernorm_biased.

BenjaminiHochberg FDR (`glassbox/fdr.py`)

Multiple testing correction for 144+ simultaneous head tests. Controls E[FDR] ≤ α via the BH step-up procedure (Benjamini & Hochberg 1995). Reports BH and Bonferroni side-by-side. Supports z-test SE, bootstrap SE, and permutation-based p-values.

PolysemanticityScorerSAE (`glassbox/polysemanticity.py`)

Quantifies whether attention heads are monosemantic or polysemantic via H(p(feature|head_h)). SAE-entropy method when sae-lens installed; PCA participation ratio fallback otherwise. monosemantic_fraction across circuit heads.

EU AI Act Mapping

Art. 13(1) Transparency: bias-corrected attribution scores
Art. 9(1) Risk Management: FDR prevents false-positive circuit identifications
Art. 15(1) Robustness: polysemanticity quantifies interpretability quality

Mathematical Completeness: 13/18 frameworks

pip install glassbox-mech-interp==4.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4.0.0 — FoldedLayerNorm + BH FDR + SAE Polysemanticity

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's New in v4.0.0

FoldedLayerNorm (`glassbox/layernorm_correction.py`)

BenjaminiHochberg FDR (`glassbox/fdr.py`)

PolysemanticityScorerSAE (`glassbox/polysemanticity.py`)

EU AI Act Mapping

Uh oh!

v4.0.0 — FoldedLayerNorm + BH FDR + SAE Polysemanticity

What's New in v4.0.0

FoldedLayerNorm (glassbox/layernorm_correction.py)

BenjaminiHochberg FDR (glassbox/fdr.py)

PolysemanticityScorerSAE (glassbox/polysemanticity.py)

EU AI Act Mapping

Uh oh!

FoldedLayerNorm (`glassbox/layernorm_correction.py`)

BenjaminiHochberg FDR (`glassbox/fdr.py`)

PolysemanticityScorerSAE (`glassbox/polysemanticity.py`)