v4.1.0 — Hessian Bounds + Causal Scrubbing + DAS — 18/18 Math Frameworks Complete
·
183 commits
to main
since this release
What's New in v4.1.0
This release completes the ROADMAP_V4 mathematical framework: 18/18 frameworks implemented. Glassbox is now foundationally rigorous across every Harvard/MIT/Anthropic/DeepMind standard identified in our gap analysis.
HessianErrorBounds (glassbox/hessian.py)
Second-order Taylor error bounds via Pearlmutter (1994) Hessian-vector products:
ε(h) = ½·δz_hᵀ·H_h·δz_h
Computed via torch.autograd.grad double-backprop. Flags hessian_dominated when |ε(h)|/|α(h)| > 0.20. approximation_reliable flag for compliance reports.
CausalScrubbing (glassbox/causal_scrubbing.py)
Anthropic-standard circuit hypothesis testing (Chan et al. 2022):
CS(H) = E[LD(x; do(acts ~ P_H))] / LD_clean
CircuitHypothesisdataclass withfrom_wang2022_ioi()preset (13 heads, full role labels)- Strong ≥ 0.80, partial ≥ 0.50, insufficient < 0.50
- Not just correlation — formal causal account of circuit behaviour
DistributedAlignmentSearch (glassbox/das.py)
Geiger et al. (2023) — finds the linear subspace encoding a concept:
- PCA on activation difference vectors
Δz = z_clean − z_CF - Interchange interventions to compute DAS score ∈ [0, 1]
search_all_layers()for cross-layer concept localisationrotation_matrix,concept_dims,explained_varianceinDASResult
Mathematical Completeness Scorecard: 18/18 ✓
| Version | Score |
|---|---|
| v3.6.0 (baseline) | 7/18 |
| v3.7.0 | 10/18 |
| v4.0.0 | 13/18 |
| v4.1.0 | 18/18 ✓ |
EU AI Act Mapping
- Art. 13(1) Transparency: Hessian bounds certify attribution reliability
- Art. 9(1) Risk Management: Causal scrubbing — formal hypothesis testing
- Art. 15(1) Robustness: DAS localises concept encoding for controlled interventions
pip install glassbox-mech-interp==4.1.0from glassbox import (
HessianErrorBounds, CausalScrubbing, CircuitHypothesis,
DistributedAlignmentSearch
)
# Certify attribution reliability
bounds = HessianErrorBounds(model).compute(attributions, clean_tokens, corr_tokens, t, d)
print(bounds.approximation_reliable) # True
# Causal hypothesis test
scrubber = CausalScrubbing(model)
result = scrubber.evaluate(CircuitHypothesis.from_wang2022_ioi(), prompt, corr, t, d)
print(result.cs_score) # 0.89 — strong
# Find where concept lives
das = DistributedAlignmentSearch(model, concept_dims=4)
result = das.search("IO_name", clean_toks, cf_toks, t, d, target_layer=9)
print(result.concept_encoded) # True