Release v4.1.0 — Hessian Bounds + Causal Scrubbing + DAS — 18/18 Math Frameworks Complete · designer-coderajay/glassbox-mech

What's New in v4.1.0

This release completes the ROADMAP_V4 mathematical framework: 18/18 frameworks implemented. Glassbox is now foundationally rigorous across every Harvard/MIT/Anthropic/DeepMind standard identified in our gap analysis.

HessianErrorBounds (`glassbox/hessian.py`)

Second-order Taylor error bounds via Pearlmutter (1994) Hessian-vector products:

ε(h) = ½·δz_hᵀ·H_h·δz_h

Computed via torch.autograd.grad double-backprop. Flags hessian_dominated when |ε(h)|/|α(h)| > 0.20. approximation_reliable flag for compliance reports.

CausalScrubbing (`glassbox/causal_scrubbing.py`)

Anthropic-standard circuit hypothesis testing (Chan et al. 2022):

CS(H) = E[LD(x; do(acts ~ P_H))] / LD_clean

CircuitHypothesis dataclass with from_wang2022_ioi() preset (13 heads, full role labels)
Strong ≥ 0.80, partial ≥ 0.50, insufficient < 0.50
Not just correlation — formal causal account of circuit behaviour

DistributedAlignmentSearch (`glassbox/das.py`)

Geiger et al. (2023) — finds the linear subspace encoding a concept:

PCA on activation difference vectors Δz = z_clean − z_CF
Interchange interventions to compute DAS score ∈ [0, 1]
search_all_layers() for cross-layer concept localisation
rotation_matrix, concept_dims, explained_variance in DASResult

Mathematical Completeness Scorecard: 18/18 ✓

Version	Score
v3.6.0 (baseline)	7/18
v3.7.0	10/18
v4.0.0	13/18
v4.1.0	18/18 ✓

EU AI Act Mapping

Art. 13(1) Transparency: Hessian bounds certify attribution reliability
Art. 9(1) Risk Management: Causal scrubbing — formal hypothesis testing
Art. 15(1) Robustness: DAS localises concept encoding for controlled interventions

pip install glassbox-mech-interp==4.1.0

from glassbox import (
    HessianErrorBounds, CausalScrubbing, CircuitHypothesis,
    DistributedAlignmentSearch
)

# Certify attribution reliability
bounds = HessianErrorBounds(model).compute(attributions, clean_tokens, corr_tokens, t, d)
print(bounds.approximation_reliable)  # True

# Causal hypothesis test
scrubber = CausalScrubbing(model)
result   = scrubber.evaluate(CircuitHypothesis.from_wang2022_ioi(), prompt, corr, t, d)
print(result.cs_score)  # 0.89 — strong

# Find where concept lives
das    = DistributedAlignmentSearch(model, concept_dims=4)
result = das.search("IO_name", clean_toks, cf_toks, t, d, target_layer=9)
print(result.concept_encoded)  # True

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4.1.0 — Hessian Bounds + Causal Scrubbing + DAS — 18/18 Math Frameworks Complete

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's New in v4.1.0

HessianErrorBounds (`glassbox/hessian.py`)

CausalScrubbing (`glassbox/causal_scrubbing.py`)

DistributedAlignmentSearch (`glassbox/das.py`)

Mathematical Completeness Scorecard: 18/18 ✓

EU AI Act Mapping

Uh oh!

v4.1.0 — Hessian Bounds + Causal Scrubbing + DAS — 18/18 Math Frameworks Complete

What's New in v4.1.0

HessianErrorBounds (glassbox/hessian.py)

CausalScrubbing (glassbox/causal_scrubbing.py)

DistributedAlignmentSearch (glassbox/das.py)

Mathematical Completeness Scorecard: 18/18 ✓

EU AI Act Mapping

Uh oh!

HessianErrorBounds (`glassbox/hessian.py`)

CausalScrubbing (`glassbox/causal_scrubbing.py`)

DistributedAlignmentSearch (`glassbox/das.py`)