Skip to content

v4.1.0 — Hessian Bounds + Causal Scrubbing + DAS — 18/18 Math Frameworks Complete

Choose a tag to compare

@designer-coderajay designer-coderajay released this 03 Apr 19:43
· 183 commits to main since this release

What's New in v4.1.0

This release completes the ROADMAP_V4 mathematical framework: 18/18 frameworks implemented. Glassbox is now foundationally rigorous across every Harvard/MIT/Anthropic/DeepMind standard identified in our gap analysis.

HessianErrorBounds (glassbox/hessian.py)

Second-order Taylor error bounds via Pearlmutter (1994) Hessian-vector products:

ε(h) = ½·δz_hᵀ·H_h·δz_h

Computed via torch.autograd.grad double-backprop. Flags hessian_dominated when |ε(h)|/|α(h)| > 0.20. approximation_reliable flag for compliance reports.

CausalScrubbing (glassbox/causal_scrubbing.py)

Anthropic-standard circuit hypothesis testing (Chan et al. 2022):

CS(H) = E[LD(x; do(acts ~ P_H))] / LD_clean

  • CircuitHypothesis dataclass with from_wang2022_ioi() preset (13 heads, full role labels)
  • Strong ≥ 0.80, partial ≥ 0.50, insufficient < 0.50
  • Not just correlation — formal causal account of circuit behaviour

DistributedAlignmentSearch (glassbox/das.py)

Geiger et al. (2023) — finds the linear subspace encoding a concept:

  • PCA on activation difference vectors Δz = z_clean − z_CF
  • Interchange interventions to compute DAS score ∈ [0, 1]
  • search_all_layers() for cross-layer concept localisation
  • rotation_matrix, concept_dims, explained_variance in DASResult

Mathematical Completeness Scorecard: 18/18 ✓

Version Score
v3.6.0 (baseline) 7/18
v3.7.0 10/18
v4.0.0 13/18
v4.1.0 18/18

EU AI Act Mapping

  • Art. 13(1) Transparency: Hessian bounds certify attribution reliability
  • Art. 9(1) Risk Management: Causal scrubbing — formal hypothesis testing
  • Art. 15(1) Robustness: DAS localises concept encoding for controlled interventions
pip install glassbox-mech-interp==4.1.0
from glassbox import (
    HessianErrorBounds, CausalScrubbing, CircuitHypothesis,
    DistributedAlignmentSearch
)

# Certify attribution reliability
bounds = HessianErrorBounds(model).compute(attributions, clean_tokens, corr_tokens, t, d)
print(bounds.approximation_reliable)  # True

# Causal hypothesis test
scrubber = CausalScrubbing(model)
result   = scrubber.evaluate(CircuitHypothesis.from_wang2022_ioi(), prompt, corr, t, d)
print(result.cs_score)  # 0.89 — strong

# Find where concept lives
das    = DistributedAlignmentSearch(model, concept_dims=4)
result = das.search("IO_name", clean_toks, cf_toks, t, d, target_layer=9)
print(result.concept_encoded)  # True