Skip to content

v4.2.0 — ACDC + GQA/RMSNorm Multi-Arch + Cross-Model Comparison

Choose a tag to compare

@designer-coderajay designer-coderajay released this 03 Apr 22:40
· 178 commits to main since this release

Glassbox v4.2.0 — ACDC + GQA/RMSNorm Multi-Arch + Cross-Model Comparison

Extends from 18 to 21 mathematical frameworks, adding three architecturally significant capabilities.

New Modules

glassbox/acdc.py — AutomatedCircuitDiscovery (Conmy et al. NeurIPS 2023, arXiv:2304.14997)

Full ACDC algorithm with exact KL-divergence edge-level circuit pruning. For each directed edge (sender → receiver) in topological order, patches the sender's per-head residual-stream contribution with the corrupted activation and measures KL(p_patched ‖ p_clean). Edges with KL < τ=0.10 are pruned; the retained edges form the minimal faithful circuit.

from glassbox import AutomatedCircuitDiscovery

acd    = AutomatedCircuitDiscovery(model, threshold=0.10)
result = acd.discover(clean_tokens, corrupted_tokens)
print(result.summary())
# ACDC Circuit: 18/144 edges retained | KL=0.023 | Faithful=True | τ=0.100
print(result.faithfulness_grade())  # STRONG | PARTIAL | WEAK

glassbox/multi_arch.py — MultiArchAdapter (GQA + RMSNorm)

Architecture registry for 11 model families. Auto-detects from TransformerLens config. GQAAttentionMapper redistributes KV attribution scores equally across sharing query heads (1/G). RMSNormFolding absorbs γ into W_Q/K/V (no bias term, unlike LayerNorm).

Supported: gpt2, llama-2, llama-3, llama-3-70b, mistral, phi-2, phi-3, gemma, pythia, gpt-j, qwen2.

from glassbox import MultiArchAdapter
from transformer_lens import HookedTransformer

model   = HookedTransformer.from_pretrained(meta-llama/Llama-3-8B)
adapter = MultiArchAdapter.from_model(model)
report  = adapter.architecture_report()
adjusted = adapter.adjust_attributions_for_gqa(raw_attributions)

glassbox/cross_model.py — CrossModelComparison

Sequential multi-model analysis. Pairwise Jaccard similarity on normalised (layer/n_layers, head/n_heads) circuit positions with 10×10 grid binning. Pearson r on normalised attribution vectors. Consensus heads (≥50% of models). Memory-safe: explicit del model + gc.collect() between loads.

from glassbox import compare_models, ModelAnalysisConfig

report = compare_models(configs, top_k_circuit=10)
print(report.attribution_table())

Mathematical Foundation

  • ACDC edge KL: KL(p ‖ q) = Σ p(x)·(log p(x) − log q(x)); τ = 0.10; faithful if KL < 0.80
  • GQA redistribution: score[q] += kv_score / heads_per_kv_group (equal split)
  • RMSNorm folding: W_Q^folded = diag(γ) · W_Q; bias_ratio = 0.0 (no β in RMSNorm)
  • Cross-model Jaccard: sim(C1, C2) = |bin(C1) ∩ bin(C2)| / |bin(C1) ∪ bin(C2)|; bin_size = 0.1

EU AI Act

  • Art. 13(1): ACDC provides exact causal edge evidence for compliance reports
  • Art. 15(1): Cross-model comparison validates circuit stability across architectures
  • Art. 10: MultiArchAdapter enables consistent analysis across training distributions

Upgrade

pip install --upgrade glassbox-mech-interp
# or pin exact version
pip install glassbox-mech-interp==4.2.0

Framework Count

21/21 mathematical frameworks (18 from v4.1.0 + 3 from v4.2.0)