OOM fix — acdc.py: run_with_cache had no names_filter, caching all 50+
hooks per layer. Added names_filter restricted to hook_z and hook_resid_pre,
cutting VRAM usage from several GB to only what ACDC actually reads.
KL float64 — acdc.py: _test_edge_kl and _circuit_kl summed 128k float32
terms for large-vocab models (Llama-3). Both converted to float64 before
summation, recovering ~3 digits of KL precision.
Edge-count warning — acdc.py: Llama-3-7B has ~540k candidate edges,
70B has ~13.4M. Added upfront logger.warning at >100k edges redirecting
users to the MFC algorithm for large models.
enable_grad — core.py: IG attribution silently returned zero gradients when
analyze() was called inside torch.no_grad(). Wrapped each IG step in
torch.enable_grad() so gradient tracking is forced regardless of outer context.
dtype kwarg — core.py: GlassboxV2 string constructor now accepts dtype= for
bf16/fp16 large-model loading: GlassboxV2("llama-3-8b", device="cuda",
dtype=torch.bfloat16). Passed through to HookedTransformer.from_pretrained().
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>