Skip to content

v4.2.4

@designer-coderajay designer-coderajay tagged this 04 Apr 06:40
OOM fix — acdc.py: run_with_cache had no names_filter, caching all 50+
hooks per layer. Added names_filter restricted to hook_z and hook_resid_pre,
cutting VRAM usage from several GB to only what ACDC actually reads.

KL float64 — acdc.py: _test_edge_kl and _circuit_kl summed 128k float32
terms for large-vocab models (Llama-3). Both converted to float64 before
summation, recovering ~3 digits of KL precision.

Edge-count warning — acdc.py: Llama-3-7B has ~540k candidate edges,
70B has ~13.4M. Added upfront logger.warning at >100k edges redirecting
users to the MFC algorithm for large models.

enable_grad — core.py: IG attribution silently returned zero gradients when
analyze() was called inside torch.no_grad(). Wrapped each IG step in
torch.enable_grad() so gradient tracking is forced regardless of outer context.

dtype kwarg — core.py: GlassboxV2 string constructor now accepts dtype= for
bf16/fp16 large-model loading: GlassboxV2("llama-3-8b", device="cuda",
dtype=torch.bfloat16). Passed through to HookedTransformer.from_pretrained().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Assets 2
Loading