Add `return_cache` option to `TransformerBridge.generate` by RecreationalMath · Pull Request #1337 · TransformerLensOrg/TransformerLens

RecreationalMath · 2026-05-27T21:30:47Z

Description

Adds an opt-in return_cache flag to TransformerBridge.generate(). When return_cache=True, generate returns (output, cache) where cache is a standard ActivationCache over the full prompt+generated sequence, identical to run_with_cache(output). This resolves the gap in #697, where run_with_cache only covers the prompt and generate returns no activations. A names_filter argument lets callers scope the cache, and a device argument offloads the returned cache to another device (e.g. CPU); the cache over prompt+max_new_tokens can be large, so the docstring notes the memory cost.

Semantics are "recompute one clean forward over the generated sequence," so the cache is consistent with the rest of TransformerLens, includes attention patterns and all hook points, and avoids the cached-eager-attention path behind #1322. For a causal LM this is numerically identical to capturing during generation (verified), without the ragged per-step shapes. This PR covers single-sequence decoder-only text; encoder-decoder / SSM / multimodal / batched / inputs_embeds raise a clear error pointing to the run_with_cache-on-output workaround. Capturing during generation (for active-hook/steering scenarios) remains available via with model.hooks(...) around generate and can be added later as an explicit opt-in.

Fixes #697

Type of change

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

…ge.generate generate(return_cache=True) now also returns an ActivationCache for the full prompt + generated sequence, identical to run_with_cache(output), via one clean recompute forward over the output. Adds names_filter and device passthroughs to scope and offload the cache. Supported for single-sequence decoder-only text generation; encoder-decoder, SSM, multimodal, batched, and inputs_embeds inputs raise a clear NotImplementedError pointing to run_with_cache. Device offload moves cache_dict directly to avoid ActivationCache.to's spurious move_model DeprecationWarning.

RecreationalMath · 2026-05-27T21:36:58Z

Heads-up on a small follow-up, not a blocker for this PR.

The device= offload here moves the cache tensors directly (cache.cache_dict = {k: v.to(device) ...}) rather than passing device= into run_with_cache. That is deliberate as run_with_cache(device=) currently moves the whole model to the cache device and never restores it (filed as #1336). The direct move is correct and side-effect-free, but it offloads the cache after it is built, so it does not reduce peak memory.

Once #1336 is fixed, this can be simplified to a run_with_cache(output_tokens, names_filter=names_filter, device=device) passthrough, which offloads at capture time and therefore lowers peak memory for large caches.

I'd recommend keeping this as a self-contained version and switching to the passthrough in a small follow-up PR once #1336 lands. But if you prefer to hold this PR until #1336 is in and use the passthrough here directly, let me know @jlarson4

jlarson4 · 2026-05-28T13:55:25Z

I agree with your assessment here, and think it is fine to merge this as-is with the temporary solution. Typically, I'd ask you to add a note to #1336 that it should update this as a side-effect of the solution, but since you're the one handling that issue, I'll trust that you take care of it.

Thank you for the thorough investigation of both this issue and the new one you discovered! Great work

RecreationalMath added 3 commits May 28, 2026 01:35

Add tests for TransformerBridge.generate return_cache

1ba61ae

Make return_cache tests macOS-safe (use_past_kv_cache=False)

b62664f

RecreationalMath mentioned this pull request May 28, 2026

[Bug Report] ActivationCache.to() always warns "move_model is deprecated", even when not passed #1342

Open

1 task

jlarson4 merged commit f676d8a into TransformerLensOrg:dev May 28, 2026
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `return_cache` option to `TransformerBridge.generate`#1337

Add `return_cache` option to `TransformerBridge.generate`#1337
jlarson4 merged 3 commits into
TransformerLensOrg:devfrom
RecreationalMath:return-cache-in-generate

RecreationalMath commented May 27, 2026

Uh oh!

RecreationalMath commented May 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

jlarson4 commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RecreationalMath commented May 27, 2026

Description

Type of change

Checklist:

Uh oh!

RecreationalMath commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jlarson4 commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RecreationalMath commented May 27, 2026 •

edited

Loading