Fixing memory leak in Joint QKV Attention Bridge by jlarson4 · Pull Request #1229 · TransformerLensOrg/TransformerLens

jlarson4 · 2026-04-02T21:48:26Z

Description

#1043 introduced a flaw in the deepcopy initially added in #960.

Before #1043, deepcopy(blocks_template) was cheap because the template's submodules didn't hold bound methods referencing the adapter. Once split_qkv_matrix became a bound adapter method stored on the attention submodule, the deepcopy started transitively copying the entire adapter (and everything it references) once per attention layer.

This affected all architectures that used the JointQKVAttentionBridge class: bloom, neox, pythia, and qwen.

The solution was to override __deepcopy__ for the JointQKVAttentionBridge, to ensure that each copy shares a reference to the original split_qkv_matrix & config, rather than creating their own unique copies in memory

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

… in every layer of Joint QKV Attention Bridge

…models

* Fixing bug with deepcopy that was unintentionally copying all weights in every layer of Joint QKV Attention Bridge * Additional model testing to confirm fix was not negatively impacting models

jlarson4 added 2 commits April 2, 2026 16:30

Fixing bug with deepcopy that was unintentionally copying all weights…

f5b59f4

… in every layer of Joint QKV Attention Bridge

Additional model testing to confirm fix was not negatively impacting …

007d57c

…models

jlarson4 merged commit 407501e into dev-3.x-canary Apr 2, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing memory leak in Joint QKV Attention Bridge#1229

Fixing memory leak in Joint QKV Attention Bridge#1229
jlarson4 merged 2 commits intodev-3.x-canaryfrom
bug/joint-qkb-attention-memory-leak

jlarson4 commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jlarson4 commented Apr 2, 2026

Description

Type of change

Checklist:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant