Skip to content

Fixing memory leak in Joint QKV Attention Bridge#1229

Merged
jlarson4 merged 2 commits intodev-3.x-canaryfrom
bug/joint-qkb-attention-memory-leak
Apr 2, 2026
Merged

Fixing memory leak in Joint QKV Attention Bridge#1229
jlarson4 merged 2 commits intodev-3.x-canaryfrom
bug/joint-qkb-attention-memory-leak

Conversation

@jlarson4
Copy link
Copy Markdown
Collaborator

@jlarson4 jlarson4 commented Apr 2, 2026

Description

#1043 introduced a flaw in the deepcopy initially added in #960.

Before #1043, deepcopy(blocks_template) was cheap because the template's submodules didn't hold bound methods referencing the adapter. Once split_qkv_matrix became a bound adapter method stored on the attention submodule, the deepcopy started transitively copying the entire adapter (and everything it references) once per attention layer.

This affected all architectures that used the JointQKVAttentionBridge class: bloom, neox, pythia, and qwen.

The solution was to override __deepcopy__ for the JointQKVAttentionBridge, to ensure that each copy shares a reference to the original split_qkv_matrix & config, rather than creating their own unique copies in memory

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

@jlarson4 jlarson4 merged commit 407501e into dev-3.x-canary Apr 2, 2026
18 checks passed
jlarson4 added a commit that referenced this pull request Apr 2, 2026
* Fixing bug with deepcopy that was unintentionally copying all weights in every layer of Joint QKV Attention Bridge

* Additional model testing to confirm fix was not negatively impacting models
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant