Skip to content

refactor: optimized covariance transform in ExpectedAttentionPress#111

Merged
alessiodevoto merged 1 commit intoNVIDIA:mainfrom
neuralsorcerer:patch-1
Aug 7, 2025
Merged

refactor: optimized covariance transform in ExpectedAttentionPress#111
alessiodevoto merged 1 commit intoNVIDIA:mainfrom
neuralsorcerer:patch-1

Conversation

@neuralsorcerer
Copy link
Copy Markdown
Contributor

Changes:

  • Compute per-head query covariance directly in the projected query space, avoiding any intermediate $O((n * d)^2)$ hidden-state covariance tensors.

Why?

  • A quick benchmark with n=32, d=128, seq_len=64 showed the old method taking ~1.87s and storing 33,554,432 elements vs. ~0.04s and 1,048,576 elements for this approach about 32x less memory and ~50x faster.

Signed-off-by: Soumyadip Sarkar <soumya.papanvk18@gmail.com>
Copy link
Copy Markdown
Collaborator

@alessiodevoto alessiodevoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @neuralsorcerer ! I just revised and tested the code and I believe we can merge, thanks for opening this PR and contributing to KVPress !

@alessiodevoto alessiodevoto merged commit e079b22 into NVIDIA:main Aug 7, 2025
3 checks passed
@neuralsorcerer neuralsorcerer deleted the patch-1 branch August 7, 2025 09:04
maxjeblick pushed a commit that referenced this pull request Aug 12, 2025
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
maxjeblick pushed a commit that referenced this pull request Aug 12, 2025
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants