Skip to content

Feat/Add Compactor press#143

Merged
maxjeblick merged 6 commits intoNVIDIA:mainfrom
vnchari:feat/compactor-press
Nov 4, 2025
Merged

Feat/Add Compactor press#143
maxjeblick merged 6 commits intoNVIDIA:mainfrom
vnchari:feat/compactor-press

Conversation

@vnchari
Copy link
Copy Markdown
Contributor

@vnchari vnchari commented Oct 17, 2025

PR description

Adds Compactor Press, from https://arxiv.org/abs/2507.08143. It does quite well, and ends up being reasonably lightweight (about the same inference runtime as SnapKV). So far the code only supports prefill (otherwise we would need to unrotate keys, and keep a long buffer of queries, which slows down decoding considerably), so I modified the test-runner to skip CompactorPress when testing the decoder press.

Please let me know if there are any issues! Thanks!

Compression Ratio 0.25 0.5 0.75 0.95
Llama 3.1-8B 94.0 87.7 77.8 63.5
Qwen3-8B 92.6 85.0 72.3 51.7

Checklist

Before submitting a PR, please make sure:

  • [✅] Tests are working (make test)

  • [✅] Code is formatted correctly (make style, on errors try fix with make format)

  • [✅] Copyright header is included

  • [✅] All commits are signed-off using git commit -s (it says I didn't, but the commits show that I did. not sure what's wrong here)

  • [✅] (new press) mypress_press.py is in the presses directory

  • [✅] (new press) MyPress is in __init__.py

  • [✅] (new press) README.md is updated with a 1 liner about the new press in the Available presses section

  • [✅] (new press) New press is in the default_presses list in tests/default_presses.py

  • [✅] (new press) A docstring is provided that follows the same structure as the existing ones

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Oct 17, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@maxjeblick
Copy link
Copy Markdown
Collaborator

Thanks a lot for this PR, the method looks quite interesting!
We will go through your paper and code this week and report back.

As for the DCO error, all Nvidia OS repositories require commits to be signed off.

Copy link
Copy Markdown
Collaborator

@maxjeblick maxjeblick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR, it was a real pleasure reading the paper and code!

Regarding the code, it looks good to me overall; I left a few comments.

There's one suggestion to allow for additional experiments:
Since you are blending the z-normalized scores of two methods, it would make sense to introduce two independent scorer presses (one for l_scores, one for attn_scores) and define your press as a wrapper press that blends both base presses.
By this, it would be possible to combine the outlier detection with other presses defined in the library. We (the maintainers) can also work on this potential feautre in a subseuqent PR if you prefer to keep the press as is.

Comment thread kvpress/presses/compactor_press.py Outdated
Comment thread kvpress/presses/compactor_press.py Outdated
Comment thread kvpress/presses/compactor_press.py Outdated
@vnchari
Copy link
Copy Markdown
Contributor Author

vnchari commented Oct 24, 2025

I'm glad you enjoyed the paper :) and thanks for the feedback. The paper should indeed say that we center the key states, and I will update it to do so. As you suggested, I split the press into two presses and added some clarifying comments + tests. Also, I have some code ready to go for "calibrated" compression as described in the paper (for any ScorerPress). Do you think it would be best open a new PR for that, or add it here?

Please let me know if there are any issues! Thanks!

@maxjeblick
Copy link
Copy Markdown
Collaborator

Also, I have some code ready to go for "calibrated" compression as described in the paper (for any ScorerPress). Do you think it would be best open a new PR for that, or add it here?

This sounds great! IMO, it is best to open another PR implementing "calibrated" compression.

@maxjeblick
Copy link
Copy Markdown
Collaborator

/ok to test a391627

Copy link
Copy Markdown
Collaborator

@maxjeblick maxjeblick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot!

Regarding DCO:

  • You should be able to sign off the missing commits.
  • Otherwise, feel free to open a new PR with the same code (stashed to a single, signed commit)

Vivek Chari and others added 6 commits November 3, 2025 15:44
Signed-off-by: Vivek Chari <viveknchari@gmail.com>
Signed-off-by: Vivek Chari <viveknchari@gmail.com>
Signed-off-by: Vivek Chari <viveknchari@gmail.com>
Signed-off-by: Vivek Chari <viveknchari@gmail.com>
… generator to sketching matrix generation

Signed-off-by: Vivek Chari <viveknchari@gmail.com>
Signed-off-by: Vivek Chari <viveknchari@gmail.com>
@vnchari vnchari force-pushed the feat/compactor-press branch from a391627 to 6552dff Compare November 3, 2025 20:44
@vnchari
Copy link
Copy Markdown
Contributor Author

vnchari commented Nov 3, 2025

Done!

@maxjeblick
Copy link
Copy Markdown
Collaborator

/ok to test 6552dff

@maxjeblick maxjeblick merged commit 9405be0 into NVIDIA:main Nov 4, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants