Feat/Add Compactor press#143
Conversation
|
Thanks a lot for this PR, the method looks quite interesting! As for the DCO error, all Nvidia OS repositories require commits to be signed off. |
maxjeblick
left a comment
There was a problem hiding this comment.
Thanks a lot for the PR, it was a real pleasure reading the paper and code!
Regarding the code, it looks good to me overall; I left a few comments.
There's one suggestion to allow for additional experiments:
Since you are blending the z-normalized scores of two methods, it would make sense to introduce two independent scorer presses (one for l_scores, one for attn_scores) and define your press as a wrapper press that blends both base presses.
By this, it would be possible to combine the outlier detection with other presses defined in the library. We (the maintainers) can also work on this potential feautre in a subseuqent PR if you prefer to keep the press as is.
|
I'm glad you enjoyed the paper :) and thanks for the feedback. The paper should indeed say that we center the key states, and I will update it to do so. As you suggested, I split the press into two presses and added some clarifying comments + tests. Also, I have some code ready to go for "calibrated" compression as described in the paper (for any ScorerPress). Do you think it would be best open a new PR for that, or add it here? Please let me know if there are any issues! Thanks! |
This sounds great! IMO, it is best to open another PR implementing "calibrated" compression. |
|
/ok to test a391627 |
maxjeblick
left a comment
There was a problem hiding this comment.
LGTM, thanks a lot!
Regarding DCO:
- You should be able to sign off the missing commits.
- Otherwise, feel free to open a new PR with the same code (stashed to a single, signed commit)
Signed-off-by: Vivek Chari <viveknchari@gmail.com>
Signed-off-by: Vivek Chari <viveknchari@gmail.com>
Signed-off-by: Vivek Chari <viveknchari@gmail.com>
… generator to sketching matrix generation Signed-off-by: Vivek Chari <viveknchari@gmail.com>
Signed-off-by: Vivek Chari <viveknchari@gmail.com>
a391627 to
6552dff
Compare
|
Done! |
|
/ok to test 6552dff |
PR description
Adds Compactor Press, from https://arxiv.org/abs/2507.08143. It does quite well, and ends up being reasonably lightweight (about the same inference runtime as SnapKV). So far the code only supports prefill (otherwise we would need to unrotate keys, and keep a long buffer of queries, which slows down decoding considerably), so I modified the test-runner to skip CompactorPress when testing the decoder press.
Please let me know if there are any issues! Thanks!
Checklist
Before submitting a PR, please make sure:
[✅] Tests are working (
make test)[✅] Code is formatted correctly (
make style, on errors try fix withmake format)[✅] Copyright header is included
[✅] All commits are signed-off using
git commit -s(it says I didn't, but the commits show that I did. not sure what's wrong here)[✅] (new press)
mypress_press.pyis in thepressesdirectory[✅] (new press)
MyPressis in__init__.py[✅] (new press)
README.mdis updated with a 1 liner about the new press in the Available presses section[✅] (new press) New press is in the
default_presseslist intests/default_presses.py[✅] (new press) A docstring is provided that follows the same structure as the existing ones