Add Expected Attention with Stats #120
Conversation
maxjeblick
left a comment
There was a problem hiding this comment.
Thanks a lot for opening the PR!
I've left some comments; the most important one is probably the one about how to compute mean query and mean covariance.
|
/ok to test |
@maxjeblick, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
|
/ok to test 24ef932 |
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
|
I have refactored the code significantly (sorry 😅), before computing and uploading stats to the hub and adding the press to tests and press list I would like to hear your opinion :) Major changes:
Let me know wdyt @maxjeblick 🙂 |
|
Hi! |
|
Thanks Max!
|
I had one comment (it is collapsed above) about the if-else logic returning The folder structure looks fine, it is nice that it is self-contained. |
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
|
Got it, I simplified the if else logic. Also, there was a bug in the computation (I was computing the stats but forgot to save them), now it is fixed. |
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
|
/ok to test e4157b2 |
|
Update:
|
|
/ok to test 7f93961 |
|
Had a dependency error because of "fire", fixed. All tests pass now 👌 |
|
Forgot to clean some comments, now it's should be /ok to test bc27dd8 |
|
/ok to test bc27dd8 |
PR description
So far Expected Attention only supports non QK norm models (no support for Gemma and Qwen) and it requires computing mean and covariance at inference time. This PR adds the option to load precomputed stats if they are available, or compute them on the fly if not. There are two main contributions:
Checklist
make test)make style, on errors try fix withmake format)git commit -smypress_press.pyis in thepressesdirectoryMyPressis in__init__.pyREADME.mdis updated with a 1 liner about the new press in the Available presses sectiondefault_presseslist intests/default_presses.py