Bug
Hi, there.
Thanks for the wonderful kvpress.
I'm reading the paper "Expected Attention", and it's claimed AIME2025 is tested with good accuracy.
However, there's no AIME dataset supported in evaluate_registry.py.
- first just to confirm at first, does ExpectedAttn on AIME2025 supported e.g. using DeepSeek-R1-Qwen-7b ? is there a script to reproduce the Fig4.
- or simply run with ExpectedAttn pipeline and input AIME25 question?
- since AIME25 is kind of short prompt + long decoding tokens, so KV eviction happens on-demand during decode stage?
To Reproduce
No AIME dataset listed in evaluation/evaluate_registry.py
Repository version
master
Bug
Hi, there.
Thanks for the wonderful kvpress.
I'm reading the paper "Expected Attention", and it's claimed AIME2025 is tested with good accuracy.
However, there's no AIME dataset supported in evaluate_registry.py.
To Reproduce
No AIME dataset listed in evaluation/evaluate_registry.py
Repository version
master