Skip to content

Add AIME25 and Math500 benchmark dataset#142

Merged
Jack-Yu-815 merged 5 commits intomainfrom
max/aime25_eval
Oct 16, 2025
Merged

Add AIME25 and Math500 benchmark dataset#142
Jack-Yu-815 merged 5 commits intomainfrom
max/aime25_eval

Conversation

@maxjeblick
Copy link
Copy Markdown
Collaborator

@maxjeblick maxjeblick commented Oct 14, 2025

PR description

THis PR adds AIME25 and Math500 datasets used for evaluation of the decoding press.
The datasets have been used in https://arxiv.org/abs/2510.00636v1

This PR is based upon https://github.com/NVIDIA/kvpress/tree/aledev/decoding_eval
Fixes #141

Checklist

Before submitting a PR, please make sure:

  • Tests are working (make test)
  • Code is formatted correctly (make style, on errors try fix with make format)
  • Copyright header is included
  • All commits are signed-off using git commit -s

Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Oct 14, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@maxjeblick
Copy link
Copy Markdown
Collaborator Author

/ok to test 96a15f5

Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
@maxjeblick
Copy link
Copy Markdown
Collaborator Author

/ok to test bbc026c

Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
@maxjeblick
Copy link
Copy Markdown
Collaborator Author

/ok to test 67e2856

Copy link
Copy Markdown
Collaborator

@Jack-Yu-815 Jack-Yu-815 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Max. I don't see any issue.

@Jack-Yu-815 Jack-Yu-815 merged commit 6a9c828 into main Oct 16, 2025
3 checks passed
@Jack-Yu-815 Jack-Yu-815 deleted the max/aime25_eval branch October 16, 2025 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AIME dataset support?

2 participants