BeyondAIME is an environment for evaluating advanced mathematical reasoning on 100 competition-level problems with difficulty at or above AIME problems #11-15. All problems have been manually revised to be unique and contamination-resistant, focusing on reasoning rather than domain-specific knowledge. Answers are integers, enabling unambiguous automated evaluation.
- Advanced competition-level mathematical reasoning
- Integer answer validation with exact match
- Problems spanning algebra, number theory, combinatorics, and geometry
Agents are given a standard environment with no sandbox or file system access.
CC0 1.0 Universal (Public Domain).
There is one split in this environment:
- test: 100 tasks
Each problem requires an integer answer (range: 3 to 33,124,147).
Single-turn evaluation with deterministic grading. The agent submits an integer answer via the submit_answer tool. The submitted answer is compared via exact integer match against the ground truth. Reward is 1.0 if correct, 0.0 if incorrect.
beyondaime_test.parquet (100 problems) sourced from HuggingFace ByteDance-Seed/BeyondAIME. Stored on the OpenReward platform.
| Tool | Description |
|---|---|
submit_answer |
Submit an integer answer. Deterministic evaluation via exact match. Ends the episode. |
Single-turn. The agent reads the problem and submits one integer answer.
BeyondAIME is designed to be significantly harder than standard AIME problems, with difficulty at or above AIME #11-15. Evaluation results from ByteDance-Seed:
| Model | Accuracy |
|---|---|
| OpenAI o3-mini | 63.6% |
| Gemini 2.5 Pro | 58.8% |
| Seed-Thinking-v1.5 | 48.0% |
| DeepSeek R1 | 42.4% |
There are no further environment requirements; BeyondAIME works out of the box with the OpenReward endpoint without any external API keys.
Agents in BeyondAIME solve advanced mathematics problems in a standard environment. The environment does not present direct safety risks.
@misc{beyondaime2025,
title={BeyondAIME: Advancing Math Reasoning Evaluation Beyond High School Olympiads},
author={ByteDance-Seed},
year={2025},
url={https://huggingface.co/datasets/ByteDance-Seed/BeyondAIME}
}