Skip to content

EnvCommons/BeyondAIME

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BeyondAIME

OpenReward Environment Hugging Face Dataset

Description

BeyondAIME is an environment for evaluating advanced mathematical reasoning on 100 competition-level problems with difficulty at or above AIME problems #11-15. All problems have been manually revised to be unique and contamination-resistant, focusing on reasoning rather than domain-specific knowledge. Answers are integers, enabling unambiguous automated evaluation.

Capabilities

  • Advanced competition-level mathematical reasoning
  • Integer answer validation with exact match
  • Problems spanning algebra, number theory, combinatorics, and geometry

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

CC0 1.0 Universal (Public Domain).

Tasks

There is one split in this environment:

  • test: 100 tasks

Each problem requires an integer answer (range: 3 to 33,124,147).

Reward Structure

Single-turn evaluation with deterministic grading. The agent submits an integer answer via the submit_answer tool. The submitted answer is compared via exact integer match against the ground truth. Reward is 1.0 if correct, 0.0 if incorrect.

Data

beyondaime_test.parquet (100 problems) sourced from HuggingFace ByteDance-Seed/BeyondAIME. Stored on the OpenReward platform.

Tools

Tool Description
submit_answer Submit an integer answer. Deterministic evaluation via exact match. Ends the episode.

Time Horizon

Single-turn. The agent reads the problem and submits one integer answer.

Environment Difficulty

BeyondAIME is designed to be significantly harder than standard AIME problems, with difficulty at or above AIME #11-15. Evaluation results from ByteDance-Seed:

Model Accuracy
OpenAI o3-mini 63.6%
Gemini 2.5 Pro 58.8%
Seed-Thinking-v1.5 48.0%
DeepSeek R1 42.4%

Other Environment Requirements

There are no further environment requirements; BeyondAIME works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in BeyondAIME solve advanced mathematics problems in a standard environment. The environment does not present direct safety risks.

Citation

@misc{beyondaime2025,
  title={BeyondAIME: Advancing Math Reasoning Evaluation Beyond High School Olympiads},
  author={ByteDance-Seed},
  year={2025},
  url={https://huggingface.co/datasets/ByteDance-Seed/BeyondAIME}
}

About

BeyondAIME env

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors