AceReason-Math

Description

AceReason-Math is an environment for evaluating mathematical reasoning on a curated set of 49,585 math problems. The dataset was created by NVIDIA as part of the AceReason-Nemotron project, which uses reinforcement learning to advance math and code reasoning capabilities. Problems are sourced from the NuminaMath and DeepScaler-Preview datasets and filtered for quality, excluding multiple sub-questions, multiple-choice problems, true/false questions, proofs, and figure-based problems. Each problem has a short verified answer.

Capabilities

Mathematical reasoning across diverse topics and difficulty levels
Multi-step problem solving
Algebraic manipulation and computation
Numerical answer extraction

Compute Requirements

AceReason-Math is a lightweight, single-turn environment. No sandbox or significant compute resources are required beyond the agent's own inference.

License

CC BY 4.0.

Tasks

There is one split in this environment:

train: 49,585 math problems sourced from NuminaMath and DeepScaler-Preview, filtered for quality by the NVIDIA team.

Each task presents the agent with a math problem statement. The agent must solve the problem and submit its answer using the answer tool.

Reward Structure

AceReason-Math uses a binary, deterministic reward:

1.0 if the submitted answer is correct
0.0 if the submitted answer is incorrect

Grading is performed using the math-verify library, which parses and verifies mathematical expressions for equivalence. No LLM grader is used.

Data

The 49,585 math problems are sourced from the nvidia/AceReason-Math dataset on Hugging Face. Each record contains a problem field (the problem statement) and an answer field (the ground-truth short answer). Data files are stored on the OpenReward platform.

Tools

AceReason-Math exposes a single tool:

Tool	Parameters	Description
`answer`	`answer: str`	Submits the agent's final answer. The answer is parsed and verified against the ground truth using math-verify. This call ends the episode.

Time Horizon

AceReason-Math is a single-turn environment. The agent receives the problem in the prompt and is expected to make exactly one tool call (answer) to submit its solution.

Environment Difficulty

The dataset spans a wide range of difficulty levels, from straightforward arithmetic to competition-level mathematics.

Other Environment Requirements

There are no further environment requirements; AceReason-Math works out of the box with the OpenReward endpoint without any external API keys.

Safety

AceReason-Math is a purely mathematical evaluation environment. The agent solves well-defined math problems with known correct answers and does not interact with external systems, APIs, or other agents. The environment presents no direct or indirect safety risks.

Citation

@article{chen2025acereason,
  title={AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning},
  author={Chen, Yang and Yang, Zhuolin and Liu, Zihan and Lee, Chankyu and Xu, Peng and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
  journal={arXiv preprint arXiv:2505.16400},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt
sample_agent.py		sample_agent.py
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AceReason-Math

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AceReason-Math

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages