AceReason-Math is an environment for evaluating mathematical reasoning on a curated set of 49,585 math problems. The dataset was created by NVIDIA as part of the AceReason-Nemotron project, which uses reinforcement learning to advance math and code reasoning capabilities. Problems are sourced from the NuminaMath and DeepScaler-Preview datasets and filtered for quality, excluding multiple sub-questions, multiple-choice problems, true/false questions, proofs, and figure-based problems. Each problem has a short verified answer.
- Mathematical reasoning across diverse topics and difficulty levels
- Multi-step problem solving
- Algebraic manipulation and computation
- Numerical answer extraction
AceReason-Math is a lightweight, single-turn environment. No sandbox or significant compute resources are required beyond the agent's own inference.
There is one split in this environment:
- train: 49,585 math problems sourced from NuminaMath and DeepScaler-Preview, filtered for quality by the NVIDIA team.
Each task presents the agent with a math problem statement. The agent must solve the problem and submit its answer using the answer tool.
AceReason-Math uses a binary, deterministic reward:
- 1.0 if the submitted answer is correct
- 0.0 if the submitted answer is incorrect
Grading is performed using the math-verify library, which parses and verifies mathematical expressions for equivalence. No LLM grader is used.
The 49,585 math problems are sourced from the nvidia/AceReason-Math dataset on Hugging Face. Each record contains a problem field (the problem statement) and an answer field (the ground-truth short answer). Data files are stored on the OpenReward platform.
AceReason-Math exposes a single tool:
| Tool | Parameters | Description |
|---|---|---|
answer |
answer: str |
Submits the agent's final answer. The answer is parsed and verified against the ground truth using math-verify. This call ends the episode. |
AceReason-Math is a single-turn environment. The agent receives the problem in the prompt and is expected to make exactly one tool call (answer) to submit its solution.
The dataset spans a wide range of difficulty levels, from straightforward arithmetic to competition-level mathematics.
There are no further environment requirements; AceReason-Math works out of the box with the OpenReward endpoint without any external API keys.
AceReason-Math is a purely mathematical evaluation environment. The agent solves well-defined math problems with known correct answers and does not interact with external systems, APIs, or other agents. The environment presents no direct or indirect safety risks.
@article{chen2025acereason,
title={AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning},
author={Chen, Yang and Yang, Zhuolin and Liu, Zihan and Lee, Chankyu and Xu, Peng and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
journal={arXiv preprint arXiv:2505.16400},
year={2025}
}