Skywork-OR1-RL-Data is an environment for evaluating agents on mathematical reasoning and code generation tasks. It is based on the Skywork-OR1 RL training dataset from Skywork AI, consisting of 105,000 math problems and 14,112 code problems curated from diverse open-source datasets including NuminaMath, DeepScaler, and competitive programming collections.
The environment has two variants:
- skyworkmath: Single-turn math problem solving with rule-based answer verification via
math_verify - skyworkcode: Multi-step code generation with sandbox test execution
- Mathematical reasoning across competition-level problems (olympiads, AMC/AIME, Chinese contests)
- Code generation with stdin/stdout test case verification
- Rule-based verifiable rewards (no LLM grader needed for either variant)
The math variant does not require a sandbox and has minimal compute requirements. The code variant provides agents with a sandbox (0.5 CPU, 1GB RAM) for code development and execution.
Apache 2.0 (matching the original dataset license).
| Variant | Split | Tasks |
|---|---|---|
| skyworkmath | train | ~99,750 |
| skyworkmath | test | ~5,250 |
| skyworkcode | train | ~13,400 |
| skyworkcode | test | ~700 |
Math tasks: Each task presents a competition-level math problem. The agent submits an answer via the answer tool, which is verified using the math_verify library against one or more acceptable ground truth answers.
Code tasks: Each task presents a programming problem. The agent uses CLI tools to develop a Python solution that reads from stdin and writes to stdout, then submits it via the submit tool which runs it against hidden test cases.
Math variant: Binary reward (0 or 1). The answer is verified using rule-based math_verify comparison against ground truth.
Code variant: Proportional reward (0.0 to 1.0) based on fraction of test cases passed: passed / total.
We do not use LLM graders for this environment.
Problems are sourced from the Skywork/Skywork-OR1-RL-Data HuggingFace dataset. Math problems originate from NuminaMath-1.5, DeepScaler, and STILL collections. Code problems include competitive programming challenges. The dataset includes model-aware difficulty scores from DeepSeek-R1-Distill variants (1.5B, 7B, 32B).
Math variant (1 tool):
answer: Submit a final answer for rule-based verification
Code variant (10 tools):
- CLI tools:
bash,glob,grep,ls,read,write,edit,multi_edit,todo_write submit: Submit a solution file for execution against hidden test cases
The math variant is single-turn (one tool call per task). The code variant is multi-step, allowing iterative development and testing before final submission.
The code variant requires an OpenReward API key for sandbox provisioning. The math variant has no additional requirements.
Agents interact only with mathematical problems or write code in a sandboxed environment. No real-world systems are affected. Code execution is sandboxed with resource limits (0.5 CPU, 1GB RAM) and per-test-case timeouts (10 seconds).
@article{he2025skywork,
title={Skywork Open Reasoner 1 Technical Report},
author={Jujie He and Jiacai Liu and Chris Yuhao Liu and Rui Yan and Chaojie Wang and Peng Cheng and Xiaoyu Zhang and Fuxiang Zhang and Jiacheng Xu and Wei Shen and Siyuan Li and Liang Zeng and Tianwen Wei and Cheng Cheng and Bo An and Yang Liu and Yahui Zhou},
journal={arXiv preprint arXiv:2505.22312},
year={2025}
}