Skip to content

EnvCommons/Skywork-OR1

Repository files navigation

Skywork-OR1-RL-Data

⭐ OpenReward Environment Hugging Face Dataset

Description

Skywork-OR1-RL-Data is an environment for evaluating agents on mathematical reasoning and code generation tasks. It is based on the Skywork-OR1 RL training dataset from Skywork AI, consisting of 105,000 math problems and 14,112 code problems curated from diverse open-source datasets including NuminaMath, DeepScaler, and competitive programming collections.

The environment has two variants:

  • skyworkmath: Single-turn math problem solving with rule-based answer verification via math_verify
  • skyworkcode: Multi-step code generation with sandbox test execution

Capabilities

  • Mathematical reasoning across competition-level problems (olympiads, AMC/AIME, Chinese contests)
  • Code generation with stdin/stdout test case verification
  • Rule-based verifiable rewards (no LLM grader needed for either variant)

Compute Requirements

The math variant does not require a sandbox and has minimal compute requirements. The code variant provides agents with a sandbox (0.5 CPU, 1GB RAM) for code development and execution.

License

Apache 2.0 (matching the original dataset license).

Tasks

Variant Split Tasks
skyworkmath train ~99,750
skyworkmath test ~5,250
skyworkcode train ~13,400
skyworkcode test ~700

Math tasks: Each task presents a competition-level math problem. The agent submits an answer via the answer tool, which is verified using the math_verify library against one or more acceptable ground truth answers.

Code tasks: Each task presents a programming problem. The agent uses CLI tools to develop a Python solution that reads from stdin and writes to stdout, then submits it via the submit tool which runs it against hidden test cases.

Reward Structure

Math variant: Binary reward (0 or 1). The answer is verified using rule-based math_verify comparison against ground truth.

Code variant: Proportional reward (0.0 to 1.0) based on fraction of test cases passed: passed / total.

We do not use LLM graders for this environment.

Data

Problems are sourced from the Skywork/Skywork-OR1-RL-Data HuggingFace dataset. Math problems originate from NuminaMath-1.5, DeepScaler, and STILL collections. Code problems include competitive programming challenges. The dataset includes model-aware difficulty scores from DeepSeek-R1-Distill variants (1.5B, 7B, 32B).

Tools

Math variant (1 tool):

  • answer: Submit a final answer for rule-based verification

Code variant (10 tools):

  • CLI tools: bash, glob, grep, ls, read, write, edit, multi_edit, todo_write
  • submit: Submit a solution file for execution against hidden test cases

Time Horizon

The math variant is single-turn (one tool call per task). The code variant is multi-step, allowing iterative development and testing before final submission.

Other Environment Requirements

The code variant requires an OpenReward API key for sandbox provisioning. The math variant has no additional requirements.

Safety

Agents interact only with mathematical problems or write code in a sandboxed environment. No real-world systems are affected. Code execution is sandboxed with resource limits (0.5 CPU, 1GB RAM) and per-test-case timeouts (10 seconds).

Citations

@article{he2025skywork,
  title={Skywork Open Reasoner 1 Technical Report},
  author={Jujie He and Jiacai Liu and Chris Yuhao Liu and Rui Yan and Chaojie Wang and Peng Cheng and Xiaoyu Zhang and Fuxiang Zhang and Jiacheng Xu and Wei Shen and Siyuan Li and Liang Zeng and Tianwen Wei and Cheng Cheng and Bo An and Yang Liu and Yahui Zhou},
  journal={arXiv preprint arXiv:2505.22312},
  year={2025}
}

About

Skywork-OR1

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors