Skip to content

Add top-level enable_thinking and reasoning_effort to RL config#526

Merged
JannikSt merged 2 commits intomainfrom
feature/rl-reasoning-controls
Apr 23, 2026
Merged

Add top-level enable_thinking and reasoning_effort to RL config#526
JannikSt merged 2 commits intomainfrom
feature/rl-reasoning-controls

Conversation

@JannikSt
Copy link
Copy Markdown
Member

@JannikSt JannikSt commented Apr 16, 2026

Adds the two hosted RL reasoning controls added to the backend in platform#1390 as top-level fields on RLConfig, so users don't have to guess the nested sampling.extra_body.chat_template_kwargs path.

  • enable_thinking: bool — Qwen3.5 / Nemotron
  • reasoning_effort: "low" | "medium" | "high" — GPT-OSS
  • Mutual exclusion validated client-side, mirrored from the backend schema
  • Forwarded 1:1 in create_run; server resolves model-family + trainer-image-version path

Note

Low Risk
Low risk: adds optional config fields and forwards them to the existing run-creation API, with a small new validation rule that only errors when both options are set.

Overview
Adds two optional, top-level hosted RL reasoning controls to RLConfigenable_thinking and reasoning_effort—including client-side mutual-exclusion validation and updated rl init template/docs.

Updates RLClient.create_run and the prime rl run flow to display these settings and forward them 1:1 in the /rft/runs payload, and adds tests covering config loading and rejection when both controls are provided.

Reviewed by Cursor Bugbot for commit 2dec5c6. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 43cbb1a. Configure here.

Comment thread packages/prime/src/prime_cli/commands/rl.py
@JannikSt JannikSt merged commit ed3bc8f into main Apr 23, 2026
19 of 20 checks passed
@JannikSt JannikSt deleted the feature/rl-reasoning-controls branch April 23, 2026 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant