[Docs] Small update on docs by CharlieFRuan · Pull Request #1348 · NovaSky-AI/SkyRL

CharlieFRuan · 2026-03-19T06:27:51Z

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

gemini-code-assist

Code Review

This pull request adds a helpful clarification to the step-wise-training.mdx documentation, explaining how the structure of GeneratorOutput changes for step-wise training. The change correctly states that each element in the output lists corresponds to a single step rather than a full trajectory. I've included one suggestion to make the wording slightly more precise for improved clarity.

gemini-code-assist · 2026-03-19T06:29:09Z


 ## GeneratorOutput Format

+Normally, each element in `GeneratorOutput` (i.e. `response_ids[i]`, `prompt_token_ids[i]`, `rewards[i]`, etc.) represents a single trajectory. With step-wise training, each element instead represents a single **step** (one LLM turn within a trajectory). A trajectory with 3 turns produces 3 elements rather than 1.


This explanation is very helpful. To make it even more precise and avoid potential confusion, you could clarify that this per-step/per-trajectory structure applies specifically to the list-based fields in GeneratorOutput. The GeneratorOutput TypedDict also contains non-list fields like rollout_metrics, which are aggregated for the entire batch and don't follow this pattern. Specifying this distinction will make the documentation more robust.

Normally, for the list-based fields in `GeneratorOutput` (e.g., `response_ids`, `prompt_token_ids`, `rewards`), each element represents a single trajectory. With step-wise training, each element instead represents a single **step** (one LLM turn within a trajectory). A trajectory with 3 turns produces 3 elements rather than 1.

[Docs] Small update on docs

4f92b22

CharlieFRuan merged commit 72f8d86 into main Mar 19, 2026
1 check was pending

CharlieFRuan deleted the trivial branch March 19, 2026 06:27

devin-ai-integration bot reviewed Mar 19, 2026

View reviewed changes

gemini-code-assist bot reviewed Mar 19, 2026

View reviewed changes

devpatelio pushed a commit that referenced this pull request Mar 20, 2026

[Docs] Small update on docs (#1348)

493387c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs] Small update on docs#1348

[Docs] Small update on docs#1348
CharlieFRuan merged 1 commit intomainfrom
trivial

CharlieFRuan commented Mar 19, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		## GeneratorOutput Format

		Normally, each element in `GeneratorOutput` (i.e. `response_ids[i]`, `prompt_token_ids[i]`, `rewards[i]`, etc.) represents a single trajectory. With step-wise training, each element instead represents a single step (one LLM turn within a trajectory). A trajectory with 3 turns produces 3 elements rather than 1.

Conversation

CharlieFRuan commented Mar 19, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CharlieFRuan commented Mar 19, 2026 •

edited by devin-ai-integration bot

Loading