Revert "[skyrl-train][step-wise] 1/N - Support step-wise training with `step_wise_training` flag" #706

CharlieFRuan · 2025-11-25T20:10:13Z

Reverts #694

The PR expects trajectory_ids to always be in the generator output, which currently is not enforced and is breaking. run_gsm8k.sh fails with https://gist.github.com/CharlieFRuan/cbbef69fde60a20d483d03efb13d60bb

…h `step_…" This reverts commit a30405f.

gemini-code-assist

Code Review

This pull request reverts the integration of step-wise training from the core library and refactors it into a self-contained example. This is a good approach to isolate the experimental feature and fix the bug mentioned in the description. The changes correctly decouple the step-wise logic from the main trainer and generator.

My review focuses on the newly added/moved example code. I've identified a few areas for improvement regarding code clarity, maintainability, and performance. Specifically, I've suggested renaming a method and a dataclass in the StepWiseGenerator to avoid confusion and Liskov Substitution Principle violations, pointed out a potentially incorrect check that restricts the use of custom chat templates, and highlighted a potential performance bottleneck in the StepWiseTrainer due to a GPU-to-CPU data transfer.

gemini-code-assist · 2025-11-25T20:13:20Z

skyrl-train/examples/step_wise/step_wise_generator.py

+@dataclass
+class AgentLoopOutput:
+    """Output from a single agent_loop execution."""
+
+    response_ids: List[int]
+    reward: Union[List[float], float]
+    stop_reason: str
+    loss_mask: List[int]
+    prompt_ids: List[int]
+    rollout_logprobs: Optional[List[float]]


This AgentLoopOutput dataclass is a near-duplicate of the one in skyrl_train.generators.skyrl_gym_generator. However, they represent different concepts: here it's the output of a single step, while in the base class it's the output of a whole trajectory. This name collision is confusing.

To improve clarity and avoid duplication, consider renaming this dataclass to something more specific, like StepOutput. This would make the code easier to understand and maintain.

gemini-code-assist · 2025-11-25T20:13:20Z

skyrl-train/examples/step_wise/step_wise_trainer.py

+        last_step_advantages, last_step_returns = ppo_utils.compute_advantages_and_returns(
+            token_level_rewards=last_step_rewards,
+            response_mask=response_mask[is_last_step],
+            index=index[is_last_step.cpu().numpy()],


The expression is_last_step.cpu().numpy() involves a GPU-to-CPU data transfer within the training loop, which can be a performance bottleneck, especially with large batch sizes. Since index is a NumPy array of strings, this transfer is necessary for boolean indexing.

To optimize this, consider keeping is_last_step on the CPU if possible, or explore ways to perform the grouping logic on the GPU before this step to avoid the synchronization overhead.

…h `step_wise_training` flag" (NovaSky-AI#706) Reverts NovaSky-AI#694 See NovaSky-AI#694 (comment) The PR expects `trajectory_ids` to always be in the generator output, which currently is not enforced and is breaking. `run_gsm8k.sh` fails with https://gist.github.com/CharlieFRuan/cbbef69fde60a20d483d03efb13d60bb

Revert "[skyrl-train][step-wise] 1/N - Support step-wise training wit…

83ef434

…h `step_…" This reverts commit a30405f.

gemini-code-assist bot reviewed Nov 25, 2025

View reviewed changes

CharlieFRuan merged commit cc5e8fe into main Nov 25, 2025
3 checks passed

CharlieFRuan deleted the revert-694-step-wise-native branch November 25, 2025 20:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert "[skyrl-train][step-wise] 1/N - Support step-wise training with `step_wise_training` flag" #706

Revert "[skyrl-train][step-wise] 1/N - Support step-wise training with `step_wise_training` flag" #706

Uh oh!

CharlieFRuan commented Nov 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Uh oh!

gemini-code-assist bot Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Revert "[skyrl-train][step-wise] 1/N - Support step-wise training with step_wise_training flag" #706

Revert "[skyrl-train][step-wise] 1/N - Support step-wise training with step_wise_training flag" #706

Uh oh!

Conversation

CharlieFRuan commented Nov 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Revert "[skyrl-train][step-wise] 1/N - Support step-wise training with `step_wise_training` flag" #706

Revert "[skyrl-train][step-wise] 1/N - Support step-wise training with `step_wise_training` flag" #706