[train][TBench][MiniSwe] Fix custom generator loss masking #710

CharlieFRuan · 2025-11-26T02:27:36Z

Before this PR, in TerminalBenchGenerator and MiniSweAgentGenerator, we naively generate loss mask for a chat history by doing, all zeros for user messages, and all ones for assistant messages.

However, this is incorrect. For instance, the generation prompt of assistant should be masked with zero, and the token \n that comes after the EOS for Qwen models should be masked with zero.

This PR fixes this by implementing get_response_ids_and_loss_mask_from_messages(), which uses the fixed-base helper encode_messages_subset(), to convert a chat history (excluding the initial prompt, which may differ by models) into response IDs, loss mask, and optionally rollout logprobs.

Besides, TerminalBenchGenerator had a bug of

prompt_ids = self.tokenizer.apply_chat_template(
    prompt,
    add_generation_prompt=True,
...

which is incorrect because the messages below will add the generation prompt.

We also add extensive unit tests. The unit tests are implemented by Opus 4.5, which are reviewed and iterated by me.

gemini-code-assist

Code Review

This pull request effectively refactors the loss mask generation by introducing a new utility function, get_response_ids_and_loss_mask_from_messages. This new function correctly handles the complexities of masking for different parts of a conversation, such as generation prompts and tokens following an EOS token, which is a significant improvement over the previous naive implementation. The associated bug fix in TerminalBenchGenerator regarding add_generation_prompt is also correctly addressed. Furthermore, the inclusion of extensive and thorough unit tests for the new utility function is commendable and greatly enhances the reliability of this change. I have one piece of feedback regarding a misleading error message that could hinder future debugging efforts.

skyrl-train/skyrl_train/generators/utils.py

…I#710) Before this PR, in `TerminalBenchGenerator` and `MiniSweAgentGenerator`, we naively generate loss mask for a chat history by doing, all zeros for user messages, and all ones for assistant messages. However, this is incorrect. For instance, the generation prompt of assistant should be masked with zero, and the token `\n` that comes after the EOS for Qwen models should be masked with zero. This PR fixes this by implementing `get_response_ids_and_loss_mask_from_messages()`, which uses the fixed-base helper `encode_messages_subset()`, to convert a chat history (excluding the initial prompt, which may differ by models) into response IDs, loss mask, and optionally rollout logprobs. Besides, TerminalBenchGenerator had a bug of ```python prompt_ids = self.tokenizer.apply_chat_template( prompt, add_generation_prompt=True, ... ``` which is incorrect because the messages below will add the generation prompt. We also add extensive unit tests. The unit tests are implemented by Opus 4.5, which are reviewed and iterated by me.

[train][TBench][MiniSwe] Fix custom generator loss masking

e249947

gemini-code-assist bot reviewed Nov 26, 2025

View reviewed changes

skyrl-train/skyrl_train/generators/utils.py Show resolved Hide resolved

CharlieFRuan added 2 commits November 26, 2025 02:32

fix typo

fefd4a1

fix GPU CI

1588e7d

CharlieFRuan merged commit 7397fed into NovaSky-AI:main Nov 26, 2025

CharlieFRuan deleted the pr-112525-fix-customGen-mask branch November 26, 2025 02:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[train][TBench][MiniSwe] Fix custom generator loss masking #710

[train][TBench][MiniSwe] Fix custom generator loss masking #710

Uh oh!

CharlieFRuan commented Nov 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[train][TBench][MiniSwe] Fix custom generator loss masking #710

[train][TBench][MiniSwe] Fix custom generator loss masking #710

Uh oh!

Conversation

CharlieFRuan commented Nov 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant