Skip to content

Conversation

@CharlieFRuan
Copy link
Collaborator

Before this PR, in TerminalBenchGenerator and MiniSweAgentGenerator, we naively generate loss mask for a chat history by doing, all zeros for user messages, and all ones for assistant messages.

However, this is incorrect. For instance, the generation prompt of assistant should be masked with zero, and the token \n that comes after the EOS for Qwen models should be masked with zero.

This PR fixes this by implementing get_response_ids_and_loss_mask_from_messages(), which uses the fixed-base helper encode_messages_subset(), to convert a chat history (excluding the initial prompt, which may differ by models) into response IDs, loss mask, and optionally rollout logprobs.

Besides, TerminalBenchGenerator had a bug of

prompt_ids = self.tokenizer.apply_chat_template(
    prompt,
    add_generation_prompt=True,
...

which is incorrect because the messages below will add the generation prompt.

We also add extensive unit tests. The unit tests are implemented by Opus 4.5, which are reviewed and iterated by me.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively refactors the loss mask generation by introducing a new utility function, get_response_ids_and_loss_mask_from_messages. This new function correctly handles the complexities of masking for different parts of a conversation, such as generation prompts and tokens following an EOS token, which is a significant improvement over the previous naive implementation. The associated bug fix in TerminalBenchGenerator regarding add_generation_prompt is also correctly addressed. Furthermore, the inclusion of extensive and thorough unit tests for the new utility function is commendable and greatly enhances the reliability of this change. I have one piece of feedback regarding a misleading error message that could hinder future debugging efforts.

@CharlieFRuan CharlieFRuan merged commit 7397fed into NovaSky-AI:main Nov 26, 2025
@CharlieFRuan CharlieFRuan deleted the pr-112525-fix-customGen-mask branch November 26, 2025 02:35
CharlieFRuan added a commit to mlfoundations/SkyRL that referenced this pull request Nov 26, 2025
…I#710)

Before this PR, in `TerminalBenchGenerator` and `MiniSweAgentGenerator`,
we naively generate loss mask for a chat history by doing, all zeros for
user messages, and all ones for assistant messages.

However, this is incorrect. For instance, the generation prompt of
assistant should be masked with zero, and the token `\n` that comes
after the EOS for Qwen models should be masked with zero.

This PR fixes this by implementing
`get_response_ids_and_loss_mask_from_messages()`, which uses the
fixed-base helper `encode_messages_subset()`, to convert a chat history
(excluding the initial prompt, which may differ by models) into response
IDs, loss mask, and optionally rollout logprobs.

Besides, TerminalBenchGenerator had a bug of

```python
prompt_ids = self.tokenizer.apply_chat_template(
    prompt,
    add_generation_prompt=True,
...
```

which is incorrect because the messages below will add the generation
prompt.

We also add extensive unit tests. The unit tests are implemented by Opus
4.5, which are reviewed and iterated by me.
devpatelio pushed a commit to devpatelio/SkyRL that referenced this pull request Nov 29, 2025
…I#710)

Before this PR, in `TerminalBenchGenerator` and `MiniSweAgentGenerator`,
we naively generate loss mask for a chat history by doing, all zeros for
user messages, and all ones for assistant messages.

However, this is incorrect. For instance, the generation prompt of
assistant should be masked with zero, and the token `\n` that comes
after the EOS for Qwen models should be masked with zero.

This PR fixes this by implementing
`get_response_ids_and_loss_mask_from_messages()`, which uses the
fixed-base helper `encode_messages_subset()`, to convert a chat history
(excluding the initial prompt, which may differ by models) into response
IDs, loss mask, and optionally rollout logprobs.

Besides, TerminalBenchGenerator had a bug of 

```python
prompt_ids = self.tokenizer.apply_chat_template(
    prompt,
    add_generation_prompt=True,
...
```

which is incorrect because the messages below will add the generation
prompt.

We also add extensive unit tests. The unit tests are implemented by Opus
4.5, which are reviewed and iterated by me.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant