[GPT-OSS 1/N] Initial GPT-OSS support for single turn training#390
[GPT-OSS 1/N] Initial GPT-OSS support for single turn training#390SumanthRH merged 22 commits intoNovaSky-AI:mainfrom
Conversation
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces initial support for single-turn training with GPT-OSS models. This is achieved by adding a chat_template_kwargs configuration to pass specific arguments to the tokenizer's chat template, which is necessary for GPT-OSS. The changes include a new example script for running GSM8K with GPT-OSS, updates to the base configuration, and modifications to SkyRLGymGenerator to utilize the new kwargs.
My review focuses on improving script robustness and code maintainability. I've suggested making the new shell script more robust by adding error handling flags. I've also pointed out an opportunity to refactor the repeated use of the new configuration in SkyRLGymGenerator to improve code clarity and make future changes easier.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…into sumanthrh/gptoss
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
| trainer.run_name="gsm8k_test_gptoss_low" \ | ||
| trainer.resume_mode=latest \ | ||
| trainer.ckpt_path="$HOME/ckpts/gsm8k_1.5B_ckpt_gptoss" \ | ||
| +generator.chat_template_kwargs={reasoning_effort:'low'} \ |
There was a problem hiding this comment.
Do you need + since this config param exists?
There was a problem hiding this comment.
Yeah....without the plus I get:
Could not override 'generator.chat_template_kwargs'.
To append to your config use +generator.chat_template_kwargs={reasoning_effort:low}
Key 'reasoning_effort' is not in struct
full_key: generator.chat_template_kwargs.reasoning_effort
object_type=dictSigned-off-by: SumanthRH <sumanthrh99@gmail.com>
| - ``generator.max_turns``: Maximum number of turns for generation with multi-turn RL. | ||
| - ``generator.use_conversation_multi_turn``: Whether to use conversation format for multi-turn generation. If set to ``true`` then observations are appended to the chat history as a new turn. If set to ``false`` then observations are appended as-is to the assistant response in token space and generation is continued (after removing any EOS token in the response). We've observed some cases where model can be sensitive to chat history format (ex: in SkyRL-SQL), and thus ``false`` can be used for full control over the exact tokens added after environment interaction. | ||
| - ``generator.engine_init_kwargs``: Inference engine arguments passed directly to the vLLM or SGLang engine. To specify an engine arg in the CLI override, use the format: +generator.engine_init_kwargs.[arg_name]=value. If duplicate kwargs are passed or kwargs clash with existing generator arguments (e.g., ``tensor_parallel_size``), an error is raised. | ||
| - ``generator.chat_template``: Custom chat template configuration if needed. |
The config parameter `reasoning_effort` needs a + otherwise you'll see:
```bash
Could not override 'generator.chat_template_kwargs'.
To append to your config use +generator.chat_template_kwargs={reasoning_effort:low}
Key 'reasoning_effort' is not in struct
full_key: generator.chat_template_kwargs.reasoning_effort
object_type=dict
```
This was added by mistake in #390 while I was testing the script
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
…ky-AI#390) # What does this PR do? Initia GPT-OSS support for single turn training. Support training in mixed precision (BF16) and inference in half precision only at the moment. Given some quirks in chat templating, the current SkyRLGymGenerator is not compatible with GPT-OSS for multi-turn tasks. This PR further adds overrides for chat templating with `chat_template_kwargs` to be used in the agent loop in SkyRLGymGenerator. For GPT-OSS, we can provide the reasoning effort in the system prompt, so this feature is important. --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
The config parameter `reasoning_effort` needs a + otherwise you'll see:
```bash
Could not override 'generator.chat_template_kwargs'.
To append to your config use +generator.chat_template_kwargs={reasoning_effort:low}
Key 'reasoning_effort' is not in struct
full_key: generator.chat_template_kwargs.reasoning_effort
object_type=dict
```
This was added by mistake in NovaSky-AI#390 while I was testing the script
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>



What does this PR do?
Initia GPT-OSS support for single turn training.
Support training in mixed precision (BF16) and inference in half precision only at the moment.
Given some quirks in chat templating, the current SkyRLGymGenerator is not compatible with GPT-OSS for multi-turn tasks.
This PR further adds overrides for chat templating with
chat_template_kwargsto be used in the agent loop in SkyRLGymGenerator.For GPT-OSS, we can provide the reasoning effort in the system prompt, so this feature is important.