Closed
Conversation
…nfo (#27) Signed-off-by: Brian Yu <bxyu@nvidia.com>
updated the following logging print when running ng_prepare_data from, for example: "Found 0 agent server instance configs withOUT datasets:" to "Found 0 agent server instance configs WITHOUT datasets:" to match the format of the subsequent logs, for example: "Found 1 agent server instance configs WITH datasets:" Signed-off-by: chrismun <cmunley@nvidia.com>
update readme for resources servers for updated cli Signed-off-by: chrismun <cmunley@nvidia.com>
…item to be present (#19) Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Peter Jin <pjin@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
This change adds a new dataset for the library judge math resources server. The new dataset contains math problems from Stack Overflow. This dataset corresponds to the StackOverflow Dump Data entry in the RL verifier data tracking [spreadsheet](https://docs.google.com/spreadsheets/d/1VK4-ZonMSR-4Ulk161Au1f-nGhs4r9V9-beScboaR3I/edit?gid=0#gid=0&range=7:7). The prompts in the dataset have been formatted in the same way as the OpenMathReasoning dataset that was previously added. All the Stack Overflow problems with expected answers were included, so it may be necessary in the future to filter the problems according to difficulty based on the responses of a model. Approval for the use of this dataset is covered in this ticket: [DGPTT-96](https://jirasw.nvidia.com/browse/DGPTT-96). Signed-off-by: Damon Mosk-Aoyama <dmoskaoyama@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Address #43 and #45. Thank you to @xinyu-dev for the raises! --------- Signed-off-by: Brian Yu <bxyu@nvidia.com>
Add LLM-as-judge resources server
Introduce a resources server that uses an LLM as a judge to compare a
model’s generated answer against an expected (gold) answer.
### Details
- Only the last assistant message is graded;
- Optional regex extraction for both question and response:
- Uses the last regex match; returns the first non-empty capture group,
else the whole match.
- Optional second-pass verification to mitigate positional bias (swap
expected/generated).
### Configuration:
- judge_system_message (optional): system prompt for the judge.
- judge_prompt_template (required): must include {question},
{expected_answer}, {generated_answer}.
- judge_equal_label / judge_not_equal_label: defaults to [[A=B]] /
[[A!=B]].
- check_twice_swap (bool, default false): if true, on an initial equal
verdict, perform a second judge pass with expected/prediction swapped;
reward remains 1 only if the second pass is also equal.
- reward_if_swap_fails (float, default 0.0): reward to assign if the
second pass fails.
- Here the training framework will need to handle how to discard this
sample based on this reward.
- question_extract_regex (optional): extract from the last user message
(last match semantics).
- response_extract_regex (optional): extract from the last assistant
message (last match semantics).
The regex part can be helpful if for some reason the user decided to
train with in-context examples.
For the responses, this can be useful if the training specifically
expects the final response to be inside \boxed or \text, for instance.
### Steps
- Extracts question from the last user message; applies
question_extract_regex if provided.
- Extracts generated answer from the last assistant message; applies
response_extract_regex if provided.
- Calls the judge model with a configurable prompt and optional system
message.
- Parses the judge’s last message for judge_equal_label /
judge_not_equal_label to set reward and verdict.
### Running the server
```bash
config_paths="responses_api_models/openai_model/configs/openai_model.yaml, \
resources_servers/equivalence_llm_judge/configs/equivalence_llm_judge.yaml"
ng_run "+config_paths=[$config_paths]" \
+equivalence_llm_judge.resources_servers.equivalence_llm_judge.judge_model_server.name=openai_model
```
### Collecting rollouts
```bash
ng_collect_rollouts +agent_name=equivalence_llm_judge_simple_agent \
+input_jsonl_fpath=resources_servers/equivalence_llm_judge/data/example.jsonl \
+output_jsonl_fpath=results/example_rollouts.jsonl \
+limit=5
```
### Licensing
Code: Apache 2.0
Data: CC-BY-NC-3.0 (Examples from
https://huggingface.co/datasets/allenai/sciq)
---------
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: soares-f <soarescmsa@gmail.com>
Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>
Co-authored-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>
Co-authored-by: fsiino-nvidia <fsiino@nvidia.com>
Signed-off-by: Peter Jin <pjin@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
…44) ### Switch python_math_exec to use session cookies and fix tool usage ### Changes 1. Updated `python_math_exec` server to use built-in session cookies: - Removed manual `session_id` handling from request/response models - Now using `request.session[SESSION_ID_KEY]` for state tracking 2. Removed `simple_agent_stateful/*` ### To test, follow these steps from the README.md of `python_math_exec` 1. Download dataset ``` ng_download_dataset_from_gitlab \ +dataset_name=open_math_reasoning_problems_tool \ +version=0.0.1 \ +artifact_fpath=open_math_reasoning_problems_tool.jsonl \ +output_fpath=data/open_math_reasoning_problems_tool.jsonl ``` 2. Start server ``` ng_run "+config_paths=[responses_api_agents/simple_agent/configs/simple_agent.yaml,responses_api_models/openai_model/configs/openai_model.yaml,resources_servers/python_math_exec/configs/python_math_exec.yaml]" +simple_agent.responses_api_agents.simple_agent.resources_server.name=python_math_exec ``` 3. Collect trajectories ``` ng_collect_rollouts +agent_name=simple_agent +input_jsonl_fpath=data/open_math_reasoning_problems_tool.jsonl +output_jsonl_fpath=results/open_math_reasoning_problems_tool_output_new.jsonl +limit=1 ``` --------- Signed-off-by: Rahul Chand <rchand@nvidia.com> Co-authored-by: Rahul Chand <rchand@cw-dfw-cs-001-vscode-02.cm.cluster>
## Details
This is a resource server for sudoku text based game. The correctness is
checked programtically. The reward is returned at each step (each call
of the `make_move` function. The final reward is the sum)
**Dataset**: The dataset is proceduraly generated. But for it to be
compatible with current trajectory generation framework. We have a
`simple_sudoku.jsonl` and a script (steps below) that can be used to
create a jsonl to use for trajectory generation. This would be changed
later
## Steps to run
1. Download reference dataset
```
ng_download_dataset_to_gitlab \
+dataset_name=simple_sudoku \
+version=0.0.2 \
+artifact_fpath=simple_sudoku.jsonl \
+output_fpath=data/simple_sudoku.jsonl
```
2. Create a larger jsonl (the reference dataset just has 1 data point as
reference for how the system prompt, tool definition should look like).
Below generates 5 examples. Currently the game parameters , board size &
number of clues are selected randomly in a reasonable range (4 & 9 for
board size and btw (6,12) for board of sizse 4 and (16, 48) for board
size 9. In the future we could replace this with a parameter of how
tough we want the game)
Run below from the `simple_sudoku/` folder
```
python generate_sudoku_jsonl.py ../../data/simple_sudoku.jsonl 5 ../../data/sudoku_batch.jsonl
```
2. Start the servers (this starts the simple_game_agent, the model
server & the sudkoku environment server)
```
ng_run "+config_paths=[responses_api_agents/simple_game_agent/configs/simple_game_agent.yaml,responses_api_models/openai_model/configs/openai_model.yaml,resources_servers/simple_sudoku/configs/simple_sudoku.yaml]" +simple_game_agent.responses_api_agents.simple_game_agent.resources_server.name=simple_sudoku
```
3. Start trajectory collection
```
ng_collect_traj +agent_name=simple_game_agent +input_jsonl_fpath=data/sudoku_batch.jsonl +output_jsonl_fpath=results/sudoku_output_NEWLOGIC.jsonl
```
---
**Example Rollouts**:
```
{"responses_create_params": {"input": [], "tools": [{"name": "make_move", "parameters": {"type": "object", "properties": {"row": {"type": "integer", "description": "Row number (1-4)"}, "col": {"type": "integer", "description": "Column number (1-4)"}, "number": {"type": "integer", "description": "Digit to place (1-4)"}}, "required": ["row", "col", "number"], "additionalProperties": false}, "strict": true, "type": "function", "description": "Place a digit in the Sudoku grid"}]}, "response": {"id": "resp_68bd93aa0e9081948ee1fb98147b354701b662383935ba9f", "created_at": 1757254570.0, "error": null, "incomplete_details": null, "instructions": null, "metadata": {}, "model": "gpt-4.1-2025-04-14", "object": "response", "output": [{"arguments": "{\"row\":2,\"col\":3,\"number\":3}", "call_id": "call_HRb8CVBJzjVXbqiYvDNf63yS", "name": "make_move", "type": "function_call", "id": "fc_68bd93a1faec8194b1669db531b87b8a039ed859b3babbe5", "status": "completed"}, {"call_id": "call_HRb8CVBJzjVXbqiYvDNf63yS", "output": "{\"success\": false, \"message\": \"Incorrect move: R2 C3 = 3 violates Sudoku rules\", \"game_state\": {\"current_board\": [[0, 4, 2, 0], [0, 0, 0, 1], [4, 1, 3, 0], [3, 2, 0, 0]], \"solution\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"scale\": 4, \"initial_empty_count\": 8, \"moves_made\": 1, \"correct_moves\": 0}, \"board_text\": \" C1 C2 C3 C4 \\nR1 . 4 | 2 .\\nR2 . . | . 1\\n - - - - - - \\nR3 4 1 | 3 .\\nR4 3 2 | . .\", \"is_complete\": false, \"move_reward\": 0.0}", "type": "function_call_output", "id": null, "status": null}, {"arguments": "{\"row\":1,\"col\":1,\"number\":1}", "call_id": "call_hM2UMng5iSWnuET4aMpeKIoJ", "name": "make_move", "type": "function_call", "id": "fc_68bd93a31d648193b06f9923513b16640b983aa011463905", "status": "completed"}, {"call_id": "call_hM2UMng5iSWnuET4aMpeKIoJ", "output": "{\"success\": true, \"message\": \"Correct move! R1 C1 = 1\", \"game_state\": {\"current_board\": [[1, 4, 2, 0], [0, 0, 0, 1], [4, 1, 3, 0], [3, 2, 0, 0]], \"solution\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"scale\": 4, \"initial_empty_count\": 8, \"moves_made\": 2, \"correct_moves\": 1}, \"board_text\": \" C1 C2 C3 C4 \\nR1 1 4 | 2 .\\nR2 . . | . 1\\n - - - - - - \\nR3 4 1 | 3 .\\nR4 3 2 | . .\", \"is_complete\": false, \"move_reward\": 0.0}", "type": "function_call_output", "id": null, "status": null}, {"arguments": "{\"row\":1,\"col\":4,\"number\":3}", "call_id": "call_Q866NsiyJeD807NjpnQxJk0V", "name": "make_move", "type": "function_call", "id": "fc_68bd93a44ac081978ee4664c6853523e0e28ca231f6afdf1", "status": "completed"}, {"call_id": "call_Q866NsiyJeD807NjpnQxJk0V", "output": "{\"success\": true, \"message\": \"Correct move! R1 C4 = 3\", \"game_state\": {\"current_board\": [[1, 4, 2, 3], [0, 0, 0, 1], [4, 1, 3, 0], [3, 2, 0, 0]], \"solution\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"scale\": 4, \"initial_empty_count\": 8, \"moves_made\": 3, \"correct_moves\": 2}, \"board_text\": \" C1 C2 C3 C4 \\nR1 1 4 | 2 3\\nR2 . . | . 1\\n - - - - - - \\nR3 4 1 | 3 .\\nR4 3 2 | . .\", \"is_complete\": false, \"move_reward\": 0.0}", "type": "function_call_output", "id": null, "status": null}, {"arguments": "{\"row\":2,\"col\":1,\"number\":2}", "call_id": "call_giYxUnyIL7BLZCp7hKHPxHn9", "name": "make_move", "type": "function_call", "id": "fc_68bd93a5838c8196b7ac194f34837bfb039cf8ecc424db03", "status": "completed"}, {"call_id": "call_giYxUnyIL7BLZCp7hKHPxHn9", "output": "{\"success\": true, \"message\": \"Correct move! R2 C1 = 2\", \"game_state\": {\"current_board\": [[1, 4, 2, 3], [2, 0, 0, 1], [4, 1, 3, 0], [3, 2, 0, 0]], \"solution\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"scale\": 4, \"initial_empty_count\": 8, \"moves_made\": 4, \"correct_moves\": 3}, \"board_text\": \" C1 C2 C3 C4 \\nR1 1 4 | 2 3\\nR2 2 . | . 1\\n - - - - - - \\nR3 4 1 | 3 .\\nR4 3 2 | . .\", \"is_complete\": false, \"move_reward\": 0.0}", "type": "function_call_output", "id": null, "status": null}, {"arguments": "{\"row\":2,\"col\":2,\"number\":3}", "call_id": "call_y9bTBhRd2FvRDadiap5nNujC", "name": "make_move", "type": "function_call", "id": "fc_68bd93a6908c8194962f9d783ef9c08405b4c0e16f7cb6be", "status": "completed"}, {"call_id": "call_y9bTBhRd2FvRDadiap5nNujC", "output": "{\"success\": true, \"message\": \"Correct move! R2 C2 = 3\", \"game_state\": {\"current_board\": [[1, 4, 2, 3], [2, 3, 0, 1], [4, 1, 3, 0], [3, 2, 0, 0]], \"solution\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"scale\": 4, \"initial_empty_count\": 8, \"moves_made\": 5, \"correct_moves\": 4}, \"board_text\": \" C1 C2 C3 C4 \\nR1 1 4 | 2 3\\nR2 2 3 | . 1\\n - - - - - - \\nR3 4 1 | 3 .\\nR4 3 2 | . .\", \"is_complete\": false, \"move_reward\": 0.0}", "type": "function_call_output", "id": null, "status": null}, {"arguments": "{\"row\":2,\"col\":3,\"number\":4}", "call_id": "call_YV4txJW0g9sgS5UZpS0cnzTC", "name": "make_move", "type": "function_call", "id": "fc_68bd93a780408197a8f194c0c7253d3e07d834853ce8c529", "status": "completed"}, {"call_id": "call_YV4txJW0g9sgS5UZpS0cnzTC", "output": "{\"success\": true, \"message\": \"Correct move! R2 C3 = 4\", \"game_state\": {\"current_board\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 0], [3, 2, 0, 0]], \"solution\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"scale\": 4, \"initial_empty_count\": 8, \"moves_made\": 6, \"correct_moves\": 5}, \"board_text\": \" C1 C2 C3 C4 \\nR1 1 4 | 2 3\\nR2 2 3 | 4 1\\n - - - - - - \\nR3 4 1 | 3 .\\nR4 3 2 | . .\", \"is_complete\": false, \"move_reward\": 0.0}", "type": "function_call_output", "id": null, "status": null}, {"arguments": "{\"row\":3,\"col\":4,\"number\":2}", "call_id": "call_tOUxWELZGfigplBp5lEqQLvs", "name": "make_move", "type": "function_call", "id": "fc_68bd93a895148197bfb733d4c5af173f0db6c67fa5f6bb70", "status": "completed"}, {"call_id": "call_tOUxWELZGfigplBp5lEqQLvs", "output": "{\"success\": true, \"message\": \"Correct move! R3 C4 = 2\", \"game_state\": {\"current_board\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 0, 0]], \"solution\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"scale\": 4, \"initial_empty_count\": 8, \"moves_made\": 7, \"correct_moves\": 6}, \"board_text\": \" C1 C2 C3 C4 \\nR1 1 4 | 2 3\\nR2 2 3 | 4 1\\n - - - - - - \\nR3 4 1 | 3 2\\nR4 3 2 | . .\", \"is_complete\": false, \"move_reward\": 0.0}", "type": "function_call_output", "id": null, "status": null}, {"arguments": "{\"row\":4,\"col\":3,\"number\":1}", "call_id": "call_JxdEvztCmWNb0oxvgDzqFJFV", "name": "make_move", "type": "function_call", "id": "fc_68bd93a992a88193a5a0bf428d294140050e29fd449a66b6", "status": "completed"}, {"call_id": "call_JxdEvztCmWNb0oxvgDzqFJFV", "output": "{\"success\": true, \"message\": \"Correct move! R4 C3 = 1\", \"game_state\": {\"current_board\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 0]], \"solution\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"scale\": 4, \"initial_empty_count\": 8, \"moves_made\": 8, \"correct_moves\": 7}, \"board_text\": \" C1 C2 C3 C4 \\nR1 1 4 | 2 3\\nR2 2 3 | 4 1\\n - - - - - - \\nR3 4 1 | 3 2\\nR4 3 2 | 1 .\", \"is_complete\": false, \"move_reward\": 0.0}", "type": "function_call_output", "id": null, "status": null}, {"arguments": "{\"row\":4,\"col\":4,\"number\":4}", "call_id": "call_FAcCh4KDrE657ryKXNgZL695", "name": "make_move", "type": "function_call", "id": "fc_68bd93ab49f481949fa445b5d6ad069501b662383935ba9f", "status": "completed"}, {"call_id": "call_FAcCh4KDrE657ryKXNgZL695", "output": "{\"success\": true, \"message\": \"Correct move! R4 C4 = 4. Congratulations! Puzzle completed!\", \"game_state\": {\"current_board\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"solution\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"scale\": 4, \"initial_empty_count\": 8, \"moves_made\": 9, \"correct_moves\": 8}, \"board_text\": \" C1 C2 C3 C4 \\nR1 1 4 | 2 3\\nR2 2 3 | 4 1\\n - - - - - - \\nR3 4 1 | 3 2\\nR4 3 2 | 1 4\", \"is_complete\": true, \"move_reward\": 1.0}", "type": "function_call_output", "id": null, "status": null}], "parallel_tool_calls": false, "temperature": 1.0, "tool_choice": "auto", "tools": [{"name": "make_move", "parameters": {"type": "object", "properties": {"row": {"type": "integer", "description": "Row number (1-4)"}, "col": {"type": "integer", "description": "Column number (1-4)"}, "number": {"type": "integer", "description": "Digit to place (1-4)"}}, "required": ["row", "col", "number"], "additionalProperties": false}, "strict": true, "type": "function", "description": "Place a digit in the Sudoku grid"}], "top_p": 1.0, "background": false, "max_output_tokens": null, "max_tool_calls": 1, "previous_response_id": null, "prompt": null, "reasoning": {"effort": null, "generate_summary": null, "summary": null}, "service_tier": "default", "status": "completed", "text": {"format": {"type": "text"}, "verbosity": "medium"}, "top_logprobs": 0, "truncation": "disabled", "usage": {"input_tokens": 1103, "input_tokens_details": {"cached_tokens": 0}, "output_tokens": 23, "output_tokens_details": {"reasoning_tokens": 0}, "total_tokens": 1126}, "user": null, "prompt_cache_key": null, "safety_identifier": null, "store": true, "input": [{"role": "user", "type": "message", "content": "You are playing a simple version of Sudoku.\nEach row is numbered from 1 to 4, and each column is also numbered from 1 to 4.\nEmpty cells are represented by '.', and pre-filled cells contain digits from 1 to 4.\n\nYour objective is to fill the empty cells in the 4x4 grid with digits from 1 to 4 such that:\n1. Each row contains all digits from 1 to 4 without repetition.\n2. Each column contains all digits from 1 to 4 without repetition.\n3. Each 2x2 subgrid contains all digits from 1 to 4 without repetition.\n\nRules and Instructions:\n1. **Do not overwrite** the initial numbers provided in the grid.\n2. **Only fill** empty cells represented by '.'.\n3. You must respond with the format '\\boxed{row column number}', e.g. \\boxed{1 1 5}.\n4. **Ensure** that your move does not violate Sudoku rules. Invalid moves will result in penalties.\nUse the make_move function to submit your moves. Good luck!\n\n\n\n C1 C2 C3 C4 \nR1 . 4 | 2 .\nR2 . . | . 1\n - - - - - - \nR3 4 1 | 3 .\nR4 3 2 | . ."}]}, "reward": 1.0, "total_moves": 9, "is_complete": true}
```
---
---------
Signed-off-by: Rahul Chand <rchand@nvidia.com>
Reverts #30 Signed-off-by: Brian Yu <bxyu@nvidia.com>
This change updates the train_data_utils via `ng_prepare_data` to apply data aggregations to the other keys within an `example.jsonl`. file. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>
…mprovements (#77) Signed-off-by: Brian Yu <bxyu@nvidia.com>
…ning (#66) Remove unnecessary Github Action CI and add uv config to enable dependency scanning * This project's current CI doesn't need to build and test through a docker image. So, deleting the unnecssary CI Dockerfile and Github Actions template * Adding `managed = true` under `[uv.tool]` to allow for repo dependency scanning --------- Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
This implements a simple rounding rule for `AvgMinMax` floats in order to keep example_metrics consistent. For background, the addition of median and std dev did not assign a ceiling for decimal places, so trivial value differences such as `1.2 != 1.200002` caused ValueErrors. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…dd uvicorn logging filtering (#79) Signed-off-by: Brian Yu <bxyu@nvidia.com>
From now on, any Github repo under the Nvidia-NeMo org will use the default Issues system / behavior (unless overwritten i.e. in this case the individual repo would have a .github/ISSUE_TEMPLATE folder in the repo itself, effectively overriding the default behavior coming from the NVIDIA-NeMo/.github repo [which establishes the default system]). Default behavior for Issues creation now is : - There is a Bug Report issue - There is a Feature Request issue - Blank issues are NOT allowed to be created
# Add `num_repeats` hyperparameter for dataset repetition
## Summary
Adds optional `num_repeats` parameter to `DatasetConfig` that allows
repeating each dataset sample during training data processing and
preparation.
## Changes
- **Config**: Added `num_repeats: Optional[int] = Field(default=None,
ge=1)` to `DatasetConfig`
- **Processing**: Modified `_iter_dataset_lines()` to repeat each line
`num_repeats` times (defaults to 1)
- **Integration**: Updated data validation, metrics aggregation, and
preparation workflows to handle repeated samples
- **Documentation**: Updated README with usage examples
## Usage
```yaml
datasets:
- name: train
type: train
jsonl_fpath: data/train.jsonl
num_repeats: 3 # Each sample appears 3 times during processing
```
## Testing
Added comprehensive unit tests covering:
- Configuration validation (accepts positive integers, rejects invalid
values)
- Data iteration with different repeat values
- Metrics aggregation with repeated samples
- Data preparation workflow integration
All existing functionality preserved with backward compatibility
(defaults to 1 repetition).
---------
Signed-off-by: Mahan Fathi <mfathi@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Co-authored-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> Co-authored-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>
will add back later with Nano 3 recipe --------- Signed-off-by: Chris Wing <cwing@nvidia.com>
adds docs for trl integration see #371 --------- Signed-off-by: cmunley1 <cmunley@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
- Replace intro with clearer scope statement - Add tip linking to README for existing environments - Add verification method as fifth environment property with link to concepts page - Improve explanatory text for rollout structure and core capabilities tables - Add section headers for tables - Remove duplicate Verification Methods table (covered in concepts) - Remove Reference Implementations table (covered in README) - Remove Learning Path section - Update environment tutorial card descriptions and add multi-environment card to docs home --------- Signed-off-by: Chris Wing <cwing@nvidia.com>
Fixes #670 - Fix #-available-resource-servers anchor to #-available-environments across 5 files - Remove Next Steps section from unsloth tutorial - Improve environment card descriptions Signed-off-by: Chris Wing <cwing@nvidia.com>
no content changes just style guide application Signed-off-by: Lawrence Lane <llane@nvidia.com>
…ting Started (#721) - Remove first-training-run.md (duplicated unsloth tutorial content) - Add "Start Training" as recommended next step in quickstart, rollout collection - Add training bullet to README next steps - Remove first-training-run from toctree and landing page Signed-off-by: Chris Wing <cwing@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
bxyu-nvidia
previously approved these changes
Feb 17, 2026
Signed-off-by: Lawrence Lane <llane@nvidia.com>
auto-merge was automatically disabled
March 10, 2026 23:08
Pull request was closed
9d4ed48 to
aa52c7e
Compare
Contributor
|
Sorry folks, this PR was mistakenly closed when one of our folks mistakenly force-pushed diverging refs to Github. We are looking to remedy this and re-open the PR |
Contributor
|
Replacement PR opened here: #880 |
bxyu-nvidia
pushed a commit
that referenced
this pull request
Mar 20, 2026
MahanFathi
pushed a commit
that referenced
this pull request
Mar 24, 2026
jsw-zorro
pushed a commit
to niletron/Gym
that referenced
this pull request
Apr 7, 2026
Replacement PR for: NVIDIA-NeMo#722 Contributors: @lbliii --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.