docs: version bump, CTA link changes by lbliii · Pull Request #722 · NVIDIA-NeMo/Gym

lbliii · 2026-02-17T19:23:13Z

No description provided.

…nfo (#27) Signed-off-by: Brian Yu <bxyu@nvidia.com>

updated the following logging print when running ng_prepare_data from, for example: "Found 0 agent server instance configs withOUT datasets:" to "Found 0 agent server instance configs WITHOUT datasets:" to match the format of the subsequent logs, for example: "Found 1 agent server instance configs WITH datasets:" Signed-off-by: chrismun <cmunley@nvidia.com>

update readme for resources servers for updated cli Signed-off-by: chrismun <cmunley@nvidia.com>

…item to be present (#19) Signed-off-by: Brian Yu <bxyu@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Signed-off-by: Peter Jin <pjin@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com>

This change adds a new dataset for the library judge math resources server. The new dataset contains math problems from Stack Overflow. This dataset corresponds to the StackOverflow Dump Data entry in the RL verifier data tracking [spreadsheet](https://docs.google.com/spreadsheets/d/1VK4-ZonMSR-4Ulk161Au1f-nGhs4r9V9-beScboaR3I/edit?gid=0#gid=0&range=7:7). The prompts in the dataset have been formatted in the same way as the OpenMathReasoning dataset that was previously added. All the Stack Overflow problems with expected answers were included, so it may be necessary in the future to filter the problems according to difficulty based on the responses of a model. Approval for the use of this dataset is covered in this ticket: [DGPTT-96](https://jirasw.nvidia.com/browse/DGPTT-96). Signed-off-by: Damon Mosk-Aoyama <dmoskaoyama@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com>

@xinyu-dev

Address #43 and #45. Thank you to @xinyu-dev for the raises! --------- Signed-off-by: Brian Yu <bxyu@nvidia.com>

Add LLM-as-judge resources server Introduce a resources server that uses an LLM as a judge to compare a model’s generated answer against an expected (gold) answer. ### Details - Only the last assistant message is graded; - Optional regex extraction for both question and response: - Uses the last regex match; returns the first non-empty capture group, else the whole match. - Optional second-pass verification to mitigate positional bias (swap expected/generated). ### Configuration: - judge_system_message (optional): system prompt for the judge. - judge_prompt_template (required): must include {question}, {expected_answer}, {generated_answer}. - judge_equal_label / judge_not_equal_label: defaults to [[A=B]] / [[A!=B]]. - check_twice_swap (bool, default false): if true, on an initial equal verdict, perform a second judge pass with expected/prediction swapped; reward remains 1 only if the second pass is also equal. - reward_if_swap_fails (float, default 0.0): reward to assign if the second pass fails. - Here the training framework will need to handle how to discard this sample based on this reward. - question_extract_regex (optional): extract from the last user message (last match semantics). - response_extract_regex (optional): extract from the last assistant message (last match semantics). The regex part can be helpful if for some reason the user decided to train with in-context examples. For the responses, this can be useful if the training specifically expects the final response to be inside \boxed or \text, for instance. ### Steps - Extracts question from the last user message; applies question_extract_regex if provided. - Extracts generated answer from the last assistant message; applies response_extract_regex if provided. - Calls the judge model with a configurable prompt and optional system message. - Parses the judge’s last message for judge_equal_label / judge_not_equal_label to set reward and verdict. ### Running the server ```bash config_paths="responses_api_models/openai_model/configs/openai_model.yaml, \ resources_servers/equivalence_llm_judge/configs/equivalence_llm_judge.yaml" ng_run "+config_paths=[$config_paths]" \ +equivalence_llm_judge.resources_servers.equivalence_llm_judge.judge_model_server.name=openai_model ``` ### Collecting rollouts ```bash ng_collect_rollouts +agent_name=equivalence_llm_judge_simple_agent \ +input_jsonl_fpath=resources_servers/equivalence_llm_judge/data/example.jsonl \ +output_jsonl_fpath=results/example_rollouts.jsonl \ +limit=5 ``` ### Licensing Code: Apache 2.0 Data: CC-BY-NC-3.0 (Examples from https://huggingface.co/datasets/allenai/sciq) --------- Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: soares-f <soarescmsa@gmail.com> Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com> Co-authored-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> Co-authored-by: fsiino-nvidia <fsiino@nvidia.com>

Signed-off-by: Peter Jin <pjin@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com>

…44) ### Switch python_math_exec to use session cookies and fix tool usage ### Changes 1. Updated `python_math_exec` server to use built-in session cookies: - Removed manual `session_id` handling from request/response models - Now using `request.session[SESSION_ID_KEY]` for state tracking 2. Removed `simple_agent_stateful/*` ### To test, follow these steps from the README.md of `python_math_exec` 1. Download dataset ``` ng_download_dataset_from_gitlab \ +dataset_name=open_math_reasoning_problems_tool \ +version=0.0.1 \ +artifact_fpath=open_math_reasoning_problems_tool.jsonl \ +output_fpath=data/open_math_reasoning_problems_tool.jsonl ``` 2. Start server ``` ng_run "+config_paths=[responses_api_agents/simple_agent/configs/simple_agent.yaml,responses_api_models/openai_model/configs/openai_model.yaml,resources_servers/python_math_exec/configs/python_math_exec.yaml]" +simple_agent.responses_api_agents.simple_agent.resources_server.name=python_math_exec ``` 3. Collect trajectories ``` ng_collect_rollouts +agent_name=simple_agent +input_jsonl_fpath=data/open_math_reasoning_problems_tool.jsonl +output_jsonl_fpath=results/open_math_reasoning_problems_tool_output_new.jsonl +limit=1 ``` --------- Signed-off-by: Rahul Chand <rchand@nvidia.com> Co-authored-by: Rahul Chand <rchand@cw-dfw-cs-001-vscode-02.cm.cluster>

## Details This is a resource server for sudoku text based game. The correctness is checked programtically. The reward is returned at each step (each call of the `make_move` function. The final reward is the sum) **Dataset**: The dataset is proceduraly generated. But for it to be compatible with current trajectory generation framework. We have a `simple_sudoku.jsonl` and a script (steps below) that can be used to create a jsonl to use for trajectory generation. This would be changed later ## Steps to run 1. Download reference dataset ``` ng_download_dataset_to_gitlab \ +dataset_name=simple_sudoku \ +version=0.0.2 \ +artifact_fpath=simple_sudoku.jsonl \ +output_fpath=data/simple_sudoku.jsonl ``` 2. Create a larger jsonl (the reference dataset just has 1 data point as reference for how the system prompt, tool definition should look like). Below generates 5 examples. Currently the game parameters , board size & number of clues are selected randomly in a reasonable range (4 & 9 for board size and btw (6,12) for board of sizse 4 and (16, 48) for board size 9. In the future we could replace this with a parameter of how tough we want the game) Run below from the `simple_sudoku/` folder ``` python generate_sudoku_jsonl.py ../../data/simple_sudoku.jsonl 5 ../../data/sudoku_batch.jsonl ``` 2. Start the servers (this starts the simple_game_agent, the model server & the sudkoku environment server) ``` ng_run "+config_paths=[responses_api_agents/simple_game_agent/configs/simple_game_agent.yaml,responses_api_models/openai_model/configs/openai_model.yaml,resources_servers/simple_sudoku/configs/simple_sudoku.yaml]" +simple_game_agent.responses_api_agents.simple_game_agent.resources_server.name=simple_sudoku ``` 3. Start trajectory collection ``` ng_collect_traj +agent_name=simple_game_agent +input_jsonl_fpath=data/sudoku_batch.jsonl +output_jsonl_fpath=results/sudoku_output_NEWLOGIC.jsonl ``` --- **Example Rollouts**: ``` {"responses_create_params": {"input": [], "tools": [{"name": "make_move", "parameters": {"type": "object", "properties": {"row": {"type": "integer", "description": "Row number (1-4)"}, "col": {"type": "integer", "description": "Column number (1-4)"}, "number": {"type": "integer", "description": "Digit to place (1-4)"}}, "required": ["row", "col", "number"], "additionalProperties": false}, "strict": true, "type": "function", "description": "Place a digit in the Sudoku grid"}]}, "response": {"id": "resp_68bd93aa0e9081948ee1fb98147b354701b662383935ba9f", "created_at": 1757254570.0, "error": null, "incomplete_details": null, "instructions": null, "metadata": {}, "model": "gpt-4.1-2025-04-14", "object": "response", "output": [{"arguments": "{\"row\":2,\"col\":3,\"number\":3}", "call_id": "call_HRb8CVBJzjVXbqiYvDNf63yS", "name": "make_move", "type": "function_call", "id": "fc_68bd93a1faec8194b1669db531b87b8a039ed859b3babbe5", "status": "completed"}, {"call_id": "call_HRb8CVBJzjVXbqiYvDNf63yS", "output": "{\"success\": false, \"message\": \"Incorrect move: R2 C3 = 3 violates Sudoku rules\", \"game_state\": {\"current_board\": [[0, 4, 2, 0], [0, 0, 0, 1], [4, 1, 3, 0], [3, 2, 0, 0]], \"solution\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"scale\": 4, \"initial_empty_count\": 8, \"moves_made\": 1, \"correct_moves\": 0}, \"board_text\": \" C1 C2 C3 C4 \\nR1 . 4 | 2 .\\nR2 . . | . 1\\n - - - - - - \\nR3 4 1 | 3 .\\nR4 3 2 | . .\", \"is_complete\": false, \"move_reward\": 0.0}", "type": "function_call_output", "id": null, "status": null}, {"arguments": "{\"row\":1,\"col\":1,\"number\":1}", "call_id": "call_hM2UMng5iSWnuET4aMpeKIoJ", "name": "make_move", "type": "function_call", "id": "fc_68bd93a31d648193b06f9923513b16640b983aa011463905", "status": "completed"}, {"call_id": "call_hM2UMng5iSWnuET4aMpeKIoJ", "output": "{\"success\": true, \"message\": \"Correct move! R1 C1 = 1\", \"game_state\": {\"current_board\": [[1, 4, 2, 0], [0, 0, 0, 1], [4, 1, 3, 0], [3, 2, 0, 0]], \"solution\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"scale\": 4, \"initial_empty_count\": 8, \"moves_made\": 2, \"correct_moves\": 1}, \"board_text\": \" C1 C2 C3 C4 \\nR1 1 4 | 2 .\\nR2 . . | . 1\\n - - - - - - \\nR3 4 1 | 3 .\\nR4 3 2 | . .\", \"is_complete\": false, \"move_reward\": 0.0}", "type": "function_call_output", "id": null, "status": null}, {"arguments": "{\"row\":1,\"col\":4,\"number\":3}", "call_id": "call_Q866NsiyJeD807NjpnQxJk0V", "name": "make_move", "type": "function_call", "id": "fc_68bd93a44ac081978ee4664c6853523e0e28ca231f6afdf1", "status": "completed"}, {"call_id": "call_Q866NsiyJeD807NjpnQxJk0V", "output": "{\"success\": true, \"message\": \"Correct move! R1 C4 = 3\", \"game_state\": {\"current_board\": [[1, 4, 2, 3], [0, 0, 0, 1], [4, 1, 3, 0], [3, 2, 0, 0]], \"solution\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"scale\": 4, \"initial_empty_count\": 8, \"moves_made\": 3, \"correct_moves\": 2}, \"board_text\": \" C1 C2 C3 C4 \\nR1 1 4 | 2 3\\nR2 . . | . 1\\n - - - - - - \\nR3 4 1 | 3 .\\nR4 3 2 | . .\", \"is_complete\": false, \"move_reward\": 0.0}", "type": "function_call_output", "id": null, "status": null}, {"arguments": "{\"row\":2,\"col\":1,\"number\":2}", "call_id": "call_giYxUnyIL7BLZCp7hKHPxHn9", "name": "make_move", "type": "function_call", "id": "fc_68bd93a5838c8196b7ac194f34837bfb039cf8ecc424db03", "status": "completed"}, {"call_id": "call_giYxUnyIL7BLZCp7hKHPxHn9", "output": "{\"success\": true, \"message\": \"Correct move! R2 C1 = 2\", \"game_state\": {\"current_board\": [[1, 4, 2, 3], [2, 0, 0, 1], [4, 1, 3, 0], [3, 2, 0, 0]], \"solution\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"scale\": 4, \"initial_empty_count\": 8, \"moves_made\": 4, \"correct_moves\": 3}, \"board_text\": \" C1 C2 C3 C4 \\nR1 1 4 | 2 3\\nR2 2 . | . 1\\n - - - - - - \\nR3 4 1 | 3 .\\nR4 3 2 | . .\", \"is_complete\": false, \"move_reward\": 0.0}", "type": "function_call_output", "id": null, "status": null}, {"arguments": "{\"row\":2,\"col\":2,\"number\":3}", "call_id": "call_y9bTBhRd2FvRDadiap5nNujC", "name": "make_move", "type": "function_call", "id": "fc_68bd93a6908c8194962f9d783ef9c08405b4c0e16f7cb6be", "status": "completed"}, {"call_id": "call_y9bTBhRd2FvRDadiap5nNujC", "output": "{\"success\": true, \"message\": \"Correct move! R2 C2 = 3\", \"game_state\": {\"current_board\": [[1, 4, 2, 3], [2, 3, 0, 1], [4, 1, 3, 0], [3, 2, 0, 0]], \"solution\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"scale\": 4, \"initial_empty_count\": 8, \"moves_made\": 5, \"correct_moves\": 4}, \"board_text\": \" C1 C2 C3 C4 \\nR1 1 4 | 2 3\\nR2 2 3 | . 1\\n - - - - - - \\nR3 4 1 | 3 .\\nR4 3 2 | . .\", \"is_complete\": false, \"move_reward\": 0.0}", "type": "function_call_output", "id": null, "status": null}, {"arguments": "{\"row\":2,\"col\":3,\"number\":4}", "call_id": "call_YV4txJW0g9sgS5UZpS0cnzTC", "name": "make_move", "type": "function_call", "id": "fc_68bd93a780408197a8f194c0c7253d3e07d834853ce8c529", "status": "completed"}, {"call_id": "call_YV4txJW0g9sgS5UZpS0cnzTC", "output": "{\"success\": true, \"message\": \"Correct move! R2 C3 = 4\", \"game_state\": {\"current_board\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 0], [3, 2, 0, 0]], \"solution\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"scale\": 4, \"initial_empty_count\": 8, \"moves_made\": 6, \"correct_moves\": 5}, \"board_text\": \" C1 C2 C3 C4 \\nR1 1 4 | 2 3\\nR2 2 3 | 4 1\\n - - - - - - \\nR3 4 1 | 3 .\\nR4 3 2 | . .\", \"is_complete\": false, \"move_reward\": 0.0}", "type": "function_call_output", "id": null, "status": null}, {"arguments": "{\"row\":3,\"col\":4,\"number\":2}", "call_id": "call_tOUxWELZGfigplBp5lEqQLvs", "name": "make_move", "type": "function_call", "id": "fc_68bd93a895148197bfb733d4c5af173f0db6c67fa5f6bb70", "status": "completed"}, {"call_id": "call_tOUxWELZGfigplBp5lEqQLvs", "output": "{\"success\": true, \"message\": \"Correct move! R3 C4 = 2\", \"game_state\": {\"current_board\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 0, 0]], \"solution\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"scale\": 4, \"initial_empty_count\": 8, \"moves_made\": 7, \"correct_moves\": 6}, \"board_text\": \" C1 C2 C3 C4 \\nR1 1 4 | 2 3\\nR2 2 3 | 4 1\\n - - - - - - \\nR3 4 1 | 3 2\\nR4 3 2 | . .\", \"is_complete\": false, \"move_reward\": 0.0}", "type": "function_call_output", "id": null, "status": null}, {"arguments": "{\"row\":4,\"col\":3,\"number\":1}", "call_id": "call_JxdEvztCmWNb0oxvgDzqFJFV", "name": "make_move", "type": "function_call", "id": "fc_68bd93a992a88193a5a0bf428d294140050e29fd449a66b6", "status": "completed"}, {"call_id": "call_JxdEvztCmWNb0oxvgDzqFJFV", "output": "{\"success\": true, \"message\": \"Correct move! R4 C3 = 1\", \"game_state\": {\"current_board\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 0]], \"solution\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"scale\": 4, \"initial_empty_count\": 8, \"moves_made\": 8, \"correct_moves\": 7}, \"board_text\": \" C1 C2 C3 C4 \\nR1 1 4 | 2 3\\nR2 2 3 | 4 1\\n - - - - - - \\nR3 4 1 | 3 2\\nR4 3 2 | 1 .\", \"is_complete\": false, \"move_reward\": 0.0}", "type": "function_call_output", "id": null, "status": null}, {"arguments": "{\"row\":4,\"col\":4,\"number\":4}", "call_id": "call_FAcCh4KDrE657ryKXNgZL695", "name": "make_move", "type": "function_call", "id": "fc_68bd93ab49f481949fa445b5d6ad069501b662383935ba9f", "status": "completed"}, {"call_id": "call_FAcCh4KDrE657ryKXNgZL695", "output": "{\"success\": true, \"message\": \"Correct move! R4 C4 = 4. Congratulations! Puzzle completed!\", \"game_state\": {\"current_board\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"solution\": [[1, 4, 2, 3], [2, 3, 4, 1], [4, 1, 3, 2], [3, 2, 1, 4]], \"scale\": 4, \"initial_empty_count\": 8, \"moves_made\": 9, \"correct_moves\": 8}, \"board_text\": \" C1 C2 C3 C4 \\nR1 1 4 | 2 3\\nR2 2 3 | 4 1\\n - - - - - - \\nR3 4 1 | 3 2\\nR4 3 2 | 1 4\", \"is_complete\": true, \"move_reward\": 1.0}", "type": "function_call_output", "id": null, "status": null}], "parallel_tool_calls": false, "temperature": 1.0, "tool_choice": "auto", "tools": [{"name": "make_move", "parameters": {"type": "object", "properties": {"row": {"type": "integer", "description": "Row number (1-4)"}, "col": {"type": "integer", "description": "Column number (1-4)"}, "number": {"type": "integer", "description": "Digit to place (1-4)"}}, "required": ["row", "col", "number"], "additionalProperties": false}, "strict": true, "type": "function", "description": "Place a digit in the Sudoku grid"}], "top_p": 1.0, "background": false, "max_output_tokens": null, "max_tool_calls": 1, "previous_response_id": null, "prompt": null, "reasoning": {"effort": null, "generate_summary": null, "summary": null}, "service_tier": "default", "status": "completed", "text": {"format": {"type": "text"}, "verbosity": "medium"}, "top_logprobs": 0, "truncation": "disabled", "usage": {"input_tokens": 1103, "input_tokens_details": {"cached_tokens": 0}, "output_tokens": 23, "output_tokens_details": {"reasoning_tokens": 0}, "total_tokens": 1126}, "user": null, "prompt_cache_key": null, "safety_identifier": null, "store": true, "input": [{"role": "user", "type": "message", "content": "You are playing a simple version of Sudoku.\nEach row is numbered from 1 to 4, and each column is also numbered from 1 to 4.\nEmpty cells are represented by '.', and pre-filled cells contain digits from 1 to 4.\n\nYour objective is to fill the empty cells in the 4x4 grid with digits from 1 to 4 such that:\n1. Each row contains all digits from 1 to 4 without repetition.\n2. Each column contains all digits from 1 to 4 without repetition.\n3. Each 2x2 subgrid contains all digits from 1 to 4 without repetition.\n\nRules and Instructions:\n1. **Do not overwrite** the initial numbers provided in the grid.\n2. **Only fill** empty cells represented by '.'.\n3. You must respond with the format '\\boxed{row column number}', e.g. \\boxed{1 1 5}.\n4. **Ensure** that your move does not violate Sudoku rules. Invalid moves will result in penalties.\nUse the make_move function to submit your moves. Good luck!\n\n\n\n C1 C2 C3 C4 \nR1 . 4 | 2 .\nR2 . . | . 1\n - - - - - - \nR3 4 1 | 3 .\nR4 3 2 | . ."}]}, "reward": 1.0, "total_moves": 9, "is_complete": true} ``` --- --------- Signed-off-by: Rahul Chand <rchand@nvidia.com>

Reverts #30 Signed-off-by: Brian Yu <bxyu@nvidia.com>

This change updates the train_data_utils via `ng_prepare_data` to apply data aggregations to the other keys within an `example.jsonl`. file. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>

) Signed-off-by: Brian Yu <bxyu@nvidia.com>

…mprovements (#77) Signed-off-by: Brian Yu <bxyu@nvidia.com>

…ning (#66) Remove unnecessary Github Action CI and add uv config to enable dependency scanning * This project's current CI doesn't need to build and test through a docker image. So, deleting the unnecssary CI Dockerfile and Github Actions template * Adding `managed = true` under `[uv.tool]` to allow for repo dependency scanning --------- Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com>

This implements a simple rounding rule for `AvgMinMax` floats in order to keep example_metrics consistent. For background, the addition of median and std dev did not assign a ceiling for decimal places, so trivial value differences such as `1.2 != 1.200002` caused ValueErrors. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>

…dd uvicorn logging filtering (#79) Signed-off-by: Brian Yu <bxyu@nvidia.com>

From now on, any Github repo under the Nvidia-NeMo org will use the default Issues system / behavior (unless overwritten i.e. in this case the individual repo would have a .github/ISSUE_TEMPLATE folder in the repo itself, effectively overriding the default behavior coming from the NVIDIA-NeMo/.github repo [which establishes the default system]). Default behavior for Issues creation now is : - There is a Bug Report issue - There is a Feature Request issue - Blank issues are NOT allowed to be created

# Add `num_repeats` hyperparameter for dataset repetition ## Summary Adds optional `num_repeats` parameter to `DatasetConfig` that allows repeating each dataset sample during training data processing and preparation. ## Changes - **Config**: Added `num_repeats: Optional[int] = Field(default=None, ge=1)` to `DatasetConfig` - **Processing**: Modified `_iter_dataset_lines()` to repeat each line `num_repeats` times (defaults to 1) - **Integration**: Updated data validation, metrics aggregation, and preparation workflows to handle repeated samples - **Documentation**: Updated README with usage examples ## Usage ```yaml datasets: - name: train type: train jsonl_fpath: data/train.jsonl num_repeats: 3 # Each sample appears 3 times during processing ``` ## Testing Added comprehensive unit tests covering: - Configuration validation (accepts positive integers, rejects invalid values) - Data iteration with different repeat values - Metrics aggregation with repeated samples - Data preparation workflow integration All existing functionality preserved with backward compatibility (defaults to 1 repetition). --------- Signed-off-by: Mahan Fathi <mfathi@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> Co-authored-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>

will add back later with Nano 3 recipe --------- Signed-off-by: Chris Wing <cwing@nvidia.com>

adds docs for trl integration see #371 --------- Signed-off-by: cmunley1 <cmunley@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com>

- Replace intro with clearer scope statement - Add tip linking to README for existing environments - Add verification method as fifth environment property with link to concepts page - Improve explanatory text for rollout structure and core capabilities tables - Add section headers for tables - Remove duplicate Verification Methods table (covered in concepts) - Remove Reference Implementations table (covered in README) - Remove Learning Path section - Update environment tutorial card descriptions and add multi-environment card to docs home --------- Signed-off-by: Chris Wing <cwing@nvidia.com>

Fixes #670 - Fix #-available-resource-servers anchor to #-available-environments across 5 files - Remove Next Steps section from unsloth tutorial - Improve environment card descriptions Signed-off-by: Chris Wing <cwing@nvidia.com>

no content changes just style guide application Signed-off-by: Lawrence Lane <llane@nvidia.com>

…ting Started (#721) - Remove first-training-run.md (duplicated unsloth tutorial content) - Add "Start Training" as recommended next step in quickstart, rollout collection - Add training bullet to README next steps - Remove first-training-run from toctree and landing page Signed-off-by: Chris Wing <cwing@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com>

Signed-off-by: Lawrence Lane <llane@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Signed-off-by: Lawrence Lane <llane@nvidia.com>

copy-pr-bot · 2026-03-10T23:08:47Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

bxyu-nvidia · 2026-03-11T00:26:46Z

Sorry folks, this PR was mistakenly closed when one of our folks mistakenly force-pushed diverging refs to Github. We are looking to remedy this and re-open the PR

vadam5 · 2026-03-11T03:09:52Z

Replacement PR opened here: #880

@lbliii

Replacement PR for: #722 Contributors: @lbliii --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com>

@lbliii

Replacement PR for: #722 Contributors: @lbliii --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com>

@lbliii

Replacement PR for: NVIDIA-NeMo#722 Contributors: @lbliii --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com>

kbhardwaj-nvidia and others added 30 commits September 8, 2025 20:13

Update Comp-Coding README.md (#26)

0ebd762

Docs improvements - remove Why NeMo Gym section and add CI/CD tests i…

b590d40

…nfo (#27) Signed-off-by: Brian Yu <bxyu@nvidia.com>

update readmes from ng_collect_traj to ng_collect_rollouts (#25)

323b66c

update readme for resources servers for updated cli Signed-off-by: chrismun <cmunley@nvidia.com>

Simple agent stop criteria requires no tool calls AND output message …

b9e2d02

…item to be present (#19) Signed-off-by: Brian Yu <bxyu@nvidia.com>

Server spinup polling (#31)

fd7a93a

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Rename top-level config key 'openai_model' => 'policy_model' (#33)

685498c

Signed-off-by: Peter Jin <pjin@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com>

Simple agent allows non-json tool responses (#35)

a00ea78

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Multi-verifier docs (#36)

654b782

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Servers have easy hooks into individual instances via session (#24)

49e7e14

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Add Workbench validation dataset (#46)

a7b72bc

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Docs update (#47)

0bbe84a

Address #43 and #45. Thank you to @xinyu-dev for the raises! --------- Signed-off-by: Brian Yu <bxyu@nvidia.com>

Configure global httpx client (#50)

2d996b8

Signed-off-by: Peter Jin <pjin@nvidia.com>

Fix OpenAI ResponseReasoningItem.status property (#54)

df2772d

Signed-off-by: Brian Yu <bxyu@nvidia.com>

VLLMModel data parallel; explicit RunHelper shutdown handle (#52)

8c0eed5

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Revert "Migrate text_based_game: sudoku and game agent features" (#65)

7945f3f

Reverts #30 Signed-off-by: Brian Yu <bxyu@nvidia.com>

Instantiate one httpx async client per unique connection / base url (#75

f66c220

) Signed-off-by: Brian Yu <bxyu@nvidia.com>

Swap async http backend from httpx to aiohttp; various server infra i…

77e93f5

…mprovements (#77) Signed-off-by: Brian Yu <bxyu@nvidia.com>

VLLMModel fix whitespace stripping and unwarranted spaces (#70)

722e7fc

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Add profiling; improve rollout collection usability and efficiency; a…

6c2943b

…dd uvicorn logging filtering (#79) Signed-off-by: Brian Yu <bxyu@nvidia.com>

Comp coding fixes; lots of misc infra items (#90)

0a504d5

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> Co-authored-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>

cwing-nvidia and others added 9 commits February 15, 2026 21:40

Remove model-recipes section from docs (#710)

16d0205

will add back later with Nano 3 recipe --------- Signed-off-by: Chris Wing <cwing@nvidia.com>

docs: TRL integration (#602)

ddf388e

adds docs for trl integration see #371 --------- Signed-off-by: cmunley1 <cmunley@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com>

docs: Environment properties (#713)

560e329

Signed-off-by: Brian Yu <bxyu@nvidia.com>

docs: Clean environment properties (#714)

b45f7cb

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Style Guide Sweep (#717)

93e1282

no content changes just style guide application Signed-off-by: Lawrence Lane <llane@nvidia.com>

version bump

82482d9

Signed-off-by: Lawrence Lane <llane@nvidia.com>

lbliii self-assigned this Feb 17, 2026

remove setting

fb71a7a

Signed-off-by: Lawrence Lane <llane@nvidia.com>

lbliii changed the title ~~version bump~~ docs: version bump, CTA link changes Feb 17, 2026

lbliii requested review from bxyu-nvidia and cwing-nvidia February 17, 2026 19:25

docs: FAQ: inference.nvidia.com has no response diversity (#718)

00439ee

Signed-off-by: Brian Yu <bxyu@nvidia.com>

bxyu-nvidia previously approved these changes Feb 17, 2026

View reviewed changes

lbliii added 2 commits February 17, 2026 14:55

Merge branch 'main' into llane/docs-remove-link

83ca37e

remove 0.2.0 from version dropdown

aa52c7e

Signed-off-by: Lawrence Lane <llane@nvidia.com>

lbliii dismissed bxyu-nvidia’s stale review via 9d4ed48 February 17, 2026 20:14

lbliii enabled auto-merge (squash) February 17, 2026 20:45

vadam5 closed this Mar 10, 2026

auto-merge was automatically disabled March 10, 2026 23:08
Pull request was closed

vadam5 force-pushed the llane/docs-remove-link branch from 9d4ed48 to aa52c7e Compare March 10, 2026 23:08

vadam5 mentioned this pull request Mar 11, 2026

docs: version bump, CTA link changes #880

Merged

bxyu-nvidia pushed a commit that referenced this pull request Mar 20, 2026

docs: version bump, CTA link changes (#880)

4206a26

Replacement PR for: #722 Contributors: @lbliii --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com>

MahanFathi pushed a commit that referenced this pull request Mar 24, 2026

docs: version bump, CTA link changes (#880)

61e22ae

Replacement PR for: #722 Contributors: @lbliii --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: version bump, CTA link changes#722

docs: version bump, CTA link changes#722
lbliii wants to merge 276 commits intomainfrom
llane/docs-remove-link

lbliii commented Feb 17, 2026

Uh oh!

copy-pr-bot bot commented Mar 10, 2026

Uh oh!

bxyu-nvidia commented Mar 11, 2026

Uh oh!

vadam5 commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

lbliii commented Feb 17, 2026

Uh oh!

copy-pr-bot bot commented Mar 10, 2026

Uh oh!

bxyu-nvidia commented Mar 11, 2026

Uh oh!

vadam5 commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants