Skip to content

fix: reward profiling not require usage#824

Merged
bxyu-nvidia merged 1 commit intomainfrom
cmunley1/reward-profile-fail-gracefully
Mar 5, 2026
Merged

fix: reward profiling not require usage#824
bxyu-nvidia merged 1 commit intomainfrom
cmunley1/reward-profile-fail-gracefully

Conversation

@cmunley1
Copy link
Copy Markdown
Contributor

@cmunley1 cmunley1 commented Mar 5, 2026

(Gym) phtran@eos0065:~/lustre_phtran/Gym$ ng_collect_rollouts \
    +agent_name=verifiers_agent \
    +input_jsonl_fpath=responses_api_agents/verifiers_agent/data/acereason-math-example.jsonl \
    +output_jsonl_fpath=responses_api_agents/verifiers_agent/data/acereason-math-example-rollouts.jsonl \
    +limit=5
Limiting the number of rows to 5
Using `verifiers_agent` for rows that do not already have an agent ref
Repeating rows 1 times (in a pattern of abc to aabbcc)!
Reading rows: 4it [00:00, 12291.00it/s]
Clearing output fpath since `resume_from_cache=False`!
Collecting rollouts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:50<00:00, 10.01s/it]
Traceback (most recent call last):
  File "/lustre/fsw/coreai_dlalgo_genai/phtran/Gym/.venv/bin/ng_collect_rollouts", line 10, in <module>
    sys.exit(collect_rollouts())
             ^^^^^^^^^^^^^^^^^^
  File "/lustre/fsw/coreai_dlalgo_genai/phtran/Gym/nemo_gym/rollout_collection.py", line 357, in collect_rollouts
    asyncio.run(rch.run_from_config(config))
  File "/home/phtran/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/phtran/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/phtran/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/lustre/fsw/coreai_dlalgo_genai/phtran/Gym/nemo_gym/rollout_collection.py", line 279, in run_from_config
    group_level_metrics, agent_level_metrics = rp.profile_from_data(rows, results)
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/fsw/coreai_dlalgo_genai/phtran/Gym/nemo_gym/reward_profile.py", line 91, in profile_from_data
    result = result | result["response"].get("usage", dict())
             ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for |: 'dict' and 'NoneType'

verifiers agent is crashing due to recent change in ng collect rollouts which requires usage field from agents. this makes the fall back not error.

Signed-off-by: cmunley1 <cmunley@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 5, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cmunley1 cmunley1 changed the title fail gracefully fix: reward profiling not require usage Mar 5, 2026
@bxyu-nvidia bxyu-nvidia merged commit 545fb5f into main Mar 5, 2026
5 checks passed
@bxyu-nvidia bxyu-nvidia deleted the cmunley1/reward-profile-fail-gracefully branch March 5, 2026 03:41
MahanFathi pushed a commit that referenced this pull request Mar 24, 2026
```
(Gym) phtran@eos0065:~/lustre_phtran/Gym$ ng_collect_rollouts \
    +agent_name=verifiers_agent \
    +input_jsonl_fpath=responses_api_agents/verifiers_agent/data/acereason-math-example.jsonl \
    +output_jsonl_fpath=responses_api_agents/verifiers_agent/data/acereason-math-example-rollouts.jsonl \
    +limit=5
Limiting the number of rows to 5
Using `verifiers_agent` for rows that do not already have an agent ref
Repeating rows 1 times (in a pattern of abc to aabbcc)!
Reading rows: 4it [00:00, 12291.00it/s]
Clearing output fpath since `resume_from_cache=False`!
Collecting rollouts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:50<00:00, 10.01s/it]
Traceback (most recent call last):
  File "/lustre/fsw/coreai_dlalgo_genai/phtran/Gym/.venv/bin/ng_collect_rollouts", line 10, in <module>
    sys.exit(collect_rollouts())
             ^^^^^^^^^^^^^^^^^^
  File "/lustre/fsw/coreai_dlalgo_genai/phtran/Gym/nemo_gym/rollout_collection.py", line 357, in collect_rollouts
    asyncio.run(rch.run_from_config(config))
  File "/home/phtran/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/phtran/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/phtran/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/lustre/fsw/coreai_dlalgo_genai/phtran/Gym/nemo_gym/rollout_collection.py", line 279, in run_from_config
    group_level_metrics, agent_level_metrics = rp.profile_from_data(rows, results)
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/fsw/coreai_dlalgo_genai/phtran/Gym/nemo_gym/reward_profile.py", line 91, in profile_from_data
    result = result | result["response"].get("usage", dict())
             ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for |: 'dict' and 'NoneType'

```

verifiers agent is crashing due to recent change in ng collect rollouts
which requires usage field from agents. this makes the fall back not
error.

Signed-off-by: cmunley1 <cmunley@nvidia.com>
jsw-zorro pushed a commit to niletron/Gym that referenced this pull request Apr 7, 2026
```
(Gym) phtran@eos0065:~/lustre_phtran/Gym$ ng_collect_rollouts \
    +agent_name=verifiers_agent \
    +input_jsonl_fpath=responses_api_agents/verifiers_agent/data/acereason-math-example.jsonl \
    +output_jsonl_fpath=responses_api_agents/verifiers_agent/data/acereason-math-example-rollouts.jsonl \
    +limit=5
Limiting the number of rows to 5
Using `verifiers_agent` for rows that do not already have an agent ref
Repeating rows 1 times (in a pattern of abc to aabbcc)!
Reading rows: 4it [00:00, 12291.00it/s]
Clearing output fpath since `resume_from_cache=False`!
Collecting rollouts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:50<00:00, 10.01s/it]
Traceback (most recent call last):
  File "/lustre/fsw/coreai_dlalgo_genai/phtran/Gym/.venv/bin/ng_collect_rollouts", line 10, in <module>
    sys.exit(collect_rollouts())
             ^^^^^^^^^^^^^^^^^^
  File "/lustre/fsw/coreai_dlalgo_genai/phtran/Gym/nemo_gym/rollout_collection.py", line 357, in collect_rollouts
    asyncio.run(rch.run_from_config(config))
  File "/home/phtran/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/phtran/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/phtran/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/lustre/fsw/coreai_dlalgo_genai/phtran/Gym/nemo_gym/rollout_collection.py", line 279, in run_from_config
    group_level_metrics, agent_level_metrics = rp.profile_from_data(rows, results)
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/fsw/coreai_dlalgo_genai/phtran/Gym/nemo_gym/reward_profile.py", line 91, in profile_from_data
    result = result | result["response"].get("usage", dict())
             ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for |: 'dict' and 'NoneType'

```

verifiers agent is crashing due to recent change in ng collect rollouts
which requires usage field from agents. this makes the fall back not
error.

Signed-off-by: cmunley1 <cmunley@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants