fix(rm): raise clear error when CP is used with DTensor RM training#2483
Merged
Conversation
Collaborator
Author
|
/ok to test 01993fb |
yuki-97
reviewed
May 13, 2026
Contributor
yuki-97
left a comment
There was a problem hiding this comment.
One issue: test_context_parallel_allowed_when_one uses a broken negative lookahead regex that will not catch regressions.
…sor RM training Context parallelism (context_parallel_size > 1) is not supported for reward model training on the DTensor backend because the log_sigmoid operator lacks a DTensor sharding strategy for CP meshes. Instead of letting users hit cryptic runtime errors, raise a clear ValueError during setup with a link to the tracking issue. Signed-off-by: Terry Kong <terryk@nvidia.com>
4157ff0 to
d318709
Compare
Collaborator
Author
|
/ok to test d318709 |
The NeMo Gym docs URL returns 404, causing sphinx-build CI to fail. Add the URL pattern to linkcheck_ignore since the external docs site is not under our control. Signed-off-by: Terry Kong <terryk@nvidia.com>
Collaborator
Author
|
/ok to test 9c93d8d |
Replace the blanket linkcheck_ignore for all NeMo Gym docs with pinned v0.2.1 URLs so linkcheck still validates them. Signed-off-by: Terry Kong <terryk@nvidia.com>
Collaborator
Author
|
/ok to test 4c24d9e |
yuki-97
approved these changes
May 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
rm.setup()that raises a clearValueErrorwhencontext_parallel_size > 1is set for reward model training on the DTensor backendTODO(https://github.com/NVIDIA-NeMo/RL/issues/2482)for easy cleanup when CP support is implementedContext
Related to #2482 — this PR does not implement CP support for RM training, it only surfaces a clear error message instead of letting users hit cryptic runtime failures (
Unknown parallel style: local_rowwiseorNotImplementedError: Operator aten.log_sigmoid_forward.default does not have a sharding strategy registered).Test plan
uv run --group test pytest tests/unit/algorithms/test_rm.py -vpassestest_context_parallel_rejected_for_dtensor_rmverifies theValueErrorfires when CP > 1test_context_parallel_allowed_when_oneverifies CP=1 passes the validation