Megatron refactor POC by ashors1 · Pull Request #1592 · NVIDIA-NeMo/RL

ashors1 · 2025-12-02T20:12:23Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Signed-off-by: ashors1 <ashors@nvidia.com>

…efactor

Signed-off-by: ashors1 <ashors@nvidia.com>

yaoyu-33 · 2025-12-03T00:39:24Z

nemo_rl/models/megatron/pipeline_parallel.py

+)
+
+
+def broadcast_object_across_pp_ranks(obj: Any) -> Any:


we might want to move utils to bridge for common testing. I think some utils here are also used in bridge. but can change later.

Good point. I removed broadcast_object_across_pp_ranks in favor of mbridge's broadcast_obj_from_pp_rank

Oops, didn't look at the mbridge code closely enough and I missed that broadcast_obj_from_pp_rank is a class method. I added broadcast_obj_from_pp_rank back to nemo-rl for now, but eventually we should plan to merge the two

yaoyu-33 · 2025-12-03T00:40:33Z

nemo_rl/models/megatron/setup.py

+    )
+
+
+def validate_model_paths(config: PolicyConfig) -> tuple[str, str, bool]:


seems to be a general util

Do you think we should move it elsewhere? I think the function itself is pretty specific to megatron and is only used during init

yaoyu-33 · 2025-12-03T00:41:52Z

nemo_rl/models/megatron/setup.py

+    model_cfg = cfg_from_pretrained.model
+    cfg_from_pretrained.logger = LoggerConfig()
+
+    # Apply parallelism settings


this overriding part needs some work, right now seems hacky and hard to maintain

Do you think the refactor should be improved, or the actual logic for overriding hyperparameters? If the latter, do you think we can address in a separate PR since this PR just focuses on refactoring the existing logic?

yaoyu-33 · 2025-12-03T00:44:16Z

nemo_rl/models/megatron/setup.py

+
+
+
+def setup_reference_model_state(


how is reference model setup diff from actor? can they share some components? right now it seems very different flow

Yeah, the flow for the two is quite different. Personally I feel like attempting to share components between the two would be somewhat contrived, because right now there isn't much obvious code duplication, but happy to discuss this further if you'd like

yaoyu-33 · 2025-12-03T00:45:25Z

nemo_rl/models/megatron/train.py

+
+    return output_tensor, processor_fn_wrapped
+
+def forward_maybe_backward(


need some docstring, looks confusing to have different levels of fwd stuff.

Added some docstrings and renamed forward_maybe_backward to megatron_forward_backward for clarity. Please take another look and let me know if things are more clear.

yaoyu-33 · 2025-12-03T00:46:59Z

nemo_rl/models/policy/workers/megatron_policy_worker.py

+            if hasattr(module, "_inference_key_value_memory"):
+                module._inference_key_value_memory = None
+
+        if gbs is None:


this fallback seems unsafe. any cases gbs here will be different from the cfg value?

Yeah, we call train when running validation for DPO, for example, and we want to allow for the possibility of using a val batch size that's different from the training batch size. Similarly for microbatch size. We could make things explicit and require that users pass gbs and mbs to train. WDYT?

yep that sounds better.

nemo_rl/models/policy/workers/megatron_policy_worker.py

Signed-off-by: ashors1 <ashors@nvidia.com>

Signed-off-by: Anna Shors <ashors@nvidia.com>

Signed-off-by: ashors1 <ashors@nvidia.com>

…efactor

github-actions · 2025-12-05T19:08:18Z

⚠️ File Consistency Check

Check based on commit: 71dea6b (PR #1592 from ashors/policy-refactor)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: ashors1 <ashors@nvidia.com>

github-actions · 2025-12-05T19:22:27Z

⚠️ File Consistency Check

Check based on commit: fea335e (PR #1592 from ashors/policy-refactor)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: ashors1 <ashors@nvidia.com>

github-actions · 2025-12-11T00:21:24Z

⚠️ File Consistency Check

Check based on commit: 1f07875 (PR #1592 from ashors/policy-refactor)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: ashors1 <ashors@nvidia.com>

ashors1 added 6 commits November 26, 2025 15:49

WIP megatron policy refactoring

c8b7cad

Signed-off-by: ashors1 <ashors@nvidia.com>

Merge branch 'main' of github.com:NVIDIA-NeMo/RL into ashors/policy-r…

9cefec5

…efactor

bug fixes, clean up, make implementation more functional

362c4a1

Signed-off-by: ashors1 <ashors@nvidia.com>

move components into modular functions

b576fff

Signed-off-by: ashors1 <ashors@nvidia.com>

refactor pipeline parallel utils

0863bc7

Signed-off-by: ashors1 <ashors@nvidia.com>

make APIs match dtensor refactor more closely

856cc5b

Signed-off-by: ashors1 <ashors@nvidia.com>

ashors1 mentioned this pull request Dec 2, 2025

MegatronPolicyWorker refactor #1593

Closed

3 tasks

yaoyu-33 reviewed Dec 3, 2025

View reviewed changes

nemo_rl/models/policy/workers/megatron_policy_worker.py Outdated Show resolved Hide resolved

ashors1 added 5 commits December 3, 2025 10:08

address some comments, add docstrings to train.py

a2764f0

Signed-off-by: ashors1 <ashors@nvidia.com>

make move_model private

7b5b1d5

Signed-off-by: ashors1 <ashors@nvidia.com>

small updates for parity with main, small bug fix

95d03a1

Signed-off-by: ashors1 <ashors@nvidia.com>

fix

9d640a9

Signed-off-by: ashors1 <ashors@nvidia.com>

fixes

e5d2497

Signed-off-by: Anna Shors <ashors@nvidia.com>

terrykong linked an issue Dec 4, 2025 that may be closed by this pull request

MegatronPolicyWorker refactor #1593

Closed

3 tasks

ashors1 added 4 commits December 4, 2025 14:00

make mcore APIs align more closely with dtensor refactor

8520e87

Signed-off-by: ashors1 <ashors@nvidia.com>

make collection functions more robust

38c73b7

Signed-off-by: ashors1 <ashors@nvidia.com>

cleanup

696df90

Signed-off-by: ashors1 <ashors@nvidia.com>

Merge branch 'main' of github.com:NVIDIA-NeMo/RL into ashors/policy-r…

71dea6b

…efactor

cleanup

fea335e

Signed-off-by: ashors1 <ashors@nvidia.com>

fix rebase

1f07875

Signed-off-by: ashors1 <ashors@nvidia.com>

address feedback

ee27c4b

Signed-off-by: ashors1 <ashors@nvidia.com>

pull microbatch processing out of the forward pass

7a14829

Signed-off-by: ashors1 <ashors@nvidia.com>

		)


		def validate_model_paths(config: PolicyConfig) -> tuple[str, str, bool]:


		return output_tensor, processor_fn_wrapped

		def forward_maybe_backward(

Conversation

ashors1 commented Dec 2, 2025

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Dec 5, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions bot commented Dec 5, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions bot commented Dec 11, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants