feat: improve research template by yuki-97 · Pull Request #2094 · NVIDIA-NeMo/RL

yuki-97 · 2026-03-10T10:28:07Z

Support research project to add custom functions to policy workers by using extension class.

Summary by CodeRabbit

Release Notes

New Features
- Added support for custom worker extensions via optional worker_extension_cls parameter in Policy initialization
- Introduced new method to execute operations across all distributed workers with unified input data
- Added worker extension example demonstrating rank and device information retrieval
Documentation
- Enhanced template README with detailed setup instructions and required artifacts checklist
- Updated example workflow to demonstrate custom worker extension integration

Signed-off-by: Yuki Huang <yukih@nvidia.com>

copy-pr-bot · 2026-03-10T10:28:15Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

yuki-97 · 2026-03-10T10:28:47Z

/ok to test c8cf4c1

github-actions · 2026-03-10T10:28:48Z

ℹ️ File Consistency Check

Check based on commit: c8cf4c1 (PR #2094 from yukih/research-template)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

coderabbitai · 2026-03-10T10:38:12Z

📝 Walkthrough

Walkthrough

The changes introduce environment-specific actor class identifier support in worker group management, add a new method for executing named methods across all workers with shared input data, refactor worker class implementations to separate Ray remote constraints from concrete logic, and expand documentation and examples for research templates with new worker extension patterns.

Changes

Cohort / File(s)	Summary
Distributed Worker Management `nemo_rl/distributed/worker_groups.py`	Added support for environment-specific actor class identifier (ray_actor_class_fqn_for_env) in RayWorkerBuilder, used for deriving Python environments and creating virtual environments while preserving fallback to ray_actor_class_fqn.
Policy Core & Worker Method Execution `nemo_rl/models/policy/lm_policy.py`	Added optional worker_extension_cls parameter to override default worker builder, introduced run_all_workers_single_data method to execute named methods across all workers with shared input, and added worker_builder_cls_for_env for environment-specific wiring.
Worker Implementation Restructuring `nemo_rl/models/policy/workers/dtensor_policy_worker.py`, `nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py`, `nemo_rl/models/policy/workers/megatron_policy_worker.py`	Refactored worker classes to separate concrete implementation (Impl classes) from Ray remote constraints; Ray remote classes now inherit from Impl classes while maintaining public API compatibility.
Research Documentation `research/README.md`, `research/template_project/README.md`	Expanded research templates README with itemized descriptions of required artifacts, examples, and test locations; updated template project flow to integrate extension-driven policy pattern with worker extension class initialization.
Template Implementation & Example `research/template_project/single_update.py`, `research/template_project/template_project/worker_extension.py`	Added worker extension parameter to Policy initialization and introduced DTensorPolicyWorkerV2Extension class with get_worker_info method supporting parallel-data execution across workers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR contains major architectural changes and new worker extension features but lacks any test results, performance metrics, or verification information in the description.	Add test results, performance benchmarks, or verification data to the PR description demonstrating that refactored worker classes maintain compatibility and new extension capability functions correctly.
Title check	❓ Inconclusive	The PR title 'feat: improve research template' is vague and does not clearly convey the specific technical changes. The main focus is adding worker extension support for policy workers, not just improving the research template.	Revise the title to be more specific, e.g., 'feat: add worker extension support to policy workers' or 'feat: support custom worker extensions in research template'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 81.25% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch yukih/research-template

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

research/template_project/single_update.py (1)
86-92: ⚠️ Potential issue | 🟠 Major

Use an extension FQN that is importable under the documented invocation.

Line 91 hard-codes research.template_project.template_project.worker_extension.DTensorPolicyWorkerV2Extension, but this is inconsistent with the script's own imports (line 34 uses from template_project.data_utils import ...) and the documented invocation method (uv run single_update.py from the research/template_project directory). When the script is run as documented, template_project.* is directly importable, but research.* is not in the module search path. Ray worker processes will fail to load the extension with ModuleNotFoundError: No module named 'research' unless the repo root is manually added to PYTHONPATH.
Suggested fix
-        worker_extension_cls="research.template_project.template_project.worker_extension.DTensorPolicyWorkerV2Extension",
+        worker_extension_cls="template_project.worker_extension.DTensorPolicyWorkerV2Extension",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@research/template_project/single_update.py` around lines 86 - 92, The
hard-coded extension FQN in the Policy constructor is not importable when
running from the documented working directory; change the worker_extension_cls
value passed to Policy (the Policy(...) instantiation) from
"research.template_project.template_project.worker_extension.DTensorPolicyWorkerV2Extension"
to the package-relative import used elsewhere, e.g.
"template_project.worker_extension.DTensorPolicyWorkerV2Extension", so the
worker extension can be imported when running `uv run single_update.py`.

🧹 Nitpick comments (1)

nemo_rl/models/policy/lm_policy.py (1)
136-142: Fail fast on invalid extension classes.

This currently prints an advisory but defers all validation to Ray startup. Importing the extension class up front and checking issubclass(...) against the resolved base worker would turn typos and incompatible extensions into a much clearer local error.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_rl/models/policy/lm_policy.py` around lines 136 - 142, When a
worker_extension_cls is provided, validate it immediately by checking
issubclass(worker_extension_cls, worker_builder_cls) and raise a clear TypeError
if the check fails instead of merely printing; on success set
worker_builder_cls_for_env = worker_builder_cls and then override
worker_builder_cls = worker_extension_cls so downstream code uses the validated
extension. Ensure the error message names the offending worker_extension_cls and
the expected base worker_builder_cls to aid debugging.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py`:
- Around line 1129-1133: The coverage pragma is placed on the `@ray.remote`
decorator closing line instead of the class definition; move the "# pragma: no
cover" comment to the class line for DTensorPolicyWorkerV2 so the class
declaration reads with the pragma (i.e., ensure the coverage comment is attached
to the "class DTensorPolicyWorkerV2(DTensorPolicyWorkerV2Impl):" line), leaving
the
`@ray.remote`(runtime_env=get_runtime_env_for_policy_worker("dtensor_policy_worker_v2"))
decorator unchanged.

In `@research/README.md`:
- Around line 28-30: The documentation currently swaps the repository references
for example config paths; update the README sentence that mentions
`research/template_project/configs` and `examples/configs` so they point to the
correct repository examples: `research/template_project/configs` is the research
template example and `examples/configs` is the core repository example—edit the
text near the "Configuration" section to swap the descriptions so each path is
labeled with its correct source.

---

Outside diff comments:
In `@research/template_project/single_update.py`:
- Around line 86-92: The hard-coded extension FQN in the Policy constructor is
not importable when running from the documented working directory; change the
worker_extension_cls value passed to Policy (the Policy(...) instantiation) from
"research.template_project.template_project.worker_extension.DTensorPolicyWorkerV2Extension"
to the package-relative import used elsewhere, e.g.
"template_project.worker_extension.DTensorPolicyWorkerV2Extension", so the
worker extension can be imported when running `uv run single_update.py`.

---

Nitpick comments:
In `@nemo_rl/models/policy/lm_policy.py`:
- Around line 136-142: When a worker_extension_cls is provided, validate it
immediately by checking issubclass(worker_extension_cls, worker_builder_cls) and
raise a clear TypeError if the check fails instead of merely printing; on
success set worker_builder_cls_for_env = worker_builder_cls and then override
worker_builder_cls = worker_extension_cls so downstream code uses the validated
extension. Ensure the error message names the offending worker_extension_cls and
the expected base worker_builder_cls to aid debugging.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 973e0925-c2cd-4c67-9924-08830faf7698

📥 Commits

Reviewing files that changed from the base of the PR and between 280d3aa and c8cf4c1.

📒 Files selected for processing (9)

nemo_rl/distributed/worker_groups.py
nemo_rl/models/policy/lm_policy.py
nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
nemo_rl/models/policy/workers/megatron_policy_worker.py
research/README.md
research/template_project/README.md
research/template_project/single_update.py
research/template_project/template_project/worker_extension.py

nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

research/README.md

github-actions · 2026-03-10T10:57:12Z

ℹ️ File Consistency Check

Check based on commit: cd05258 (PR #2094 from yukih/research-template)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

yuki-97 · 2026-03-10T10:57:36Z

/ok to test cd05258

Signed-off-by: Yuki Huang <yukih@nvidia.com>

github-actions · 2026-03-10T11:17:27Z

ℹ️ File Consistency Check

Check based on commit: 4361031 (PR #2094 from yukih/research-template)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

yuki-97 · 2026-03-10T11:18:05Z

/ok to test 4361031

Signed-off-by: Yuki Huang <yukih@nvidia.com>

github-actions · 2026-03-10T15:39:52Z

ℹ️ File Consistency Check

Check based on commit: ade0c15 (PR #2094 from yukih/research-template)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

support worker extension and add example

8cee10d

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 requested review from a team and terrykong as code owners March 10, 2026 10:28

yuki-97 added the CI:L1 Run doctests, unit tests, and functional tests label Mar 10, 2026

copy-pr-bot bot had a problem deploying to nemo-ci March 10, 2026 10:29 Error

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py Show resolved Hide resolved

research/README.md Outdated Show resolved Hide resolved

copy-pr-bot bot had a problem deploying to nemo-ci March 10, 2026 10:58 Error

yuki-97 added 3 commits March 10, 2026 04:16

update doc

1d6ca41

Signed-off-by: Yuki Huang <yukih@nvidia.com>

add ray_actor_class_fqn_for_env and fix ray class

a23e660

Signed-off-by: Yuki Huang <yukih@nvidia.com>

add run_all_workers_multiple_data

4361031

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 force-pushed the yukih/research-template branch from cd05258 to 4361031 Compare March 10, 2026 11:16

copy-pr-bot bot temporarily deployed to nemo-ci March 10, 2026 11:18 Inactive

copy-pr-bot bot deployed to nemo-ci March 10, 2026 12:30 Active

copy-pr-bot bot had a problem deploying to nemo-ci March 10, 2026 14:02 Failure

update some comments

ade0c15

Signed-off-by: Yuki Huang <yukih@nvidia.com>

Conversation

yuki-97 commented Mar 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Mar 10, 2026

Uh oh!

yuki-97 commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

coderabbitai bot commented Mar 10, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 10, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

yuki-97 commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

yuki-97 commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yuki-97 commented Mar 10, 2026 •

edited by coderabbitai bot

Loading