Skip to content

feat: improve research template#2094

Open
yuki-97 wants to merge 5 commits intomainfrom
yukih/research-template
Open

feat: improve research template#2094
yuki-97 wants to merge 5 commits intomainfrom
yukih/research-template

Conversation

@yuki-97
Copy link
Contributor

@yuki-97 yuki-97 commented Mar 10, 2026

Support research project to add custom functions to policy workers by using extension class.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for custom worker extensions via optional worker_extension_cls parameter in Policy initialization
    • Introduced new method to execute operations across all distributed workers with unified input data
    • Added worker extension example demonstrating rank and device information retrieval
  • Documentation

    • Enhanced template README with detailed setup instructions and required artifacts checklist
    • Updated example workflow to demonstrate custom worker extension integration

Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 requested review from a team and terrykong as code owners March 10, 2026 10:28
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yuki-97 yuki-97 added the CI:L1 Run doctests, unit tests, and functional tests label Mar 10, 2026
@yuki-97
Copy link
Contributor Author

yuki-97 commented Mar 10, 2026

/ok to test c8cf4c1

@github-actions
Copy link

ℹ️ File Consistency Check

Check based on commit: c8cf4c1 (PR #2094 from yukih/research-template)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

The changes introduce environment-specific actor class identifier support in worker group management, add a new method for executing named methods across all workers with shared input data, refactor worker class implementations to separate Ray remote constraints from concrete logic, and expand documentation and examples for research templates with new worker extension patterns.

Changes

Cohort / File(s) Summary
Distributed Worker Management
nemo_rl/distributed/worker_groups.py
Added support for environment-specific actor class identifier (ray_actor_class_fqn_for_env) in RayWorkerBuilder, used for deriving Python environments and creating virtual environments while preserving fallback to ray_actor_class_fqn.
Policy Core & Worker Method Execution
nemo_rl/models/policy/lm_policy.py
Added optional worker_extension_cls parameter to override default worker builder, introduced run_all_workers_single_data method to execute named methods across all workers with shared input, and added worker_builder_cls_for_env for environment-specific wiring.
Worker Implementation Restructuring
nemo_rl/models/policy/workers/dtensor_policy_worker.py, nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py, nemo_rl/models/policy/workers/megatron_policy_worker.py
Refactored worker classes to separate concrete implementation (Impl classes) from Ray remote constraints; Ray remote classes now inherit from Impl classes while maintaining public API compatibility.
Research Documentation
research/README.md, research/template_project/README.md
Expanded research templates README with itemized descriptions of required artifacts, examples, and test locations; updated template project flow to integrate extension-driven policy pattern with worker extension class initialization.
Template Implementation & Example
research/template_project/single_update.py, research/template_project/template_project/worker_extension.py
Added worker extension parameter to Policy initialization and introduced DTensorPolicyWorkerV2Extension class with get_worker_info method supporting parallel-data execution across workers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR contains major architectural changes and new worker extension features but lacks any test results, performance metrics, or verification information in the description. Add test results, performance benchmarks, or verification data to the PR description demonstrating that refactored worker classes maintain compatibility and new extension capability functions correctly.
Title check ❓ Inconclusive The PR title 'feat: improve research template' is vague and does not clearly convey the specific technical changes. The main focus is adding worker extension support for policy workers, not just improving the research template. Revise the title to be more specific, e.g., 'feat: add worker extension support to policy workers' or 'feat: support custom worker extensions in research template'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 81.25% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch yukih/research-template

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
research/template_project/single_update.py (1)

86-92: ⚠️ Potential issue | 🟠 Major

Use an extension FQN that is importable under the documented invocation.

Line 91 hard-codes research.template_project.template_project.worker_extension.DTensorPolicyWorkerV2Extension, but this is inconsistent with the script's own imports (line 34 uses from template_project.data_utils import ...) and the documented invocation method (uv run single_update.py from the research/template_project directory). When the script is run as documented, template_project.* is directly importable, but research.* is not in the module search path. Ray worker processes will fail to load the extension with ModuleNotFoundError: No module named 'research' unless the repo root is manually added to PYTHONPATH.

Suggested fix
-        worker_extension_cls="research.template_project.template_project.worker_extension.DTensorPolicyWorkerV2Extension",
+        worker_extension_cls="template_project.worker_extension.DTensorPolicyWorkerV2Extension",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@research/template_project/single_update.py` around lines 86 - 92, The
hard-coded extension FQN in the Policy constructor is not importable when
running from the documented working directory; change the worker_extension_cls
value passed to Policy (the Policy(...) instantiation) from
"research.template_project.template_project.worker_extension.DTensorPolicyWorkerV2Extension"
to the package-relative import used elsewhere, e.g.
"template_project.worker_extension.DTensorPolicyWorkerV2Extension", so the
worker extension can be imported when running `uv run single_update.py`.
🧹 Nitpick comments (1)
nemo_rl/models/policy/lm_policy.py (1)

136-142: Fail fast on invalid extension classes.

This currently prints an advisory but defers all validation to Ray startup. Importing the extension class up front and checking issubclass(...) against the resolved base worker would turn typos and incompatible extensions into a much clearer local error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_rl/models/policy/lm_policy.py` around lines 136 - 142, When a
worker_extension_cls is provided, validate it immediately by checking
issubclass(worker_extension_cls, worker_builder_cls) and raise a clear TypeError
if the check fails instead of merely printing; on success set
worker_builder_cls_for_env = worker_builder_cls and then override
worker_builder_cls = worker_extension_cls so downstream code uses the validated
extension. Ensure the error message names the offending worker_extension_cls and
the expected base worker_builder_cls to aid debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py`:
- Around line 1129-1133: The coverage pragma is placed on the `@ray.remote`
decorator closing line instead of the class definition; move the "# pragma: no
cover" comment to the class line for DTensorPolicyWorkerV2 so the class
declaration reads with the pragma (i.e., ensure the coverage comment is attached
to the "class DTensorPolicyWorkerV2(DTensorPolicyWorkerV2Impl):" line), leaving
the
`@ray.remote`(runtime_env=get_runtime_env_for_policy_worker("dtensor_policy_worker_v2"))
decorator unchanged.

In `@research/README.md`:
- Around line 28-30: The documentation currently swaps the repository references
for example config paths; update the README sentence that mentions
`research/template_project/configs` and `examples/configs` so they point to the
correct repository examples: `research/template_project/configs` is the research
template example and `examples/configs` is the core repository example—edit the
text near the "Configuration" section to swap the descriptions so each path is
labeled with its correct source.

---

Outside diff comments:
In `@research/template_project/single_update.py`:
- Around line 86-92: The hard-coded extension FQN in the Policy constructor is
not importable when running from the documented working directory; change the
worker_extension_cls value passed to Policy (the Policy(...) instantiation) from
"research.template_project.template_project.worker_extension.DTensorPolicyWorkerV2Extension"
to the package-relative import used elsewhere, e.g.
"template_project.worker_extension.DTensorPolicyWorkerV2Extension", so the
worker extension can be imported when running `uv run single_update.py`.

---

Nitpick comments:
In `@nemo_rl/models/policy/lm_policy.py`:
- Around line 136-142: When a worker_extension_cls is provided, validate it
immediately by checking issubclass(worker_extension_cls, worker_builder_cls) and
raise a clear TypeError if the check fails instead of merely printing; on
success set worker_builder_cls_for_env = worker_builder_cls and then override
worker_builder_cls = worker_extension_cls so downstream code uses the validated
extension. Ensure the error message names the offending worker_extension_cls and
the expected base worker_builder_cls to aid debugging.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 973e0925-c2cd-4c67-9924-08830faf7698

📥 Commits

Reviewing files that changed from the base of the PR and between 280d3aa and c8cf4c1.

📒 Files selected for processing (9)
  • nemo_rl/distributed/worker_groups.py
  • nemo_rl/models/policy/lm_policy.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • nemo_rl/models/policy/workers/megatron_policy_worker.py
  • research/README.md
  • research/template_project/README.md
  • research/template_project/single_update.py
  • research/template_project/template_project/worker_extension.py

@github-actions
Copy link

ℹ️ File Consistency Check

Check based on commit: cd05258 (PR #2094 from yukih/research-template)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@yuki-97
Copy link
Contributor Author

yuki-97 commented Mar 10, 2026

/ok to test cd05258

yuki-97 added 3 commits March 10, 2026 04:16
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 force-pushed the yukih/research-template branch from cd05258 to 4361031 Compare March 10, 2026 11:16
@github-actions
Copy link

ℹ️ File Consistency Check

Check based on commit: 4361031 (PR #2094 from yukih/research-template)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@yuki-97
Copy link
Contributor Author

yuki-97 commented Mar 10, 2026

/ok to test 4361031

Signed-off-by: Yuki Huang <yukih@nvidia.com>
@github-actions
Copy link

ℹ️ File Consistency Check

Check based on commit: ade0c15 (PR #2094 from yukih/research-template)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant