feat: add lora config for dpo dtensor backend by RayenTian · Pull Request #1826 · NVIDIA-NeMo/RL

RayenTian · 2026-01-26T02:43:32Z

closes #1680

Result

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

New Features
- Introduced Low-Rank Adaptation (LoRA) configuration options with customizable parameters for model optimization
Tests
- Expanded functional test coverage with new DPO workflow tests including LoRA-based scenarios to ensure reliability

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-26T06:42:36Z

📝 Walkthrough

Walkthrough

This PR adds LoRA configuration options to the DPO example configuration file and introduces new functional GPU tests, including a test script for LoRA-based automodel DPO training with comprehensive test orchestration including metric validation.

Changes

Cohort / File(s)	Summary
LoRA Configuration `examples/configs/dpo.yaml`	Adds new LoRA settings block under `policy.dtensor_cfg` with parameters: enabled, target_modules, exclude_modules, match_all_linear, dim, alpha, dropout, dropout_position, lora_A_init, and use_triton
Test Registration `tests/functional/L1_Functional_Tests_GPU.sh`	Registers two new functional tests: `dpo_automodel_lora.sh` and `dpo_megatron.sh` in the GPU test suite execution sequence
New Test Script `tests/functional/dpo_automodel_lora.sh`	Implements end-to-end DPO LoRA test orchestration: environment setup, runs DPO automodel training with LoRA enabled, captures metrics to JSON, and validates training loss at step 3 is below 0.8

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

feat: Megatron SFT LoRA #1629 — Adds and expands LoRA configuration under policy.dtensor_cfg and introduces LoRA-focused functional tests

Suggested labels

CI:L1

Suggested reviewers

terrykong
yuki-97
joyang-nv

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR introduces major changes but PR description lacks documented test results, convergence validation, or performance information despite adding functional tests.	Update PR description with: (1) Results from running dpo_automodel_lora.sh and dpo_megatron.sh tests including training loss and convergence, (2) Confirmation of no regression in numerics/convergence, (3) Performance measurements. Fix bash syntax error in line 11: change set -eou pipefail to set -euo pipefail.
Title check	❓ Inconclusive	The PR title mentions adding a LoRA config for DPO dtensor backend, which aligns with the primary change (adding LoRA configuration to dpo.yaml), but conflicts with PR objectives that describe SFT (supervised fine-tuning) configuration.	Clarify whether the PR is for LoRA or SFT configuration. The title says 'lora config' but the PR objectives mention 'SFT config.' Update the title and objectives to be consistent.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ruit/dpo_lora

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@tests/functional/dpo_automodel_lora.sh`:
- Line 11: The shell script uses an incorrectly ordered set invocation "set -eou
pipefail" where the -o flag's argument must immediately follow, causing an
invalid option; update the command to "set -euo pipefail" so -e and -u are
enabled and -o pipefail is applied (replace the existing set -eou pipefail
invocation).

tests/functional/dpo_automodel_lora.sh

Signed-off-by: ruit <ruit@nvidia.com>

yuki-97 · 2026-02-03T13:57:45Z

thanks a lot for helping fix run_grpo_math.py stuffs in #1841 for newly added scripts!

RayenTian marked this pull request as ready for review January 26, 2026 06:39

RayenTian requested review from a team as code owners January 26, 2026 06:39

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Jan 26, 2026

RayenTian requested a review from yuki-97 January 26, 2026 06:39

RayenTian had a problem deploying to nemo-ci January 26, 2026 06:39 — with GitHub Actions Error

RayenTian requested a review from terrykong January 26, 2026 06:39

RayenTian removed the CI:L1 Run doctests, unit tests, and functional tests label Jan 26, 2026

RayenTian force-pushed the ruit/dpo_lora branch from 991d1d1 to 13788ac Compare January 26, 2026 06:41

RayenTian changed the title ~~feat: add sft config for dpo dtensor backend~~ feat: add lora config for dpo dtensor backend Jan 26, 2026

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Jan 26, 2026

RayenTian temporarily deployed to nemo-ci January 26, 2026 06:47 — with GitHub Actions Inactive

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

tests/functional/dpo_automodel_lora.sh Show resolved Hide resolved

yuki-97 previously approved these changes Jan 26, 2026

View reviewed changes

RayenTian temporarily deployed to nemo-ci January 26, 2026 09:02 — with GitHub Actions Inactive

RayenTian had a problem deploying to nemo-ci January 26, 2026 16:23 — with GitHub Actions Failure

RayenTian dismissed yuki-97’s stale review via 5294b63 January 28, 2026 05:59

RayenTian force-pushed the ruit/dpo_lora branch from 13788ac to 5294b63 Compare January 28, 2026 05:59

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jan 28, 2026

RayenTian temporarily deployed to nemo-ci January 28, 2026 06:00 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci January 28, 2026 06:05 — with GitHub Actions Inactive

yuki-97 previously approved these changes Jan 28, 2026

View reviewed changes

RayenTian temporarily deployed to nemo-ci January 28, 2026 12:55 — with GitHub Actions Inactive

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jan 29, 2026

RayenTian temporarily deployed to nemo-ci January 29, 2026 03:39 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci January 29, 2026 05:10 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci January 29, 2026 08:42 — with GitHub Actions Inactive

terrykong previously approved these changes Feb 2, 2026

View reviewed changes

terrykong enabled auto-merge (squash) February 2, 2026 22:00

terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 2, 2026

RayenTian added 2 commits February 2, 2026 18:42

add sft config for dpo dtensor backend

8451470

Signed-off-by: ruit <ruit@nvidia.com>

add functional test

d5dd46c

Signed-off-by: ruit <ruit@nvidia.com>

RayenTian dismissed stale reviews from terrykong and yuki-97 via d5dd46c February 3, 2026 02:44

RayenTian force-pushed the ruit/dpo_lora branch from 52e0de3 to d5dd46c Compare February 3, 2026 02:44

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 3, 2026

RayenTian requested review from terrykong and yuki-97 February 3, 2026 02:45

RayenTian temporarily deployed to nemo-ci February 3, 2026 02:45 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci February 3, 2026 02:56 — with GitHub Actions Inactive

terrykong previously approved these changes Feb 3, 2026

View reviewed changes

RayenTian had a problem deploying to nemo-ci February 3, 2026 06:31 — with GitHub Actions Failure

update entrypoint

0514855

Signed-off-by: ruit <ruit@nvidia.com>

RayenTian dismissed terrykong’s stale review via 0514855 February 3, 2026 08:12

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 3, 2026

RayenTian temporarily deployed to nemo-ci February 3, 2026 08:12 — with GitHub Actions Inactive

yuki-97 approved these changes Feb 3, 2026

View reviewed changes

RayenTian temporarily deployed to nemo-ci February 3, 2026 08:46 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci February 3, 2026 11:14 — with GitHub Actions Inactive

terrykong merged commit c9db946 into main Feb 3, 2026
41 of 42 checks passed

terrykong deleted the ruit/dpo_lora branch February 3, 2026 13:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add lora config for dpo dtensor backend#1826

feat: add lora config for dpo dtensor backend#1826
terrykong merged 3 commits intomainfrom
ruit/dpo_lora

RayenTian commented Jan 26, 2026 •

edited by terrykong

Loading

Uh oh!

coderabbitai bot commented Jan 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

yuki-97 commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RayenTian commented Jan 26, 2026 • edited by terrykong Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Result

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yuki-97 commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RayenTian commented Jan 26, 2026 •

edited by terrykong

Loading

coderabbitai bot commented Jan 26, 2026 •

edited

Loading