Skip to content

Conversation

@RayenTian
Copy link
Contributor

@RayenTian RayenTian commented Jan 9, 2026

What does this PR do ?

Support async config for dtensor lora grpo.

TODOs

  • unit test
  • functional test
  • convergence test

Issues

[3/3] of #1597
closes #1597

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Result

Async

Qwen/Qwen3-0.6B

image

Llama-3.2-3B-Instruct

image

Llama-3.1-8B

image image

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

@RayenTian RayenTian changed the title feat: Support lora in dtensor grpo workflow[2/3]: async vllm feat: Support lora in dtensor grpo workflow[3/3]: async vllm Jan 9, 2026
@RayenTian RayenTian force-pushed the ruit/lora_grpo_sync_non_colocated branch from 3f5b2b5 to ee34dcb Compare January 9, 2026 09:08
@RayenTian RayenTian force-pushed the ruit/lora_grpo_async branch 2 times, most recently from 9bc5186 to f92d968 Compare January 9, 2026 09:23
@RayenTian RayenTian force-pushed the ruit/lora_grpo_sync_non_colocated branch 3 times, most recently from e37c9d9 to ee92a4d Compare January 13, 2026 09:23
@RayenTian RayenTian force-pushed the ruit/lora_grpo_async branch from 62eaaaf to ab6f639 Compare January 13, 2026 09:41
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: ab6f639 (PR #1752 from ruit/lora_grpo_async)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@RayenTian RayenTian force-pushed the ruit/lora_grpo_async branch from ab6f639 to b600e5f Compare January 13, 2026 09:47
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: b600e5f (PR #1752 from ruit/lora_grpo_async)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@RayenTian RayenTian force-pushed the ruit/lora_grpo_sync_non_colocated branch 2 times, most recently from 9a1b189 to 2cb73ba Compare January 13, 2026 11:49
@RayenTian RayenTian force-pushed the ruit/lora_grpo_async branch from b600e5f to 3aba604 Compare January 13, 2026 11:56
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: 3aba604 (PR #1752 from ruit/lora_grpo_async)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@RayenTian RayenTian force-pushed the ruit/lora_grpo_async branch from 3aba604 to 8e1312c Compare January 13, 2026 11:59
@RayenTian RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Jan 13, 2026
@RayenTian RayenTian force-pushed the ruit/lora_grpo_sync_non_colocated branch 2 times, most recently from 517ab01 to 0bf11eb Compare January 14, 2026 02:40
@RayenTian RayenTian force-pushed the ruit/lora_grpo_async branch from 8e1312c to 32d76b9 Compare January 14, 2026 02:41
@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jan 14, 2026
@RayenTian RayenTian force-pushed the ruit/lora_grpo_sync_non_colocated branch from 0bf11eb to 2436d92 Compare January 14, 2026 09:54
@RayenTian RayenTian force-pushed the ruit/lora_grpo_async branch from 32d76b9 to 3fdb505 Compare January 14, 2026 09:56
@RayenTian RayenTian force-pushed the ruit/lora_grpo_sync_non_colocated branch from 2436d92 to cfb4f10 Compare January 15, 2026 08:37
@RayenTian RayenTian force-pushed the ruit/lora_grpo_sync_non_colocated branch from cfb4f10 to b880394 Compare January 15, 2026 09:25
Signed-off-by: ruit <ruit@nvidia.com>
…mode' across multiple interfaces

Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
@RayenTian RayenTian marked this pull request as ready for review January 15, 2026 09:58
@RayenTian RayenTian requested review from a team as code owners January 15, 2026 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants