feat: single controller (w/o sync_weight) by yuki-97 · Pull Request #2819 · NVIDIA-NeMo/RL

yuki-97 · 2026-06-15T06:53:06Z

Summary

Adds the SC (Single Controller) entrypoint for async GRPO on the TransferQueue data plane.

SingleControllerActor (nemo_rl/algorithms/single_controller.py): CPU-only Ray actor with three asyncio pumps.
- _rollout_pump: iterates the dataloader → RolloutManager.generate_and_push → TQReplayBuffer.add writes N training rows per group. Gated by _buffer_capacity / max_inflight_prompts / _rollout_permitted.
- _train_pump: sampler.evict → sampler.select → _advantage_pump → TQPolicy split API (begin/train_microbatch/finish_train_step) → dp_client.clear_samples.
- _sync_weights: drain gate + (stubbed) WeightSynchronizer.sync_weights.
single_controller_utils/:
- SC MasterConfig + AsyncRLConfig;
- One driver-side setup() builds the full SingleControllerBundle via inlined _build_clusters / _build_generation / _build_trainer.
async_utils/: TQReplayBuffer (group-granular, owns producer-side tensorization) + StalenessSampler (filter-only).
Launcher + exemplar: examples/run_grpo_single_controller.py, examples/configs/grpo_math_1B_single_controller.yaml.
Megatron worker fix: begin_train_step also nulls config.no_sync_func so the outer model.no_sync() covers the last MB — otherwise register_grad_ready leaks counts past the explicit sync and asserts on step 2.

Known limits (TODOs in code)

max_train_steps not bounded by max_num_epochs * len(dataloader).
compute_prev_logprobs / compute_reference_logprobs gating is provisional.
sampler.select only enforces min_prompt_groups_per_batch; no max_* yet.
_sync_weights stubbed.
Validation dataset wiring not in.
Log stuffs.
Integrate nemogym setup.

Current state

Runnable end-to-end via the launcher; functional test (grpo_dp_single_controller.sh, Qwen3-0.6B / 2 GPU / 2 steps).
Accuracy not validated — sync_weights not wired, generator never refreshes.
Perf not measured — no profiling vs grpo_train_sync yet.

copy-pr-bot · 2026-06-15T06:53:10Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Two bugs around the split-API state machine fixed in one pass: 1. **no_sync_func leak (latent assertion on step 2 with PP=1).** `forward_backward_no_pipelining` (the PP=1 path, which is the common one) wraps inner microbatches in `model.config.no_sync_func` but runs the *last* microbatch OUTSIDE of it. Our outer `with self.model.no_sync():` in `train_microbatch` was therefore bypassed for the trailing MB. `register_grad_ready` leaked per-param counts past the explicit DP sync at `finish_train_step`, and the next `begin_train_step` then asserted on the stale counts (typically step 2). Fix: in `begin_train_step` also save and null `no_sync_func` (set to `contextlib.nullcontext`). Restore both hooks at finish/abort via the new `_restore_saved_grad_sync_func` helper. Spotted in yuki-97's PR NVIDIA-NeMo#2819 (`begin_train_step` also nulls `config.no_sync_func`); same fix landed here. 2. **No exception safety around the open step (terrykong, NVIDIA-NeMo#2683:640).** If `train_microbatch` or `finish_train_step` raised mid-body, both `grad_sync_func` and (now also) `no_sync_func` would stay nulled and future steps would run with the PP scheduler bypass disabled silently. Fix: extract `_train_microbatch_body` and `_finish_train_step_body`, wrap each entry method in try/except that calls `_restore_saved_grad_sync_func` before re-raising. Caller is still expected to invoke `abort_train_step` (idempotent on the saved values) to drop `_train_step_state`. 3. **Cleanup (terrykong, NVIDIA-NeMo#2683:582).** Drop dead `no_sync_active` field from `_split_step_state_init` — never read or written; `no_sync` is applied via the `with self.model.no_sync():` context manager. Also: add module-level `log = logging.getLogger(__name__)` so the try/except handlers can `log.exception(...)` if the restore itself fails. Signed-off-by: Akash Mehra <akamehra@nvidia.com>

Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Akash Mehra <akamehra@nvidia.com>

Signed-off-by: Yuki Huang <yukih@nvidia.com>

Signed-off-by: Yuki Huang <yukih@nvidia.com> update functional test Signed-off-by: Yuki Huang <yukih@nvidia.com>

Signed-off-by: Yuki Huang <yukih@nvidia.com>

… reserve/commit slots Signed-off-by: Yuki Huang <yukih@nvidia.com> update unit test Signed-off-by: Yuki Huang <yukih@nvidia.com>

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 force-pushed the yukih/sc-entrypoint branch 3 times, most recently from c1fc94f to 3a55904 Compare June 19, 2026 07:18

yuki-97 and others added 7 commits June 19, 2026 00:54

squash #2700

1e9ce78

Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Akash Mehra <akamehra@nvidia.com>

squash rollout pump (e48aa9f -> 8b5d01f)

8e03827

Signed-off-by: Yuki Huang <yukih@nvidia.com>

squash staleness sampler + tq replay buffer (eadf626 -> ac5571a)

18abd45

Signed-off-by: Yuki Huang <yukih@nvidia.com>

[rollout pump] fix batch

2904f05

Signed-off-by: Yuki Huang <yukih@nvidia.com>

squash entrypoint + fix (f335c92 -> 57f8eb2)

7fdf762

Signed-off-by: Yuki Huang <yukih@nvidia.com> update functional test Signed-off-by: Yuki Huang <yukih@nvidia.com>

refactor(single-controller): rename setup entrypoint; drop dryrun test

a9894ef

Signed-off-by: Yuki Huang <yukih@nvidia.com>

feat(single-controller): add over_sampling=false batch-quota mode via…

22b9e48

… reserve/commit slots Signed-off-by: Yuki Huang <yukih@nvidia.com> update unit test Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 force-pushed the yukih/sc-entrypoint branch from 3a55904 to 22b9e48 Compare June 19, 2026 07:58

yuki-97 added 7 commits June 20, 2026 23:48

fix config

e27f8b6

Signed-off-by: Yuki Huang <yukih@nvidia.com>

limit max_train_steps by max_num_epochs

61f193c

Signed-off-by: Yuki Huang <yukih@nvidia.com>

elegant shutdown

e7d75ed

Signed-off-by: Yuki Huang <yukih@nvidia.com>

lint

79d4a61

Signed-off-by: Yuki Huang <yukih@nvidia.com>

support setup weight_synchronizer and uncomment

ebf346c

Signed-off-by: Yuki Huang <yukih@nvidia.com>

feat(single-controller): log per-step train metrics

617e585

Signed-off-by: Yuki Huang <yukih@nvidia.com>

add prepare_for_lp/training

3a44de6

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 force-pushed the yukih/sc-entrypoint branch from d2b5182 to 98daa14 Compare June 21, 2026 13:30

yuki-97 added 2 commits June 21, 2026 06:46

add nightly

33018f0

Signed-off-by: Yuki Huang <yukih@nvidia.com>

copyright

7976488

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 force-pushed the yukih/sc-entrypoint branch from 98daa14 to 7976488 Compare June 21, 2026 13:47

add timing

437579b

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 force-pushed the yukih/sc-entrypoint branch 3 times, most recently from a049782 to 36c7e50 Compare June 21, 2026 15:00

add nightly sync

44fc9a4

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 force-pushed the yukih/sc-entrypoint branch from 36c7e50 to 44fc9a4 Compare June 21, 2026 15:25

yuki-97 added 4 commits June 22, 2026 02:10

fix(sc): unblock train_pump asyncio, add set_seed, fix LR log ordering

e480418

Signed-off-by: Yuki Huang <yukih@nvidia.com>

feat(sc): add force_in_order target-step matching to async_rl sampler

94ae945

Signed-off-by: Yuki Huang <yukih@nvidia.com>

feat(sc): assert num_prompts * num_gen == train_global_batch_size

fbb1ccd

Signed-off-by: Yuki Huang <yukih@nvidia.com>

[tmp] upload yaml/script for debug

dfc4977

Signed-off-by: Yuki Huang <yukih@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: single controller (w/o sync_weight)#2819

feat: single controller (w/o sync_weight)#2819
yuki-97 wants to merge 22 commits into
mainfrom
yukih/sc-entrypoint

yuki-97 commented Jun 15, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yuki-97 commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Known limits (TODOs in code)

Current state

Uh oh!

copy-pr-bot Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yuki-97 commented Jun 15, 2026 •

edited

Loading