Allow RL to run inference-only via skip-train by tdene · Pull Request #3744 · NVIDIA/Megatron-LM

tdene · 2026-03-08T16:02:32Z

What does this PR do ?

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Contribution process

Pre-checks

I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code Typing guidelines
I have added relevant documentation
I have run the autoformatter.sh on my PR

Code review

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.

Step 1: Mark PR as "Ready for Review"

When your PR is ready, click Ready for Review.
An oncall reviewer is auto-assigned and expert reviewers are notified based on your changes.
- Some PRs may jump straight to step 2. This is determined by .github/CODEOWNERS.

⚠️ Only mark as ready once merge-conflicts are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

Step 2: Final Review

For PRs that change megatron/core, once all expert reviewers have approved, the Final Review label is applied automatically and final reviewers are assigned.

For PRs outside megatron/core, this step is skipped.

Step 3: Approved

Once all required reviewers have approved, the Approved label is applied automatically.

Merge

Any member of mcore-engineers will be able to merge your PR.

For MRs into `dev` branch

The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

copy-pr-bot · 2026-03-08T16:02:35Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

jaredcasper · 2026-03-11T03:52:49Z

megatron/training/arguments.py

                       help="Default top-k for model inference.")
    group.add_argument('--rl-offload-optimizer-during-inference', action='store_true',
                       help='Offload optimizer state to CPU during inference/rollout to save GPU memory')
+    group.add_argument('--rl-skip-optimizer', action=argparse.BooleanOptionalAction, default=False,


When would I want to create an optimizer when in inference only mode?

Gauge memory pressure whilst finetuning inference params

Test optimizer offload / onload

Simulate the time pressure of all the onload / offloads together (model swap, optimizer, inference)

None of these are strong reasons, but it's easy enough to offer the option. And it was requested by a reviewer.

tdene · 2026-03-12T01:07:33Z

/claude review

claude · 2026-03-12T01:08:39Z

megatron/training/training.py

-            forward_step_func, train_data_iterator, model, optimizer, opt_param_scheduler, config, forward_backward_func, iteration=iteration
-        )
-        ft_integration.on_training_step_end()
+        if args.skip_train:


Bug: This condition checks args.skip_train without also checking args.perform_rl_step. If --skip-train is used without --perform-rl-step, this code path is not reached (the outer if at the pretrain level gates on not args.skip_train or args.perform_rl_step). However, inside the train() function itself there is no such gate — train() is called when args.do_train and (args.train_iters or 0) > 0, so if a future caller or configuration ends up invoking train() with skip_train=True but without perform_rl_step, this would silently skip all training steps every iteration.

Consider tightening this to if args.skip_train and args.perform_rl_step: to make the intent explicit and guard against misuse.

I thank you for your detailed feedback, Claude, but please consider that this is the intended outcome of this code:

if a future caller or configuration ends up invoking train() with skip_train=True but without perform_rl_step, this would silently skip all training steps every iteration.

claude · 2026-03-12T01:08:48Z

megatron/training/training.py

+        if args.perform_rl_step:
            # we don't need to build any dataloaders for RL training
            train_dataloader = None
            valid_dataloaders = None
            test_dataloader = None
-            do_train = args.train_iters > 0
+            do_train = (args.train_iters or 0) > 0
            do_valid = (args.full_validation or args.eval_iters > 0)
-            do_test = (args.full_validation or args.eval_iters > 0)
+            do_test = False


Bug (pre-existing, surfaced by this PR): The old code had if getattr(args, 'perform_rl_step', True) which was always true — so this branch was taken for all runs, not just RL. This PR correctly fixes it to args.perform_rl_step, but since do_test changed from (args.full_validation or args.eval_iters > 0) to False, this is now a behavioral change for RL runs that previously relied on the test set evaluation.

Was do_test = False intentional here? If so, it might be worth a brief comment explaining why RL skips test evaluation (e.g. "RL does not use a test set").

You're right Claude; there's no point in making this diff in this PR. Addressed.

svcnvidia-nemo-ci · 2026-03-12T17:26:31Z

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23015040178

tdene force-pushed the tde/rl_add_inference_script branch 4 times, most recently from 45346da to f7c1f4d Compare March 9, 2026 17:07

tdene changed the title ~~Add inference-only RL script~~ Allow RL to run inference-only via skip-train Mar 9, 2026

tdene force-pushed the tde/rl_add_inference_script branch 7 times, most recently from 1f6b239 to d5d9894 Compare March 10, 2026 14:38

tdene marked this pull request as ready for review March 10, 2026 15:33

svcnvidia-nemo-ci requested a review from a team March 10, 2026 15:34

copy-pr-bot bot temporarily deployed to test March 10, 2026 15:35 Inactive

svcnvidia-nemo-ci added this to the Core 0.16 milestone Mar 10, 2026

tdene requested review from deepakn94 and jaredcasper March 10, 2026 15:42

tdene added the Expert Review [deprecated] Apply this label to indicate that your PR is ready for expert review. label Mar 10, 2026

tdene force-pushed the tde/rl_add_inference_script branch from d5d9894 to 29b7ad3 Compare March 11, 2026 00:51

copy-pr-bot bot temporarily deployed to test March 11, 2026 00:52 Inactive

jaredcasper reviewed Mar 11, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to test March 11, 2026 13:44 Inactive

claude bot reviewed Mar 12, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to test March 12, 2026 13:26 Inactive

tdene added 2 commits March 12, 2026 11:10

Allow RL to run inference-only via skip-train

c4eaedf

Fix optimizer argument

f1565eb

tdene added 3 commits March 12, 2026 11:10

Actually fix optimizer argument

f4024e7

Throw away --rl-skip-optimizer

d33e958

Address reviewer feedback

191c11c

tdene force-pushed the tde/rl_add_inference_script branch from ad526b1 to 191c11c Compare March 12, 2026 16:11

copy-pr-bot bot temporarily deployed to test March 12, 2026 16:12 Inactive

jaredcasper approved these changes Mar 12, 2026

View reviewed changes

tdene added this pull request to the merge queue Mar 12, 2026

Merged via the queue into NVIDIA:main with commit 29e798a Mar 12, 2026
53 of 54 checks passed

tdene deleted the tde/rl_add_inference_script branch March 12, 2026 18:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow RL to run inference-only via skip-train#3744

Allow RL to run inference-only via skip-train#3744
tdene merged 5 commits intoNVIDIA:mainfrom
tdene:tde/rl_add_inference_script

tdene commented Mar 8, 2026

Uh oh!

copy-pr-bot bot commented Mar 8, 2026

Uh oh!

jaredcasper Mar 11, 2026

Uh oh!

tdene Mar 11, 2026 •

edited

Loading

Uh oh!

tdene commented Mar 12, 2026

Uh oh!

claude bot Mar 12, 2026

Uh oh!

tdene Mar 12, 2026

Uh oh!

claude bot Mar 12, 2026

Uh oh!

tdene Mar 12, 2026

Uh oh!

svcnvidia-nemo-ci commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tdene commented Mar 8, 2026

What does this PR do ?

Contribution process

Pre-checks

Code review

Step 1: Mark PR as "Ready for Review"

Step 2: Final Review

Step 3: Approved

Merge

Uh oh!

copy-pr-bot bot commented Mar 8, 2026

Uh oh!

jaredcasper Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

tdene Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tdene commented Mar 12, 2026

Uh oh!

claude bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

tdene Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

tdene Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

svcnvidia-nemo-ci commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tdene Mar 11, 2026 •

edited

Loading