Skip to content

Allow RL to run inference-only via skip-train#3744

Merged
tdene merged 5 commits intoNVIDIA:mainfrom
tdene:tde/rl_add_inference_script
Mar 12, 2026
Merged

Allow RL to run inference-only via skip-train#3744
tdene merged 5 commits intoNVIDIA:mainfrom
tdene:tde/rl_add_inference_script

Conversation

@tdene
Copy link
Contributor

@tdene tdene commented Mar 8, 2026

What does this PR do ?

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Contribution process

Pre-checks

  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

Code review

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.

Step 1: Mark PR as "Ready for Review"

  1. When your PR is ready, click Ready for Review.
  2. An oncall reviewer is auto-assigned and expert reviewers are notified based on your changes.
    • Some PRs may jump straight to step 2. This is determined by .github/CODEOWNERS.

⚠️ Only mark as ready once merge-conflicts are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

Step 2: Final Review

For PRs that change megatron/core, once all expert reviewers have approved, the Final Review label is applied automatically and final reviewers are assigned.

For PRs outside megatron/core, this step is skipped.

Step 3: Approved

Once all required reviewers have approved, the Approved label is applied automatically.

Merge

Any member of mcore-engineers will be able to merge your PR.

For MRs into `dev` branch The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 8, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@tdene tdene force-pushed the tde/rl_add_inference_script branch 4 times, most recently from 45346da to f7c1f4d Compare March 9, 2026 17:07
@tdene tdene changed the title Add inference-only RL script Allow RL to run inference-only via skip-train Mar 9, 2026
@tdene tdene force-pushed the tde/rl_add_inference_script branch 7 times, most recently from 1f6b239 to d5d9894 Compare March 10, 2026 14:38
@tdene tdene marked this pull request as ready for review March 10, 2026 15:33
@svcnvidia-nemo-ci svcnvidia-nemo-ci requested a review from a team March 10, 2026 15:34
@svcnvidia-nemo-ci svcnvidia-nemo-ci added this to the Core 0.16 milestone Mar 10, 2026
@tdene tdene requested review from deepakn94 and jaredcasper March 10, 2026 15:42
@tdene tdene added the Expert Review [deprecated] Apply this label to indicate that your PR is ready for expert review. label Mar 10, 2026
@tdene tdene force-pushed the tde/rl_add_inference_script branch from d5d9894 to 29b7ad3 Compare March 11, 2026 00:51
help="Default top-k for model inference.")
group.add_argument('--rl-offload-optimizer-during-inference', action='store_true',
help='Offload optimizer state to CPU during inference/rollout to save GPU memory')
group.add_argument('--rl-skip-optimizer', action=argparse.BooleanOptionalAction, default=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would I want to create an optimizer when in inference only mode?

Copy link
Contributor Author

@tdene tdene Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Gauge memory pressure whilst finetuning inference params
  • Test optimizer offload / onload
  • Simulate the time pressure of all the onload / offloads together (model swap, optimizer, inference)

None of these are strong reasons, but it's easy enough to offer the option. And it was requested by a reviewer.

@tdene
Copy link
Contributor Author

tdene commented Mar 12, 2026

/claude review

forward_step_func, train_data_iterator, model, optimizer, opt_param_scheduler, config, forward_backward_func, iteration=iteration
)
ft_integration.on_training_step_end()
if args.skip_train:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: This condition checks args.skip_train without also checking args.perform_rl_step. If --skip-train is used without --perform-rl-step, this code path is not reached (the outer if at the pretrain level gates on not args.skip_train or args.perform_rl_step). However, inside the train() function itself there is no such gate — train() is called when args.do_train and (args.train_iters or 0) > 0, so if a future caller or configuration ends up invoking train() with skip_train=True but without perform_rl_step, this would silently skip all training steps every iteration.

Consider tightening this to if args.skip_train and args.perform_rl_step: to make the intent explicit and guard against misuse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thank you for your detailed feedback, Claude, but please consider that this is the intended outcome of this code:

if a future caller or configuration ends up invoking train() with skip_train=True but without perform_rl_step, this would silently skip all training steps every iteration.

Comment on lines +3518 to +3525
if args.perform_rl_step:
# we don't need to build any dataloaders for RL training
train_dataloader = None
valid_dataloaders = None
test_dataloader = None
do_train = args.train_iters > 0
do_train = (args.train_iters or 0) > 0
do_valid = (args.full_validation or args.eval_iters > 0)
do_test = (args.full_validation or args.eval_iters > 0)
do_test = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug (pre-existing, surfaced by this PR): The old code had if getattr(args, 'perform_rl_step', True) which was always true — so this branch was taken for all runs, not just RL. This PR correctly fixes it to args.perform_rl_step, but since do_test changed from (args.full_validation or args.eval_iters > 0) to False, this is now a behavioral change for RL runs that previously relied on the test set evaluation.

Was do_test = False intentional here? If so, it might be worth a brief comment explaining why RL skips test evaluation (e.g. "RL does not use a test set").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right Claude; there's no point in making this diff in this PR. Addressed.

@tdene tdene force-pushed the tde/rl_add_inference_script branch from ad526b1 to 191c11c Compare March 12, 2026 16:11
@tdene tdene added this pull request to the merge queue Mar 12, 2026
@svcnvidia-nemo-ci
Copy link

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23015040178

Merged via the queue into NVIDIA:main with commit 29e798a Mar 12, 2026
53 of 54 checks passed
@tdene tdene deleted the tde/rl_add_inference_script branch March 12, 2026 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Expert Review [deprecated] Apply this label to indicate that your PR is ready for expert review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants