Allow RL to run inference-only via skip-train#3744
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
45346da to
f7c1f4d
Compare
1f6b239 to
d5d9894
Compare
d5d9894 to
29b7ad3
Compare
megatron/training/arguments.py
Outdated
| help="Default top-k for model inference.") | ||
| group.add_argument('--rl-offload-optimizer-during-inference', action='store_true', | ||
| help='Offload optimizer state to CPU during inference/rollout to save GPU memory') | ||
| group.add_argument('--rl-skip-optimizer', action=argparse.BooleanOptionalAction, default=False, |
There was a problem hiding this comment.
When would I want to create an optimizer when in inference only mode?
There was a problem hiding this comment.
- Gauge memory pressure whilst finetuning inference params
- Test optimizer offload / onload
- Simulate the time pressure of all the onload / offloads together (model swap, optimizer, inference)
None of these are strong reasons, but it's easy enough to offer the option. And it was requested by a reviewer.
|
/claude review |
| forward_step_func, train_data_iterator, model, optimizer, opt_param_scheduler, config, forward_backward_func, iteration=iteration | ||
| ) | ||
| ft_integration.on_training_step_end() | ||
| if args.skip_train: |
There was a problem hiding this comment.
Bug: This condition checks args.skip_train without also checking args.perform_rl_step. If --skip-train is used without --perform-rl-step, this code path is not reached (the outer if at the pretrain level gates on not args.skip_train or args.perform_rl_step). However, inside the train() function itself there is no such gate — train() is called when args.do_train and (args.train_iters or 0) > 0, so if a future caller or configuration ends up invoking train() with skip_train=True but without perform_rl_step, this would silently skip all training steps every iteration.
Consider tightening this to if args.skip_train and args.perform_rl_step: to make the intent explicit and guard against misuse.
There was a problem hiding this comment.
I thank you for your detailed feedback, Claude, but please consider that this is the intended outcome of this code:
if a future caller or configuration ends up invoking train() with skip_train=True but without perform_rl_step, this would silently skip all training steps every iteration.
megatron/training/training.py
Outdated
| if args.perform_rl_step: | ||
| # we don't need to build any dataloaders for RL training | ||
| train_dataloader = None | ||
| valid_dataloaders = None | ||
| test_dataloader = None | ||
| do_train = args.train_iters > 0 | ||
| do_train = (args.train_iters or 0) > 0 | ||
| do_valid = (args.full_validation or args.eval_iters > 0) | ||
| do_test = (args.full_validation or args.eval_iters > 0) | ||
| do_test = False |
There was a problem hiding this comment.
Bug (pre-existing, surfaced by this PR): The old code had if getattr(args, 'perform_rl_step', True) which was always true — so this branch was taken for all runs, not just RL. This PR correctly fixes it to args.perform_rl_step, but since do_test changed from (args.full_validation or args.eval_iters > 0) to False, this is now a behavioral change for RL runs that previously relied on the test set evaluation.
Was do_test = False intentional here? If so, it might be worth a brief comment explaining why RL skips test evaluation (e.g. "RL does not use a test set").
There was a problem hiding this comment.
You're right Claude; there's no point in making this diff in this PR. Addressed.
ad526b1 to
191c11c
Compare
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23015040178 |
What does this PR do ?
Contribution process
Pre-checks
Code review
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.
Step 1: Mark PR as "Ready for Review"
.github/CODEOWNERS.Final Review might get declined if these requirements are not fulfilled.
Step 2: Final Review
For PRs that change
megatron/core, once all expert reviewers have approved, theFinal Reviewlabel is applied automatically and final reviewers are assigned.For PRs outside
megatron/core, this step is skipped.Step 3: Approved
Once all required reviewers have approved, the
Approvedlabel is applied automatically.Merge
Any member of mcore-engineers will be able to merge your PR.
For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.