Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
megatron/rl/agent/api.py
Outdated
| group_index = yielded_groups | ||
| yielded_groups += 1 | ||
| for rollout in group: | ||
| rollout.submission_index = group_index |
There was a problem hiding this comment.
nit: remove group index and do rollout.submission_index = yielded_groups -1?
megatron/rl/rl_utils.py
Outdated
| def get_rollout_generator(args, inference_interface, n_prompts, samples_per_group): | ||
| global _ROLLOUT_GENERATOR | ||
| if not args.rl_partial_rollouts or _ROLLOUT_GENERATOR is None: | ||
| oversubscribed = args.rl_partial_rollouts or args.rl_forced_lag > 0 |
There was a problem hiding this comment.
Could you explain why you are calling this 'oversubscribed'?
There was a problem hiding this comment.
I could not think of a better word at the time. It's regarding the concept of submitting more requests than you actually need.
There was a problem hiding this comment.
Yeah, that's why I asked! partial rollouts or forced lag are not the only cases when we sample more than we consume. Sending a request when batch_size < group_size * prompts will also do this.
There was a problem hiding this comment.
It's now called streaming.
megatron/training/arguments.py
Outdated
| assert args.micro_batch_size == 1, \ | ||
| "micro_batch_size must be 1 when using sequence packing. To increase compute per micro batch increase the sequence length." | ||
| assert rl.forced_lag > 0 or not args.rl_partial_rollouts, ( | ||
| "--rl-forced-lag and --rl-partial-rollouts are incompatible." |
megatron/rl/rl_utils.py
Outdated
| buffered_rollouts is None or | ||
| iteration == runtime_state.last_collection_iteration + | ||
| runtime_state.data_iterator is None or | ||
| iteration >= runtime_state.last_collection_iteration + |
There was a problem hiding this comment.
I think this is now entirely stale code.
megatron/rl/rl_utils.py
Outdated
| RerunDataIterator for the current training step | ||
| """ | ||
| runtime_state = get_rl_runtime_state() | ||
| args = get_args() |
There was a problem hiding this comment.
Please, send the argument as a function arg, do not call get_args() here.
megatron/rl/rl_utils.py
Outdated
| ): | ||
| if forced_lag > 0: | ||
| runtime_state.lag_buffer.append( | ||
| get_environment_rollouts( |
There was a problem hiding this comment.
This can be above the if forced_lag branch.
There was a problem hiding this comment.
I think this is now entirely stale code.
megatron/rl/rl_utils.py
Outdated
| model, inference_model, optimizer, grpo_prompts_per_step, grpo_group_size, | ||
| ) | ||
| ) | ||
| rollouts = runtime_state.lag_buffer.popleft() |
There was a problem hiding this comment.
Will be useful/interesting to track the length of the lag buffer. Is it gonna increase in time?
There was a problem hiding this comment.
Also! Related to the staleness tracking, without the plotting fix, the staleness will be reported incorrectly even more.
megatron/rl/rl_utils.py
Outdated
|
|
||
| runtime_state.reset_iteration_counters(iteration) | ||
| return runtime_state.data_iterator | ||
| return runtime_state.data_iterator |
There was a problem hiding this comment.
With those changes, we need to run tests for these scenarios.
- Normal, GRPO_ITERATIONS=1 sample X, consume X (batch size = prompts * group_size)
- sample 2X, consume X batch size = prompts * group_size / 2
- your newly added lag thing. Not sure if we need it in both scenarios above.
megatron/rl/rl_utils.py
Outdated
| if runtime_state.start_iteration is None: | ||
| runtime_state.start_iteration = iteration | ||
| if forced_lag > 0: | ||
| runtime_state.lag_buffer = deque() |
There was a problem hiding this comment.
For me this is kind of stuff __init()__ is for. Why do we need all this logic here?
There was a problem hiding this comment.
I think this is now entirely stale code.
241f1bf to
95815f4
Compare
| inference_interface: InferenceInterface | ||
| validation: bool = False | ||
| filter_groups_with_same_reward: bool = False | ||
| streaming: bool = False |
There was a problem hiding this comment.
New parameter that matches up to what num_groups = -1 meant in the old code. It's bad practice to have a parameter that means one thing when it's positive, and an entirely different thing when it's set to -1.
There was a problem hiding this comment.
Those will break nemo_gym integration. Please, adjust those scripts accordingly. ./nemo_gym_agent.py is the file you need.
There was a problem hiding this comment.
Will do this later today.
There was a problem hiding this comment.
Let's get back to this after we deal with completed_at_step parameter.
megatron/rl/agent/api.py
Outdated
| validation: bool = False | ||
| filter_groups_with_same_reward: bool = False | ||
| streaming: bool = False | ||
| batch_results: bool = False |
There was a problem hiding this comment.
When False, this returns groups at a time like in the old flow. When True, this waits until num_groups are ready and then returns them all at once.
megatron/rl/agent/api.py
Outdated
| # The semaphore ensures that each batch only starts after the previous is consumed. | ||
| groups_per_worker = request.num_groups | ||
| num_workers = self.parallel_generation_tasks // groups_per_worker | ||
| submission_gate = asyncio.Semaphore(num_workers) |
There was a problem hiding this comment.
The whole point of this submission_gate is the following.
If we are doing forced lag, we can force the RL setup to only "generate" a certain number of groups per step. This is the flow without submission_gate. This is more performant, but does not guarantee consistent lag.
The flow with submission_gate forces the RL setup to only "consume" a certain number of groups per step. This is less performant, but guarantees consistent lag.
The difference between "generate N per step" and "consume N per step" is in the edge-case.
| request.inference_interface, InferenceServer | ||
| ), "Rollout requests to remote server must contain an InferenceServer object" | ||
| assert request.num_groups != -1, "FastAPIEnvServer does not support group rollout streaming" | ||
| assert not request.streaming, "FastAPIEnvServer does not support group rollout streaming" |
There was a problem hiding this comment.
This is what I meant above. num_groups != -1 was secretly hidden behavior. Now it's out in the open.
megatron/rl/rl_utils.py
Outdated
| streaming = args.rl_partial_rollouts or args.rl_forced_lag > 0 | ||
| if not streaming or _ROLLOUT_GENERATOR is None: | ||
| if args.rl_forced_lag > 0: | ||
| pgt = (args.rl_forced_lag + 1) * n_prompts |
There was a problem hiding this comment.
We must control the inference batch size in forced lag mode.
|
/claude review |
megatron/training/arguments.py
Outdated
| group.add_argument('--rl-forced-lag', type=int, default=0, | ||
| help='Forced rollout lag of L steps. After an initial warm-up of L steps, ' | ||
| 'All steps N+L use only rollouts that were started on step N. ' | ||
| '0 (default) disabled this behavior.' |
There was a problem hiding this comment.
Typo: "disabled" → "disables"
| '0 (default) disabled this behavior.' | |
| help='Forced rollout lag of L steps. After an initial warm-up of L steps, ' | |
| 'All steps N+L use only rollouts that were started on step N. ' | |
| '0 (default) disables this behavior.' |
There was a problem hiding this comment.
Resolved. Thank you Claude!
| if submission_gate is not None: | ||
| await submission_gate.acquire() | ||
| batch_id = submitted_groups // groups_per_worker | ||
| submitted_groups += groups_per_worker |
There was a problem hiding this comment.
The assert only guards the groups_per_worker > 1 path. When groups_per_worker == 1 (i.e. n_prompts == 1) with batch_results=True and filter_groups_with_same_reward=True, a filtered group will leave a semaphore permit unreleased. The consumer is then waiting on grouped_rollouts.get() while every worker is stuck waiting on submission_gate.acquire() — deadlock.
Since filtering + forced lag is unsupported, it should be caught in validate_args rather than at runtime inside the generator (where the assert kills the training run with no cleanup). Consider adding to validate_args:
assert not (args.rl_forced_lag > 0 and args.rl_filter_groups_with_same_reward), \
"--rl-forced-lag and --rl-filter-groups-with-same-reward are incompatible."(adjust the flag name to whatever the actual arg is.)
There was a problem hiding this comment.
Resolved. Thank you Claude!
|
I actually have pretty strong feeling about the arguments here. So. My proposal is They would be mutually exclusive. Both would require The first would imply no batching and would represents the number of parallel rollouts. This is what we really want to control. We would internally adjust for multiple tasks and divide it by grpo_group_size to ensure that changing the grpo_group_size or number of tasks doesn't require an associated change in this argument to keep the same amount of "active" rollouts. The second would imply batching and would represent the number of batches active at a time. In this case we would set our controls so we have that batches in flight. This would be our "forced lag" or "N step off policy" setup. |
megatron/rl/agent/api.py
Outdated
|
|
||
|
|
||
| @dataclass(slots=True) | ||
| class RolloutGroup: |
There was a problem hiding this comment.
Should it inherit from the BaseModel?
megatron/rl/agent/api.py
Outdated
| while grouped_rollouts.qsize() > 0 or not all(task.done() for task in tasks): | ||
| yield await grouped_rollouts.get() | ||
| next_batch_id = 0 | ||
| pending: dict[int, list[RolloutGroup]] = {} |
megatron/rl/agent/api.py
Outdated
| class RolloutGroup: | ||
| """A group of rollouts (e.g. multiple completions for one prompt) with batch metadata.""" | ||
|
|
||
| rollouts: list[Rollout] |
There was a problem hiding this comment.
Rollout or Rolout|TokenRollout?
megatron/rl/agent/api.py
Outdated
| if request.enforce_order: | ||
| # Accumulate groups and enforce submission order across batches. | ||
| pending.setdefault(group.batch_id, []).append(group) | ||
| while len(pending.get(next_batch_id, [])) >= groups_per_worker: |
There was a problem hiding this comment.
Trying to understand this logic. Is there a chance we do not get to this loop? i.e. not get out of it because we have exhausted pending, but for some other reasons.
There was a problem hiding this comment.
Certainly. The only way we can get stuck here is if we filter out rollouts (which we are asserting that we do not do), or if the worker tasks die. But if the worker tasks die, we shutdown the asyncio Queues (because I pack-ported Python 3.13 functionality into this repo, because asyncio Queues are broken before Python 3.13), and that breaks us out of a potentially infinite loop.
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23271910667 |
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23273525669 |
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23275855415 |
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23278219551 |
What does this PR do ?
Contribution process
flowchart LR A[Pre-checks] --> B[PR Tests] subgraph Code Review/Approval C1[Expert Review] --> C2[Final Review] end B --> C1 C2 --> D[Merge]Pre-checks
Core 0.8)Code review
The following process is enforced via the CODEOWNERS file for changes into
megatron/core. For changes outside ofmegatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.For MRs into `main` branch
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
(Step 1): Add PR label
Expert Review(Step 2): Collect the expert reviewers reviews
Expert Reviewlabel when your PR is ready for review.Final Review might get declined if these requirements are not fulfilled.
(Step 3): Final Review
Final Reviewlabel(Optional Step 4): Cherry-pick into release branch
If this PR also needs to be merged into
core_r*release branches, after this PR has been merged, selectCherry-pickto open a new PR into the release branch.For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.Merging your PR
Any member of core-adlr and
core-nemowill be able to merge your PR.