Conversation
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
| ) | ||
|
|
||
|
|
||
| def broadcast_object_across_pp_ranks(obj: Any) -> Any: |
There was a problem hiding this comment.
we might want to move utils to bridge for common testing. I think some utils here are also used in bridge. but can change later.
There was a problem hiding this comment.
Good point. I removed broadcast_object_across_pp_ranks in favor of mbridge's broadcast_obj_from_pp_rank
There was a problem hiding this comment.
Oops, didn't look at the mbridge code closely enough and I missed that broadcast_obj_from_pp_rank is a class method. I added broadcast_obj_from_pp_rank back to nemo-rl for now, but eventually we should plan to merge the two
| ) | ||
|
|
||
|
|
||
| def validate_model_paths(config: PolicyConfig) -> tuple[str, str, bool]: |
There was a problem hiding this comment.
seems to be a general util
There was a problem hiding this comment.
Do you think we should move it elsewhere? I think the function itself is pretty specific to megatron and is only used during init
| model_cfg = cfg_from_pretrained.model | ||
| cfg_from_pretrained.logger = LoggerConfig() | ||
|
|
||
| # Apply parallelism settings |
There was a problem hiding this comment.
this overriding part needs some work, right now seems hacky and hard to maintain
There was a problem hiding this comment.
Do you think the refactor should be improved, or the actual logic for overriding hyperparameters? If the latter, do you think we can address in a separate PR since this PR just focuses on refactoring the existing logic?
|
|
||
|
|
||
|
|
||
| def setup_reference_model_state( |
There was a problem hiding this comment.
how is reference model setup diff from actor? can they share some components? right now it seems very different flow
There was a problem hiding this comment.
Yeah, the flow for the two is quite different. Personally I feel like attempting to share components between the two would be somewhat contrived, because right now there isn't much obvious code duplication, but happy to discuss this further if you'd like
nemo_rl/models/megatron/train.py
Outdated
|
|
||
| return output_tensor, processor_fn_wrapped | ||
|
|
||
| def forward_maybe_backward( |
There was a problem hiding this comment.
need some docstring, looks confusing to have different levels of fwd stuff.
There was a problem hiding this comment.
Added some docstrings and renamed forward_maybe_backward to megatron_forward_backward for clarity. Please take another look and let me know if things are more clear.
| if hasattr(module, "_inference_key_value_memory"): | ||
| module._inference_key_value_memory = None | ||
|
|
||
| if gbs is None: |
There was a problem hiding this comment.
this fallback seems unsafe. any cases gbs here will be different from the cfg value?
There was a problem hiding this comment.
Yeah, we call train when running validation for DPO, for example, and we want to allow for the possibility of using a val batch size that's different from the training batch size. Similarly for microbatch size. We could make things explicit and require that users pass gbs and mbs to train. WDYT?
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
|
|
Signed-off-by: ashors1 <ashors@nvidia.com>
|
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information