[train] LoRA-only weight syncing for Megatron #1360
[train] LoRA-only weight syncing for Megatron #1360hao-aaron wants to merge 9 commits intoNovaSky-AI:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for LoRA (Low-Rank Adaptation) adapter export and synchronization within the Megatron training framework. Key changes include updating pyproject.toml to use a newer megatron-bridge commit and adding the transformers library, implementing a new _save_lora_adapters_and_sync method in megatron_worker.py to export LoRA adapter weights and notify the inference engine, and modifying the broadcast_to_inference_engines method to utilize this new LoRA-specific synchronization. Additionally, the test_lora.py file has been updated to include new test cases for the Megatron strategy, while a redundant LoRA test case was removed from test_megatron_worker.py.
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request introduces support for LoRA-only weight syncing for Megatron models, which is a valuable feature. The changes include updating dependencies, implementing the core logic for saving and syncing LoRA adapters, and adding corresponding tests. The implementation appears correct and robust. I have one suggestion regarding import statements to improve code maintainability.
| import json | ||
|
|
||
| from megatron.bridge.models.conversion.peft_bridge import ( | ||
| build_adapter_config_dict, | ||
| infer_target_modules_from_adapter_weights, | ||
| ) | ||
| from safetensors.torch import save_file |
There was a problem hiding this comment.
Imports should generally be at the top of the file as per PEP 8 style guidelines. This improves readability and makes dependencies clear. The imports for json, megatron.bridge, and safetensors.torch here, as well as for RemoteInferenceClient on line 846, are local to this method. Unless there's a specific reason for lazy loading (like avoiding circular dependencies, which doesn't seem to be the case here), please move these imports to the top of the file.
| self.weight_extractor.extract_weights(generator_dtype), | ||
| weight_metadata=weight_metadata, | ||
| ) | ||
| if self._is_lora: |
There was a problem hiding this comment.
hmm i don't think we want to get rid of the option to merge loras back into the base model to sync (since lora serving throughput is slower)
can we add a flag to gate this feature PR - something like trainer.policy.megatron_config.lora_config.merge_lora and have the default still be True?
then when we get to in memory lora weight sync we can replace the disk syncing?
There was a problem hiding this comment.
+1.
We definitely want to support both modes. Default to merge for now
|
also will this lora only weight sync mode only support |
There was a problem hiding this comment.
@hao-aaron can you add this to the megatron CI here and ensure that tests pass with _SKYRL_USE_NEW_INFERENCE=1 ?
| pytest.param(False, "nccl", "megatron", 2, marks=pytest.mark.megatron), | ||
| pytest.param(True, "nccl", "megatron", 2, marks=pytest.mark.megatron), |
There was a problem hiding this comment.
Let's also parameterize saving to disk vs in-memory weight sync here for LoRA (after addressing @erictang000 's comment)
colocated and non colocated both work, i have tests for both in test_lora.py |
SumanthRH
left a comment
There was a problem hiding this comment.
Only 2 minor comments left!
Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>
312a02a to
0bf7283
Compare
| # Override build_vllm_cli_args which auto-enables LoRA based | ||
| # on lora rank in the config. For merged weight sync the | ||
| # inference engine must NOT have LoRA wrapping enabled. | ||
| cli_args.enable_lora = False |
There was a problem hiding this comment.
@hao-aaron actually the same fix is needed for build_vllm_cli_args? Otherwise this will fail outside of the tests?
We need to unset enable_lora in the vllm cli if it's megatron backend and merge lora is true
|
wait can we hold off on merging this? the megatron-bridge bump might cause issues for deepseek-v3 models: megatron-bridge = {git = "https://github.com/NVIDIA-NeMo/Megatron-Bridge", rev = "f78c65f9", marker = "sys_platform == 'linux'"} |
There was a problem hiding this comment.
megatron-bridge commit bump is a little scary for deepseek-v3 models - can we test that initializing glm-flash-4.7 still works at all?
i saw this script not run anymore previously after bumping megatron-bridge to a newer commit
let me test this out
how does it work for multi-node + non-colocated if it's writing to disk? i guess you can write to shared storage but that's a little non-obvious - we should at least put up a warning if non-colocated + not merging lora? |
closes #1336
Uses megatron new
export_adapter_weightsto save loras to disk and load in vllm. Bumped megatron version for the new api.Added new megatron tests to
test_lora.py, run test_lora with both_USE_SKYRL_NEW_INFERENCE=0and1.Added some more robust shutdown that was causing issues with the test when using new inference