refactor: extract initialize_model_weights from load_base_model#1356
Merged
hemildesai merged 1 commit intomainfrom Feb 24, 2026
Merged
refactor: extract initialize_model_weights from load_base_model#1356hemildesai merged 1 commit intomainfrom
hemildesai merged 1 commit intomainfrom
Conversation
Decouple weight initialization from checkpoint loading so that from_config (which skips checkpoint loading) can still materialize meta-device parameters and call initialize_weights. Move _init_peft_adapters into initialize_model_weights as well, and make from_config accept load_base_model as a kwarg for flexibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Hemil Desai <hemild@nvidia.com>
Contributor
Author
|
/ok to test cf6773f |
adil-a
reviewed
Feb 23, 2026
| fp8_config=fp8_config, | ||
| compile_config=compile_config, | ||
| load_base_model=False, | ||
| load_base_model=kwargs.pop("load_base_model", False), |
Collaborator
There was a problem hiding this comment.
why do we need this? if we want to load a base model then what stops us from going the .from_pretrained route?
Contributor
Author
There was a problem hiding this comment.
For models like Deepseek v3.2 the config class isn't available in transformers so it errors out if you use .from_pretrained
adil-a
approved these changes
Feb 24, 2026
svcnvidia-nemo-ci
pushed a commit
that referenced
this pull request
Feb 24, 2026
Decouple weight initialization from checkpoint loading so that from_config (which skips checkpoint loading) can still materialize meta-device parameters and call initialize_weights. Move _init_peft_adapters into initialize_model_weights as well, and make from_config accept load_base_model as a kwarg for flexibility. Signed-off-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
thomasdhc
pushed a commit
that referenced
this pull request
Feb 24, 2026
…(1356)` into `r0.3.0` (#1360) refactor: extract initialize_model_weights from load_base_model (#1356) Decouple weight initialization from checkpoint loading so that from_config (which skips checkpoint loading) can still materialize meta-device parameters and call initialize_weights. Move _init_peft_adapters into initialize_model_weights as well, and make from_config accept load_base_model as a kwarg for flexibility. Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
akoumpa
pushed a commit
that referenced
this pull request
Feb 25, 2026
Decouple weight initialization from checkpoint loading so that from_config (which skips checkpoint loading) can still materialize meta-device parameters and call initialize_weights. Move _init_peft_adapters into initialize_model_weights as well, and make from_config accept load_base_model as a kwarg for flexibility. Signed-off-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
linnanwang
pushed a commit
that referenced
this pull request
Apr 24, 2026
Decouple weight initialization from checkpoint loading so that from_config (which skips checkpoint loading) can still materialize meta-device parameters and call initialize_weights. Move _init_peft_adapters into initialize_model_weights as well, and make from_config accept load_base_model as a kwarg for flexibility. Signed-off-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Checkpointer.load_base_modelinto a newCheckpointer.initialize_model_weightsstatic method, sofrom_config(which skips checkpoint loading) can still materialize meta-device parameters and callinitialize_weights_init_peft_adaptersintoinitialize_model_weightsvia a newpeft_init_methodparameter, keeping PEFT adapter init alongside weight init regardless of checkpoint loading pathneed_post_shard_initblock inapply_model_infrastructurethat callsinitialize_model_weightsafter parallelisms are applied, decoupled from the checkpoint loading conditionfrom_configacceptload_base_modelas a kwarg (defaults toFalse) for flexibilityTest plan
initialize_model_weights(materialization, HF init reset, Nemotron skip, peft_init_method forwarding, signature check)load_before_shardpath (call ordering, single init call, no peft_init_method in load_base_model)apply_model_infrastructurefrom_configload_base_model kwarg forwardingpytest tests/unit_tests/checkpoint/test_checkpointing.py tests/unit_tests/_transformers/test_infrastructure.py)🤖 Generated with Claude Code