Skip to content

refactor: extract initialize_model_weights from load_base_model#1356

Merged
hemildesai merged 1 commit intomainfrom
claude/eloquent-pare
Feb 24, 2026
Merged

refactor: extract initialize_model_weights from load_base_model#1356
hemildesai merged 1 commit intomainfrom
claude/eloquent-pare

Conversation

@hemildesai
Copy link
Copy Markdown
Contributor

Summary

  • Extract weight initialization logic from Checkpointer.load_base_model into a new Checkpointer.initialize_model_weights static method, so from_config (which skips checkpoint loading) can still materialize meta-device parameters and call initialize_weights
  • Move _init_peft_adapters into initialize_model_weights via a new peft_init_method parameter, keeping PEFT adapter init alongside weight init regardless of checkpoint loading path
  • Add need_post_shard_init block in apply_model_infrastructure that calls initialize_model_weights after parallelisms are applied, decoupled from the checkpoint loading condition
  • Make from_config accept load_base_model as a kwarg (defaults to False) for flexibility

Test plan

  • Added 11 new unit tests for initialize_model_weights (materialization, HF init reset, Nemotron skip, peft_init_method forwarding, signature check)
  • Added 3 tests for the load_before_shard path (call ordering, single init call, no peft_init_method in load_base_model)
  • Added 1 test for peft_init_method forwarding through apply_model_infrastructure
  • Added 5 tests for post-shard init (from_config, from_pretrained, non-meta, MegatronFSDP skip)
  • Added 2 tests for from_config load_base_model kwarg forwarding
  • All 62 tests pass (pytest tests/unit_tests/checkpoint/test_checkpointing.py tests/unit_tests/_transformers/test_infrastructure.py)

🤖 Generated with Claude Code

Decouple weight initialization from checkpoint loading so that
from_config (which skips checkpoint loading) can still materialize
meta-device parameters and call initialize_weights. Move
_init_peft_adapters into initialize_model_weights as well, and
make from_config accept load_base_model as a kwarg for flexibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Feb 23, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@hemildesai
Copy link
Copy Markdown
Contributor Author

/ok to test cf6773f

fp8_config=fp8_config,
compile_config=compile_config,
load_base_model=False,
load_base_model=kwargs.pop("load_base_model", False),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this? if we want to load a base model then what stops us from going the .from_pretrained route?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For models like Deepseek v3.2 the config class isn't available in transformers so it errors out if you use .from_pretrained

@hemildesai hemildesai added the r0.3.0 Add for cherry-pick into release branch r0.3.0 label Feb 24, 2026
@hemildesai hemildesai merged commit 310dfb9 into main Feb 24, 2026
51 checks passed
@hemildesai hemildesai deleted the claude/eloquent-pare branch February 24, 2026 01:25
svcnvidia-nemo-ci pushed a commit that referenced this pull request Feb 24, 2026
Decouple weight initialization from checkpoint loading so that
from_config (which skips checkpoint loading) can still materialize
meta-device parameters and call initialize_weights. Move
_init_peft_adapters into initialize_model_weights as well, and
make from_config accept load_base_model as a kwarg for flexibility.

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
thomasdhc pushed a commit that referenced this pull request Feb 24, 2026
…(1356)` into `r0.3.0` (#1360)

refactor: extract initialize_model_weights from load_base_model (#1356)

Decouple weight initialization from checkpoint loading so that
from_config (which skips checkpoint loading) can still materialize
meta-device parameters and call initialize_weights. Move
_init_peft_adapters into initialize_model_weights as well, and
make from_config accept load_base_model as a kwarg for flexibility.

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
akoumpa pushed a commit that referenced this pull request Feb 25, 2026
Decouple weight initialization from checkpoint loading so that
from_config (which skips checkpoint loading) can still materialize
meta-device parameters and call initialize_weights. Move
_init_peft_adapters into initialize_model_weights as well, and
make from_config accept load_base_model as a kwarg for flexibility.

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
linnanwang pushed a commit that referenced this pull request Apr 24, 2026
Decouple weight initialization from checkpoint loading so that
from_config (which skips checkpoint loading) can still materialize
meta-device parameters and call initialize_weights. Move
_init_peft_adapters into initialize_model_weights as well, and
make from_config accept load_base_model as a kwarg for flexibility.

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r0.3.0 Add for cherry-pick into release branch r0.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants