refactor: extract initialize_model_weights from load_base_model by hemildesai · Pull Request #1356 · NVIDIA-NeMo/Automodel

hemildesai · 2026-02-23T21:44:59Z

Summary

Extract weight initialization logic from Checkpointer.load_base_model into a new Checkpointer.initialize_model_weights static method, so from_config (which skips checkpoint loading) can still materialize meta-device parameters and call initialize_weights
Move _init_peft_adapters into initialize_model_weights via a new peft_init_method parameter, keeping PEFT adapter init alongside weight init regardless of checkpoint loading path
Add need_post_shard_init block in apply_model_infrastructure that calls initialize_model_weights after parallelisms are applied, decoupled from the checkpoint loading condition
Make from_config accept load_base_model as a kwarg (defaults to False) for flexibility

Test plan

Added 11 new unit tests for initialize_model_weights (materialization, HF init reset, Nemotron skip, peft_init_method forwarding, signature check)
Added 3 tests for the load_before_shard path (call ordering, single init call, no peft_init_method in load_base_model)
Added 1 test for peft_init_method forwarding through apply_model_infrastructure
Added 5 tests for post-shard init (from_config, from_pretrained, non-meta, MegatronFSDP skip)
Added 2 tests for from_config load_base_model kwarg forwarding
All 62 tests pass (pytest tests/unit_tests/checkpoint/test_checkpointing.py tests/unit_tests/_transformers/test_infrastructure.py)

🤖 Generated with Claude Code

Decouple weight initialization from checkpoint loading so that from_config (which skips checkpoint loading) can still materialize meta-device parameters and call initialize_weights. Move _init_peft_adapters into initialize_model_weights as well, and make from_config accept load_base_model as a kwarg for flexibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Hemil Desai <hemild@nvidia.com>

copy-pr-bot · 2026-02-23T21:45:02Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

hemildesai · 2026-02-23T21:45:48Z

/ok to test cf6773f

adil-a · 2026-02-23T21:52:14Z

            fp8_config=fp8_config,
            compile_config=compile_config,
-            load_base_model=False,
+            load_base_model=kwargs.pop("load_base_model", False),


why do we need this? if we want to load a base model then what stops us from going the .from_pretrained route?

For models like Deepseek v3.2 the config class isn't available in transformers so it errors out if you use .from_pretrained

Decouple weight initialization from checkpoint loading so that from_config (which skips checkpoint loading) can still materialize meta-device parameters and call initialize_weights. Move _init_peft_adapters into initialize_model_weights as well, and make from_config accept load_base_model as a kwarg for flexibility. Signed-off-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

…(1356)` into `r0.3.0` (#1360) refactor: extract initialize_model_weights from load_base_model (#1356) Decouple weight initialization from checkpoint loading so that from_config (which skips checkpoint loading) can still materialize meta-device parameters and call initialize_weights. Move _init_peft_adapters into initialize_model_weights as well, and make from_config accept load_base_model as a kwarg for flexibility. Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Decouple weight initialization from checkpoint loading so that from_config (which skips checkpoint loading) can still materialize meta-device parameters and call initialize_weights. Move _init_peft_adapters into initialize_model_weights as well, and make from_config accept load_base_model as a kwarg for flexibility. Signed-off-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

hemildesai requested review from HuiyingLi, ZhiyuLi-Nvidia, adil-a and akoumpa as code owners February 23, 2026 21:45

copy-pr-bot Bot temporarily deployed to test February 23, 2026 21:46 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci February 23, 2026 21:46 Inactive

adil-a reviewed Feb 23, 2026

View reviewed changes

copy-pr-bot Bot temporarily deployed to nemo-ci February 23, 2026 22:05 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci February 23, 2026 22:19 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci February 23, 2026 22:38 Inactive

adil-a approved these changes Feb 24, 2026

View reviewed changes

hemildesai added the r0.3.0 Add for cherry-pick into release branch r0.3.0 label Feb 24, 2026

hemildesai merged commit 310dfb9 into main Feb 24, 2026
51 checks passed

hemildesai deleted the claude/eloquent-pare branch February 24, 2026 01:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: extract initialize_model_weights from load_base_model#1356

refactor: extract initialize_model_weights from load_base_model#1356
hemildesai merged 1 commit intomainfrom
claude/eloquent-pare

hemildesai commented Feb 23, 2026

Uh oh!

copy-pr-bot Bot commented Feb 23, 2026

Uh oh!

hemildesai commented Feb 23, 2026

Uh oh!

adil-a Feb 23, 2026

Uh oh!

hemildesai Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hemildesai commented Feb 23, 2026

Summary

Test plan

Uh oh!

copy-pr-bot Bot commented Feb 23, 2026

Uh oh!

hemildesai commented Feb 23, 2026

Uh oh!

adil-a Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

hemildesai Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants