ci: Update to transformers v5.5 by athitten · Pull Request #1734 · NVIDIA-NeMo/Automodel

athitten · 2026-04-08T17:23:55Z

What does this PR do ?

Upgrade to transformers v5.5 and updates mistral_common. Both of them are required for gemma4.

Changelog

tests/unit_tests/models/deepseek_v3/test_dsv3_layers.py — Updated mock configs to set rope_parameters = rope_scaling to match v5.5's DeepseekV3Config behavior.
nemo_automodel/components/models/gemma4_moe/model.py — Added bidirectional compat between expert_intermediate_size (pre-v5.5) and moe_intermediate_size (v5.5 rename) in Gemma4ForConditionalGeneration.init and Gemma4MoETextModelBackend.init.
tests/unit_tests/models/gemma4/test_gemma4_model.py — Updated test configs to use moe_intermediate_size instead of expert_intermediate_size.
tests/unit_tests/models/gemma4/test_gemma4_state_dict_adapter.py — Updated mock state dict to use v5.5 key format (experts.gate_up_proj, router.per_expert_scale).
nemo_automodel/components/models/nemotron_parse/model.py — Handle MBartDecoderLayer.forward returning a single tensor in v5.5 (was tuple in v5.3).
nemo_automodel/components/models/nemotron_v3/model.py — Replace pre-created DynamicCache from GenerationMixin with NemotronHybridCache in prepare_inputs_for_generation.
_reinit_non_persistent_buffers in nemo_automodel/components/checkpoint/checkpointing.py since in v5.5 Gemma3RotaryEmbedding's per-layer-type inv_freq buffers, embed_scale, and SigLIP position_ids buffers are non-persistent (not saved in checkpoint) and so they have to be initialized explicitly. Creates a generalized method instead of the previous once which was specific to rope. Also adds appropriate unit test.
Adds new message pattern added in transformers v5.4 "attempted to run this operator with Meta tensors" to _build_model method in nemo_automodel/_transformers/auto_model.py to retry building without meta device. Transformers v5.5 added a torch.equal() call inside tie_weights() (line 2539 of modeling_utils.py) to compare tied parameter values. When _build_model wraps from_pretrained inside init_empty_weights() (meta device context), the model parameters are meta tensors, and torch.equal doesn't support them — producing a NotImplementedError. Retry logic is already there in _build_model method, the PR adds this error message so that retry is done even for this case. (this is done in NotImplementedError: aten::equal on meta tensors during multi-GPU model init with transformers >= 5.4.0 #1765 )
Remove cache_position arg from Qwen3_5MoeGatedDeltaNet.forward() method in nemo_automodel/components/models/qwen3_5_moe/cp_linear_attn.py as its no longer supported in transformers v5.5
Fix Gemma3 VLM checkpoint loading with transformers 5.5: the generic llava key mapping (aliased by transformers for gemma3) produced wrong FQNs (Fully Qualified Name) for Gemma3's model.language_model.* hierarchy, causing all base weights to be silently dropped by set_model_state_dict(strict=False). Replaced the fallback mechanism in get_combined_key_mapping with an explicit priority override in _VLM_KEY_MAPPINGS that uses the correct ^-anchored regex patterns for Gemma3's module structure.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

copy-pr-bot · 2026-04-08T17:23:58Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

athitten · 2026-04-08T17:24:17Z

/ok to test e4c7b30

akoumpa · 2026-04-08T17:25:17Z

/ok to test 30ca0e9

athitten · 2026-04-09T22:52:01Z

/ok to test 933ff7c

athitten · 2026-04-10T00:09:25Z

/ok to test 17c9a03

akoumpa · 2026-04-14T18:40:36Z



-def _reinit_rope_buffers(model: nn.Module, device: torch.device) -> None:
+def _reinit_non_persistent_buffers(model: nn.Module, device: torch.device) -> None:


can we make this model specific?

ie run the function only on select models where it applies. I think it's well intended, catch as many cases as possible, but I also want to be very risk averse here, given that there's a wild west on hf hub :)

Addressed in 07f775a

akoumpa · 2026-04-14T18:43:32Z

+    # that override the generic transformers conversion (e.g. transformers 5.5.0
+    # aliases gemma3→llava, but the llava mapping produces wrong FQNs for
+    # Gemma3's model.language_model.* hierarchy).
+    if model_type in _VLM_KEY_MAPPINGS:


Note-to-self: I think this is ok for the time being, but long term, it might be a good idea to move all the model-specific patching under nemo_automodel/components/models/, so that each model has its own patches. No action is needed.

athitten · 2026-04-14T21:09:43Z

/ok to test 07f775a

athitten · 2026-04-14T21:27:52Z

/ok to test 3ea3ff2

akoumpa

Thanks a lot @athitten !

Signed-off-by: Abhishree Thittenamane <athittenaman@cw-dfw-cs-001-login-02.cm.cluster>

Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

Signed-off-by: Abhishree <abhishreetm@gmail.com>

Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

Signed-off-by: Abhishree <abhishreetm@gmail.com>

…ion_ids non-persistent buffer Signed-off-by: Abhishree <abhishreetm@gmail.com>

Signed-off-by: Abhishree Thittenamane <athittenaman@cw-dfw-cs-001-login-02.cm.cluster>

Signed-off-by: Abhishree <abhishreetm@gmail.com>

Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

athitten · 2026-04-14T22:56:52Z

/ok to test e1bf007

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

akoumpa · 2026-04-15T00:32:48Z

/ok to test 2cd5ee6

athitten requested a review from a team as a code owner April 8, 2026 17:23

athitten changed the title ~~Update to transformers v5.5~~ ci: Update to transformers v5.5 Apr 8, 2026

copy-pr-bot bot temporarily deployed to test April 8, 2026 17:26 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci April 8, 2026 17:26 Failure

copy-pr-bot bot temporarily deployed to nemo-ci April 8, 2026 17:26 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci April 8, 2026 17:26 Failure

copy-pr-bot bot temporarily deployed to nemo-ci April 8, 2026 17:26 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 8, 2026 17:34 Inactive

athitten added the r0.4.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge. label Apr 8, 2026

copy-pr-bot bot had a problem deploying to nemo-ci April 8, 2026 18:09 Failure

athitten requested review from HuiyingLi, ZhiyuLi-Nvidia, adil-a, akoumpa, hemildesai and pthombre as code owners April 9, 2026 22:51

copy-pr-bot bot had a problem deploying to nemo-ci April 9, 2026 22:52 Failure

copy-pr-bot bot temporarily deployed to nemo-ci April 9, 2026 22:52 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci April 9, 2026 22:52 Failure

copy-pr-bot bot temporarily deployed to nemo-ci April 9, 2026 22:52 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci April 9, 2026 22:52 Failure

copy-pr-bot bot temporarily deployed to nemo-ci April 9, 2026 22:52 Inactive

thomasdhc previously approved these changes Apr 14, 2026

View reviewed changes

akoumpa reviewed Apr 14, 2026

View reviewed changes

akoumpa previously approved these changes Apr 14, 2026

View reviewed changes

Abhishree Thittenamane and others added 19 commits April 14, 2026 15:42

Update to transformers v5.5

6f9044f

Signed-off-by: Abhishree Thittenamane <athittenaman@cw-dfw-cs-001-login-02.cm.cluster>

Update uv lock

3287ae3

Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

Fix for gemma4, nemotron, dsv3 and unit tests

3afed33

Signed-off-by: Abhishree <abhishreetm@gmail.com>

_reinit_rope_buffers for gemma3 and enable gemma4 tests

5fabb2d

Signed-off-by: Abhishree <abhishreetm@gmail.com>

Update uv lock

523b8ed

Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

Add _reinit_non_persistent_buffers to cover all non_persistent_buffers

97674c1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

materialize params on meta device only and handle gemma3 SigLIP posit…

14ce6ee

…ion_ids non-persistent buffer Signed-off-by: Abhishree <abhishreetm@gmail.com>

Remove cache_position arg from Qwen3_5MoeGatedDeltaNet.forward()

db1479c

Signed-off-by: Abhishree Thittenamane <athittenaman@cw-dfw-cs-001-login-02.cm.cluster>

Run L2_HF_PEFT only with debug info

503840d

Signed-off-by: Abhishree <abhishreetm@gmail.com>

Add gemma3 fallback key mappings to be compatible with transformers5.5

d76c944

Signed-off-by: Abhishree <abhishreetm@gmail.com>

Remove need_post_shard_init from should_load_checkpoint

b69b337

Signed-off-by: Abhishree <abhishreetm@gmail.com>

Revert need_post_shard_init

9c31a2e

Signed-off-by: Abhishree <abhishreetm@gmail.com>

Fix Gemma3 VLM checkpoint loading with transformers 5.5

47c39ee

Signed-off-by: Abhishree <abhishreetm@gmail.com>

Uncomment all tests and add skipped code test_peft_vlm.py

9be524c

Signed-off-by: Abhishree <abhishreetm@gmail.com>

Fix order of tests

56260fd

Signed-off-by: Abhishree <abhishreetm@gmail.com>

Minor fix cicd-main.yml

d5e5250

Signed-off-by: Abhishree <abhishreetm@gmail.com>

Make _reinit_non_persistent_buffers model specific

6cab330

Signed-off-by: Abhishree <abhishreetm@gmail.com>

Fix lint error

f3b7189

Signed-off-by: Abhishree <abhishreetm@gmail.com>

Update uv lock

e1bf007

Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

pass model_type to _reinit_non_persistent_buffers

2cd5ee6

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

akoumpa approved these changes Apr 15, 2026

View reviewed changes

HuiyingLi mentioned this pull request Apr 18, 2026

fix: restore Qwen3.5 + Phi-4-MM nightly CI after transformers v5.5 update #1906

Merged

5 tasks



		def _reinit_rope_buffers(model: nn.Module, device: torch.device) -> None:
		def _reinit_non_persistent_buffers(model: nn.Module, device: torch.device) -> None:

Conversation

athitten commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Apr 8, 2026

Uh oh!

athitten commented Apr 8, 2026

Uh oh!

akoumpa commented Apr 8, 2026

Uh oh!

athitten commented Apr 9, 2026

Uh oh!

athitten commented Apr 10, 2026

Uh oh!

akoumpa Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

athitten Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

akoumpa Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

athitten commented Apr 14, 2026

Uh oh!

athitten commented Apr 14, 2026

Uh oh!

akoumpa left a comment

Choose a reason for hiding this comment

Uh oh!

athitten commented Apr 14, 2026

Uh oh!

akoumpa commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

athitten commented Apr 8, 2026 •

edited

Loading