Skip to content

ci: Update to transformers v5.5#1734

Merged
akoumpa merged 20 commits intomainfrom
transformers_bump_5.5.0
Apr 15, 2026
Merged

ci: Update to transformers v5.5#1734
akoumpa merged 20 commits intomainfrom
transformers_bump_5.5.0

Conversation

@athitten
Copy link
Copy Markdown
Contributor

@athitten athitten commented Apr 8, 2026

What does this PR do ?

Upgrade to transformers v5.5 and updates mistral_common. Both of them are required for gemma4.

Changelog

  • tests/unit_tests/models/deepseek_v3/test_dsv3_layers.py — Updated mock configs to set rope_parameters = rope_scaling to match v5.5's DeepseekV3Config behavior.
  • nemo_automodel/components/models/gemma4_moe/model.py — Added bidirectional compat between expert_intermediate_size (pre-v5.5) and moe_intermediate_size (v5.5 rename) in Gemma4ForConditionalGeneration.init and Gemma4MoETextModelBackend.init.
  • tests/unit_tests/models/gemma4/test_gemma4_model.py — Updated test configs to use moe_intermediate_size instead of expert_intermediate_size.
  • tests/unit_tests/models/gemma4/test_gemma4_state_dict_adapter.py — Updated mock state dict to use v5.5 key format (experts.gate_up_proj, router.per_expert_scale).
  • nemo_automodel/components/models/nemotron_parse/model.py — Handle MBartDecoderLayer.forward returning a single tensor in v5.5 (was tuple in v5.3).
  • nemo_automodel/components/models/nemotron_v3/model.py — Replace pre-created DynamicCache from GenerationMixin with NemotronHybridCache in prepare_inputs_for_generation.
  • _reinit_non_persistent_buffers in nemo_automodel/components/checkpoint/checkpointing.py since in v5.5 Gemma3RotaryEmbedding's per-layer-type inv_freq buffers, embed_scale, and SigLIP position_ids buffers are non-persistent (not saved in checkpoint) and so they have to be initialized explicitly. Creates a generalized method instead of the previous once which was specific to rope. Also adds appropriate unit test.
  • Adds new message pattern added in transformers v5.4 "attempted to run this operator with Meta tensors" to _build_model method in nemo_automodel/_transformers/auto_model.py to retry building without meta device. Transformers v5.5 added a torch.equal() call inside tie_weights() (line 2539 of modeling_utils.py) to compare tied parameter values. When _build_model wraps from_pretrained inside init_empty_weights() (meta device context), the model parameters are meta tensors, and torch.equal doesn't support them — producing a NotImplementedError. Retry logic is already there in _build_model method, the PR adds this error message so that retry is done even for this case. (this is done in NotImplementedError: aten::equal on meta tensors during multi-GPU model init with transformers >= 5.4.0 #1765 )
  • Remove cache_position arg from Qwen3_5MoeGatedDeltaNet.forward() method in nemo_automodel/components/models/qwen3_5_moe/cp_linear_attn.py as its no longer supported in transformers v5.5
  • Fix Gemma3 VLM checkpoint loading with transformers 5.5: the generic llava key mapping (aliased by transformers for gemma3) produced wrong FQNs (Fully Qualified Name) for Gemma3's model.language_model.* hierarchy, causing all base weights to be silently dropped by set_model_state_dict(strict=False). Replaced the fallback mechanism in get_combined_key_mapping with an explicit priority override in _VLM_KEY_MAPPINGS that uses the correct ^-anchored regex patterns for Gemma3's module structure.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

@athitten athitten requested a review from a team as a code owner April 8, 2026 17:23
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 8, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@athitten
Copy link
Copy Markdown
Contributor Author

athitten commented Apr 8, 2026

/ok to test e4c7b30

@athitten athitten changed the title Update to transformers v5.5 ci: Update to transformers v5.5 Apr 8, 2026
@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 8, 2026

/ok to test 30ca0e9

@athitten
Copy link
Copy Markdown
Contributor Author

athitten commented Apr 9, 2026

/ok to test 933ff7c

@athitten
Copy link
Copy Markdown
Contributor Author

/ok to test 17c9a03

thomasdhc
thomasdhc previously approved these changes Apr 14, 2026


def _reinit_rope_buffers(model: nn.Module, device: torch.device) -> None:
def _reinit_non_persistent_buffers(model: nn.Module, device: torch.device) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this model specific?

ie run the function only on select models where it applies. I think it's well intended, catch as many cases as possible, but I also want to be very risk averse here, given that there's a wild west on hf hub :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 07f775a

# that override the generic transformers conversion (e.g. transformers 5.5.0
# aliases gemma3→llava, but the llava mapping produces wrong FQNs for
# Gemma3's model.language_model.* hierarchy).
if model_type in _VLM_KEY_MAPPINGS:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note-to-self: I think this is ok for the time being, but long term, it might be a good idea to move all the model-specific patching under nemo_automodel/components/models/, so that each model has its own patches. No action is needed.

@athitten
Copy link
Copy Markdown
Contributor Author

/ok to test 07f775a

@athitten
Copy link
Copy Markdown
Contributor Author

/ok to test 3ea3ff2

akoumpa
akoumpa previously approved these changes Apr 14, 2026
Copy link
Copy Markdown
Contributor

@akoumpa akoumpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @athitten !

Abhishree Thittenamane and others added 19 commits April 14, 2026 15:42
Signed-off-by: Abhishree Thittenamane <athittenaman@cw-dfw-cs-001-login-02.cm.cluster>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
…ion_ids non-persistent buffer

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree Thittenamane <athittenaman@cw-dfw-cs-001-login-02.cm.cluster>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@athitten
Copy link
Copy Markdown
Contributor Author

/ok to test e1bf007

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 15, 2026

/ok to test 2cd5ee6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r0.4.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade to transformers v5.5 TypeError: Qwen3_5MoeGatedDeltaNet.forward() got an unexpected keyword argument 'cache_position'

3 participants