Conversation
Contributor
Author
|
/ok to test e4c7b30 |
Contributor
|
/ok to test 30ca0e9 |
Contributor
Author
|
/ok to test 933ff7c |
Contributor
Author
|
/ok to test 17c9a03 |
thomasdhc
previously approved these changes
Apr 14, 2026
akoumpa
reviewed
Apr 14, 2026
|
|
||
|
|
||
| def _reinit_rope_buffers(model: nn.Module, device: torch.device) -> None: | ||
| def _reinit_non_persistent_buffers(model: nn.Module, device: torch.device) -> None: |
Contributor
There was a problem hiding this comment.
can we make this model specific?
ie run the function only on select models where it applies. I think it's well intended, catch as many cases as possible, but I also want to be very risk averse here, given that there's a wild west on hf hub :)
akoumpa
reviewed
Apr 14, 2026
| # that override the generic transformers conversion (e.g. transformers 5.5.0 | ||
| # aliases gemma3→llava, but the llava mapping produces wrong FQNs for | ||
| # Gemma3's model.language_model.* hierarchy). | ||
| if model_type in _VLM_KEY_MAPPINGS: |
Contributor
There was a problem hiding this comment.
Note-to-self: I think this is ok for the time being, but long term, it might be a good idea to move all the model-specific patching under nemo_automodel/components/models/, so that each model has its own patches. No action is needed.
Contributor
Author
|
/ok to test 07f775a |
Contributor
Author
|
/ok to test 3ea3ff2 |
Signed-off-by: Abhishree Thittenamane <athittenaman@cw-dfw-cs-001-login-02.cm.cluster>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
…ion_ids non-persistent buffer Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree Thittenamane <athittenaman@cw-dfw-cs-001-login-02.cm.cluster>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Contributor
Author
|
/ok to test e1bf007 |
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Contributor
|
/ok to test 2cd5ee6 |
akoumpa
approved these changes
Apr 15, 2026
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Upgrade to transformers v5.5 and updates mistral_common. Both of them are required for gemma4.
Changelog
tests/unit_tests/models/deepseek_v3/test_dsv3_layers.py— Updated mock configs to set rope_parameters = rope_scaling to match v5.5's DeepseekV3Config behavior.nemo_automodel/components/models/gemma4_moe/model.py— Added bidirectional compat between expert_intermediate_size (pre-v5.5) and moe_intermediate_size (v5.5 rename) in Gemma4ForConditionalGeneration.init and Gemma4MoETextModelBackend.init.tests/unit_tests/models/gemma4/test_gemma4_model.py— Updated test configs to use moe_intermediate_size instead of expert_intermediate_size.tests/unit_tests/models/gemma4/test_gemma4_state_dict_adapter.py— Updated mock state dict to use v5.5 key format (experts.gate_up_proj, router.per_expert_scale).nemo_automodel/components/models/nemotron_parse/model.py— Handle MBartDecoderLayer.forward returning a single tensor in v5.5 (was tuple in v5.3).nemo_automodel/components/models/nemotron_v3/model.py— Replace pre-created DynamicCache from GenerationMixin with NemotronHybridCache in prepare_inputs_for_generation._reinit_non_persistent_buffersinnemo_automodel/components/checkpoint/checkpointing.pysince in v5.5 Gemma3RotaryEmbedding's per-layer-type inv_freq buffers, embed_scale, and SigLIP position_ids buffers are non-persistent (not saved in checkpoint) and so they have to be initialized explicitly. Creates a generalized method instead of the previous once which was specific to rope. Also adds appropriate unit test._build_modelmethod innemo_automodel/_transformers/auto_model.pyto retry building without meta device. Transformers v5.5 added atorch.equal()call insidetie_weights()(line 2539 ofmodeling_utils.py) to compare tied parameter values. When _build_model wraps from_pretrained inside init_empty_weights() (meta device context), the model parameters are meta tensors, and torch.equal doesn't support them — producing a NotImplementedError. Retry logic is already there in _build_model method, the PR adds this error message so that retry is done even for this case. (this is done inNotImplementedError: aten::equalon meta tensors during multi-GPU model init with transformers >= 5.4.0 #1765 )cache_positionarg fromQwen3_5MoeGatedDeltaNet.forward()method innemo_automodel/components/models/qwen3_5_moe/cp_linear_attn.pyas its no longer supported in transformers v5.5model.language_model.*hierarchy, causing all base weights to be silently dropped byset_model_state_dict(strict=False). Replaced the fallback mechanism inget_combined_key_mappingwith an explicit priority override in_VLM_KEY_MAPPINGSthat uses the correct ^-anchored regex patterns for Gemma3's module structure.Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information