Describe the bug
Issue discovered while working on NVIDIA-NeMo/RL#2212
When loading an HF model with tie_word_embeddings=True (e.g., Qwen/Qwen3-0.6B) on multi-GPU, model initialization crashes with:
NotImplementedError: aten::equal: attempted to run this operator with Meta tensors,
but there was no fake impl or Meta kernel registered.
The crash occurs because _build_model wraps the entire _init_model call — including HF's from_pretrained — inside an init_empty_weights() context (meta device). This means that by the time HF's _finalize_model_loading calls tie_weights(missing_keys=...), the model parameters are still meta tensors. Transformers v5.4.0 added a torch.equal() call inside tie_weights to compare tied parameter values (HF PR #44497), and torch.equal does not support meta tensors.
Call chain
_build_model (auto_model.py:359)
with [no_init_weights(), init_empty_weights()]: ← meta device context wraps everything
_init_model (model_init.py:396)
_from_pretrained_parent_class (auto_model.py:205)
HF AutoModelForCausalLM.from_pretrained
model.__init__() ← meta tensors created here
_load_pretrained_model() ← weights loaded, but STILL META (inside init_empty_weights)
_finalize_model_loading (modeling_utils.py:4290)
tie_weights(missing_keys=...)
torch.equal(source_param, target_param) ← CRASH: meta tensors don't support this
Steps/Code to reproduce bug
Run the existing qwen3_0p6b_hellaswag.yaml SFT recipe on multiple GPUs.
automodel examples/llm_finetune/qwen/qwen3_0p6b_hellaswag.yaml --nproc-per-node 2
Impact seems to be: Any model with tie_word_embeddings=True in its config.json will trigger this when loaded via the HF fallback path (i.e., not a custom-registered model) on multi-GPU.
Additional context
Error log from my reproduce with automodel sft on dfw /lustre/fsw/portfolios/coreai/users/shuangy/src/NeMo-RL/nemo-rl/meta_tensor_issue_reproduce.log
Describe the bug
Issue discovered while working on NVIDIA-NeMo/RL#2212
When loading an HF model with
tie_word_embeddings=True(e.g.,Qwen/Qwen3-0.6B) on multi-GPU, model initialization crashes with:The crash occurs because
_build_modelwraps the entire_init_modelcall — including HF'sfrom_pretrained— inside aninit_empty_weights()context (meta device). This means that by the time HF's_finalize_model_loadingcallstie_weights(missing_keys=...), the model parameters are still meta tensors. Transformers v5.4.0 added atorch.equal()call insidetie_weightsto compare tied parameter values (HF PR #44497), andtorch.equaldoes not support meta tensors.Call chain
Steps/Code to reproduce bug
Run the existing
qwen3_0p6b_hellaswag.yamlSFT recipe on multiple GPUs.Impact seems to be: Any model with
tie_word_embeddings=Truein itsconfig.jsonwill trigger this when loaded via the HF fallback path (i.e., not a custom-registered model) on multi-GPU.Additional context
Error log from my reproduce with automodel sft on dfw
/lustre/fsw/portfolios/coreai/users/shuangy/src/NeMo-RL/nemo-rl/meta_tensor_issue_reproduce.log