Rename MambaModel/MambaStack to HybridModel/HybridStack#4099
Rename MambaModel/MambaStack to HybridModel/HybridStack#4099Phlip79 merged 20 commits intoNVIDIA:mainfrom
Conversation
|
This PR has been automatically converted to draft because all PRs must start as drafts. When you are ready for review, click Ready for Review to begin the review process. This will:
See the contribution guide for more details. |
|
/ok to test 360b582 |
|
/ok to test c7dec8a |
|
/claude review |
|
/claude review |
There was a problem hiding this comment.
Clean rename with good backward-compatible re-exports at the old import paths. One issue: the MambaModel = HybridModel alias doesn't cover the renamed mamba_stack_spec → hybrid_stack_spec keyword parameter in __init__, so existing callers using MambaModel(mamba_stack_spec=...) will break with a TypeError. See inline comment for a suggested fix.
|
/ok to test f115ba7 |
ko3n1g
left a comment
There was a problem hiding this comment.
has this been cross tested with MBridge?
I just re-kicked off MBridge testing (previously failing due to linting error on MBridge side). All changes in this PR are backwards compatible. |
ko3n1g
left a comment
There was a problem hiding this comment.
ok I see you're got an eye on testing & backwards compa. i'll lift my block
…brid # Conflicts: # megatron/core/inference/contexts/dynamic_context.py # megatron/core/models/mamba/mamba_layer_specs.py # megatron/core/models/mamba/mamba_model.py # megatron/core/ssm/mamba_block.py # megatron/core/ssm/mamba_hybrid_layer_allocation.py
Test classes named after MambaModel/MambaStack/MambaStackSubmodules are renamed to match the new Hybrid class names: - TestMambaModel -> TestHybridModel - TestMambaQKLayernorm -> TestHybridQKLayernorm - TestMambaWithDynamicInference -> TestHybridWithDynamicInference - TestMambaMoEModel -> TestHybridMoEModel - TestMambaBlock -> TestHybridBlock - TestModelOptMambaModel -> TestModelOptHybridModel - TestMultiTokenPredictionMamba -> TestMultiTokenPredictionHybrid - TestParallelMambaBlockCudagraphs -> TestParallelHybridBlockCudagraphs Also updates the TestHybridQKLayernorm class (added by upstream merge) to use HybridModel/hybrid_stack_spec consistently. Test classes for Mamba-specific SSM components (MambaLayer, MambaMixer, MambaContextParallel, MambaMetadata, MambaSlotAllocator, etc.) are unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ok to test b88d93b |
This stale file was missed in the original directory rename. The canonical location is now modelopt/hybrid/model_specs.py. This stub re-exports from the new canonical location to preserve backward compatibility with any external imports from the old path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ok to test 3375b10 |
MultiTokenPredictionLayer and MultiTokenPredictionBlock now accept mamba_submodules as a deprecated alias for hybrid_submodules, emitting DeprecationWarning and forwarding the value (raises if both are set). Also drops the redundant explicit HybridModel import in mamba_model.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ok to test 556eaa5 |
The test_dsa_layer_types test was added to test_hybrid_block.py by an upstream merge and used bare references to mamba_stack_spec and MambaStack (not via the backward-compat import). The file imports hybrid_stack_spec and HybridStack, so those bare references NameError'd. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ok to test 5202abc |
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/24598437932 |
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/24637957751 |
* origin/main: (286 commits) Rename MambaModel/MambaStack to HybridModel/HybridStack (NVIDIA#4099) Fix Megatron initialization with extra_args_provider (NVIDIA#4327) Fix RL to once again work with --skip-train (NVIDIA#4249) Add activation logging and tokens per expert logging (NVIDIA#3842) Make param_index_map always use unpacked (full numel) offsets (NVIDIA#4328) FA4 Inference (NVIDIA#4186) Fix RL reward due to stop token (NVIDIA#4096) cp: Fix UT timeout (NVIDIA#4310) (NVIDIA#4373) feat(ckpt): add --async-ckpt-use-cpu-shm argument (NVIDIA#4355) Update copy-pr-bot.yaml [skip ci] Docs: improve docstrings and comments in example training loop (NVIDIA#4041) Add QK layernorm support for dot-product attention in MambaModel (NVIDIA#4067) Fix bug with non-partial rollouts (NVIDIA#3964) [docs] ci: use parent-relative json_url for version picker (NVIDIA#4367) Add tables and histogram for RL staleness (NVIDIA#4097) Port DeepSeek Sparse Attention to `MambaModel` (NVIDIA#3553) docs: bump versions1.json to 0.17.0 (latest) (NVIDIA#4360) Fix potential coredump issue that occurs when saving a checkpoint (NVIDIA#1871) ci(gb200): add 1-node mr-github functional test variants (NVIDIA#4334) fix: wait for async P2P send before deallocating output tensor (NVIDIA#4047) ... # Conflicts: # megatron/core/transformer/cuda_graphs.py
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Rename the generic model classes in
megatron/core/that support multiple layer types (Mamba SSM, Attention, MoE, GDN, MLP) viahybrid_layer_pattern:MambaModel→HybridModel,MambaStack→HybridStack,MambaStackSubmodules→HybridStackSubmodulesmamba_stack_spec→hybrid_stack_spec,mamba_inference_stack_spec→hybrid_inference_stack_specget_mamba_stack_modelopt_spec→get_hybrid_stack_modelopt_specmegatron/core/models/hybrid/(hybrid_model.py,hybrid_block.py,hybrid_layer_specs.py,hybrid_layer_allocation.py)megatron.core.models.mamba,megatron.core.ssm.mamba_block, etc.)MambaModelis a thin subclass ofHybridModelthat accepts the deprecatedmamba_stack_speckwargMambaLayer,MambaMixer,MambaContextParallel, etc.) unchangedmegatron/core/models/hybrid/__init__.pyis intentionally empty to avoid circular import withmegatron.coreThis PR only touches
megatron/core/. Non-core renames (scripts, tools, tests, examples) are in #4159.Testing
Functional tests