Rename MambaModel/MambaStack to HybridModel/HybridStack by Phlip79 · Pull Request #4099 · NVIDIA/Megatron-LM

Phlip79 · 2026-04-01T21:30:39Z

Summary

Rename the generic model classes in megatron/core/ that support multiple layer types (Mamba SSM, Attention, MoE, GDN, MLP) via hybrid_layer_pattern:

MambaModel → HybridModel, MambaStack → HybridStack, MambaStackSubmodules → HybridStackSubmodules
mamba_stack_spec → hybrid_stack_spec, mamba_inference_stack_spec → hybrid_inference_stack_spec
get_mamba_stack_modelopt_spec → get_hybrid_stack_modelopt_spec
Move canonical files to megatron/core/models/hybrid/ (hybrid_model.py, hybrid_block.py, hybrid_layer_specs.py, hybrid_layer_allocation.py)
Backward-compatible re-export stubs at old import paths (megatron.core.models.mamba, megatron.core.ssm.mamba_block, etc.)
MambaModel is a thin subclass of HybridModel that accepts the deprecated mamba_stack_spec kwarg
Mamba-specific SSM classes (MambaLayer, MambaMixer, MambaContextParallel, etc.) unchanged
megatron/core/models/hybrid/__init__.py is intentionally empty to avoid circular import with megatron.core

This PR only touches megatron/core/. Non-core renames (scripts, tools, tests, examples) are in #4159.

Testing

Functional tests

copy-pr-bot · 2026-04-01T21:30:43Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-04-01T21:30:48Z

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

Add the oncall reviewer (optional reviewer)
Add required review teams based on your changes

See the contribution guide for more details.

Phlip79 · 2026-04-01T21:32:50Z

/ok to test 360b582

Phlip79 · 2026-04-01T22:58:14Z

/ok to test c7dec8a

Phlip79 · 2026-04-02T00:22:03Z

/claude review

claude

Clean mechanical rename with proper backward-compatible aliases and re-exports. Left a few minor nits on stale references to 'MambaBlock' in comments and a '2026-2026' copyright typo, but nothing blocking.

Phlip79 · 2026-04-02T00:29:03Z

/claude review

claude

Clean rename with good backward-compatible re-exports at the old import paths. One issue: the MambaModel = HybridModel alias doesn't cover the renamed mamba_stack_spec → hybrid_stack_spec keyword parameter in __init__, so existing callers using MambaModel(mamba_stack_spec=...) will break with a TypeError. See inline comment for a suggested fix.

Phlip79 · 2026-04-02T15:55:58Z

/ok to test f115ba7

ko3n1g

has this been cross tested with MBridge?

Phlip79 · 2026-04-16T17:22:27Z

has this been cross tested with MBridge?

I just re-kicked off MBridge testing (previously failing due to linting error on MBridge side). All changes in this PR are backwards compatible.

ko3n1g

ok I see you're got an eye on testing & backwards compa. i'll lift my block

…brid # Conflicts: # megatron/core/inference/contexts/dynamic_context.py # megatron/core/models/mamba/mamba_layer_specs.py # megatron/core/models/mamba/mamba_model.py # megatron/core/ssm/mamba_block.py # megatron/core/ssm/mamba_hybrid_layer_allocation.py

Test classes named after MambaModel/MambaStack/MambaStackSubmodules are renamed to match the new Hybrid class names: - TestMambaModel -> TestHybridModel - TestMambaQKLayernorm -> TestHybridQKLayernorm - TestMambaWithDynamicInference -> TestHybridWithDynamicInference - TestMambaMoEModel -> TestHybridMoEModel - TestMambaBlock -> TestHybridBlock - TestModelOptMambaModel -> TestModelOptHybridModel - TestMultiTokenPredictionMamba -> TestMultiTokenPredictionHybrid - TestParallelMambaBlockCudagraphs -> TestParallelHybridBlockCudagraphs Also updates the TestHybridQKLayernorm class (added by upstream merge) to use HybridModel/hybrid_stack_spec consistently. Test classes for Mamba-specific SSM components (MambaLayer, MambaMixer, MambaContextParallel, MambaMetadata, MambaSlotAllocator, etc.) are unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phlip79 · 2026-04-18T00:51:27Z

/ok to test b88d93b

This stale file was missed in the original directory rename. The canonical location is now modelopt/hybrid/model_specs.py. This stub re-exports from the new canonical location to preserve backward compatibility with any external imports from the old path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phlip79 · 2026-04-18T01:04:39Z

/ok to test 3375b10

MultiTokenPredictionLayer and MultiTokenPredictionBlock now accept mamba_submodules as a deprecated alias for hybrid_submodules, emitting DeprecationWarning and forwarding the value (raises if both are set). Also drops the redundant explicit HybridModel import in mamba_model.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phlip79 · 2026-04-18T01:16:59Z

/ok to test 556eaa5

The test_dsa_layer_types test was added to test_hybrid_block.py by an upstream merge and used bare references to mamba_stack_spec and MambaStack (not via the backward-compat import). The file imports hybrid_stack_spec and HybridStack, so those bare references NameError'd. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phlip79 · 2026-04-18T05:04:31Z

/ok to test 5202abc

svcnvidia-nemo-ci · 2026-04-18T06:02:53Z

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/24598437932

svcnvidia-nemo-ci · 2026-04-19T20:05:19Z

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/24637957751

* origin/main: (286 commits) Rename MambaModel/MambaStack to HybridModel/HybridStack (NVIDIA#4099) Fix Megatron initialization with extra_args_provider (NVIDIA#4327) Fix RL to once again work with --skip-train (NVIDIA#4249) Add activation logging and tokens per expert logging (NVIDIA#3842) Make param_index_map always use unpacked (full numel) offsets (NVIDIA#4328) FA4 Inference (NVIDIA#4186) Fix RL reward due to stop token (NVIDIA#4096) cp: Fix UT timeout (NVIDIA#4310) (NVIDIA#4373) feat(ckpt): add --async-ckpt-use-cpu-shm argument (NVIDIA#4355) Update copy-pr-bot.yaml [skip ci] Docs: improve docstrings and comments in example training loop (NVIDIA#4041) Add QK layernorm support for dot-product attention in MambaModel (NVIDIA#4067) Fix bug with non-partial rollouts (NVIDIA#3964) [docs] ci: use parent-relative json_url for version picker (NVIDIA#4367) Add tables and histogram for RL staleness (NVIDIA#4097) Port DeepSeek Sparse Attention to `MambaModel` (NVIDIA#3553) docs: bump versions1.json to 0.17.0 (latest) (NVIDIA#4360) Fix potential coredump issue that occurs when saving a checkpoint (NVIDIA#1871) ci(gb200): add 1-node mr-github functional test variants (NVIDIA#4334) fix: wait for async P2P send before deallocating output tensor (NVIDIA#4047) ... # Conflicts: # megatron/core/transformer/cuda_graphs.py

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phlip79 requested review from a team as code owners April 1, 2026 21:30

svcnvidia-nemo-ci marked this pull request as draft April 1, 2026 21:30

svcnvidia-nemo-ci added this to the Core 0.16 milestone Apr 1, 2026

copy-pr-bot Bot temporarily deployed to test April 1, 2026 22:59 Inactive