Rename Mamba to Hybrid outside megatron/core by Phlip79 · Pull Request #4159 · NVIDIA/Megatron-LM

Phlip79 · 2026-04-06T17:24:59Z

Summary

Follow-up to #4099 (now merged). This is the second and final rename PR.

Updates files outside megatron/core/ to use the new HybridModel / HybridStack / HybridStackSubmodules names introduced in #4099. Ran functional tests.

Renames

pretrain_mamba.py → pretrain_hybrid.py (backward-compat wrapper kept at old path with deprecation warning)
mamba_builders.py → hybrid_builders.py (backward-compat stub kept at old path with deprecation warning)
mamba_builder() → hybrid_builder()
modelopt_gpt_mamba_builder() → modelopt_gpt_hybrid_builder()
tools/run_mamba_text_generation_server*.py → tools/run_hybrid_text_generation_server*.py
Test files: test_mamba_model.py → test_hybrid_model.py, test_mamba_block.py → test_hybrid_block.py, test_mamba_moe_model.py → test_hybrid_moe_model.py, test_mamba_model_expert_parallel_inference.py → test_hybrid_*, test_mamba_prefix_caching_e2e.py → test_hybrid_*
Test classes: TestMambaModel → TestHybridModel, TestMambaQKLayernorm → TestHybridQKLayernorm, TestMambaWithDynamicInference → TestHybridWithDynamicInference, TestMambaMoEModel → TestHybridMoEModel, TestMambaBlock → TestHybridBlock, TestModelOptMambaModel → TestModelOptHybridModel, TestMultiTokenPredictionMamba → TestMultiTokenPredictionHybrid, TestParallelMambaBlockCudagraphs → TestParallelHybridBlockCudagraphs

CLI updates

--spec paths updated to megatron.core.models.hybrid.hybrid_layer_specs hybrid_stack_spec in shell scripts, YAML configs, and recipes
--export-model-type and --model-provider use HybridModel/hybrid with backward-compat acceptance of old MambaModel/mamba values (with deprecation warnings)

Backward compatibility

All old import paths, class names, function names, and CLI values continue to work via backward-compat stubs and aliases. External libraries depending on this repo are unaffected.

Not renamed (intentional)

examples/mamba/ directory (architecture-named, like examples/llama/, examples/gpt3/)
tests/test_utils/recipes/h100/mamba*.yaml (architecture-named)
tests/unit_tests/dist_checkpointing/models/test_mamba.py (tests MambaMixer specifically)
All Mamba SSM component names (MambaLayer, MambaMixer, MambaContextParallel, MambaInferenceStateConfig, MambaMetadata, MambaSlotAllocator, MambaTokenizer, --mamba-* CLI args, SSM algorithm config params)

Rename the generic model classes that support multiple layer types (Mamba SSM, Attention, MoE, GDN, MLP) via hybrid_layer_pattern: - MambaModel -> HybridModel - MambaStack -> HybridStack - MambaStackSubmodules -> HybridStackSubmodules - mamba_stack_spec -> hybrid_stack_spec - mamba_inference_stack_spec -> hybrid_inference_stack_spec - get_mamba_stack_modelopt_spec -> get_hybrid_stack_modelopt_spec Move canonical files to megatron/core/models/hybrid/: - hybrid_model.py, hybrid_block.py, hybrid_layer_specs.py, hybrid_layer_allocation.py Backward-compatible re-export stubs at old import paths and class aliases (MambaModel is a thin subclass accepting mamba_stack_spec kwarg). Mamba-specific SSM classes (MambaLayer, MambaMixer, etc.) unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

copy-pr-bot · 2026-04-06T17:25:43Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Remove redundant explicit named imports that duplicate the wildcard import and add pylint disable comments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Include the test files that exercise the renamed megatron/core classes (HybridModel, HybridStack, hybrid_layer_allocation, etc.) and update their imports to use the new canonical paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Updates all files outside megatron/core/ to use the new Hybrid names introduced in the core rename PR. Depends on the core changes which provide backward-compat stubs at old import paths. File renames: - mamba_builders.py -> hybrid_builders.py - pretrain_mamba.py -> pretrain_hybrid.py - tools/run_mamba_text_generation_server*.py -> run_hybrid_* - test_mamba_model.py -> test_hybrid_model.py (and similar test renames) Function renames: - mamba_builder() -> hybrid_builder() - modelopt_gpt_mamba_builder() -> modelopt_gpt_hybrid_builder() Also updates --spec paths, --export-model-type, --model-provider, deprecation warnings, and documentation. Model-named files stay as-is: examples/mamba/, recipes/h100/mamba*.yaml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Tests under tests/unit_tests/inference/ and tests/unit_tests/resharding/ test the inference engine and resharding infrastructure, not megatron/core classes. They belong in the companion non-core PR (NVIDIA#4159). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

.claude/skills/respond-to-issue/SKILL.md is not part of this rename. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

These test files were accidentally dropped during the core/non-core split. They test inference engines and resharding (not megatron/core classes), so they belong in this non-core PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 834c84d.

…lip/rename-to-hybrid

The backward-compat MambaModel wrapper belongs in the legacy mamba module, not in the canonical hybrid_model.py. This keeps hybrid_model.py clean with only HybridModel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…brid # Conflicts: # megatron/core/models/mamba/mamba_model.py

This parameter passes HybridStackSubmodules (formerly MambaStackSubmodules), so its name should match the renamed class. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…Megatron-LM into philip/rename-to-hybrid

# Conflicts: # tests/unit_tests/inference/engines/test_dynamic_engine.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ChenhanYu

Verified all post_training and modelopt references are updated: model_builder.py (import path, function rename, model type check with deprecation warning, backward alias), arguments.py (HybridModel added to choices), train.sh (matches both HybridModel and MambaModel), all 7 Nemotron-H config scripts, and all 9 example Python scripts. Backward compat maintained via aliases and deprecation warnings throughout. LGTM — depends on #4099.

…brid # Conflicts: # megatron/core/inference/contexts/dynamic_context.py # megatron/core/models/mamba/mamba_layer_specs.py # megatron/core/models/mamba/mamba_model.py # megatron/core/ssm/mamba_block.py # megatron/core/ssm/mamba_hybrid_layer_allocation.py

Test classes named after MambaModel/MambaStack/MambaStackSubmodules are renamed to match the new Hybrid class names: - TestMambaModel -> TestHybridModel - TestMambaQKLayernorm -> TestHybridQKLayernorm - TestMambaWithDynamicInference -> TestHybridWithDynamicInference - TestMambaMoEModel -> TestHybridMoEModel - TestMambaBlock -> TestHybridBlock - TestModelOptMambaModel -> TestModelOptHybridModel - TestMultiTokenPredictionMamba -> TestMultiTokenPredictionHybrid - TestParallelMambaBlockCudagraphs -> TestParallelHybridBlockCudagraphs Also updates the TestHybridQKLayernorm class (added by upstream merge) to use HybridModel/hybrid_stack_spec consistently. Test classes for Mamba-specific SSM components (MambaLayer, MambaMixer, MambaContextParallel, MambaMetadata, MambaSlotAllocator, etc.) are unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This stale file was missed in the original directory rename. The canonical location is now modelopt/hybrid/model_specs.py. This stub re-exports from the new canonical location to preserve backward compatibility with any external imports from the old path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The rename commit (d6a3854) and subsequent merges accidentally reverted unrelated upstream changes. Restore upstream versions of: - megatron/training/arguments.py: restore 'adaptive_muon' optimizer choice and --optimizer-cuda-graph argument; remove stray --no-scatter-gather-tensors-in-pipeline - megatron/training/training.py: restore OptimizerCudaGraphWrapper import, save_checkpoint_and_time() usage, and timer logic from upstream - megatron/rl/sequence_packing_utils.py: restore removal of unused packed_attention_mask parameter (upstream NVIDIA#3859) - .github/actions/action.yml: restore retry loop around uv install (upstream NVIDIA#4387) Re-applied only the intended rename-related import path changes to arguments.py (2 lines) and training.py (2 lines). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- tests/test_utils/python_scripts/notify.py: restore WEBHOOK_URL check and copyright header from upstream - tests/unit_tests/rl/test_sequence_packing_utils.py: restore removal of packed_attention_mask parameter (companion to the sequence_packing_utils.py revert) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…test config Upstream removed these flags from hybrid_mr_mcore_te_tp2_pp1_cp1_dgx_a100_1N8G/model_config.yaml in commit 2697b82 ("base strategy simplification NVIDIA#4001"), but they were accidentally re-introduced by a merge conflict resolution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Test files don't need to accept the old "mamba" CLI value since the backward-compat shim is only for external library consumers. Simplify the defensive ("hybrid", "mamba") checks to just "hybrid". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phlip79 · 2026-04-21T00:29:08Z

/ok to test 30606da

Phlip79 · 2026-04-21T00:33:38Z

Thank you @jaredcasper for catching those errors. All should be resolved now.

Tighten the nightly sync workflow prompt and skill so the bot cannot mark the PR ready while non-exempt checks are pending or failing. - Replace the loose Phase 4 gate with a strict "all terminal, all green" rule: every non-exempt required check in statusCheckRollup must be COMPLETED + {SUCCESS, SKIPPED, NEUTRAL}. Queued or in_progress is never acceptable. - Explicitly ban background tasks (Bash/Agent run_in_background, ScheduleWakeup, &, nohup, disown, setsid, tail -f on background output) and explain why: the GitHub Actions step process owns the shell and is destroyed on exit, so backgrounded work cannot resume. - Anchor CI polling to `gh pr view --json statusCheckRollup` — the actions/runs/.../jobs endpoint alone misses external status contexts (GitLab CI, copy-pr-bot, etc.), which was the failure mode on run 24800621116. - Distinguish outer loop (agent's sequence of tool calls) from inner loop (single blocking Bash call); provide a validated bash template that normalizes CheckRun and StatusContext entries, uses Nemo_CICD_Test as a sentinel to close the empty-rollup edge case, and classifies against the exempt regex. - Align exempt list with actual check names seen on PR NVIDIA#4159: approval (codeowners-approval, check-approval, multi-approval-bot-summary, is-not-external-contributor), coverage (Coverage (unit-test), Coverage_Fake), docs (build-docs / Build docs, build-docs-summary).

maanug-nv · 2026-04-23T01:42:23Z

Code review

Found 1 issue:

pretrain_mamba.py calls runpy.run_path at module level with no if __name__ == "__main__": guard (bug due to missing import guard)

The runpy.run_path(...) call on line 18 executes at import time. Every other pretrain script in this repo (e.g. pretrain_gpt.py, pretrain_hybrid.py) wraps its launch logic in if __name__ == "__main__":. Tools and examples in this codebase import from pretrain scripts to extract functions like model_provider and get_batch — if anyone does the same with pretrain_mamba, it will immediately attempt to launch distributed training as a side-effect of the import.

Megatron-LM/pretrain_mamba.py

Lines 15 to 18 in 30606da

    
           # Execute pretrain_hybrid.py as if it were invoked directly. 
        
           _this_dir = os.path.dirname(os.path.abspath(__file__)) 
        
           runpy.run_path(os.path.join(_this_dir, "pretrain_hybrid.py"), run_name="__main__")

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

maanug-nv

i think it makes sense to wrap runpy.run_path in pretrain_mamba.py with a if __name__ == "__main__" to be safe. otherwise lgtm.

Wrap the runpy.run_path call in `if __name__ == "__main__":` so that importing pretrain_mamba (e.g. to reuse model_provider / get_batch) does not launch distributed training as an import side-effect. Matches the pattern used in pretrain_gpt.py and pretrain_hybrid.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phlip79 · 2026-04-23T02:38:58Z

/ok to test 018fa59

Phlip79 mentioned this pull request Apr 6, 2026

Rename MambaModel/MambaStack to HybridModel/HybridStack #4099

Merged

Phlip79 and others added 3 commits April 6, 2026 17:29

Fix pylint unused-import warnings in backward-compat stubs

f8aaa60

Remove redundant explicit named imports that duplicate the wildcard import and add pylint disable comments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phlip79 force-pushed the philip/rename-non-core branch from 3c8b1c9 to d6a3854 Compare April 6, 2026 18:34

Phlip79 and others added 18 commits April 6, 2026 18:37

Merge branch 'philip/rename-to-hybrid' into philip/rename-non-core

a77b065

Remove unrelated upstream change from core PR

834c84d

.claude/skills/respond-to-issue/SKILL.md is not part of this rename. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'philip/rename-to-hybrid' into philip/rename-non-core

196f8e8

Revert "Remove unrelated upstream change from core PR"

f3533a1

This reverts commit 834c84d.

Merge branch 'main' of https://github.com/NVIDIA/Megatron-LM into phi…

3b09a05

…lip/rename-to-hybrid

Merge branch 'philip/rename-to-hybrid' into philip/rename-non-core

731629b

Merge branch 'philip/rename-to-hybrid' into philip/rename-non-core

b6aaaa0

Fix imports

6426646

Add CODEOWNERS entry for megatron/core/models/hybrid/

a1fb376

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'upstream/main' into philip/rename-to-hy…

050527d

…brid # Conflicts: # megatron/core/models/mamba/mamba_model.py

Merge branch 'main' into philip/rename-to-hybrid

ccca078

Rename mamba_submodules parameter to hybrid_submodules

91897dc

This parameter passes HybridStackSubmodules (formerly MambaStackSubmodules), so its name should match the renamed class. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'philip/rename-to-hybrid' of https://github.com/Phlip79/…

40f0384

…Megatron-LM into philip/rename-to-hybrid

Merge branch 'philip/rename-to-hybrid' into philip/rename-non-core

76dfdd7

# Conflicts: # tests/unit_tests/inference/engines/test_dynamic_engine.py

Fix black formatting in multi_token_prediction.py

f851305

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ChenhanYu approved these changes Apr 14, 2026

View reviewed changes

Phlip79 and others added 4 commits April 18, 2026 00:39

Merge branch 'philip/rename-to-hybrid' into philip/rename-non-core

2d619e1

Phlip79 marked this pull request as ready for review April 19, 2026 23:16

Phlip79 requested review from a team as code owners April 19, 2026 23:16

svcnvidia-nemo-ci requested a review from a team April 19, 2026 23:16

Phlip79 removed the request for review from a team April 19, 2026 23:17

svcnvidia-nemo-ci added the complexity: high label Apr 19, 2026

ko3n1g approved these changes Apr 20, 2026

View reviewed changes

tdene approved these changes Apr 20, 2026

View reviewed changes

jaredcasper reviewed Apr 20, 2026

View reviewed changes

Comment thread megatron/training/arguments.py

Comment thread megatron/training/arguments.py

Comment thread megatron/training/arguments.py Outdated

Phlip79 and others added 5 commits April 20, 2026 23:02

Merge remote-tracking branch 'upstream/main' into philip/rename-non-core

e98c50c

copy-pr-bot Bot temporarily deployed to test April 21, 2026 00:30 Inactive

Phlip79 requested a review from jaredcasper April 21, 2026 00:33

jaredcasper approved these changes Apr 22, 2026

View reviewed changes

Phlip79 enabled auto-merge April 23, 2026 00:54

maanug-nv approved these changes Apr 23, 2026

View reviewed changes

svcnvidia-nemo-ci added the Approved All necessary approvals have been made label Apr 23, 2026

copy-pr-bot Bot deployed to test April 23, 2026 02:39 Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename Mamba to Hybrid outside megatron/core#4159

Rename Mamba to Hybrid outside megatron/core#4159
Phlip79 wants to merge 37 commits intoNVIDIA:mainfrom
Phlip79:philip/rename-non-core

Phlip79 commented Apr 6, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Apr 6, 2026

Uh oh!

ChenhanYu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Phlip79 commented Apr 21, 2026

Uh oh!

Phlip79 commented Apr 21, 2026

Uh oh!

maanug-nv commented Apr 23, 2026

Uh oh!

maanug-nv left a comment

Uh oh!

Phlip79 commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

Phlip79 commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Renames

CLI updates

Backward compatibility

Not renamed (intentional)

Uh oh!

copy-pr-bot Bot commented Apr 6, 2026

Uh oh!

ChenhanYu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Phlip79 commented Apr 21, 2026

Uh oh!

Phlip79 commented Apr 21, 2026

Uh oh!

maanug-nv commented Apr 23, 2026

Code review

Uh oh!

maanug-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Phlip79 commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Phlip79 commented Apr 6, 2026 •

edited

Loading