Rename Mamba to Hybrid outside megatron/core#4159
Rename Mamba to Hybrid outside megatron/core#4159Phlip79 wants to merge 37 commits intoNVIDIA:mainfrom
Conversation
Rename the generic model classes that support multiple layer types (Mamba SSM, Attention, MoE, GDN, MLP) via hybrid_layer_pattern: - MambaModel -> HybridModel - MambaStack -> HybridStack - MambaStackSubmodules -> HybridStackSubmodules - mamba_stack_spec -> hybrid_stack_spec - mamba_inference_stack_spec -> hybrid_inference_stack_spec - get_mamba_stack_modelopt_spec -> get_hybrid_stack_modelopt_spec Move canonical files to megatron/core/models/hybrid/: - hybrid_model.py, hybrid_block.py, hybrid_layer_specs.py, hybrid_layer_allocation.py Backward-compatible re-export stubs at old import paths and class aliases (MambaModel is a thin subclass accepting mamba_stack_spec kwarg). Mamba-specific SSM classes (MambaLayer, MambaMixer, etc.) unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove redundant explicit named imports that duplicate the wildcard import and add pylint disable comments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Include the test files that exercise the renamed megatron/core classes (HybridModel, HybridStack, hybrid_layer_allocation, etc.) and update their imports to use the new canonical paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Updates all files outside megatron/core/ to use the new Hybrid names introduced in the core rename PR. Depends on the core changes which provide backward-compat stubs at old import paths. File renames: - mamba_builders.py -> hybrid_builders.py - pretrain_mamba.py -> pretrain_hybrid.py - tools/run_mamba_text_generation_server*.py -> run_hybrid_* - test_mamba_model.py -> test_hybrid_model.py (and similar test renames) Function renames: - mamba_builder() -> hybrid_builder() - modelopt_gpt_mamba_builder() -> modelopt_gpt_hybrid_builder() Also updates --spec paths, --export-model-type, --model-provider, deprecation warnings, and documentation. Model-named files stay as-is: examples/mamba/, recipes/h100/mamba*.yaml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3c8b1c9 to
d6a3854
Compare
Tests under tests/unit_tests/inference/ and tests/unit_tests/resharding/ test the inference engine and resharding infrastructure, not megatron/core classes. They belong in the companion non-core PR (NVIDIA#4159). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
.claude/skills/respond-to-issue/SKILL.md is not part of this rename. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These test files were accidentally dropped during the core/non-core split. They test inference engines and resharding (not megatron/core classes), so they belong in this non-core PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 834c84d.
…lip/rename-to-hybrid
The backward-compat MambaModel wrapper belongs in the legacy mamba module, not in the canonical hybrid_model.py. This keeps hybrid_model.py clean with only HybridModel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…brid # Conflicts: # megatron/core/models/mamba/mamba_model.py
This parameter passes HybridStackSubmodules (formerly MambaStackSubmodules), so its name should match the renamed class. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Megatron-LM into philip/rename-to-hybrid
# Conflicts: # tests/unit_tests/inference/engines/test_dynamic_engine.py
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ChenhanYu
left a comment
There was a problem hiding this comment.
Verified all post_training and modelopt references are updated: model_builder.py (import path, function rename, model type check with deprecation warning, backward alias), arguments.py (HybridModel added to choices), train.sh (matches both HybridModel and MambaModel), all 7 Nemotron-H config scripts, and all 9 example Python scripts. Backward compat maintained via aliases and deprecation warnings throughout. LGTM — depends on #4099.
…brid # Conflicts: # megatron/core/inference/contexts/dynamic_context.py # megatron/core/models/mamba/mamba_layer_specs.py # megatron/core/models/mamba/mamba_model.py # megatron/core/ssm/mamba_block.py # megatron/core/ssm/mamba_hybrid_layer_allocation.py
Test classes named after MambaModel/MambaStack/MambaStackSubmodules are renamed to match the new Hybrid class names: - TestMambaModel -> TestHybridModel - TestMambaQKLayernorm -> TestHybridQKLayernorm - TestMambaWithDynamicInference -> TestHybridWithDynamicInference - TestMambaMoEModel -> TestHybridMoEModel - TestMambaBlock -> TestHybridBlock - TestModelOptMambaModel -> TestModelOptHybridModel - TestMultiTokenPredictionMamba -> TestMultiTokenPredictionHybrid - TestParallelMambaBlockCudagraphs -> TestParallelHybridBlockCudagraphs Also updates the TestHybridQKLayernorm class (added by upstream merge) to use HybridModel/hybrid_stack_spec consistently. Test classes for Mamba-specific SSM components (MambaLayer, MambaMixer, MambaContextParallel, MambaMetadata, MambaSlotAllocator, etc.) are unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This stale file was missed in the original directory rename. The canonical location is now modelopt/hybrid/model_specs.py. This stub re-exports from the new canonical location to preserve backward compatibility with any external imports from the old path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The rename commit (d6a3854) and subsequent merges accidentally reverted unrelated upstream changes. Restore upstream versions of: - megatron/training/arguments.py: restore 'adaptive_muon' optimizer choice and --optimizer-cuda-graph argument; remove stray --no-scatter-gather-tensors-in-pipeline - megatron/training/training.py: restore OptimizerCudaGraphWrapper import, save_checkpoint_and_time() usage, and timer logic from upstream - megatron/rl/sequence_packing_utils.py: restore removal of unused packed_attention_mask parameter (upstream NVIDIA#3859) - .github/actions/action.yml: restore retry loop around uv install (upstream NVIDIA#4387) Re-applied only the intended rename-related import path changes to arguments.py (2 lines) and training.py (2 lines). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- tests/test_utils/python_scripts/notify.py: restore WEBHOOK_URL check and copyright header from upstream - tests/unit_tests/rl/test_sequence_packing_utils.py: restore removal of packed_attention_mask parameter (companion to the sequence_packing_utils.py revert) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…test config Upstream removed these flags from hybrid_mr_mcore_te_tp2_pp1_cp1_dgx_a100_1N8G/model_config.yaml in commit 2697b82 ("base strategy simplification NVIDIA#4001"), but they were accidentally re-introduced by a merge conflict resolution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Test files don't need to accept the old "mamba" CLI value since the
backward-compat shim is only for external library consumers. Simplify
the defensive ("hybrid", "mamba") checks to just "hybrid".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ok to test 30606da |
|
Thank you @jaredcasper for catching those errors. All should be resolved now. |
Tighten the nightly sync workflow prompt and skill so the bot cannot
mark the PR ready while non-exempt checks are pending or failing.
- Replace the loose Phase 4 gate with a strict "all terminal, all
green" rule: every non-exempt required check in statusCheckRollup
must be COMPLETED + {SUCCESS, SKIPPED, NEUTRAL}. Queued or
in_progress is never acceptable.
- Explicitly ban background tasks (Bash/Agent run_in_background,
ScheduleWakeup, &, nohup, disown, setsid, tail -f on background
output) and explain why: the GitHub Actions step process owns the
shell and is destroyed on exit, so backgrounded work cannot resume.
- Anchor CI polling to `gh pr view --json statusCheckRollup` — the
actions/runs/.../jobs endpoint alone misses external status
contexts (GitLab CI, copy-pr-bot, etc.), which was the failure mode
on run 24800621116.
- Distinguish outer loop (agent's sequence of tool calls) from inner
loop (single blocking Bash call); provide a validated bash template
that normalizes CheckRun and StatusContext entries, uses
Nemo_CICD_Test as a sentinel to close the empty-rollup edge case,
and classifies against the exempt regex.
- Align exempt list with actual check names seen on PR NVIDIA#4159:
approval (codeowners-approval, check-approval,
multi-approval-bot-summary, is-not-external-contributor),
coverage (Coverage (unit-test), Coverage_Fake), docs
(build-docs / Build docs, build-docs-summary).
Code reviewFound 1 issue:
The Lines 15 to 18 in 30606da 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
maanug-nv
left a comment
There was a problem hiding this comment.
i think it makes sense to wrap runpy.run_path in pretrain_mamba.py with a if __name__ == "__main__" to be safe. otherwise lgtm.
Wrap the runpy.run_path call in `if __name__ == "__main__":` so that importing pretrain_mamba (e.g. to reuse model_provider / get_batch) does not launch distributed training as an import side-effect. Matches the pattern used in pretrain_gpt.py and pretrain_hybrid.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ok to test 018fa59 |
Summary
Follow-up to #4099 (now merged). This is the second and final rename PR.
Updates files outside
megatron/core/to use the newHybridModel/HybridStack/HybridStackSubmodulesnames introduced in #4099. Ran functional tests.Renames
pretrain_mamba.py→pretrain_hybrid.py(backward-compat wrapper kept at old path with deprecation warning)mamba_builders.py→hybrid_builders.py(backward-compat stub kept at old path with deprecation warning)mamba_builder()→hybrid_builder()modelopt_gpt_mamba_builder()→modelopt_gpt_hybrid_builder()tools/run_mamba_text_generation_server*.py→tools/run_hybrid_text_generation_server*.pytest_mamba_model.py→test_hybrid_model.py,test_mamba_block.py→test_hybrid_block.py,test_mamba_moe_model.py→test_hybrid_moe_model.py,test_mamba_model_expert_parallel_inference.py→test_hybrid_*,test_mamba_prefix_caching_e2e.py→test_hybrid_*TestMambaModel→TestHybridModel,TestMambaQKLayernorm→TestHybridQKLayernorm,TestMambaWithDynamicInference→TestHybridWithDynamicInference,TestMambaMoEModel→TestHybridMoEModel,TestMambaBlock→TestHybridBlock,TestModelOptMambaModel→TestModelOptHybridModel,TestMultiTokenPredictionMamba→TestMultiTokenPredictionHybrid,TestParallelMambaBlockCudagraphs→TestParallelHybridBlockCudagraphsCLI updates
--specpaths updated tomegatron.core.models.hybrid.hybrid_layer_specs hybrid_stack_specin shell scripts, YAML configs, and recipes--export-model-typeand--model-provideruseHybridModel/hybridwith backward-compat acceptance of oldMambaModel/mambavalues (with deprecation warnings)Backward compatibility
All old import paths, class names, function names, and CLI values continue to work via backward-compat stubs and aliases. External libraries depending on this repo are unaffected.
Not renamed (intentional)
examples/mamba/directory (architecture-named, likeexamples/llama/,examples/gpt3/)tests/test_utils/recipes/h100/mamba*.yaml(architecture-named)tests/unit_tests/dist_checkpointing/models/test_mamba.py(testsMambaMixerspecifically)MambaLayer,MambaMixer,MambaContextParallel,MambaInferenceStateConfig,MambaMetadata,MambaSlotAllocator,MambaTokenizer,--mamba-*CLI args, SSM algorithm config params)