Skip to content

Rename Mamba to Hybrid outside megatron/core#4159

Open
Phlip79 wants to merge 37 commits intoNVIDIA:mainfrom
Phlip79:philip/rename-non-core
Open

Rename Mamba to Hybrid outside megatron/core#4159
Phlip79 wants to merge 37 commits intoNVIDIA:mainfrom
Phlip79:philip/rename-non-core

Conversation

@Phlip79
Copy link
Copy Markdown
Member

@Phlip79 Phlip79 commented Apr 6, 2026

Summary

Follow-up to #4099 (now merged). This is the second and final rename PR.

Updates files outside megatron/core/ to use the new HybridModel / HybridStack / HybridStackSubmodules names introduced in #4099. Ran functional tests.

Renames

  • pretrain_mamba.pypretrain_hybrid.py (backward-compat wrapper kept at old path with deprecation warning)
  • mamba_builders.pyhybrid_builders.py (backward-compat stub kept at old path with deprecation warning)
  • mamba_builder()hybrid_builder()
  • modelopt_gpt_mamba_builder()modelopt_gpt_hybrid_builder()
  • tools/run_mamba_text_generation_server*.pytools/run_hybrid_text_generation_server*.py
  • Test files: test_mamba_model.pytest_hybrid_model.py, test_mamba_block.pytest_hybrid_block.py, test_mamba_moe_model.pytest_hybrid_moe_model.py, test_mamba_model_expert_parallel_inference.pytest_hybrid_*, test_mamba_prefix_caching_e2e.pytest_hybrid_*
  • Test classes: TestMambaModelTestHybridModel, TestMambaQKLayernormTestHybridQKLayernorm, TestMambaWithDynamicInferenceTestHybridWithDynamicInference, TestMambaMoEModelTestHybridMoEModel, TestMambaBlockTestHybridBlock, TestModelOptMambaModelTestModelOptHybridModel, TestMultiTokenPredictionMambaTestMultiTokenPredictionHybrid, TestParallelMambaBlockCudagraphsTestParallelHybridBlockCudagraphs

CLI updates

  • --spec paths updated to megatron.core.models.hybrid.hybrid_layer_specs hybrid_stack_spec in shell scripts, YAML configs, and recipes
  • --export-model-type and --model-provider use HybridModel/hybrid with backward-compat acceptance of old MambaModel/mamba values (with deprecation warnings)

Backward compatibility

All old import paths, class names, function names, and CLI values continue to work via backward-compat stubs and aliases. External libraries depending on this repo are unaffected.

Not renamed (intentional)

  • examples/mamba/ directory (architecture-named, like examples/llama/, examples/gpt3/)
  • tests/test_utils/recipes/h100/mamba*.yaml (architecture-named)
  • tests/unit_tests/dist_checkpointing/models/test_mamba.py (tests MambaMixer specifically)
  • All Mamba SSM component names (MambaLayer, MambaMixer, MambaContextParallel, MambaInferenceStateConfig, MambaMetadata, MambaSlotAllocator, MambaTokenizer, --mamba-* CLI args, SSM algorithm config params)

Rename the generic model classes that support multiple layer types
(Mamba SSM, Attention, MoE, GDN, MLP) via hybrid_layer_pattern:
- MambaModel -> HybridModel
- MambaStack -> HybridStack
- MambaStackSubmodules -> HybridStackSubmodules
- mamba_stack_spec -> hybrid_stack_spec
- mamba_inference_stack_spec -> hybrid_inference_stack_spec
- get_mamba_stack_modelopt_spec -> get_hybrid_stack_modelopt_spec

Move canonical files to megatron/core/models/hybrid/:
- hybrid_model.py, hybrid_block.py, hybrid_layer_specs.py,
  hybrid_layer_allocation.py

Backward-compatible re-export stubs at old import paths and class
aliases (MambaModel is a thin subclass accepting mamba_stack_spec kwarg).

Mamba-specific SSM classes (MambaLayer, MambaMixer, etc.) unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 6, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Phlip79 and others added 3 commits April 6, 2026 17:29
Remove redundant explicit named imports that duplicate the wildcard
import and add pylint disable comments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Include the test files that exercise the renamed megatron/core classes
(HybridModel, HybridStack, hybrid_layer_allocation, etc.) and update
their imports to use the new canonical paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Updates all files outside megatron/core/ to use the new Hybrid names
introduced in the core rename PR. Depends on the core changes which
provide backward-compat stubs at old import paths.

File renames:
- mamba_builders.py -> hybrid_builders.py
- pretrain_mamba.py -> pretrain_hybrid.py
- tools/run_mamba_text_generation_server*.py -> run_hybrid_*
- test_mamba_model.py -> test_hybrid_model.py (and similar test renames)

Function renames:
- mamba_builder() -> hybrid_builder()
- modelopt_gpt_mamba_builder() -> modelopt_gpt_hybrid_builder()

Also updates --spec paths, --export-model-type, --model-provider,
deprecation warnings, and documentation.

Model-named files stay as-is: examples/mamba/, recipes/h100/mamba*.yaml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Phlip79 Phlip79 force-pushed the philip/rename-non-core branch from 3c8b1c9 to d6a3854 Compare April 6, 2026 18:34
Phlip79 and others added 18 commits April 6, 2026 18:37
Tests under tests/unit_tests/inference/ and tests/unit_tests/resharding/
test the inference engine and resharding infrastructure, not megatron/core
classes. They belong in the companion non-core PR (NVIDIA#4159).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
.claude/skills/respond-to-issue/SKILL.md is not part of this rename.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These test files were accidentally dropped during the core/non-core
split. They test inference engines and resharding (not megatron/core
classes), so they belong in this non-core PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The backward-compat MambaModel wrapper belongs in the legacy mamba
module, not in the canonical hybrid_model.py. This keeps hybrid_model.py
clean with only HybridModel.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…brid

# Conflicts:
#	megatron/core/models/mamba/mamba_model.py
This parameter passes HybridStackSubmodules (formerly
MambaStackSubmodules), so its name should match the renamed class.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts:
#	tests/unit_tests/inference/engines/test_dynamic_engine.py
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ChenhanYu ChenhanYu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified all post_training and modelopt references are updated: model_builder.py (import path, function rename, model type check with deprecation warning, backward alias), arguments.py (HybridModel added to choices), train.sh (matches both HybridModel and MambaModel), all 7 Nemotron-H config scripts, and all 9 example Python scripts. Backward compat maintained via aliases and deprecation warnings throughout. LGTM — depends on #4099.

Phlip79 and others added 4 commits April 18, 2026 00:39
…brid

# Conflicts:
#	megatron/core/inference/contexts/dynamic_context.py
#	megatron/core/models/mamba/mamba_layer_specs.py
#	megatron/core/models/mamba/mamba_model.py
#	megatron/core/ssm/mamba_block.py
#	megatron/core/ssm/mamba_hybrid_layer_allocation.py
Test classes named after MambaModel/MambaStack/MambaStackSubmodules
are renamed to match the new Hybrid class names:
- TestMambaModel -> TestHybridModel
- TestMambaQKLayernorm -> TestHybridQKLayernorm
- TestMambaWithDynamicInference -> TestHybridWithDynamicInference
- TestMambaMoEModel -> TestHybridMoEModel
- TestMambaBlock -> TestHybridBlock
- TestModelOptMambaModel -> TestModelOptHybridModel
- TestMultiTokenPredictionMamba -> TestMultiTokenPredictionHybrid
- TestParallelMambaBlockCudagraphs -> TestParallelHybridBlockCudagraphs

Also updates the TestHybridQKLayernorm class (added by upstream merge)
to use HybridModel/hybrid_stack_spec consistently.

Test classes for Mamba-specific SSM components (MambaLayer, MambaMixer,
MambaContextParallel, MambaMetadata, MambaSlotAllocator, etc.) are
unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This stale file was missed in the original directory rename. The
canonical location is now modelopt/hybrid/model_specs.py. This stub
re-exports from the new canonical location to preserve backward
compatibility with any external imports from the old path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Phlip79 Phlip79 marked this pull request as ready for review April 19, 2026 23:16
@Phlip79 Phlip79 requested review from a team as code owners April 19, 2026 23:16
@svcnvidia-nemo-ci svcnvidia-nemo-ci requested a review from a team April 19, 2026 23:16
@Phlip79 Phlip79 removed the request for review from a team April 19, 2026 23:17
Comment thread megatron/training/arguments.py
Comment thread megatron/training/arguments.py
Comment thread megatron/training/arguments.py Outdated
Phlip79 and others added 5 commits April 20, 2026 23:02
The rename commit (d6a3854) and subsequent merges accidentally reverted
unrelated upstream changes. Restore upstream versions of:

- megatron/training/arguments.py: restore 'adaptive_muon' optimizer choice
  and --optimizer-cuda-graph argument; remove stray
  --no-scatter-gather-tensors-in-pipeline
- megatron/training/training.py: restore OptimizerCudaGraphWrapper import,
  save_checkpoint_and_time() usage, and timer logic from upstream
- megatron/rl/sequence_packing_utils.py: restore removal of unused
  packed_attention_mask parameter (upstream NVIDIA#3859)
- .github/actions/action.yml: restore retry loop around uv install
  (upstream NVIDIA#4387)

Re-applied only the intended rename-related import path changes to
arguments.py (2 lines) and training.py (2 lines).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- tests/test_utils/python_scripts/notify.py: restore WEBHOOK_URL check
  and copyright header from upstream
- tests/unit_tests/rl/test_sequence_packing_utils.py: restore removal
  of packed_attention_mask parameter (companion to the
  sequence_packing_utils.py revert)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…test config

Upstream removed these flags from
hybrid_mr_mcore_te_tp2_pp1_cp1_dgx_a100_1N8G/model_config.yaml in commit
2697b82 ("base strategy simplification NVIDIA#4001"), but they were
accidentally re-introduced by a merge conflict resolution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Test files don't need to accept the old "mamba" CLI value since the
backward-compat shim is only for external library consumers. Simplify
the defensive ("hybrid", "mamba") checks to just "hybrid".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Phlip79
Copy link
Copy Markdown
Member Author

Phlip79 commented Apr 21, 2026

/ok to test 30606da

@Phlip79
Copy link
Copy Markdown
Member Author

Phlip79 commented Apr 21, 2026

Thank you @jaredcasper for catching those errors. All should be resolved now.

@Phlip79 Phlip79 requested a review from jaredcasper April 21, 2026 00:33
Phlip79 added a commit to Phlip79/Megatron-LM that referenced this pull request Apr 22, 2026
Tighten the nightly sync workflow prompt and skill so the bot cannot
mark the PR ready while non-exempt checks are pending or failing.

- Replace the loose Phase 4 gate with a strict "all terminal, all
  green" rule: every non-exempt required check in statusCheckRollup
  must be COMPLETED + {SUCCESS, SKIPPED, NEUTRAL}. Queued or
  in_progress is never acceptable.
- Explicitly ban background tasks (Bash/Agent run_in_background,
  ScheduleWakeup, &, nohup, disown, setsid, tail -f on background
  output) and explain why: the GitHub Actions step process owns the
  shell and is destroyed on exit, so backgrounded work cannot resume.
- Anchor CI polling to `gh pr view --json statusCheckRollup` — the
  actions/runs/.../jobs endpoint alone misses external status
  contexts (GitLab CI, copy-pr-bot, etc.), which was the failure mode
  on run 24800621116.
- Distinguish outer loop (agent's sequence of tool calls) from inner
  loop (single blocking Bash call); provide a validated bash template
  that normalizes CheckRun and StatusContext entries, uses
  Nemo_CICD_Test as a sentinel to close the empty-rollup edge case,
  and classifies against the exempt regex.
- Align exempt list with actual check names seen on PR NVIDIA#4159:
  approval (codeowners-approval, check-approval,
  multi-approval-bot-summary, is-not-external-contributor),
  coverage (Coverage (unit-test), Coverage_Fake), docs
  (build-docs / Build docs, build-docs-summary).
@Phlip79 Phlip79 enabled auto-merge April 23, 2026 00:54
@maanug-nv
Copy link
Copy Markdown
Contributor

Code review

Found 1 issue:

  1. pretrain_mamba.py calls runpy.run_path at module level with no if __name__ == "__main__": guard (bug due to missing import guard)

The runpy.run_path(...) call on line 18 executes at import time. Every other pretrain script in this repo (e.g. pretrain_gpt.py, pretrain_hybrid.py) wraps its launch logic in if __name__ == "__main__":. Tools and examples in this codebase import from pretrain scripts to extract functions like model_provider and get_batch — if anyone does the same with pretrain_mamba, it will immediately attempt to launch distributed training as a side-effect of the import.

# Execute pretrain_hybrid.py as if it were invoked directly.
_this_dir = os.path.dirname(os.path.abspath(__file__))
runpy.run_path(os.path.join(_this_dir, "pretrain_hybrid.py"), run_name="__main__")

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Copy link
Copy Markdown
Contributor

@maanug-nv maanug-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it makes sense to wrap runpy.run_path in pretrain_mamba.py with a if __name__ == "__main__" to be safe. otherwise lgtm.

@svcnvidia-nemo-ci svcnvidia-nemo-ci added the Approved All necessary approvals have been made label Apr 23, 2026
Wrap the runpy.run_path call in `if __name__ == "__main__":` so that
importing pretrain_mamba (e.g. to reuse model_provider / get_batch) does
not launch distributed training as an import side-effect. Matches the
pattern used in pretrain_gpt.py and pretrain_hybrid.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Phlip79
Copy link
Copy Markdown
Member Author

Phlip79 commented Apr 23, 2026

/ok to test 018fa59

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Approved All necessary approvals have been made complexity: high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants