Skip to content

NVIDIA Megatron-Bridge 0.5.0

Latest

Choose a tag to compare

@nemo-automation-bot nemo-automation-bot released this 22 Jun 23:23
fcbb603
Changelog Details

Model Collection Support

LLM / VLM

Multimodal

  • Nemotron-3 Nano Omni support, including model, recipe, and examples (PR#3760)
  • Qwen3-Omni-MoE training support (PR#3317, community @hbhflw2000)
  • Qwen3-ASR support (PR#2836, PR#3273)
  • Nemotron Diffusion (Nemotron-Labs-Diffusion) model support (PR#3105)

Training & Functionality

  • MegatronMIMO (Multimodel-In-Multimodel-Out) is a new feature to train multimodal models with heterogeneous parallelism (e.g. different model parallelism for the image encoder and text decoder). NeMo 26.06 supports non-colocated training (i.e. encoder and decoder are placed on different ranks PR#2004, PR#2007, PR#2869, PR#2870) and MegatronMIMO model conversion (PR#3905) with a focus on dense models. Colocated training (i.e. encoder and decoder on the same rank) and MoE models will be supported in the next release.
  • Energon v7 support, including metadata and stateless cookers (PR#4090)
  • Energon updates for video and multi-image (PR#3691)
  • Eval-time context parallelism via decentralized process-group rebinding (PR#3755)
  • Deterministic training support for performance recipes (PR#3543)
  • Evaluator backend integration (SFT + inference + evaluation, demonstrated on GPT-OSS) (PR#2990)
  • LoRA support for not sharing expert adapters (PR#3408)
  • Configurable async checkpoint strategy (PR#3153); MSC support for FSDP DTensors (PR#3300)
  • Fast dataloading configs and documentation (PR#3351)

Low-Precision Bridge & Checkpoint Conversion

  • Quantize-then-gather weight export (FP8 / MXFP4) for faster RL trainer→rollout weight sync (PR#2737, community @hy2826)
  • DeepSeek V4 quantization-scale emission during HF export (PR#3969)

Performance

  • fp4_param_gather enabled in MixedPrecisionConfig (PR#3364)
  • Qwen3-Next 80B GB200/GB300 parallel mappings (PR#3168)
  • CUDA graph support for Qwen3-VL LLM and vision-encoder submodules (PR#2334); full-iteration CUDA graph for GPT-OSS recipes (PR#4140)

Megatron-LM ↔ Megatron-Bridge Unification

  • Megatron Inference integrated into Bridge — MCore Inference Engine examples, model wrappers, pure-LLM inference CLI, and inference_optimized path (PR#3897)
  • Tokenizer unification — MCore tokenizer config promoted as the shared surface (Bridge side: PR#3451; MCore side: MCore PR#4406)
  • Training-loop upstreaming (in progress) — Bridge's config + builder patterns moving into Megatron-LM: ConfigContainer (MCore PR#4227), serialization base (MCore PR#4309), Mamba config + builder (MCore PR#4550), GPT config + builder (MCore PR#4741), supporting utils (MCore PR#4872)

Developer Experience & Compatibility

  • RL API refactoring — model creation, config override, training loop, export, and LoRA for RL (PR#3813)
  • AGENTS.md and AI-coding-agent skills updated (recipe-recommender, NeMo-RL & verl E2E testing) (PR#3256, PR#3277, PR#3831)

Examples & Tutorials

  • MegatronMIMO Qwen3.5-VL non-collocated SFT tutorial + LLaVA tutorial (PR#4239)
  • Qwen3-0.6B 128K long-context SFT recipe with YaRN RoPE scaling (PR#3316)
  • HuggingFace ↔ Megatron-FSDP weight conversion (PR#3512); online HF load/save for Megatron-FSDP (PR#1910)

ModelOpt

  • LoRA × ModelOpt × DeepSeek architecture support (PR#3612)

Community Contributions

A big thank you to our community contributors for their valuable support!

Known issues:

  • Moonlight and Nemotron v3 Nano model training recipe shows performance degradation with TP > 1 . As a workaround, set TP=1 and use HybridEP. We have root caused this regression to the base PyTorch image upgrade from 26.02 to 26.04. We are actively investigating and looking to fix this regression soon.
  • Step-3.7-Flash forward-pass outputs have not been fully verified.
  • Some examples/ scripts have known minor issues: MiniMax M2 (conversion/export saving), GLM-4.5V (exported tokenizer artifacts), FLUX (tokenizer setup), and WAN (inference setup/dependencies).