NVIDIA Megatron Core 0.5.0

ericharper released this 22 Mar 16:44

· 366 commits to main since this release

Key Features and Enhancements

Megatron core documentation is now live!

Model Features

MoE (Mixture of Experts)
- Support for Z-loss, Load balancing and Sinkhorn
- Layer and communications refactor
- Richer parallelism mappings and EP can be combined with other model parallel techniques for larger MoE variants, e.g. EP + TP + DP + SP + PP
- Token dropless architecture with Top-K routing
- Performance optimization with with GroupedGEMM when number of local experts is > 1
- Distributed checkpointing
Interleaved rotary embedding

Datasets

Masked WordPiece datasets for BERT and T5
Raw and mock datasets

Parallelism

Performance

Activation offloading to CPU
Rope and Swiglu fusion
Sliding window attention (via Transformer Engine)

General Improvements

Timers

Assets 2