v26.02

sudostock released this 24 Mar 16:23

· 4 commits to main since this release

1b172ed

Added

B300 support
- Pretrain recipes: Llama 3.1, DeepSeek V3, Nemotron-H, Qwen3
- NCCL benchmark
- CPU overhead microbenchmark
GPT-OSS pretrain recipe.
DeepSeek V3 Torchtitan FP8 support for GB300 and GB200.
DeepSeek V3 proxy models for 64 GB300/GB200 GPUs.
System info script for IB, container, and enroot diagnostics.
llmb-run archive command to package experiment logs into tarball.
Exemplar program documentation and tooling.

Changed

Updated recipes to NeMo 26.02.00 where applicable.
Llama3 LoRa finetuning ported to Megatron Bridge.
Torchtitan optimizations for DeepSeek V3.
Centralized peak throughput (TFLOP/GPU) as primary performance metric in READMEs.
Qwen3 235B GB200 removed FP8 support.

Removed

Run:ai support.

Known Issues

Recipes using NeMo 26.02.00 container will not work with EFA, see Known Issues section of README for workaround.
DeepSeek V3 on EFA clusters may encounter connectivity issues.

Assets 2