Skip to content

v26.02

Choose a tag to compare

@sudostock sudostock released this 24 Mar 16:23
· 4 commits to main since this release
1b172ed

Added

  • B300 support
    • Pretrain recipes: Llama 3.1, DeepSeek V3, Nemotron-H, Qwen3
    • NCCL benchmark
    • CPU overhead microbenchmark
  • GPT-OSS pretrain recipe.
  • DeepSeek V3 Torchtitan FP8 support for GB300 and GB200.
  • DeepSeek V3 proxy models for 64 GB300/GB200 GPUs.
  • System info script for IB, container, and enroot diagnostics.
  • llmb-run archive command to package experiment logs into tarball.
  • Exemplar program documentation and tooling.

Changed

  • Updated recipes to NeMo 26.02.00 where applicable.
  • Llama3 LoRa finetuning ported to Megatron Bridge.
  • Torchtitan optimizations for DeepSeek V3.
  • Centralized peak throughput (TFLOP/GPU) as primary performance metric in READMEs.
  • Qwen3 235B GB200 removed FP8 support.

Removed

  • Run:ai support.

Known Issues

  • Recipes using NeMo 26.02.00 container will not work with EFA, see Known Issues section of README for workaround.
  • DeepSeek V3 on EFA clusters may encounter connectivity issues.