Skip to content

Dynamo v1.3.0-minimax-m3-dev.1

Pre-release
Pre-release

Choose a tag to compare

@dagil-nvidia dagil-nvidia released this 12 Jun 16:01
· 1 commit to release/1.3.0-minimax-m3-dev.1 since this release
406bcc6

Release Notes

Dynamo v1.3.0-minimax-m3-dev.1 is an experimental dev build giving an early look at MiniMax-M3 support on Dynamo's vLLM, SGLang, and TensorRT-LLM runtimes. It is not recommended for production — features may be incomplete and APIs, behaviors, and defaults may change before the stable release. Use it for evaluation, testing, and early feedback only.

Summary

Dynamo v1.3.0-minimax-m3-dev.1 adds day-0 serving for MiniMax-M3 (MiniMaxAI/MiniMax-M3) across the vLLM, SGLang, and TensorRT-LLM runtimes — bringing MiniMax-M3's reasoning and tool-call format to Dynamo's OpenAI-compatible surface and per-backend deployment recipes. Backend model support comes from vLLM (vllm-project/vllm#45381), SGLang (sgl-project/sglang#27944), and TensorRT-LLM (NVIDIA/TensorRT-LLM#15292).

Branch: release/1.3.0-minimax-m3-dev.1, cut from main commit 824458ce (#10497, 2026-06-09)

Container Images

Backend Arch Image
vLLM (CUDA 13.0) multi-arch (amd64 + arm64) nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.3.0-minimax-m3-dev.1
SGLang (CUDA 13.0) multi-arch (amd64 + arm64) nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.3.0-minimax-m3-dev.1
TensorRT-LLM (CUDA 13.1) multi-arch (amd64 + arm64) nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.3.0-minimax-m3-dev.1

Backend Versions

Backend Source CUDA Python Notes
vLLM Dynamo-built runtime + MiniMax-M3 support (vllm#45381) 13.0 3.12
SGLang Dynamo-built runtime + MiniMax-M3 support (sglang#27944) 13.0 3.12
TensorRT-LLM Dynamo-built runtime 13.1 3.12

Models

Model HuggingFace ID Modalities Notes
MiniMax-M3 MiniMaxAI/MiniMax-M3 text + vision MoE, ~428B total / ~23B activated; context up to 1M tokens (MiniMax Sparse Attention)

About MiniMax-M3

MiniMax-M3 is MiniMax's open-weight frontier model — a Mixture-of-Experts model with roughly 428B total and 23B activated parameters, natively multimodal across text and vision, with context windows up to 1M tokens via MiniMax Sparse Attention (MSA). It targets coding and agentic workloads: autonomous task decomposition, tool invocation, and multi-step reasoning.

This v1.3.0-minimax-m3-dev.1 build serves MiniMax-M3 through Dynamo's OpenAI-compatible surface on three runtimes — vLLM, SGLang, and TensorRT-LLM — with the model's reasoning and tool-call format wired into Dynamo's chat-completions path.

Full Changelog

Reasoning & Tool Calling

  • Tool-Call Parser: Added a MiniMax-M3 XML tool-call parser and wired the model's reasoning and tool-call format into Dynamo's OpenAI-compatible chat-completions surface, with validation and jail handling updates (25e7a52).

Model Enablement

  • Generation Config: Fixed missing generation_config.json handling so MiniMax-M3 loads its default generation settings (25e7a52).

Recipes

  • MiniMax-M3 Deployment Recipes: Added per-backend aggregated deployment recipes for vLLM, SGLang, and TensorRT-LLM (BF16 and MXFP8 variants) under recipes/minimax-m3/.

Upstream Model Support