Dynamo v1.3.0-minimax-m3-dev.1
Pre-releaseRelease Notes
Dynamo v1.3.0-minimax-m3-dev.1 is an experimental dev build giving an early look at MiniMax-M3 support on Dynamo's vLLM, SGLang, and TensorRT-LLM runtimes. It is not recommended for production — features may be incomplete and APIs, behaviors, and defaults may change before the stable release. Use it for evaluation, testing, and early feedback only.
Summary
Dynamo v1.3.0-minimax-m3-dev.1 adds day-0 serving for MiniMax-M3 (MiniMaxAI/MiniMax-M3) across the vLLM, SGLang, and TensorRT-LLM runtimes — bringing MiniMax-M3's reasoning and tool-call format to Dynamo's OpenAI-compatible surface and per-backend deployment recipes. Backend model support comes from vLLM (vllm-project/vllm#45381), SGLang (sgl-project/sglang#27944), and TensorRT-LLM (NVIDIA/TensorRT-LLM#15292).
Branch: release/1.3.0-minimax-m3-dev.1, cut from main commit 824458ce (#10497, 2026-06-09)
Container Images
| Backend | Arch | Image |
|---|---|---|
| vLLM (CUDA 13.0) | multi-arch (amd64 + arm64) |
nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.3.0-minimax-m3-dev.1 |
| SGLang (CUDA 13.0) | multi-arch (amd64 + arm64) |
nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.3.0-minimax-m3-dev.1 |
| TensorRT-LLM (CUDA 13.1) | multi-arch (amd64 + arm64) |
nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.3.0-minimax-m3-dev.1 |
Backend Versions
| Backend | Source | CUDA | Python | Notes |
|---|---|---|---|---|
| vLLM | Dynamo-built runtime + MiniMax-M3 support (vllm#45381) | 13.0 | 3.12 | — |
| SGLang | Dynamo-built runtime + MiniMax-M3 support (sglang#27944) | 13.0 | 3.12 | — |
| TensorRT-LLM | Dynamo-built runtime | 13.1 | 3.12 | — |
Models
| Model | HuggingFace ID | Modalities | Notes |
|---|---|---|---|
| MiniMax-M3 | MiniMaxAI/MiniMax-M3 |
text + vision | MoE, ~428B total / ~23B activated; context up to 1M tokens (MiniMax Sparse Attention) |
About MiniMax-M3
MiniMax-M3 is MiniMax's open-weight frontier model — a Mixture-of-Experts model with roughly 428B total and 23B activated parameters, natively multimodal across text and vision, with context windows up to 1M tokens via MiniMax Sparse Attention (MSA). It targets coding and agentic workloads: autonomous task decomposition, tool invocation, and multi-step reasoning.
This v1.3.0-minimax-m3-dev.1 build serves MiniMax-M3 through Dynamo's OpenAI-compatible surface on three runtimes — vLLM, SGLang, and TensorRT-LLM — with the model's reasoning and tool-call format wired into Dynamo's chat-completions path.
Full Changelog
Reasoning & Tool Calling
- Tool-Call Parser: Added a MiniMax-M3 XML tool-call parser and wired the model's reasoning and tool-call format into Dynamo's OpenAI-compatible chat-completions surface, with validation and jail handling updates (
25e7a52).
Model Enablement
- Generation Config: Fixed missing
generation_config.jsonhandling so MiniMax-M3 loads its default generation settings (25e7a52).
Recipes
- MiniMax-M3 Deployment Recipes: Added per-backend aggregated deployment recipes for vLLM, SGLang, and TensorRT-LLM (BF16 and MXFP8 variants) under
recipes/minimax-m3/.
Upstream Model Support
- vLLM: MiniMax-M3 model support in vllm-project/vllm#45381.
- SGLang: MiniMax-M3 model support in sgl-project/sglang#27944; warm-steady-state benchmark and docs in sgl-project/sglang#28062.
- TensorRT-LLM: MiniMax-M3 model support in NVIDIA/TensorRT-LLM#15292.