Release Notes

Dynamo v1.3.0-minimax-m3-dev.1 is an experimental dev build giving an early look at MiniMax-M3 support on Dynamo's vLLM, SGLang, and TensorRT-LLM runtimes. It is not recommended for production — features may be incomplete and APIs, behaviors, and defaults may change before the stable release. Use it for evaluation, testing, and early feedback only.

Summary

Dynamo v1.3.0-minimax-m3-dev.1 adds day-0 serving for MiniMax-M3 (MiniMaxAI/MiniMax-M3) across the vLLM, SGLang, and TensorRT-LLM runtimes — bringing MiniMax-M3's reasoning and tool-call format to Dynamo's OpenAI-compatible surface and per-backend deployment recipes. Backend model support comes from vLLM (vllm-project/vllm#45381), SGLang (sgl-project/sglang#27944), and TensorRT-LLM (NVIDIA/TensorRT-LLM#15292).

Branch: release/1.3.0-minimax-m3-dev.1, cut from main commit 824458ce (#10497, 2026-06-09)

Container Images

Backend	Arch	Image
vLLM (CUDA 13.0)	multi-arch (`amd64` + `arm64`)	`nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.3.0-minimax-m3-dev.1`
SGLang (CUDA 13.0)	multi-arch (`amd64` + `arm64`)	`nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.3.0-minimax-m3-dev.1`
TensorRT-LLM (CUDA 13.1)	multi-arch (`amd64` + `arm64`)	`nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.3.0-minimax-m3-dev.1`

Backend Versions

Backend	Source	CUDA	Python	Notes
vLLM	Dynamo-built runtime + MiniMax-M3 support (vllm#45381)	13.0	3.12	—
SGLang	Dynamo-built runtime + MiniMax-M3 support (sglang#27944)	13.0	3.12	—
TensorRT-LLM	Dynamo-built runtime	13.1	3.12	—

Models

Model	HuggingFace ID	Modalities	Notes
MiniMax-M3	`MiniMaxAI/MiniMax-M3`	text + vision	MoE, ~428B total / ~23B activated; context up to 1M tokens (MiniMax Sparse Attention)

About MiniMax-M3

MiniMax-M3 is MiniMax's open-weight frontier model — a Mixture-of-Experts model with roughly 428B total and 23B activated parameters, natively multimodal across text and vision, with context windows up to 1M tokens via MiniMax Sparse Attention (MSA). It targets coding and agentic workloads: autonomous task decomposition, tool invocation, and multi-step reasoning.

This v1.3.0-minimax-m3-dev.1 build serves MiniMax-M3 through Dynamo's OpenAI-compatible surface on three runtimes — vLLM, SGLang, and TensorRT-LLM — with the model's reasoning and tool-call format wired into Dynamo's chat-completions path.

Full Changelog

Reasoning & Tool Calling

Tool-Call Parser: Added a MiniMax-M3 XML tool-call parser and wired the model's reasoning and tool-call format into Dynamo's OpenAI-compatible chat-completions surface, with validation and jail handling updates (25e7a52).

Model Enablement

Generation Config: Fixed missing generation_config.json handling so MiniMax-M3 loads its default generation settings (25e7a52).

Recipes

MiniMax-M3 Deployment Recipes: Added per-backend aggregated deployment recipes for vLLM, SGLang, and TensorRT-LLM (BF16 and MXFP8 variants) under recipes/minimax-m3/.

Upstream Model Support

vLLM: MiniMax-M3 model support in vllm-project/vllm#45381.
SGLang: MiniMax-M3 model support in sgl-project/sglang#27944; warm-steady-state benchmark and docs in sgl-project/sglang#28062.
TensorRT-LLM: MiniMax-M3 model support in NVIDIA/TensorRT-LLM#15292.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamo v1.3.0-minimax-m3-dev.1

Choose a tag to compare

Sorry, something went wrong.