AIConfigurator v0.3.0
AIConfigurator 0.3.0
AIConfigurator is a tool that helps users find optimal configurations for deploying LLM inference workloads in distributed, multi-GPU environments such as those using NVIDIA H100, H200, GB200, B200, A100, or future hardware with the Dynamo backend.
Currently AIConfigurator supports NVIDIA TensorRT-LLM as the primary inference engine, with limited support for SGLang.
Release Highlights
AIConfigurator 0.3.0 introduces significant expansion in hardware support, framework compatibility, and model coverage. This release adds support for multiple new GPU architectures, introduces SGLang framework integration, and expands the model library with new Qwen3 variants and GPT-OSS models.
Features and Improvements
1. New Hardware Support
- Added GB200 GPU support (by @YijiaZhao in #32)
- Added B200 GPU support with TensorRT-LLM 1.0.0rc6 data (by @tianhaox in #36)
- Added A100 GPU support (by @simone-chen in #55)
2. New Framework Support: SGLang and Wide-EP
Note: SGLang support is currently limited and experimental.
- Added SGLang GEMM collector and performance data (by @Atream in #28)
- Added SGLang MLA-BMM collector and performance data (by @Atream in #29)
- Added SGLang MLA collector and performance data (by @Atream in #31)
- Added SGLang fused MoE Triton collector (by @Atream in #39)
- Added support for disaggregated DeepSeek in SGLang (by @AichenF in #54)
3. Expanded Model Support
- Added several Qwen3 models (by @tianhaox in #30)
- Added GPT-OSS support in AIConfigurator SDK (by @Arsene12358 in #56)
4. Configuration Generation and Evaluation
- Refactored generator as a standalone module for improved modularity (by @Ethan-ES in #40)
- Added new CLI and SDK support for presets in search space configuration (by @tianhaox in #44)
- Added AIPerf integration for performance evaluation (by @Ethan-ES in #57)
- Improved aggregated and disaggregated modeling and performance (by @tianhaox in #45)
5. Collector Improvements
- Enhanced collector to support data collection for windowed attention and additional MoE configurations (by @Arsene12358 in #33)
Bug Fixes
- Fixed LICENSE file (by @saturley-hall in #21)
- Added allowed path workspace configuration (by @tianhaox in #23)
- Updated MoE tuning logic (by @YijiaZhao in #19)
- Updated Gradio version for compatibility (by @saturley-hall in #35)
- Improved error handling for database loading failures (by @tianhaox in #37, #38)
- Enhanced Kubernetes support with corresponding documentation (by @Ethan-ES in #50)
- Changed NVIDIA SMI command from -lgc to -ac (by @LyleLuo in #49)
- Excluded FP8 from MLA generation post-processing test cases for Ampere architecture (by @simone-chen in #52)
- Fixed TensorRT-LLM 1.0.0 collector compatibility (by @tianhaox in #48)
- Improved tensor initialization to occur directly on device (by @ilyasher in #51)
- Enabled SDK tests in CI pipeline (by @ilyasher in #46)
Documentation
- Added guidance for adding new models (by @tianhaox in #26)
- Added NVIDIA SMI clock locking script to README (by @jasonqinzhou in #47)
- Added git LFS pull step to installation instructions for downloading full data files (by @saturley-hall in #71)
- Enhanced A100 documentation (by @saturley-hall in #70)
New Contributors
- @Arsene12358 made their first contribution in #33
- @ilyasher made their first contribution in #41
- @biswapanda made their first contribution in #42
- @LyleLuo made their first contribution in #49
- @AichenF made their first contribution in #54
For the complete list of changes, see the full changelog.