AIConfigurator Release v0.4.0
AIConfigurator 0.4.0
AIConfigurator is a tool that helps users find optimal configurations for deploying LLM inference workloads in distributed, multi-GPU environments.AIConfigurator 0.4.0 adds extensive support for the SGLang backend, covering DeepSeek WideEP path and regular path with dense and MoE models support. We also added dense models support for vLLM backend. With this release, AIConfigurator now supports all 3 major backends: TensorRT-LLM, SGLang, and vLLM.
Release Highlights
AIConfigurator 0.4.0 significantly expands backend support, achieving coverage for all three major backends. This release introduces support for L40S GPUs, Qwen3 30B A3B MOE models, and direct HuggingFace model loading via --hf_id.
Additionally, it adds prefix cache modeling support to simulate workloads with system prompts or prefix cache hits, and unifies SGLang paths for better maintainability.
Features and Improvements
1. New Hardware Support
2. Framework Support
- Added SGLang attention collector (by @Atream in #73)
- Enhanced allreduce data collector to enable data collection for vLLM backend (by @Arsene12358 in #87)
- Added SGLang disagg support (by @jasonqinzhou in #84)
- Added SGLang agg support (by @jasonqinzhou in #93)
- Added vLLM disagg support (by @ilyasher in #89)
- Added vLLM agg support (by @ilyasher in #98)
- Unified SGLang WideEP and regular paths (by @tianhaox in #99)
3. Expanded Model Support
- Supported using
--hf_idas an alternative to--model(by @simone-chen in #86) - Added Qwen3 30B A3B MOE model support (by @jasonqinzhou in #58)
4. Modeling and Improvements
- Added prefix length modeling support (by @tianhaox in #77)
- Added version subcommand (by @jasonqinzhou in #72)
5. Build, CI and Test
- Added linting and formatting with Ruff, created a developer guide (by @anish-shanbhag in #65)
- Added A100 to e2e test (by @simone-chen in #64)
Bug Fixes
- Added supported systems to CLI help (by @jasonqinzhou in #63)
- Fixed MLP context state (by @AichenF in #78)
- Moved Gradio to optional dependencies (by @Arsene12358 in #90)
- Fixed LLAMA2_7B and LLAMA2_13B errors (by @ilyasher in #97)
- Fixed webapp compatibility with SGLang and vLLM (by @tianhaox in #100)
- Fixed collector minor problems (by @tianhaox in #101)
- Enhanced log file collection with Path and error handling (by @xutizhou in #92)
Documentation
- Updated README to include A100 SXM in support matrix (by @simone-chen in #62)
- Added git lfs pull step before install from source code to download full data files (by @cr7258 in #69)
- Added more A100 docs (by @jasonqinzhou in #67)