AIConfigurator 0.4.0

AIConfigurator is a tool that helps users find optimal configurations for deploying LLM inference workloads in distributed, multi-GPU environments.AIConfigurator 0.4.0 adds extensive support for the SGLang backend, covering DeepSeek WideEP path and regular path with dense and MoE models support. We also added dense models support for vLLM backend. With this release, AIConfigurator now supports all 3 major backends: TensorRT-LLM, SGLang, and vLLM.

Release Highlights

AIConfigurator 0.4.0 significantly expands backend support, achieving coverage for all three major backends. This release introduces support for L40S GPUs, Qwen3 30B A3B MOE models, and direct HuggingFace model loading via --hf_id.

Additionally, it adds prefix cache modeling support to simulate workloads with system prompts or prefix cache hits, and unifies SGLang paths for better maintainability.

Features and Improvements

1. New Hardware Support

Added L40S support for TRT-LLM (by @ilyasher in #91)

2. Framework Support

Added SGLang attention collector (by @Atream in #73)
Enhanced allreduce data collector to enable data collection for vLLM backend (by @Arsene12358 in #87)
Added SGLang disagg support (by @jasonqinzhou in #84)
Added SGLang agg support (by @jasonqinzhou in #93)
Added vLLM disagg support (by @ilyasher in #89)
Added vLLM agg support (by @ilyasher in #98)
Unified SGLang WideEP and regular paths (by @tianhaox in #99)

3. Expanded Model Support

Supported using --hf_id as an alternative to --model (by @simone-chen in #86)
Added Qwen3 30B A3B MOE model support (by @jasonqinzhou in #58)

4. Modeling and Improvements

Added prefix length modeling support (by @tianhaox in #77)
Added version subcommand (by @jasonqinzhou in #72)

5. Build, CI and Test

Added linting and formatting with Ruff, created a developer guide (by @anish-shanbhag in #65)
Added A100 to e2e test (by @simone-chen in #64)

Bug Fixes

Added supported systems to CLI help (by @jasonqinzhou in #63)
Fixed MLP context state (by @AichenF in #78)
Moved Gradio to optional dependencies (by @Arsene12358 in #90)
Fixed LLAMA2_7B and LLAMA2_13B errors (by @ilyasher in #97)
Fixed webapp compatibility with SGLang and vLLM (by @tianhaox in #100)
Fixed collector minor problems (by @tianhaox in #101)
Enhanced log file collection with Path and error handling (by @xutizhou in #92)

Documentation

Updated README to include A100 SXM in support matrix (by @simone-chen in #62)
Added git lfs pull step before install from source code to download full data files (by @cr7258 in #69)
Added more A100 docs (by @jasonqinzhou in #67)

New Contributors

@cr7258 made their first contribution in #69
@anish-shanbhag made their first contribution in #65
@xutizhou made their first contribution in #92

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AIConfigurator Release v0.4.0

Choose a tag to compare

Sorry, something went wrong.