Skip to content

AIConfigurator Release v0.4.0

Choose a tag to compare

@saturley-hall saturley-hall released this 24 Nov 17:01
3a4f56d

AIConfigurator 0.4.0

AIConfigurator is a tool that helps users find optimal configurations for deploying LLM inference workloads in distributed, multi-GPU environments.AIConfigurator 0.4.0 adds extensive support for the SGLang backend, covering DeepSeek WideEP path and regular path with dense and MoE models support. We also added dense models support for vLLM backend. With this release, AIConfigurator now supports all 3 major backends: TensorRT-LLM, SGLang, and vLLM.

Release Highlights

AIConfigurator 0.4.0 significantly expands backend support, achieving coverage for all three major backends. This release introduces support for L40S GPUs, Qwen3 30B A3B MOE models, and direct HuggingFace model loading via --hf_id.

Additionally, it adds prefix cache modeling support to simulate workloads with system prompts or prefix cache hits, and unifies SGLang paths for better maintainability.

Features and Improvements

1. New Hardware Support

2. Framework Support

3. Expanded Model Support

4. Modeling and Improvements

5. Build, CI and Test

Bug Fixes

Documentation

  • Updated README to include A100 SXM in support matrix (by @simone-chen in #62)
  • Added git lfs pull step before install from source code to download full data files (by @cr7258 in #69)
  • Added more A100 docs (by @jasonqinzhou in #67)

New Contributors