Skip to content

AIConfigurator v0.3.0

Choose a tag to compare

@saturley-hall saturley-hall released this 24 Oct 18:03
8025c3b

AIConfigurator 0.3.0

AIConfigurator is a tool that helps users find optimal configurations for deploying LLM inference workloads in distributed, multi-GPU environments such as those using NVIDIA H100, H200, GB200, B200, A100, or future hardware with the Dynamo backend.

Currently AIConfigurator supports NVIDIA TensorRT-LLM as the primary inference engine, with limited support for SGLang.

Release Highlights

AIConfigurator 0.3.0 introduces significant expansion in hardware support, framework compatibility, and model coverage. This release adds support for multiple new GPU architectures, introduces SGLang framework integration, and expands the model library with new Qwen3 variants and GPT-OSS models.

Features and Improvements

1. New Hardware Support

2. New Framework Support: SGLang and Wide-EP

Note: SGLang support is currently limited and experimental.

  • Added SGLang GEMM collector and performance data (by @Atream in #28)
  • Added SGLang MLA-BMM collector and performance data (by @Atream in #29)
  • Added SGLang MLA collector and performance data (by @Atream in #31)
  • Added SGLang fused MoE Triton collector (by @Atream in #39)
  • Added support for disaggregated DeepSeek in SGLang (by @AichenF in #54)

3. Expanded Model Support

4. Configuration Generation and Evaluation

  • Refactored generator as a standalone module for improved modularity (by @Ethan-ES in #40)
  • Added new CLI and SDK support for presets in search space configuration (by @tianhaox in #44)
  • Added AIPerf integration for performance evaluation (by @Ethan-ES in #57)
  • Improved aggregated and disaggregated modeling and performance (by @tianhaox in #45)

5. Collector Improvements

  • Enhanced collector to support data collection for windowed attention and additional MoE configurations (by @Arsene12358 in #33)

Bug Fixes

  • Fixed LICENSE file (by @saturley-hall in #21)
  • Added allowed path workspace configuration (by @tianhaox in #23)
  • Updated MoE tuning logic (by @YijiaZhao in #19)
  • Updated Gradio version for compatibility (by @saturley-hall in #35)
  • Improved error handling for database loading failures (by @tianhaox in #37, #38)
  • Enhanced Kubernetes support with corresponding documentation (by @Ethan-ES in #50)
  • Changed NVIDIA SMI command from -lgc to -ac (by @LyleLuo in #49)
  • Excluded FP8 from MLA generation post-processing test cases for Ampere architecture (by @simone-chen in #52)
  • Fixed TensorRT-LLM 1.0.0 collector compatibility (by @tianhaox in #48)
  • Improved tensor initialization to occur directly on device (by @ilyasher in #51)
  • Enabled SDK tests in CI pipeline (by @ilyasher in #46)

Documentation

New Contributors

For the complete list of changes, see the full changelog.