AIConfigurator 0.2.0

AIConfigurator is a tool that helps users find optimal configurations for deploying LLM inference workloads in distributed, multi-GPU environments such as those using NVIDIA H100, H200, or future hardware with the Dynamo backend.

Currently AIConfigurator supports NVIDIA TensorRT-LLM as inference engine.

Release Highlights

AIConfigurator 0.2.0 brings several new features, improvements, and important fixes to enhance configuration workflows and automation.

Features and Improvements

1. Automation

Added automation evaluation support (by @tianhaox in #5)

2. Collector improvement

Mix-of-Expert collector now supports autotuning for improved efficiency (by @YijiaZhao in #11)

3. Dynamo upgrade

Upgraded to Dynamo 0.5.0 (by @Ethan-ES in #13)

Bug Fixes

Switched to using torch flow collector and added more default memory configuration options (by @tianhaox in #7)
Improved performance alignment logic and reliability (by @tianhaox in #10)
Enhanced mixture-of-experts (MoE) support: added power law handling and improved solver calculation for generative attention (by @tianhaox in #15)
Added safe directory creation to mitigate security risk and clarified error handling (by @tianhaox in #16)

Documentation

Improved README (https://github.com/ai-dynamo/aiconfigurator/blob/main/README.md) for clarity and precision (by @nealvaidya in #9)

New Contributors

@nealvaidya made the first contribution in #9
@Ethan-ES made the first contribution in #13

For the complete list of changes, see the full changelog.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AIConfigurator Release v0.2.0

Choose a tag to compare

Sorry, something went wrong.