AIConfigurator Release v0.2.0
AIConfigurator 0.2.0
AIConfigurator is a tool that helps users find optimal configurations for deploying LLM inference workloads in distributed, multi-GPU environments such as those using NVIDIA H100, H200, or future hardware with the Dynamo backend.
Currently AIConfigurator supports NVIDIA TensorRT-LLM as inference engine.
Release Highlights
AIConfigurator 0.2.0 brings several new features, improvements, and important fixes to enhance configuration workflows and automation.
Features and Improvements
1. Automation
2. Collector improvement
- Mix-of-Expert collector now supports autotuning for improved efficiency (by @YijiaZhao in #11)
3. Dynamo upgrade
Bug Fixes
- Switched to using torch flow collector and added more default memory configuration options (by @tianhaox in #7)
- Improved performance alignment logic and reliability (by @tianhaox in #10)
- Enhanced mixture-of-experts (MoE) support: added power law handling and improved solver calculation for generative attention (by @tianhaox in #15)
- Added safe directory creation to mitigate security risk and clarified error handling (by @tianhaox in #16)
Documentation
- Improved README (https://github.com/ai-dynamo/aiconfigurator/blob/main/README.md) for clarity and precision (by @nealvaidya in #9)
New Contributors
- @nealvaidya made the first contribution in #9
- @Ethan-ES made the first contribution in #13
For the complete list of changes, see the full changelog.