Skip to content

AIConfigurator Release v0.2.0

Choose a tag to compare

@saturley-hall saturley-hall released this 18 Sep 19:20
f3d7bba

AIConfigurator 0.2.0

AIConfigurator is a tool that helps users find optimal configurations for deploying LLM inference workloads in distributed, multi-GPU environments such as those using NVIDIA H100, H200, or future hardware with the Dynamo backend.

Currently AIConfigurator supports NVIDIA TensorRT-LLM as inference engine.

Release Highlights

AIConfigurator 0.2.0 brings several new features, improvements, and important fixes to enhance configuration workflows and automation.

Features and Improvements

1. Automation

  • Added automation evaluation support (by @tianhaox in #5)

2. Collector improvement

  • Mix-of-Expert collector now supports autotuning for improved efficiency (by @YijiaZhao in #11)

3. Dynamo upgrade

Bug Fixes

  • Switched to using torch flow collector and added more default memory configuration options (by @tianhaox in #7)
  • Improved performance alignment logic and reliability (by @tianhaox in #10)
  • Enhanced mixture-of-experts (MoE) support: added power law handling and improved solver calculation for generative attention (by @tianhaox in #15)
  • Added safe directory creation to mitigate security risk and clarified error handling (by @tianhaox in #16)

Documentation

New Contributors

For the complete list of changes, see the full changelog.