Skip to content

v0.1.0 Initial release of AIConfigurator

Choose a tag to compare

@saturley-hall saturley-hall released this 12 Aug 19:10
efcae12

AIConfigurator is a tool designed for Dynamo to optimize disaggregated serving for generative AI models. It automatically finds optimal deployment configurations by searching thousands of candidates in tens of seconds, helping you achieve better throughput and latency in disaggregated serving.

Major Features

  • Automated Configuration Search: Search across thousands of deployment configurations to find optimal one of both disaggregated and aggregated system and do intelligent choice of disaggregated or aggregated deployment.
  • SLA-based Optimization: Optimize under TTFT (Time-To-First-Token) and TPOT (Time-Per-Output-Token) constraints to address throughput@latency problem
  • Dynamo Integration: Seamless integration with Dynamo by automatic generation of deployment configurations
  • Multi-framework Support: Compatible with NVIDIA TensorRT-LLM backend with extensible architecture for other frameworks (coming soon)

Model and System Support

  • Comprehensive Model Support:
    • GPT
    • LLAMA (2,3)
    • MoE
    • QWEN
    • DEEPSEEK_V3
    • NEMOTRON model families
  • System Support: H200 SXM and H100 SXM

User Interfaces

  • Command Line Interface (Suggested): Simple CLI with 3 basic arguments for quick start and configuration generation
  • Web Application: Interactive web interface for advanced configuration tuning and visualization