Skip to content

AIConfigurator Release v0.5.0

Choose a tag to compare

@nv-anants nv-anants released this 15 Jan 23:05
f178c8a

AIConfigurator 0.5.0

AIConfigurator 0.5.0 brings significant performance optimizations, expands backend support for vLLM and SGLang, and introduces new modeling capabilities including Power Estimation and Power Law workload distribution. This release also adds comprehensive support matrix testing.

Release Highlights

This version focuses on performance efficiency with optimizations to the generation engine and database lookups. New hardware data support includes L40S for SGLang, and we have expanded MoE (Mixture of Experts) support to the vLLM backend. Additionally, users can now target End-to-End (E2E) latency and estimate power consumption.

Features and Improvements

1. Performance Optimizations

  • Engine Optimization: Optimized the implementation of run_generation and num_gpu lookups for faster execution (by @anish-shanbhag in #113, #114).
  • Efficient Data Handling: Replaced dataframes with dictionaries for batch operations in InferenceSummary generation and added caching for repeated queries to improve speed (by @anish-shanbhag in #115, #128).

2. New Modeling Capabilities

  • Power Estimation: Added support for estimating power consumption of configurations (by @kaim-eng in #153).
  • Workload Distribution: Introduced a 'power_law' option for workload distribution in the CLI and prefill modeling (by @xutizhou in #147, #134).
  • Hybrid Modeling: Added support for hybrid modeling scenarios (by @tianhaox in #125).
  • Latency Targets: Users can now set E2E latency as a target metric (by @tianhaox in #145).

3. Framework and Hardware Support

4. User Interface

  • Profiler UI: Introduced a new Profiler UI for better visualization and analysis (by @Harrilee in #117).
  • UI Updates: Relocated GPU cost references and updated profiling components (by @Harrilee in #167).

5. Build, CI and Test

  • Testing Framework: Added a comprehensive support matrix testing framework (by @Harrilee in #126).
  • Maintenance: Added a CODEOWNERS file for better repository management (by @Arsene12358 in #109).

Bug Fixes

  • SGLang Fixes: Addressed vulnerabilities in the collector (#108), aligned GEMM quantization methods (#122), and fixed attention collection for the regular path (#123).
  • MoE & Model Fixes: Fixed MoE memory issues and NVFP4 GEMM for TRT-LLM 1.x (#131), removed generation repeat attention (#148), and updated workload distribution logic for MoE/DeepSeek models (#146).
  • CLI & Compatibility: Fixed CLI for GB200 with TP > 4 (#137), improved Python compatibility by using Union instead of | (#158), and relaxed Pydantic requirements (#161, #162).
  • General Fixes: Fixed team name parsing (#130), updated custom_allreduce file locations (#156, #160), and removed PII from error stack traces (#166).

Documentation

New Contributors

Full Changelog: v0.4.0...v0.5.0