Skip to content

AITune Release v0.3.0

Choose a tag to compare

@saturley-hall saturley-hall released this 15 Mar 16:28
· 64 commits to main since this release

Summary

AITune is an open-source (Apache 2.0) inference toolkit, hosted under the ai-dynamo GitHub organization and distributed via PyPI. It is designed for tuning and deploying Deep Learning models on NVIDIA GPUs, significantly improving inference speed and efficiency across various AI workloads.

Major Features & Improvements

Tuning Modes

  • Just-in-Time (JIT) Tuning: Zero-code model tuning and inspection controlled through a single import or environment flag. Tunes on the very first model call using only one sample, with automatic fallback to Torch Inductor when a backend cannot compile a module.
  • Ahead-of-Time (AOT) Tuning: Low-code API for explicit model inspection, backend selection, and module-level tuning. Supports forward hooks for custom pre/post-processing logic around tuned modules.

Backend Support

  • TensorRT: Multi-profile engines with auto-generated and user-provided profiles, CUDA graph capture, FP16/FP8/INT8 mixed precision via TensorRT Model Optimizer, and Dynamo-based ONNX export (torch.onnx.export(dynamo=True)) for improved graph fidelity.
  • TorchInductor: Added support for static and dynamic HuggingFace models, broadening model compatibility beyond TensorRT workflows.

Model Compatibility

  • Complex Inputs: Support for dataclasses, user-defined objects in module.forward() arguments, and lists/dicts within Torch module containers for more complete model analysis.
  • LLM Support: Added KV cache support to enable tuning of autoregressive large language models.

Performance & Observability

  • Memory Optimization: Reduced CPU/GPU memory usage during tuning by offloading inactive modules to the meta device, with optimized input/output metadata handling.
  • Profiling: Extended metrics collection through NVTX annotations for Nsight Systems integration. Added configurable console output suppression with automatic log-to-file.

Documentation & Examples

  • Documentation & Examples: Added comprehensive documentation and extended end-to-end examples across Computer Vision, Generative AI, Speech Recognition, and NLP workloads.

Bug Fixes

  • Fixed dynamic shapes handling in TorchTensorRT AoT and TensorRT ONNX Dynamo export paths, calibration data creation for ModelOpt PTQ, bfloat16 precision in TensorRT, JIT cache directory collisions, and profiling for models without batching support.

Known Issues

  • AITune currently only supports single-GPU configurations.
  • Just-in-Time tuning does not support transformers>=5 due to @capture_outputs decorator.

Full Changelog

aitune/CHANGELOG.md