Release AITune Release v0.3.0 · ai-dynamo/aitune

Summary

AITune is an open-source (Apache 2.0) inference toolkit, hosted under the ai-dynamo GitHub organization and distributed via PyPI. It is designed for tuning and deploying Deep Learning models on NVIDIA GPUs, significantly improving inference speed and efficiency across various AI workloads.

Major Features & Improvements

Tuning Modes

Just-in-Time (JIT) Tuning: Zero-code model tuning and inspection controlled through a single import or environment flag. Tunes on the very first model call using only one sample, with automatic fallback to Torch Inductor when a backend cannot compile a module.
Ahead-of-Time (AOT) Tuning: Low-code API for explicit model inspection, backend selection, and module-level tuning. Supports forward hooks for custom pre/post-processing logic around tuned modules.

Backend Support

TensorRT: Multi-profile engines with auto-generated and user-provided profiles, CUDA graph capture, FP16/FP8/INT8 mixed precision via TensorRT Model Optimizer, and Dynamo-based ONNX export (torch.onnx.export(dynamo=True)) for improved graph fidelity.
TorchInductor: Added support for static and dynamic HuggingFace models, broadening model compatibility beyond TensorRT workflows.

Model Compatibility

Complex Inputs: Support for dataclasses, user-defined objects in module.forward() arguments, and lists/dicts within Torch module containers for more complete model analysis.
LLM Support: Added KV cache support to enable tuning of autoregressive large language models.

Performance & Observability

Memory Optimization: Reduced CPU/GPU memory usage during tuning by offloading inactive modules to the meta device, with optimized input/output metadata handling.
Profiling: Extended metrics collection through NVTX annotations for Nsight Systems integration. Added configurable console output suppression with automatic log-to-file.

Documentation & Examples

Documentation & Examples: Added comprehensive documentation and extended end-to-end examples across Computer Vision, Generative AI, Speech Recognition, and NLP workloads.

Bug Fixes

Fixed dynamic shapes handling in TorchTensorRT AoT and TensorRT ONNX Dynamo export paths, calibration data creation for ModelOpt PTQ, bfloat16 precision in TensorRT, JIT cache directory collisions, and profiling for models without batching support.

Known Issues

AITune currently only supports single-GPU configurations.
Just-in-Time tuning does not support transformers>=5 due to @capture_outputs decorator.

Full Changelog

aitune/CHANGELOG.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AITune Release v0.3.0

Choose a tag to compare

Sorry, something went wrong.