AITune Release v0.4.0
AITune v0.4.0 expands backend coverage, improves JIT tuning for complex pipelines, and adds observability needed to understand tuning decisions.
Summary
This release adds ONNXRuntime and Torch Inductor AOT backends, extends quantization support for TorchAO and TensorRT, introduces Dynamo serving support, and improves JIT tuning control for complex pipelines such as diffusion workloads.
It also adds performance validation against Torch eager baselines, richer tuning telemetry, optional hardware metrics, refreshed documentation and examples.
Key Dependencies
- Python
>=3.10,<3.13 - PyTorch
>=2.7,<2.11 - Torch-TensorRT
>2,<2.11 - TorchAO
>=0.13,<0.17 - TensorRT
>=10.5 - NumPy
>=2.0.0 - Optional Dynamo integration via
aitune[dynamo]
Breaking Changes
TorchInductorBackendwas renamed toTorchInductorJitBackend.HighestThroughputStrategywas renamed toMaxThroughputStrategy.NVTX_ENABLEwas renamed toAITUNE_NVTX_EVENTS.aitune.torch.jit_config.backendswas removed. Configure JIT backends throughaitune.torch.jit_config.strategy, for exampleFirstWinsStrategy(backends=[...]).- The previous public system-monitoring APIs were removed:
SystemMonitorsystem_resource_monitorenable_gpu_memory_logging
Features & Improvements
Backend Coverage
- Added ONNXRuntime backend support with CUDA and TensorRT execution providers.
- Added Torch Inductor AOT backend support for saved compiled artifacts.
- Added TorchAO NVFP4DQ and MXFP8DQ quantization options, including filter support and hardware validation.
- Added TensorRT NVFP4 quantization support.
- Improved dynamic-shape handling across Torch Inductor, Torch-TensorRT, and ONNX export paths.
JIT Tuning
- Added deferred JIT tuning mode for pipelines where modules run variable numbers of times before the best tuning point is known.
- Added JIT tune strategy selection through
aitune.torch.jit_config.strategy. - Added package and module-class exclusions for JIT patching.
- Added Diffusers integration hooks for pipeline compatibility.
- Improved wrapped-descendant handling so JIT tuning restores patched modules correctly.
Tune Strategies
- Backends that are correct but slower than Torch eager can now be rejected automatically.
- Added per-module speedup reporting during tuning.
- Speedup summaries are visible at the default warning log level.
Observability
- Added tuning telemetry reports covering runs, modules, graphs, backend builds, throughput, selected backends, and failures.
- Added
AITUNE_TUNING_DATA_PATHfor choosing the telemetry output path. - Added
snapshot_tuning_data()for long-running processes. - Added optional hardware metrics collection with
AITUNE_HARDWARE_METRICS=1. - Added
AITUNE_HARDWARE_METRICS_PATHfor hardware metrics output. - Added NVTX annotation support through
AITUNE_NVTX_EVENTS=1.
Deployment & Runtime
- Added Dynamo worker support for serving AITune-tuned models as Dynamo endpoints.
- Fixed checkpoint backend artifacts portability by storing relative artifact paths.
- Added disk-space checks before build and save operations.
Bug Fixes
- Fixed Torch-TensorRT AOT save failures for wrapt-decorated forwards.
- Fixed shared dynamic dimensions in Torch-TensorRT AOT.
- Fixed bounded dimension handling for Torch-TensorRT AOT export.
- Fixed Torch Inductor AOT dynamic-shape handling.
- Fixed JIT tuning state handling when wrappers are still in initial state.
- Preserved externally registered forward hooks across save and restore.
- Fixed
UserDicttraversal for inputs such astransformers.BatchEncoding. - Fixed Blackwell-specific quantization configuration handling.
- Dropped the Torch-TensorRT
enabled_precisions={float16}default so engines follow the model dtype.
Documentation
- Migrated the documentation preview to Fern.
- Added backend guides for ONNXRuntime and Torch Inductor AOT.
- Refreshed backend, tune strategy, observability, deployment, and JIT guides.
- Updated notebooks and examples for current imports and backend defaults.
- Added agent workflow guidance, contribution guidance, code of conduct, and security policy files.
Known Issues
- Hardware-specific quantization paths such as NVFP4 require compatible NVIDIA hardware and installed backend support.
- GPU functional tests require the relevant backend dependencies and hardware; unit tests remain the default local validation path.
- PyTorch 2.11 is outside the supported range for this release.