Skip to content

AITune Release v0.4.0

Latest

Choose a tag to compare

@saturley-hall saturley-hall released this 03 Jun 04:13
· 1 commit to main since this release

AITune Release v0.4.0

AITune v0.4.0 expands backend coverage, improves JIT tuning for complex pipelines, and adds observability needed to understand tuning decisions.

Summary

This release adds ONNXRuntime and Torch Inductor AOT backends, extends quantization support for TorchAO and TensorRT, introduces Dynamo serving support, and improves JIT tuning control for complex pipelines such as diffusion workloads.

It also adds performance validation against Torch eager baselines, richer tuning telemetry, optional hardware metrics, refreshed documentation and examples.

Key Dependencies

  • Python >=3.10,<3.13
  • PyTorch >=2.7,<2.11
  • Torch-TensorRT >2,<2.11
  • TorchAO >=0.13,<0.17
  • TensorRT >=10.5
  • NumPy >=2.0.0
  • Optional Dynamo integration via aitune[dynamo]

Breaking Changes

  • TorchInductorBackend was renamed to TorchInductorJitBackend.
  • HighestThroughputStrategy was renamed to MaxThroughputStrategy.
  • NVTX_ENABLE was renamed to AITUNE_NVTX_EVENTS.
  • aitune.torch.jit_config.backends was removed. Configure JIT backends through aitune.torch.jit_config.strategy, for example FirstWinsStrategy(backends=[...]).
  • The previous public system-monitoring APIs were removed:
  • SystemMonitor
  • system_resource_monitor
  • enable_gpu_memory_logging

Features & Improvements

Backend Coverage

  • Added ONNXRuntime backend support with CUDA and TensorRT execution providers.
  • Added Torch Inductor AOT backend support for saved compiled artifacts.
  • Added TorchAO NVFP4DQ and MXFP8DQ quantization options, including filter support and hardware validation.
  • Added TensorRT NVFP4 quantization support.
  • Improved dynamic-shape handling across Torch Inductor, Torch-TensorRT, and ONNX export paths.

JIT Tuning

  • Added deferred JIT tuning mode for pipelines where modules run variable numbers of times before the best tuning point is known.
  • Added JIT tune strategy selection through aitune.torch.jit_config.strategy.
  • Added package and module-class exclusions for JIT patching.
  • Added Diffusers integration hooks for pipeline compatibility.
  • Improved wrapped-descendant handling so JIT tuning restores patched modules correctly.

Tune Strategies

  • Backends that are correct but slower than Torch eager can now be rejected automatically.
  • Added per-module speedup reporting during tuning.
  • Speedup summaries are visible at the default warning log level.

Observability

  • Added tuning telemetry reports covering runs, modules, graphs, backend builds, throughput, selected backends, and failures.
  • Added AITUNE_TUNING_DATA_PATH for choosing the telemetry output path.
  • Added snapshot_tuning_data() for long-running processes.
  • Added optional hardware metrics collection with AITUNE_HARDWARE_METRICS=1.
  • Added AITUNE_HARDWARE_METRICS_PATH for hardware metrics output.
  • Added NVTX annotation support through AITUNE_NVTX_EVENTS=1.

Deployment & Runtime

  • Added Dynamo worker support for serving AITune-tuned models as Dynamo endpoints.
  • Fixed checkpoint backend artifacts portability by storing relative artifact paths.
  • Added disk-space checks before build and save operations.

Bug Fixes

  • Fixed Torch-TensorRT AOT save failures for wrapt-decorated forwards.
  • Fixed shared dynamic dimensions in Torch-TensorRT AOT.
  • Fixed bounded dimension handling for Torch-TensorRT AOT export.
  • Fixed Torch Inductor AOT dynamic-shape handling.
  • Fixed JIT tuning state handling when wrappers are still in initial state.
  • Preserved externally registered forward hooks across save and restore.
  • Fixed UserDict traversal for inputs such as transformers.BatchEncoding.
  • Fixed Blackwell-specific quantization configuration handling.
  • Dropped the Torch-TensorRT enabled_precisions={float16} default so engines follow the model dtype.

Documentation

  • Migrated the documentation preview to Fern.
  • Added backend guides for ONNXRuntime and Torch Inductor AOT.
  • Refreshed backend, tune strategy, observability, deployment, and JIT guides.
  • Updated notebooks and examples for current imports and backend defaults.
  • Added agent workflow guidance, contribution guidance, code of conduct, and security policy files.

Known Issues

  • Hardware-specific quantization paths such as NVFP4 require compatible NVIDIA hardware and installed backend support.
  • GPU functional tests require the relevant backend dependencies and hardware; unit tests remain the default local validation path.
  • PyTorch 2.11 is outside the supported range for this release.