Skip to content

ONNX Runtime v1.14.0

Compare
Choose a tag to compare
@rui-ren rui-ren released this 11 Feb 01:03
6ccaedd

Announcements

  • Building ORT from source will require cmake version >=3.24 instead of >=3.18.

General

  • ONNX 1.13 support (opset 18)
  • Threading
    • ORT Threadpool is now NUMA aware (details)
    • New API to set thread affinity (details)
  • New custom operator APIs
    • Enables a custom operator to wrap an entire model that is meant to be inferenced with an external API or runtime.
    • Details and example
  • Multi-stream Execution Provider refactoring
    • Improves GPU utilization by putting parallel inference requests on different GPU streams. Updated for CUDA, TensorRT, and ROCM execution providers
    • Improves memory efficiency by enabling GPU memory reuse across different streams
    • Enables Execution Provider developer to customize its stream implementation by providing "Stream" interface in ExecutionProvider API
  • [Preview] Rust API for ORT - not part of release branch but available to build in main.

Performance

  • Support of quantization with AMX on Sapphire Rapids processors
  • CUDA EP performance improvements:
    • Improve performance of transformer models and decoding methods: beam search, greedy search, and topp sampling.
    • Stable Diffusion model optimizations
    • Change cudnn_conv_use_max_workspace default value to be 1
  • Performance improvements to GRU and Slice operators

Execution Providers

Mobile

  • Pre/Post processing
    • Support updating mobilenet and super resolution models to move the pre and post processing into the model, including usage of custom ops for conversion to/from jpg/png
    • [Coming soon] onnxruntime-extensions packages for Android and iOS with DecodeImage and EncodeImage custom ops
    • Updated the onnxruntime inference examples to demonstrate end-to-end usage with onnxruntime-extensions package
  • XNNPACK
    • Added support for additional commonly used operators
    • Add iOS build support
      • XNNPACK EP is now included in the onnxruntime-c iOS package
    • Added support for using the ORT allocator in XNNPACK kernels to minimize memory usage

Web

  • onnxruntime-extensions included in default ort-web build (NLP centric)
  • XNNPACK Gemm
  • Improved exception handling
  • New utility functions (experimental) to help with exchanging data between images and tensors.

Training

  • Performance optimizations and bug fixes for Hugging Face models (i.e. Xlnet and Bloom)
  • Stable diffusion optimizations for training, including support for Resize and InstanceNorm gradients and addition of ORT-enabled examples to the diffusers library
  • FP16 optimizer exposed in torch-ort (details)
  • Bug fixes for Hugging Face models

Known Issues

  • The Microsoft.ML.OnnxRuntime.DirectML package name includes -dev-* suffix. This is functionally equivalent to the release branch build, and a patch is in progress.

Contributions

Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:
snnn, skottmckay, edgchen1, hariharans29, tianleiwu, yufenglee, guoyu-wang, yuslepukhin, fs-eire, pranavsharma, iK1D, baijumeswani, tracysh, thiagocrepaldi, askhade, RyanUnderhill, wangyems, fdwr, RandySheriffH, jywu-msft, zhanghuanrong, smk2007, pengwa, liqunfu, shahasad, mszhanyi, SherlockNoMad, xadupre, jignparm, HectorSVC, ytaous, weixingzhang, stevenlix, tiagoshibata, faxu, wschin, souptc, ashbhandare, RandyShuai, chilo-ms, PeixuanZuo, cloudhan, dependabot[bot], jeffbloo, chenfucn, linkerzhang, duli2012, codemzs, oliviajain, natke, YUNQIUGUO, Craigacp, sumitsays, orilevari, BowenBao, yangchen-MS, hanbitmyths, satyajandhyala, MaajidKhan, smkarlap, sfatimar, jchen351, georgen117, wejoncy, PatriceVignola, adrianlizarraga, justinchuby, zhangxiang1993, gineshidalgo99, tlh20, xzhu1900, jeffdaily, suryasidd, yihonglyu, liuziyue, chentaMS, jcwchen, ybrnathan, ajindal1, zhijxu-MS, gramalingam, WilBrady, garymm, kkaranasos, ashari4, martinb35, AdamLouly, zhangyaobit, vvchernov, jingyanwangms, wenbingl, daquexian, sreekanth-yalachigere, NonStatic2014, mayavijx, mindest, jstoecker, manashgoswami, Andrews548, baowenlei, kunal-vaishnavi