Skip to content

TorchServe v0.11.0 Release Notes

Latest
Compare
Choose a tag to compare
@lxning lxning released this 17 May 03:59
· 5 commits to master since this release
34bc370

This is the release of TorchServe v0.11.0.

Highlights Include

  • GenAI inference optimizations showcasing
    • torch.compile with OpenVINO backend for Stable Diffusion
    • Intel IPEX for Llama
  • Experimental support for Apple MPS and linux-aarch64
  • Security bug fixing

GenAI

  • Upgraded LLama2 examples to Llama3
    • Supported Llama3 in HuggingFace Accelerate Example #3108 @mreso
    • Supported Llama3 in chat bot #3131 @mreso
    • Supported Llama3 on inf2 Neuronx transformer using continuous batching or micro batching #3133 #3035 @lxning
  • Examples for LoRA and Mistral #3077 @lxning
  • IPEX LLM serving example with Intel AMX #3068 @bbhattar
  • Integration of Intel Openvino with TorchServe using torch.compile. Example showcase of openvino torch.compile backend with Stable Diffusion #3116 @suryasidd
  • Enabling retrieval of guaranteed sequential order of input sequences with low latency for stateful inference via HTTP extending this previously gRPC-only feature #3142 @lxning

Linux aarch64 Support:

TorchServe adds support for linux-aarch64 and shows an example working on AWS Graviton. This provides users with a new platform alternative for serving models on CPU.

Apple Silicon Support:

XGBoost Support:

With the XGBoost Classifier example, we show how to deploy any pickled model with TorchServe.

Security

The ability to bypass allowed_urls using relative paths has been fixed by ensuring preemptive check for relative paths prior to copying the model archive to the model store directory. Also, the default gRPC inference and management addresses are now set to localhost(127.0.0.1) to reduce scope of default access to gRPC endpoints.

C++ Backend

Documentation

Improvements and Bug Fixing

Platform Support

Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.

GPU Support Matrix

TorchServe version PyTorch version Python Stable CUDA Experimental CUDA
0.11.0 2.3.0 >=3.8, <=3.11 CUDA 11.8, CUDNN 8.7.0.84 CUDA 12.1, CUDNN 8.9.2.26
0.10.0 2.2.1 >=3.8, <=3.11 CUDA 11.8, CUDNN 8.7.0.84 CUDA 12.1, CUDNN 8.9.2.26
0.9.0 2.1 >=3.8, <=3.11 CUDA 11.8, CUDNN 8.7.0.84 CUDA 12.1, CUDNN 8.9.2.26
0.8.0 2.0 >=3.8, <=3.11 CUDA 11.7, CUDNN 8.5.0.96 CUDA 11.8, CUDNN 8.7.0.84
0.7.0 1.13 >=3.7, <=3.10 CUDA 11.6, CUDNN 8.3.2.44 CUDA 11.7, CUDNN 8.5.0.96

Inferentia2 Support Matrix

TorchServe version PyTorch version Python Neuron SDK
0.11.0 2.1 >=3.8, <=3.11 2.18.2+
0.10.0 1.13 >=3.8, <=3.11 2.16+
0.9.0 1.13 >=3.8, <=3.11 2.13.2+