A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
May 29, 2024 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Large Language Model Text Generation Inference
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Making large AI models cheaper, faster and more accessible
A universal scalable machine learning model deployment solution
The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.
Cross-platform, customizable ML solutions for live and streaming media.
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Swift native on-device speech recognition with Whisper for Apple Silicon
🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.
Utilities to use the Hugging Face Hub API
📚 Jupyter notebook tutorials for OpenVINO™
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Scalable Tool for Gene Network Reverse Engineering
Add a description, image, and links to the inference topic page so that developers can more easily learn about it.
To associate your repository with the inference topic, visit your repo's landing page and select "manage topics."