serving

Here are 104 public repositories matching this topic...

vectorch-ai / ScaleLLM

A high-performance inference system for large language models, designed for production environments.

performance gpu model production cuda efficiency inference transformer llama speculative serving llm llm-inference llama3

Updated Jun 8, 2024
C++

vespa-engine / vespa

Star

AI + Data, online. https://vespa.ai

java search-engine machine-learning big-data ai server cpp tensorflow vespa serving serving-recommendation vector-search

Updated Jun 8, 2024
Java

ray-project / ray

Star

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Updated Jun 8, 2024
Python

deepjavalibrary / djl-serving

Star

A universal scalable machine learning model deployment solution

deep-learning deployment inference pytorch serving djl

Updated Jun 8, 2024
Java

pytorch / serve

Star

Serve, optimize and scale PyTorch models in production

docker kubernetes machine-learning cpu deep-learning metrics gpu optimization pytorch serving mlops

Updated Jun 8, 2024
Java

openvinotoolkit / model_server

Star

A scalable inference server for models optimized with OpenVINO™

kubernetes machine-learning cloud ai deep-learning inference edge dag model-serving serving openvino

Updated Jun 7, 2024
C++

A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.

structured-data serving unstructured-data unified-sql vector-database mysql-compatibility embedding-search embedding-store key-value-distributed-store vector-ocean real-time-semantic-search

Updated Jun 8, 2024
Java

SeldonIO / seldon-core

Star

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

kubernetes machine-learning deployment serving aiops production-machine-learning mlops machine-learning-operations

Updated Jun 6, 2024
HTML

Lightning-AI / LitServe

Star

Deploy AI models at scale. High-throughput serving engine for AI/ML models that uses the latest state-of-the-art model deployment techniques.

api ai serving

Updated Jun 7, 2024
Python

tensorflow / serving

Star

A flexible, high-performance serving system for machine learning models

python machine-learning deep-neural-networks deep-learning neural-network cpp tensorflow ml serving

Updated Jun 6, 2024
C++

torchpipe / torchpipe

Star

Boosting DL Service Throughput 1.5-4x by Ensemble Pipeline Serving with Concurrent CUDA Streams for PyTorch/LibTorch Frontend and TensorRT/CVCUDA, etc., Backends