Skip to content


The easiest way to run AI Inference in the cloud

Welcome to BentoML πŸ‘‹ Twitter Follow Slack


What is BentoML? πŸ‘©β€πŸ³

BentoML is an open-source model serving library for building model inference APIs and multi-model serving systems with any open-source or custom AI models. It comes with everything you need for serving optimization, model packaging, and simplifies production deployment via ☁️ BentoCloud.

Get in touch πŸ’¬

πŸ‘‰ Join our Slack community!

πŸ‘€ Follow us on X @bentomlai and LinkedIn

πŸ“– Read our blog


  1. BentoML BentoML Public

    The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

    Python 6.6k 746

  2. OpenLLM OpenLLM Public

    Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.

    Python 9k 570


Showing 10 of 70 repositories