Skip to content


The most flexible way to serve AI models in production

Welcome to BentoML πŸ‘‹ Twitter Follow Slack


What is BentoML? πŸ‘©β€πŸ³

BentoML is an open-source model serving library for building performant and scalable AI applications with Python. It comes with everything you need for serving optimization, model packaging, and production deployment.

πŸ”¨ Build Anywhere with Open-Source:

🚒 Efficient scaling on your/our Cloud:

  • ☁️ BentoCloud: Inference Platform for enterprise AI teams to build fast, secure, and scalable AI applications.

Get in touch πŸ’¬

πŸ‘‰ Join our Slack community!

πŸ‘€ Follow us on X @bentomlai and LinkedIn

πŸ“– Read our blog


  1. BentoML BentoML Public

    The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!

    Python 6.5k 737

  2. OpenLLM OpenLLM Public

    Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint, locally and in the cloud.

    Python 8.7k 547


Showing 10 of 70 repositories