Skip to content


An open source platform for machine learning in production

Welcome to BentoML 👋 Twitter Follow Slack


What we are building 👩‍🍳

BentoML is an open source platform for machine learning in production. It simplifies model packaging and model management, optimizes model serving workloads to run at production scale, and accelerates the creation, deployment, and monitoring of prediction services.

Get in touch 💬

BentoML has a thriving open source community where hundreds of ML practitioners are contributing to the project, helping other users and discuss all things MLOps. 👉 Join us on slack today!

Open Source Projects 👩🏽‍💻

🍱 BentoML

Source | Tutorial | Documentation | Examples

BentoML makes it easy to turn your ML models into prediction services that's easily deployable. You can use it with any ML framework, incorporate business logic and pre/post-processing code with your model, serve real-time via REST API endpoint or offline via batch inference job, and automatically generate Docker container image for production deployment.

Learn More

Key Features:

  • Support multiple ML frameworks including PyTorch, TensorFlow, Scikit-Learn, XGBoost, and many more
  • Support Adaptive Batching which dynamically group inference requets into small batches in real-time for better performance
  • Build inference graph composed from multiple models or functions, and execute them in parallel
  • Automatic Docker image can be generated for production deployment

How it works:

  1. Use BentoML to save your trained model:
import bentoml'mnist', trained_model)
  1. Create a ML Service:
import bentoml
from import Image, NumpyNdarray

mnist_runner = bentoml.pytorch.load_runner("mnist")

svc = bentoml.Service("pytorch_mnist_demo", runners=[mnist_runner])

@svc.api(input=Image(), output=NumpyNdarray(dtype="int64"))
async def predict_image(f: PILImage) -> "np.ndarray[t.Any, np.dtype[t.Any]]":
  arr = np.array(f)/255.0
  arr = np.expand_dims(arr, 0).astype("float32")
  output_tensor = await mnist_runner.async_run(arr)
  return output_tensor.numpy()
  1. Run a model server locally to test out the API endpoint:
bentoml serve --reload
  1. Checkout the Quickstart Guide to learn more!

🦄️ Yatai

Source | Tutorial | Installation | Helm Chart

Yatai is a production-first platform for your machine learning needs. It brings collaborative BentoML workflows to Kubernetes, helps ML teams to run model serving at scale, while simplifying model management and deployment across teams.

Learn More

Core features:

  • Bento Registry - manage all your team's ML models via simple Web UI and API, and store ML assets on cloud blob storage
  • Deployment Automation - deploy Bentos as auto-scaling API endpoints on Kubernetes and easily rollout new versions
  • Observability - monitoring dashboard and logging integration helping users to identify model performance issues
  • CI/CD - flexible APIs for integrating with your training and CI pipelines
See more product screenshots yatai-deployment-creation yatai-bento-repos yatai-model-detail yatai-cluster-components yatai-deployment-details yatai-activities

🚀 bentoctl

Source | Quickstart | Documentation

bentoctl is a CLI tool for deploying your machine-learning models to any cloud platform and serving predictions via REST APIs. It is built on top of BentoML and makes it easy to bring any BentoML packaged model to production.

Learn More

Supported platforms:


  1. BentoML Public

    Unified Model Serving Framework 🍱

    Python 3.9k 440

  2. Yatai Public

    Production-first ML platform on Kubernetes 🦄️

    TypeScript 354 30

  3. bentoctl Public

    Fast model deployment to any cloud platform 🚀

    Python 97 14