Skip to content


Simplify Model Deployment

Deploy your ML models to production today! Twitter Follow Slack

BentoML is an open platform that simplifies ML model deployment and enables you to serve your models at production scale in minutes.

👉 Pop into our Slack community! We're happy to help with any issue you face or even just to meet you and hear what you're working on.

What we are building 👩‍🍳

BentoML - The Unified Model Serving Framework

🍱 BentoML repo | 🎨 Gallery Projects | 📖 Documentation

BentoML makes it easy to turn your ML models into prediction services that's easily deployable. You can use it with any ML framework, incorporate business logic and pre/post-processing code with your model, serve real-time via REST API endpoint or offline via batch inference job, and automatically generate Docker container image for production deployment.


Key Features:

  • Support multiple ML frameworks including PyTorch, TensorFlow, Scikit-Learn, XGBoost, and many more
  • Support Adaptive Batching which dynamically group inference requets into small batches in real-time for better performance
  • Build inference graph composed from multiple models or functions, and execute them in parallel
  • Automatic Docker image can be generated for production deployment

How it works:

  1. Use BentoML to save your trained model:
import bentoml'mnist', trained_model)
  1. Create a ML Service:
import bentoml
from import Image, NumpyNdarray

mnist_runner = bentoml.pytorch.load_runner("mnist")

svc = bentoml.Service("pytorch_mnist_demo", runners=[mnist_runner])

@svc.api(input=Image(), output=NumpyNdarray(dtype="int64"))
async def predict_image(f: PILImage) -> "np.ndarray[t.Any, np.dtype[t.Any]]":
  arr = np.array(f)/255.0
  arr = np.expand_dims(arr, 0).astype("float32")
  output_tensor = await mnist_runner.async_run(arr)
  return output_tensor.numpy()
  1. Run a model server locally to test out the API endpoint:
bentoml serve --reload
  1. Checkout the Quickstart Guide to learn more!

Yatai - Model Deployment at scale on Kubernetes

🦄️ Yatai repo | 👩‍🚀 Administrator's Guide | ⎈ Helm Chart

Yatai helps ML teams to deploy large scale model serving workloads on Kubernetes. It standarlizes BentoML deployment on Kubernetes, provides UI for managing all your ML models and deployments in one place, and enables advanced GitOps and CI/CD workflow.


Key Features:

  • Deployment Automation - deploy Bentos as auto-scaling API endpoints on Kubernetes and easily rollout new versions
  • Bento Registry - manage all your team's Bentos and Models, backed by cloud blob storage(S3, MinIO)
  • Observability - monitoring dashboard helping users to identify model performance issues
  • CI/CD - flexible APIs for integrating with your training and CI/CD pipelines
See more product screenshots yatai-deployment-creation yatai-bento-repos yatai-model-detail yatai-cluster-components yatai-deployment-details yatai-activities

bentoctl - Fast model deployment on any cloud platform

🚀 bentoctl repo | 📖 Documentation

bentoctl is a CLI tool for deploying your machine-learning models to any cloud platform and serving predictions via REST APIs. It is built on top of BentoML and makes it easy to bring any BentoML packaged model to production.


Supported platforms:

How it works:

demo of bentoctl deploying to AWS-EC2

Get in touch!

Come join our community, the veterans alongside the newbs, all trying to figure out what the hell this thing called MLOps is.


  1. BentoML Public

    The Unified Model Serving Framework 🍱

    Python 3.7k 418

  2. Yatai Public

    Model Deployment at scale on Kubernetes 🦄️

    TypeScript 320 24

  3. bentoctl Public

    Fast model deployment on any cloud platform 🚀

    Python 63 9

  4. gallery Public

    BentoML Sample Projects Gallery 🎨

    Jupyter Notebook 110 46