BentoML is an open platform that simplifies ML model deployment and enables you to serve your models at production scale in minutes.
🍱 BentoML: The Unified Model Serving Framework🦄️ Yatai: Model Deployment at scale on Kubernetes🚀 bentoctl: Fast model deployment on any cloud platform
👩🍳
What we are building BentoML - The Unified Model Serving Framework
BentoML makes it easy to turn your ML models into prediction services that's easily deployable. You can use it with any ML framework, incorporate business logic and pre/post-processing code with your model, serve real-time via REST API endpoint or offline via batch inference job, and automatically generate Docker container image for production deployment.
Details
Key Features:
- Support multiple ML frameworks including PyTorch, TensorFlow, Scikit-Learn, XGBoost, and many more
- Support Adaptive Batching which dynamically group inference requets into small batches in real-time for better performance
- Build inference graph composed from multiple models or functions, and execute them in parallel
- Automatic Docker image can be generated for production deployment
How it works:
- Use BentoML to save your trained model:
import bentoml
bentoml.pytorch.save('mnist', trained_model)
- Create a ML Service:
# mnist_service.py
import bentoml
from bentoml.io import Image, NumpyNdarray
mnist_runner = bentoml.pytorch.load_runner("mnist")
svc = bentoml.Service("pytorch_mnist_demo", runners=[mnist_runner])
@svc.api(input=Image(), output=NumpyNdarray(dtype="int64"))
async def predict_image(f: PILImage) -> "np.ndarray[t.Any, np.dtype[t.Any]]":
arr = np.array(f)/255.0
arr = np.expand_dims(arr, 0).astype("float32")
output_tensor = await mnist_runner.async_run(arr)
return output_tensor.numpy()
- Run a model server locally to test out the API endpoint:
bentoml serve mnist_service.py:svc --reload
- Checkout the Quickstart Guide to learn more!
Yatai - Model Deployment at scale on Kubernetes
Yatai helps ML teams to deploy large scale model serving workloads on Kubernetes. It standarlizes BentoML deployment on Kubernetes, provides UI for managing all your ML models and deployments in one place, and enables advanced GitOps and CI/CD workflow.
Details
Key Features:
- Deployment Automation - deploy Bentos as auto-scaling API endpoints on Kubernetes and easily rollout new versions
- Bento Registry - manage all your team's Bentos and Models, backed by cloud blob storage(S3, MinIO)
- Observability - monitoring dashboard helping users to identify model performance issues
- CI/CD - flexible APIs for integrating with your training and CI/CD pipelines

bentoctl - Fast model deployment on any cloud platform
bentoctl
is a CLI tool for deploying your machine-learning models to any cloud platform and serving predictions via REST APIs.
It is built on top of BentoML and makes it easy to bring any BentoML packaged model to production.
Details
Supported platforms:
- AWS EC2
- AWS Lambda
- AWS SageMaker
- Azure Functions
- Azure Container Instances
- Google Cloud Run
- Google Compute Engine
- Heroku
- Knative (WIP)
- Looking for Kubernetes? Try out Yatai: Model deployment at scale on Kubernetes.
- Customize deploy target by creating bentoctl plugin from the deployment operator template.
How it works:
Get in touch!
Come join our community, the veterans alongside the newbs, all trying to figure out what the hell this thing called MLOps is.