Skip to content
Model Serving Made Easy
Python JavaScript TypeScript Jupyter Notebook Shell Dockerfile
Branch: master
Clone or download

Latest commit

yubozhao Add ResNet50 onnx example notebook to docs (#729)
* Add onnx resnet50 example notebook to doc and readme

* Update title for onnx resnet50 model

* update onnx link in documentation site
Latest commit 32347e4 Jun 2, 2020


Type Name Latest commit message Commit time
Failed to load latest commit information.
.github Update Github pull-request template with checklist (#712) May 24, 2020
.vscode [DOCS] Add docs for using micro-batching and yatai-service (#598) Apr 14, 2020
benchmark [Benchmark] move to standalone repo (#705) May 21, 2020
bentoml Feature/670 add jsonartifact (#746) Jun 1, 2020
dev [CI] Add linting test to Travis CI build (#713) May 25, 2020
docker/yatai-service fix yatai service postgress dependency issue (#644) May 11, 2020
docs Add ResNet50 onnx example notebook to docs (#729) Jun 2, 2020
e2e_tests [TEST] Fix sagemaker e2e test prediction result (#735) May 31, 2020
guides [DOCS] Update quickstart notebook (#632) May 6, 2020
protos Support custom docker image (#690) May 18, 2020
tests Feature/670 add jsonartifact (#746) Jun 1, 2020
travis [CI] Include onnx artifact test on travis (#726) May 27, 2020
.flake8 Add for managing BentoML version (#253) Aug 7, 2019
.gitattributes Add for managing BentoML version (#253) Aug 7, 2019
.gitignore [DOCS] Add docs for using micro-batching and yatai-service (#598) Apr 14, 2020
.pep8speaks.yml Check in file used in quickstart guide (#601) Apr 15, 2020
.readthedocs.yml [DOCS] disable pdf and epub build on (#474) Jan 14, 2020
.travis.yml [CI] Include onnx artifact test on travis (#726) May 27, 2020 Create Apr 12, 2019 Update readme (#184) Jun 24, 2019 Add notes on debug logging for bug report template (#613) Apr 29, 2020
LICENSE License notice file update (#481) Jan 20, 2020 [BUILD] Exclude benchmark module in bdist_wheel build (#591) Apr 3, 2020 Add ResNet50 onnx example notebook to docs (#729) Jun 2, 2020
pylintrc Linting error fixes in test module (#710) May 23, 2020
pyproject.toml Update quick-start guide for coming release & MISC (#295) Sep 17, 2019
pytest.ini Fix BentoService bundle pip package dependencies (#396) Nov 25, 2019
setup.cfg Use shorter release tag prefix (#257) Aug 7, 2019 Bump minimum required Python version from 3.4 to 3.6.1 & misc setup.p… Jun 2, 2020
tox.ini Refactor e2e lambda deployment test to use pytest (#554) Feb 29, 2020 Add for managing BentoML version (#253) Aug 7, 2019

pypi status python versions Downloads build status Documentation Status join BentoML Slack


BentoML is an open-source platform for high-performance ML model serving.

What does BentoML do?

  • Turn trained ML model into production API endpoint with a few lines of code
  • Support all major machine learning training frameworks
  • End-to-end model serving solution with DevOps best practices baked-in
  • Micro-batching support, bringing the advantage of batch processing to online serving
  • Model management for teams, providing CLI access and Web UI dashboard
  • Flexible model deployment orchestration supporting Docker, Kubernetes, AWS Lambda, SageMaker, Azure ML and more

👉 Join BentoML Slack to follow the latest development updates and roadmap discussions.

Why BentoML

Getting Machine Learning models into production is hard. Data Scientists are not experts in building production services and DevOps best practices. The trained models produced by a Data Science team are hard to test and hard to deploy. This often leads us to a time consuming and error-prone workflow, where a pickled model or weights file is handed over to a software engineering team.

BentoML is an end-to-end solution for model serving, making it possible for Data Science teams to build production-ready model serving endpoints, with common DevOps best practices and performance optimizations baked in.

Check out Frequently Asked Questions page on how does BentoML compares to Tensorflow-serving, Clipper, AWS SageMaker, MLFlow, etc.

Getting Started

Before starting, make sure Python version is 3.6 or above , and install BentoML with pip:

pip install bentoml

A minimal prediction service in BentoML looks something like this:

from bentoml import env, artifacts, api, BentoService
from bentoml.handlers import DataframeHandler
from bentoml.artifact import SklearnModelArtifact

class IrisClassifier(BentoService):

    def predict(self, df):
        # Optional pre-processing, post-processing code goes here
        return self.artifacts.model.predict(df)

This code defines a prediction service that bundles a scikit-learn model and provides an API. The API here is the entry point for accessing this prediction service, and an API with DataframeHandler will convert HTTP JSON request into pandas.DataFrame object before passing it to the user-defined API function for inferencing.

The following code trains a scikit-learn model and bundles the trained model with an IrisClassifier instance. The IrisClassifier instance is then saved to disk in the BentoML SavedBundle format, which is a versioned file archive that is ready for production models serving deployment.

from sklearn import svm
from sklearn import datasets

from iris_classifier import IrisClassifier

if __name__ == "__main__":
    # Load training data
    iris = datasets.load_iris()
    X, y =,

    # Model Training
    clf = svm.SVC(gamma='scale'), y)

    # Create a iris classifier service instance
    iris_classifier_service = IrisClassifier()

    # Pack the newly trained model artifact
    iris_classifier_service.pack('model', clf)

    # Save the prediction service to disk for model serving
    saved_path =

By default, BentoML stores SavedBundle files under the ~/bentoml directory. Users can also customize BentoML to use a different directory or cloud storage like AWS S3. BentoML also comes with a model management component YataiService, which provides advanced model management features including a dashboard web UI:

BentoML YataiService Bento Repository Page

BentoML YataiService Bento Details Page

To start a REST API server with the saved IrisClassifier service, use bentoml serve command:

bentoml serve IrisClassifier:latest

The IrisClassifier model is now served at localhost:5000. Use curl command to send a prediction request:

curl -i \
  --header "Content-Type: application/json" \
  --request POST \
  --data '[[5.1, 3.5, 1.4, 0.2]]' \

The BentoML API server also provides a web UI for accessing predictions and debugging the server. Visit http://localhost:5000 in the browser and use the Web UI to send prediction request:

BentoML provides a convenient way to containerize the model API server with Docker:

  1. Find the SavedBundle directory with bentoml get command

  2. Run docker build with the SavedBundle directory which contains a generated Dockerfile

  3. Run the generated docker image to start a docker container serving the model

# If jq command not found, install jq (the command-line JSON processor) here:
saved_path=$(bentoml get IrisClassifier:latest -q | jq -r ".uri.uri")

docker build -t {docker_username}/iris-classifier $saved_path

docker run -p 5000:5000 -e BENTOML_ENABLE_MICROBATCH=True {docker_username}/iris-classifier

This made it possible to deploy BentoML bundled ML models with platforms such as Kubeflow, Knative, Kubernetes, which provides advanced model deployment features such as auto-scaling, A/B testing, scale-to-zero, canary rollout and multi-armed bandit.

BentoML can also deploy SavedBundle directly to cloud services such as AWS Lambda or AWS SageMaker, with the bentoml CLI command:

$ bentoml get IrisClassifier
BENTO_SERVICE                         CREATED_AT        APIS                       ARTIFACTS
IrisClassifier:20200121114004_360ECB  2020-01-21 19:40  predict<DataframeHandler>  model<SklearnModelArtifact>
IrisClassifier:20200120082658_4169CF  2020-01-20 16:27  predict<DataframeHandler>  clf<PickleArtifact>

$ bentoml lambda deploy test-deploy -b IrisClassifier:20200121114004_360ECB

$ bentoml deployment list
NAME           NAMESPACE    PLATFORM    BENTO_SERVICE                         STATUS    AGE
test-deploy    dev          aws-lambda  IrisClassifier:20200121114004_360ECB  running   2 days and 11 hours

Check out the deployment guides and other deployment options with BentoML here.


BentoML full documentation:


Visit bentoml/gallery repository for more examples and tutorials.




Tensorflow Keras

Tensorflow 2.0






Deployment guides:


Have questions or feedback? Post a new github issue or discuss in our Slack channel: join BentoML Slack

Want to help build BentoML? Check out our contributing guide and the development guide.


BentoML is under active development and is evolving rapidly. Currently it is a Beta release, we may change APIs in future releases.

Read more about the latest features and changes in BentoML from the releases page.

Usage Tracking

BentoML by default collects anonymous usage data using Amplitude. It only collects BentoML library's own actions and parameters, no user or model data will be collected. Here is the code that does it.

This helps BentoML team to understand how the community is using this tool and what to build next. You can easily opt-out of usage tracking by running the following command:

# From terminal:
bentoml config set usage_tracking=false
# From python:
import bentoml
bentoml.config().set('core', 'usage_tracking', 'False')


Apache License 2.0

FOSSA Status

You can’t perform that action at this time.