# Getting Started with BentoML

[BentoML](http://bentoml.ai) is an open-source framework for high-performance machine learning model serving. It makes it easy to build production API endpoints for trained ML models and supports all major machine learning frameworks, including Tensorflow, Keras, PyTorch, XGBoost, scikit-learn, fastai, etc.

BentoML comes with a high-performance API model server with adaptive micro-batching support, bringing the advantage of batch processing to online serving workloads. It also provides batch serving, model management and model deployment functionality, which gives ML teams an end-to-end model serving solution with baked-in DevOps best practices.

This is a quick tutorial on how to use BentoML to serve a sklearn modeld via a REST API server, containerize the API model server with Docker, and deploy it to [AWS Lambda](https://aws.amazon.com/lambda/) as a serverless endpoint.

![Impression](https://www.google-analytics.com/collect?v=1&tid=UA-112879361-3&cid=555&t=event&ec=guides&ea=bentoml-quick-start-guide&dt=bentoml-quick-start-guide)

BentoML requires python 3.6 or above, install dependencies via `pip`:

In [1]:
# Install PyPI packages required in this guide, including BentoML
!pip install -q bentoml 'scikit-learn>=0.23.2' 'pandas>=1.1.1'

Train a classifier model with the Iris flower data set:

In [2]:
from sklearn import svm
from sklearn import datasets

# Load training data
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Model Training
clf = svm.SVC(gamma='scale')
clf.fit(X, y)

SVC()

## Create a Prediction Service with BentoML


A minimal prediction service in BentoML looks something like this:

In [3]:
%%writefile iris_classifier.py
import pandas as pd

from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import DataframeInput
from bentoml.artifact import SklearnModelArtifact

@env(auto_pip_dependencies=True)
@artifacts([SklearnModelArtifact('model')])
class IrisClassifier(BentoService):

    @api(input=DataframeInput())
    def predict(self, df: pd.DataFrame):
        # Optional pre-processing, post-processing code goes here
        return self.artifacts.model.predict(df)

Overwriting iris_classifier.py


This code defines a prediction service that bundles a scikit-learn model and provides an
API that expects input data in the form of `pandas.Dataframe`. The user-defined API
function `predict` defines how the input dataframe data will be processed and used for 
inference with the bundled scikit-learn model. BentoML also supports other API input 
types such as `ImageInput`, `JsonInput` and 
[more](https://docs.bentoml.org/en/latest/api/adapters.html).

The following code packages the trained model with the
`IrisClassifier` class defined above. It then saves the IrisClassifier instance to disk 
in the BentoML SavedBundle format:

In [4]:
# import the custom BentoService defined above
from iris_classifier import IrisClassifier

# Create a iris classifier service instance
iris_classifier_service = IrisClassifier()

# Pack the newly trained model artifact
iris_classifier_service.pack('model', clf)

# Save the prediction service to disk for model serving
saved_path = iris_classifier_service.save()

[2020-08-27 22:25:52,733] INFO - BentoService bundle 'IrisClassifier:20200827222552_151AF3' saved to: /Users/chaoyu/bentoml/repository/IrisClassifier/20200827222552_151AF3


By default, BentoML stores SavedBundle files under the `~/bentoml` directory. Users 
can also customize BentoML to use a different directory or cloud storage like
[AWS S3](https://aws.amazon.com/s3/) and [MinIO](https://min.io/), via BentoML's
model management component [YataiService](https://docs.bentoml.org/en/latest/concepts.html#customizing-model-repository),
which provides advanced model management features including a dashboard web UI:

![BentoML YataiService Bento Repository Page](https://raw.githubusercontent.com/bentoml/BentoML/master/docs/source/_static/img/yatai-service-web-ui-repository.png)

![BentoML YataiService Bento Details Page](https://raw.githubusercontent.com/bentoml/BentoML/master/docs/source/_static/img/yatai-service-web-ui-repository-detail.png)

Start the YataiService web server on your local development machine with the CLI command `bentoml yatai-service-start` and visit http://127.0.0.1:3000 to view the web UI. More documentation about model management can be found [here](https://docs.bentoml.org/en/latest/concepts.html#model-management).

In [5]:
# Where the SavedBundle directory is saved to
print("saved_path:", saved_path)

# Print the auto-generated service version
print("version:", iris_classifier_service.version)

saved_path: /Users/chaoyu/bentoml/repository/IrisClassifier/20200827222552_151AF3
version: 20200827222552_151AF3


In [6]:
# Find the saved path from CLI:
!bentoml get IrisClassifier:latest --print-location --quiet

[39m/Users/chaoyu/bentoml/repository/IrisClassifier/20200827222552_151AF3[0m


## REST API Model Serving



The BentoML SavedBundle directory contains all the code, data and configs required to 
deploy the model. 

To start a local development REST API model server with the `IrisClassifier` SavedBundle, use the `bentoml serve` command:

In [7]:
!bentoml serve IrisClassifier:latest

[2020-08-27 22:27:00,936] INFO - Getting latest version IrisClassifier:20200827222552_151AF3
[2020-08-27 22:27:00,936] INFO - Starting BentoML API server in development mode..
 * Serving Flask app "IrisClassifier" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [27/Aug/2020 22:27:05] "[37mGET / HTTP/1.1[0m" 200 -
127.0.0.1 - - [27/Aug/2020 22:27:05] "[37mGET /swagger_static/swagger-ui.css HTTP/1.1[0m" 200 -
127.0.0.1 - - [27/Aug/2020 22:27:05] "[37mGET /swagger_static/swagger-ui-bundle.js HTTP/1.1[0m" 200 -
127.0.0.1 - - [27/Aug/2020 22:27:05] "[37mGET /docs.json HTTP/1.1[0m" 200 -
127.0.0.1 - - [27/Aug/2020 22:27:05] "[33mGET /favicon.ico HTTP/1.1[0m" 404 -
[2020-08-27 22:27:15,393] INFO - {'request_id': 'ad4692bd-0dd0-450f-89d8-88cf8026a687', 'service_name': 'IrisClassifier', 'service_version': '20200827222552_151AF3', 'api': 'predict', 're

If you are running this notebook from Google Colab, you can start the dev server with `--run-with-ngrok` option, to gain acccess to the API endpoint via a public endpoint managed by [ngrok](https://ngrok.com/): 

In [None]:
!bentoml serve IrisClassifier:latest --run-with-ngrok

The `IrisClassifier` model is now served at `localhost:5000`. Use `curl` command to send
a prediction request:

```bash
curl -i \
--header "Content-Type: application/json" \
--request POST \
--data '[[5.1, 3.5, 1.4, 0.2]]' \
localhost:5000/predict
```

Or with `python` and [request library](https://requests.readthedocs.io/):
```python
import requests
response = requests.post("http://127.0.0.1:5000/predict", json=[[5.1, 3.5, 1.4, 0.2]])
print(response.text)
```

The BentoML API server also provides a web UI for accessing predictions and debugging 
the server. Visit http://localhost:5000 in the browser and use the Web UI to send
prediction request:

![BentoML API Server Web UI Screenshot](https://raw.githubusercontent.com/bentoml/BentoML/master/guides/quick-start/bento-api-server-web-ui.png)

## Containerize model server with Docker


BentoML provides a convenient way to containerize the model API server with Docker. Simply run `docker build` with the SavedBundle directory which contains a generated Dockerfile:

In [8]:
!saved_path=$(bentoml get IrisClassifier:latest --print-location --quiet)
!docker build -q -t iris-classifier $saved_path

sha256:8be009f63cdfd31231461885ad47aa871d90f0e44296d85d35c09a9818bd54c2


BentoML also provide an equivilant CLI command for building docker image via the Docker deamon configured in current environment:

In [14]:
!bentoml containerize IrisClassifier:latest -t iris-classifier

[2020-08-27 22:36:28,926] INFO - Getting latest version IrisClassifier:20200827222552_151AF3
[39mFound Bento: /Users/chaoyu/bentoml/repository/IrisClassifier/20200827222552_151AF3[0m
[33mImage version not specified, using version parsed from BentoService: '20200827222552_151AF3'[0m
Building Docker image iris-classifier:20200827222552_151AF3 from IrisClassifier:latest 
|[39mStep 1/15 : FROM bentoml/model-server:0.8.6[0m
[39m ---> 71644b758bed[0m
[39mStep 2/15 : COPY . /bento[0m
[39m ---> Using cache[0m
[39m ---> 9b8fed7107d2[0m
[39mStep 3/15 : WORKDIR /bento[0m
[39m ---> Using cache[0m
[39m ---> 4095858ad689[0m
[39mStep 4/15 : ARG PIP_INDEX_URL=https://pypi.python.org/simple/[0m
[39m ---> Using cache[0m
[39m ---> 92b03d63dd6c[0m
[39mStep 5/15 : ARG PIP_TRUSTED_HOST=pypi.python.org[0m
[39m ---> Using cache[0m
[39m ---> 648df1c702d4[0m
[39mStep 6/15 : ENV PIP_INDEX_URL $PIP_INDEX_URL[0m
[39m ---> Using cache[0m
[39m ---> 8916e89508bd[0m
[39mStep 7/15

Note that `docker` is __note available in Google Colab__, download the notebook, ensure docker is installed and try it locally.

Run the generated docker image to start a docker container serving the model:

In [15]:
!docker run -p 5000:5000 iris-classifier:latest --workers=1

[2020-08-28 05:36:33,644] INFO - Starting BentoML API server in production mode..
[2020-08-28 05:36:34 +0000] [1] [INFO] Starting gunicorn 20.0.4
[2020-08-28 05:36:34 +0000] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)
[2020-08-28 05:36:34 +0000] [1] [INFO] Using worker: sync
[2020-08-28 05:36:34 +0000] [11] [INFO] Booting worker with pid: 11
[2020-08-28 05:36:38,315] INFO - {'request_id': '9b85d80c-365a-4529-952a-746cfa2359ea', 'service_name': 'IrisClassifier', 'service_version': '20200827222552_151AF3', 'api': 'predict', 'request': [[5.1, 3.5, 1.4, 0.2]], 'response_code': 200, 'response': [b'[0]']}
^C
[2020-08-28 05:36:41 +0000] [1] [INFO] Handling signal: int
[2020-08-28 05:36:41 +0000] [11] [INFO] Worker exiting (pid: 11)


This made it possible to deploy BentoML bundled ML models with platforms such as
[Kubeflow](https://www.kubeflow.org/docs/components/serving/bentoml/),
[Knative](https://knative.dev/community/samples/serving/machinelearning-python-bentoml/),
[Kubernetes](https://docs.bentoml.org/en/latest/deployment/kubernetes.html), which
provides advanced model deployment features such as auto-scaling, A/B testing,
scale-to-zero, canary rollout and multi-armed bandit.


## Load saved BentoService

`bentoml.load` is the enssential API for loading a Bento into your
python application:

In [16]:
import bentoml
import pandas as pd

bento_svc = bentoml.load(saved_path)

# Test loaded bentoml service:
bento_svc.predict([X[0]])



memmap([0])

This can be useful for building test pipeline for your prediction service or using the same predictions service for  offline batch serving.


## Distribute BentoML SavedBundle as PyPI package


The BentoML SavedBundle is pip-installable and can be directly distributed as a
PyPI package for use in python applications:

In [17]:
!pip install -q {saved_path}

In [18]:
# The BentoService class name will become packaged name
import IrisClassifier

installed_svc = IrisClassifier.load()
installed_svc.predict([X[0]])

memmap([0])

This also allow users to upload their BentoService to pypi.org as public python package
or to their organization's private PyPi index to share with other developers.

`cd {saved_path} & python setup.py sdist upload`

*You will have to configure ".pypirc" file before uploading to pypi index.
    You can find more information about distributing python package at:
    https://docs.python.org/3.7/distributing/index.html#distributing-index*


# Batch Offline Serving via CLI

`pip install {saved_path}` also installs a CLI tool for accessing the BentoML service, print CLI help document with `--help`:


In [19]:
!IrisClassifier --help

Usage: IrisClassifier [OPTIONS] COMMAND [ARGS]...

  BentoML CLI tool

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  containerize        Containerizes given Bento into a ready-to-use Docker
                      image

  info                List APIs
  install-completion  Install shell command completion
  open-api-spec       Display OpenAPI/Swagger JSON specs
  run                 Run API function
  serve               Start local dev API server
  serve-gunicorn      Start production API server


View the help manual for the `run` command:

In [None]:
!IrisClassifier run predict --help

Run prediction job from CLI:

In [20]:
!IrisClassifier run predict --input='[[5.1, 3.5, 1.4, 0.2]]'

[0]


BentoML cli also supports reading input data from `csv` or `json` files, in either local machine or remote HTTP/S3 location:

In [21]:
!IrisClassifier run predict --input="https://raw.githubusercontent.com/bentoml/BentoML/master/guides/quick-start/iris_data.csv"

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2
 2 2]


The same CLI command is also available via `bentoml` cli, by specifying the BentoService name and version:

In [22]:
!bentoml run IrisClassifier:latest predict --input='[[5.1, 3.5, 1.4, 0.2]]'

[2020-08-27 22:37:09,964] INFO - Getting latest version IrisClassifier:20200827222552_151AF3
[0]


# Deploy API model server to cloud services


BentoML can deploy SavedBundle directly to cloud services such as AWS Lambda or 
AWS SageMaker, with the bentoml CLI command. Check out the deployment guides and 
other deployment options with BentoML [here](https://docs.bentoml.org/en/latest/deployment/index.html).


The following part of the notebook, demonstrates how to deploy the IrisClassifier
model server built in the previous steps, to [AWS Lambda](https://aws.amazon.com/lambda/)
as a serverless endpoint.

Before started, install the `aws-sam-cli` package, which is required by BentoML
to create AWS Lambda deployment:

In [23]:
!pip install -q -U aws-sam-cli==0.33.1

Make sure an AWS account and credentials is configured either via
[environment variables](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/setup-credentials.html)
or the `aws configure` command. (Install `aws` cli command via `pip install awscli` and follow
[instructions here](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html#cli-quick-configuration))

To create a BentoML deployment on AWS Lambda, using the `bentoml lambda deploy` command:

In [24]:
!bentoml lambda deploy quick-start-guide-deployment -b IrisClassifier:{iris_classifier_service.version} 

Deploying "IrisClassifier:20200827222552_151AF3" to AWS Lambda -[2020-08-27 22:37:51,915] INFO - Building lambda project
/[2020-08-27 22:39:07,324] INFO - Packaging AWS Lambda project at /private/var/folders/7p/y_934t3s4yg8fx595vr28gym0000gn/T/bentoml-temp-8iv7dufc ...
|[2020-08-27 22:40:50,010] INFO - Deploying lambda project
/[2020-08-27 22:41:41,781] INFO - ApplyDeployment (quick-start-guide-deployment, namespace dev) succeeded
[32mSuccessfully created AWS Lambda deployment quick-start-guide-deployment[0m
[39m{
  "namespace": "dev",
  "name": "quick-start-guide-deployment",
  "spec": {
    "bentoName": "IrisClassifier",
    "bentoVersion": "20200827222552_151AF3",
    "operator": "AWS_LAMBDA",
    "awsLambdaOperatorConfig": {
      "region": "us-west-1",
      "memorySize": 1024,
      "timeout": 3
    }
  },
  "state": {
    "state": "RUNNING",
    "infoJson": {
      "endpoints": [
        "https://qcc6weu1u3.execute-api.us-west-1.amazonaws.com/Prod/predict"
      ],
      "

The 'quick-starrt-guide-deployment' here is the deployment name, which can be used to query the current deployment status:

In [25]:
!bentoml lambda get quick-start-guide-deployment

[39m{
  "namespace": "dev",
  "name": "quick-start-guide-deployment",
  "spec": {
    "bentoName": "IrisClassifier",
    "bentoVersion": "20200827222552_151AF3",
    "operator": "AWS_LAMBDA",
    "awsLambdaOperatorConfig": {
      "region": "us-west-1",
      "memorySize": 1024,
      "timeout": 3
    }
  },
  "state": {
    "state": "RUNNING",
    "infoJson": {
      "endpoints": [
        "https://qcc6weu1u3.execute-api.us-west-1.amazonaws.com/Prod/predict"
      ],
      "s3_bucket": "btml-dev-quick-start-guide-deployment-ea76c7"
    },
    "timestamp": "2020-08-28T05:48:28.356241Z"
  },
  "createdAt": "2020-08-28T05:37:45.380026Z",
  "lastUpdatedAt": "2020-08-28T05:37:45.380053Z"
}[0m


In [31]:
# Grab the endpoint URL from the command result above, this requires `jq` to be installed
!endpoint=$(bentoml lambda get quick-start-guide-deployment | jq -r ".state.infoJson.endpoints[0]") && \
    echo $endpoint

https://qcc6weu1u3.execute-api.us-west-1.amazonaws.com/Prod/predict


To send request to your AWS Lambda deployment, grab the endpoint URL from the json output above:

In [34]:
! curl -i \
--header "Content-Type: application/json" \
--request POST \
--data '[[5.1, 3.5, 1.4, 0.2]]' \
$(bentoml lambda get quick-start-guide-deployment | jq -r ".state.infoJson.endpoints[0]")

HTTP/1.1 200 OK
[1mContent-Type[0m: application/json
[1mContent-Length[0m: 3
[1mConnection[0m: keep-alive
[1mDate[0m: Fri, 28 Aug 2020 05:51:40 GMT
[1mx-amzn-RequestId[0m: aa9bed06-81e0-45c5-a482-b1282d6b994e
[1mAccess-Control-Allow-Origin[0m: *
[1mx-amz-apigw-id[0m: R904-HL4yK4Fnsw=
[1mX-Amzn-Trace-Id[0m: Root=1-5f489b6c-1d5a18b950f9e017fbc54850;Sampled=0
[1mX-Cache[0m: Miss from cloudfront
[1mVia[0m: 1.1 2de9b6504a97ad8423645370927ef0cf.cloudfront.net (CloudFront)
[1mX-Amz-Cf-Pop[0m: SFO20-C1
[1mX-Amz-Cf-Id[0m: yB8HL9n-MrF_CzW3HDEzoU9z82Q-mWjc4M9WQQMAzARKksOtFmiNFg==

[0]

To list all the deployments you've created:

In [35]:
!bentoml deployment list

[39mNAME                          NAMESPACE    PLATFORM    BENTO_SERVICE                         STATUS    AGE
quick-start-guide-deployment  dev          aws-lambda  IrisClassifier:20200827222552_151AF3  running   13 minutes and 57.26 seconds[0m


And to delete an active deployment:

In [None]:
!bentoml deployment delete quick-start-guide-deployment

BentoML by default stores the deployment metadata on the local machine. For team settings, we recommend hosting a shared BentoML YataiService for a data science team to track all their BentoML SavedBundles and model serving deployments created. See related documentation [here](https://docs.bentoml.org/en/latest/concepts.html#customizing-model-repository).

# Summary

This is what it looks like when using BentoML to serve and deploy a model in the cloud. BentoML also supports [many other Machine Learning frameworks](https://docs.bentoml.org/en/latest/examples.html), as well as [many other deployment platforms](https://docs.bentoml.org/en/latest/deployment/index.html). The [BentoML core concepts](https://docs.bentoml.org/en/latest/concepts.html) doc is also recommended for anyone looking to get a deeper understanding of BentoML.

Join the [BentoML Slack](https://join.slack.com/t/bentoml/shared_invite/enQtNjcyMTY3MjE4NTgzLTU3ZDc1MWM5MzQxMWQxMzJiNTc1MTJmMzYzMTYwMjQ0OGEwNDFmZDkzYWQxNzgxYWNhNjAxZjk4MzI4OGY1Yjg) to follow the latest development updates and roadmap discussions.