# Getting Started with BentoML

[BentoML](http://bentoml.ai) is an open-source platform for __machine learning model serving__.

What does BentoML do?

* Turn your ML model into production API endpoint with just a few lines of code
* Support all major machine learning training frameworks
* High performance API serving system with adaptive micro-batching support
* DevOps best practices baked in, simplify the transition from model development to production
* Model management for teams, providing CLI and Web UI dashboard
* Flexible model deployment orchestration with support for AWS Lambda, SageMaker, EC2, Docker, Kubernetes, KNative and more

This is a quick tutorial on how to use BentoML to serve a sklearn modeld via a REST API server and deploy it to [AWS Lambda](https://aws.amazon.com/lambda/) as a serverless endpoint.

![Impression](https://www.google-analytics.com/collect?v=1&tid=UA-112879361-3&cid=555&t=event&ec=guides&ea=bentoml-quick-start-guide&dt=bentoml-quick-start-guide)

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

BentoML requires python 3.6 or above, install via `pip`:

In [None]:
# Install BentoML
!pip install bentoml

# Also install scikit-learn, we will use a sklean model as an example
!pip install pandas sklearn

## Creating a Prediction Service with BentoML


A minimal prediction service in BentoML looks something like this:

In [3]:
%%writefile iris_classifier.py
from bentoml import BentoService, api, env, artifacts
from bentoml.artifact import SklearnModelArtifact
from bentoml.handlers import DataframeHandler

@artifacts([SklearnModelArtifact('model')])
@env(auto_pip_dependencies=True)
class IrisClassifier(BentoService):

    @api(DataframeHandler)
    def predict(self, df):
        return self.artifacts.model.predict(df)

Overwriting iris_classifier.py


The `bentoml.api` decorator defines a service API, which is the entry point for sending prediction request. The function being decorated is user defined code for processing prediction requests. Lastly the `DataframeHandler` here tells BentoML that this service API is expecting `pandas.DataFrame` object as its input format.

The `bentoml.env` decorator allows specifying the dependencies and environment settings for this prediction service. Here we are using BentoML's `auto_pip_dependencies` fature which automatically extracts and bundles all pip packages that are required for your prediction service and pins down their version.


Lastly `bentoml.artifact` defines the required trained models to be
bundled with this prediction service. Here it is using the built-in `SklearnModelArtifact` and simply naming it 'model'. BentoML also provide model artifact classes for other frameworks such as `PytorchModelArtifact`, `KerasModelArtifact`, `FastaiModelArtifact`, and `XgboostModelArtifact` etc.


## Creating a BentoService saved bundle

No thing needs to be changed in your regular model training and evaluation code:

In [4]:
from sklearn import svm
from sklearn import datasets

# Load training data
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Model Training
clf = svm.SVC(gamma='scale')
clf.fit(X, y)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

Following by the model training code, use the IrisClassifier BentoService class defined above to package this model for serving:

In [5]:
# import the custom BentoService defined above
from iris_classifier import IrisClassifier

# Create a iris classifier service instance
svc = IrisClassifier()

# Pack the newly trained model artifact
svc.pack('model', clf)

# save BentoSerivce to a BentoML bundle
saved_path = svc.save()
print("saved_path:", saved_path)

# Check the auto-generated service version
# Which can also be set manually with svc.set_version() before `save`
print("version:", svc.version)

[2020-04-03 01:53:17,971] INFO - BentoService bundle 'IrisClassifier:20200403015304_3FC8C9' saved to: /Users/chaoyu/bentoml/repository/IrisClassifier/20200403015304_3FC8C9
saved_path: /Users/chaoyu/bentoml/repository/IrisClassifier/20200403015304_3FC8C9
version: 20200403015304_3FC8C9


_That's it._ You've just created a BentoService SavedBundle, it's a versioned file archive that is
ready for production deployment. It contains the BentoService class you defined, all its
python code dependencies and PyPI dependencies, and the trained scikit-learn model. By
default, BentoML saves those files and related metadata under `~/bentoml` directory, but 
this is easily customizable to a different directory or cloud storage like
[Amazon S3](https://aws.amazon.com/s3/).

## Model Serving via REST API

From a BentoService SavedBundle, you can start a REST API server by providing the file path to the saved bundle:

In [6]:
# Note that REST API serving **does not work in Google Colab** due to unable to access Colab's VM
!bentoml serve IrisClassifier:latest

# Alternatively:
#!bentoml serve {saved_path}

[2020-04-03 01:54:21,534] INFO - Getting latest version IrisClassifier:20200403015304_3FC8C9
 * Serving Flask app "IrisClassifier" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [03/Apr/2020 01:54:33] "[37mPOST /predict HTTP/1.1[0m" 200 -
^C


#### View documentations for REST APIs

The REST API server provides a simply web UI for you to test and debug. If you are running this command on your local machine, visit http://127.0.0.1:5000 in your browser and try out sending API request to the server.

![BentoML API Server Web UI Screenshot](https://raw.githubusercontent.com/bentoml/BentoML/master/guides/quick-start/bento-api-server-web-ui.png)

#### Send prediction request to REST API server

You can also send prediction request with `curl` from command line:

```bash
curl -i \
--header "Content-Type: application/json" \
--request POST \
--data '[[5.1, 3.5, 1.4, 0.2]]' \
localhost:5000/predict
```

Or with `python` and `request` library:
```python
import requests
response = requests.post("http://127.0.0.1:5000/predict", json=[[5.1, 3.5, 1.4, 0.2]])
print(response.text)
```



## Containerize REST API server with Docker


The BentoService SavedBundle is structured to work as a docker build context, that can
be directed used to build a docker image for API server. Simply use it as the docker
build context directory:

Note that the `{saved_path}` in the following commands are referring to the returned value of `iris_classifier_service.save()`. It is the file path where the BentoService saved bundle is stored. You can also find it via `bentoml get IrisClassifier -o wide` command.

In [9]:
!cd {saved_path} && docker build -t iris-classifier .

Sending build context to Docker daemon  25.09kB
Step 1/15 : FROM continuumio/miniconda3:4.7.12
 ---> 406f2b43ea59
Step 2/15 : ENTRYPOINT [ "/bin/bash", "-c" ]
 ---> Using cache
 ---> a0c8b09d3f8f
Step 3/15 : EXPOSE 5000
 ---> Using cache
 ---> 063ef48adef5
Step 4/15 : RUN set -x      && apt-get update      && apt-get install --no-install-recommends --no-install-suggests -y libpq-dev build-essential      && rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 70e7d2f54f64
Step 5/15 : RUN conda install pip numpy scipy       && pip install gunicorn
 ---> Using cache
 ---> 00d7e233a814
Step 6/15 : COPY . /bento
 ---> 24ddf7248e41
Step 7/15 : WORKDIR /bento
 ---> Running in b72ffab19803
Removing intermediate container b72ffab19803
 ---> 1edf2ed7458b
Step 8/15 : RUN if [ -f /bento/setup.sh ]; then /bin/bash -c /bento/setup.sh; fi
 ---> Running in d6f002b7748a
Removing intermediate container d6f002b7748a
 ---> 6c6b7a80183f
Step 9/15 : RUN conda env update -n base -f /bento/environment.yml
 ---

  Downloading ply-3.11-py2.py3-none-any.whl (49 kB)
Collecting MarkupSafe>=0.9.2
  Downloading MarkupSafe-1.1.1-cp37-cp37m-manylinux1_x86_64.whl (27 kB)
Collecting docutils<0.16,>=0.10
  Downloading docutils-0.15.2-py3-none-any.whl (547 kB)
Building wheels for collected packages: sqlalchemy, prometheus-client, alembic, cerberus, sqlalchemy-utils, python-json-logger, thriftpy2
  Building wheel for sqlalchemy (PEP 517): started
  Building wheel for sqlalchemy (PEP 517): finished with status 'done'
  Created wheel for sqlalchemy: filename=SQLAlchemy-1.3.15-cp37-cp37m-linux_x86_64.whl size=1234858 sha256=e54e2f0838948013f73730ae598e065e5279fd17ea623b4dacffc47d6ff62839
  Stored in directory: /root/.cache/pip/wheels/27/96/77/0695ac3b6ad6c91d607f9a19cfb45cdf416e5b564b77a64a9b
  Building wheel for prometheus-client (setup.py): started
  Building wheel for prometheus-client (setup.py): finished with status 'done'
  Created wheel for prometheus-client: filename=prometheus_client-0.7.1-py3-none-a

Note that `docker` is __note available in Google Colab__, download the notebook, ensure docker is installed and try it locally.

Next, you can `docker push` the image to your choice of registry for deployment,
or run it locally for development and testing:

In [10]:
!docker run -p 5000:5000 -e BENTOML_ENABLE_MICROBATCH=True iris-classifier:latest

[2020-04-03 09:01:37,442] INFO - get_gunicorn_num_of_workers: 3, calculated by cpu count
[2020-04-03 09:01:37 +0000] [1] [INFO] Starting gunicorn 20.0.4
[2020-04-03 09:01:37 +0000] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)
[2020-04-03 09:01:37 +0000] [1] [INFO] Using worker: sync
[2020-04-03 09:01:37 +0000] [8] [INFO] Booting worker with pid: 8
[2020-04-03 09:01:37 +0000] [9] [INFO] Booting worker with pid: 9
[2020-04-03 09:01:37 +0000] [10] [INFO] Booting worker with pid: 10
^C
[2020-04-03 09:01:44 +0000] [1] [INFO] Handling signal: int
[2020-04-03 09:01:44 +0000] [10] [INFO] Worker exiting (pid: 10)
[2020-04-03 09:01:44 +0000] [9] [INFO] Worker exiting (pid: 9)
[2020-04-03 09:01:44 +0000] [8] [INFO] Worker exiting (pid: 8)


## Load saved BentoService

`bentoml.load` is the enssential API for loading a Bento into your
python application:

In [11]:
import bentoml
import pandas as pd

bento_svc = bentoml.load(saved_path)

# Test loaded bentoml service:
bento_svc.predict([X[0]])



memmap([0])

This can be useful for building test pipeline for your prediction service or using the same predictions service as a building block to create other applications.


## Distribute BentoML SavedBundle as PyPI package


The BentoService SavedBundle is pip-installable and can be directly distributed as a
PyPI package if you plan to use the model in your python applications. You can install
it as as a system-wide python package with `pip`:

In [12]:
!pip install {saved_path}

Processing /Users/chaoyu/bentoml/repository/IrisClassifier/20200403015304_3FC8C9


Building wheels for collected packages: IrisClassifier
  Building wheel for IrisClassifier (setup.py) ... [?25ldone
[?25h  Created wheel for IrisClassifier: filename=IrisClassifier-20200403015304_3FC8C9-py3-none-any.whl size=5341 sha256=c646f6f880fd3d1351b20b7700f7b1ecbe56235a5510f41c52d77c2e8c20474e
  Stored in directory: /private/var/folders/7p/y_934t3s4yg8fx595vr28gym0000gn/T/pip-ephem-wheel-cache-vl2inqam/wheels/1b/e2/18/d0bc4976297041a88a736fbddeea114851c9cb2575d26ece88
Successfully built IrisClassifier
Installing collected packages: IrisClassifier
Successfully installed IrisClassifier-20200403015304-3FC8C9


In [13]:
# Your bentoML model class name will become packaged name
import IrisClassifier

installed_svc = IrisClassifier.load()
installed_svc.predict([X[0]])

memmap([0])

This also allow users to upload their BentoService to pypi.org as public python package
or to their organization's private PyPi index to share with other developers.

`cd {saved_path} & python setup.py sdist upload`

*You will have to configure ".pypirc" file before uploading to pypi index.
    You can find more information about distributing python package at:
    https://docs.python.org/3.7/distributing/index.html#distributing-index*


# Batch Offline Serving via CLI

`pip install {saved_path}` also installs a CLI tool for accessing the BentoML service, print CLI help document with `--help`:


In [14]:
!IrisClassifier --help

Usage: IrisClassifier [OPTIONS] COMMAND [ARGS]...

  BentoML CLI tool

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  info                List APIs
  install-completion  Install shell command completion
  open-api-spec       Display OpenAPI/Swagger JSON specs
  run                 Run API function
  serve               Start local rest server
  serve-gunicorn      Start local gunicorn server


Printing more information about this ML service with `info` command:

In [None]:
!IrisClassifier info

You can also print help and docs on individual commands:

In [None]:
!IrisClassifier run predict --help

Each service API you defined in the BentoService will be exposed as a CLI command with the same name as the API function:

In [15]:
!IrisClassifier run predict --input='[[5.1, 3.5, 1.4, 0.2]]'

[0]


BentoML cli also supports reading input data from `csv` or `json` files, in either local machine or remote HTTP/S3 location:

In [16]:
# Writing test data to a csv file
pd.DataFrame(iris.data).to_csv('iris_data.csv', index=False)

# Invoke predict from command lien
!IrisClassifier run predict --input='./iris_data.csv'

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2
 2 2]


Alternatively, you can also use the `bentoml` cli to load and run a BentoML service archive without installing it:

In [17]:
!bentoml info IrisClassifier:latest

[2020-04-03 02:02:23,408] INFO - Getting latest version IrisClassifier:20200403015304_3FC8C9
[39m{
  "name": "IrisClassifier",
  "version": "20200403015304_3FC8C9",
  "created_at": "2020-04-03T08:53:17.930268Z",
  "env": {
    "conda_env": "name: bentoml-IrisClassifier\nchannels:\n- defaults\ndependencies:\n- python=3.7.5\n- pip\n",
    "pip_dependencies": "bentoml==0.6.3\nscikit-learn",
    "python_version": "3.7.5"
  },
  "artifacts": [
    {
      "name": "model",
      "artifact_type": "SklearnModelArtifact"
    }
  ],
  "apis": [
    {
      "name": "predict",
      "handler_type": "DataframeHandler",
      "docs": "BentoService API",
      "handler_config": {
        "output_orient": "records",
        "orient": "records",
        "typ": "frame",
        "is_batch_input": true,
        "input_dtypes": null
      }
    }
  ]
}[0m


In [18]:
!bentoml run IrisClassifier:latest predict --input='[[5.1, 3.5, 1.4, 0.2]]'

[2020-04-03 02:02:27,431] INFO - Getting latest version IrisClassifier:20200403015304_3FC8C9
[0]


# Deploy REST API server to the cloud


BentoML has a built-in deployment management tool called YataiService. YataiService can
be deployed separately to manage all your teams' trained models, BentoService bundles,
and active deployments in the cloud or in your own kubernetes cluster. You can also
create simple model serving deployments with just the BentoML cli, which launches a
local YataiService backed by SQLite database on your machine.

Now let's deploy the IrisClassifier to [AWS Lambda](https://aws.amazon.com/lambda/) as
a serverless endpoint.

First you need to install the `aws-sam-cli` package, which is required by BentoML
to work with AWS Lambda deployment:

```
    pip install -U aws-sam-cli==0.31.1
```


You will also need to configure your AWS account and credentials if you don't have
it configured on your machine. You can do this either
[via environment variables](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/setup-credentials.html)
or through the `aws configure` command: install `aws` cli command via
`pip install awscli` and follow
[detailed instructions here](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html#cli-quick-configuration).

Now you can run the `bentoml lambda deploy` command, to create a AWS Lambda deployment,
hosting the BentService you've created:

In [19]:
!bentoml lambda deploy quick-start-guide-deployment -b IrisClassifier:{svc.version} 

Deploying "IrisClassifier:20200403015304_3FC8C9" to AWS Lambda \[2020-04-03 02:02:35,976] INFO - Building lambda project
\[2020-04-03 02:08:26,060] INFO - Packaging AWS Lambda project at /private/var/folders/7p/y_934t3s4yg8fx595vr28gym0000gn/T/bentoml-temp-oushjs1w ...
|[2020-04-03 02:10:18,047] INFO - Deploying lambda project
|[2020-04-03 02:11:06,731] INFO - ApplyDeployment (quick-start-guide-deployment, namespace dev) succeeded
[32mSuccessfully created AWS Lambda deployment quick-start-guide-deployment[0m
[39m{
  "namespace": "dev",
  "name": "quick-start-guide-deployment",
  "spec": {
    "bentoName": "IrisClassifier",
    "bentoVersion": "20200403015304_3FC8C9",
    "operator": "AWS_LAMBDA",
    "awsLambdaOperatorConfig": {
      "region": "us-west-2",
      "memorySize": 1024,
      "timeout": 3
    }
  },
  "state": {
    "state": "RUNNING",
    "infoJson": {
      "endpoints": [
        "https://qgvp7z4e68.execute-api.us-west-2.amazonaws.com/Prod/predict"
      ],
      "

Here the 'quick-starrt-guide-deployment' is the deployment name, you can reference the deployment by this name and query its status. For example, to get current deployment status:

In [20]:
!bentoml lambda get quick-start-guide-deployment

[39m{
  "namespace": "dev",
  "name": "quick-start-guide-deployment",
  "spec": {
    "bentoName": "IrisClassifier",
    "bentoVersion": "20200403015304_3FC8C9",
    "operator": "AWS_LAMBDA",
    "awsLambdaOperatorConfig": {
      "region": "us-west-2",
      "memorySize": 1024,
      "timeout": 3
    }
  },
  "state": {
    "state": "RUNNING",
    "infoJson": {
      "endpoints": [
        "https://qgvp7z4e68.execute-api.us-west-2.amazonaws.com/Prod/predict"
      ],
      "s3_bucket": "btml-dev-quick-start-guide-deployment-5591e9"
    },
    "timestamp": "2020-04-03T09:11:58.035477Z"
  },
  "createdAt": "2020-04-03T09:02:30.717929Z",
  "lastUpdatedAt": "2020-04-03T09:02:30.717959Z"
}[0m


In [21]:
!bentoml lambda get quick-start-guide-deployment | jq ".state.infoJson.endpoints[0]"

[0;32m"https://qgvp7z4e68.execute-api.us-west-2.amazonaws.com/Prod/predict"[0m


To send request to your AWS Lambda deployment, grab the endpoint URL from the json output above:

In [22]:
!curl -i \
--header "Content-Type: application/json" \
--request POST \
--data '[[5.1, 3.5, 1.4, 0.2]]' \
https://qgvp7z4e68.execute-api.us-west-2.amazonaws.com/Prod/predict

HTTP/1.1 200 OK
[1mContent-Type[0m: application/json
[1mContent-Length[0m: 3
[1mConnection[0m: keep-alive
[1mDate[0m: Fri, 03 Apr 2020 09:12:27 GMT
[1mx-amzn-RequestId[0m: 2f0b7dc4-881d-4be9-bf0c-75b67ba10158
[1mx-amz-apigw-id[0m: KZyecEq7vHcF_AA=
[1mX-Amzn-Trace-Id[0m: Root=1-5e86fdf5-ed4ad9ac92963d10e215fd83;Sampled=0
[1mX-Cache[0m: Miss from cloudfront
[1mVia[0m: 1.1 c1caaceb6655a57ae014aef7bc8ec389.cloudfront.net (CloudFront)
[1mX-Amz-Cf-Pop[0m: SFO20-C1
[1mX-Amz-Cf-Id[0m: xtyVUYVhoPkENM0npKoJyxck75JhA6eeLWS5uLPnF9aenx2OVBzNDg==

[0]

To list all the deployments you've created:

In [24]:
!bentoml deployment list

[39mNAME                          NAMESPACE    PLATFORM    BENTO_SERVICE                         STATUS    AGE
quick-start-guide-deployment  dev          aws-lambda  IrisClassifier:20200403015304_3FC8C9  running   10 minutes and 26.93 seconds
chaoyu-dev-ii                 dev          aws-lambda  IrisClassifier:20200323212422_A1D30D  running   5 days and 12 hours
chaoyu-test                   dev          aws-lambda  IrisClassifier:20200228172148_6A26E5  running   4 weeks and 2 days[0m


And to delete an active deployment:

In [25]:
!bentoml lambda delete quick-start-guide-deployment

[32mSuccessfully deleted AWS Lambda deployment "quick-start-guide-deployment"[0m


BentoML by default stores the deployment metadata on the local machine. For team settings, we recommend hosting a shared BentoML Yatai server for your entire team to track all BentoService saved bundle and deployments they've created in a central place.

# Summary

This is what it looks like when using BentoML to serve and deploy a model, as a prediction service running in the cloud. BentoML also supports many other Machine Learning frameworks, as well as many other deployment platforms. You can find more BentoML example notebooks [here](https://github.com/bentoml/BentoML#examples).