# Getting Started with BentoML

[BentoML](http://bentoml.ai) is an open source framework for serving and deploying machine learning models. It provides high-level APIs for defining a prediction service and packaging trained models, source code, dependencies, and configurations into a production-system-friendly format that is ready for production deployment.

This is a quick tutorial on how to use BentoML to create a prediction service with a trained sklearn model, serving the model via a REST API server and deploy it to [AWS Lambda](https://aws.amazon.com/lambda/) as a serverless endpoint.

![Impression](https://www.google-analytics.com/collect?v=1&tid=UA-112879361-3&cid=555&t=event&ec=guides&ea=bentoml-quick-start-guide&dt=bentoml-quick-start-guide)

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

BentoML requires python 3.6 or above, install via `pip`:

In [None]:
# Install BentoML
!pip install bentoml

# Also install scikit-learn, we will use a sklean model as an example
!pip install pandas sklearn

Let's get started with a simple scikit-learn model as an example:

In [3]:
from sklearn import svm
from sklearn import datasets

clf = svm.SVC(gamma='scale')
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf.fit(X, y)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

## Creating a Prediction Service with BentoML


The first step of creating a prediction service with BentoML, is to write a prediction service class inheriting from bentoml.BentoService, and declaratively listing the dependencies, model artifacts and writing your service API call back function. Here is what a simple prediction service looks like:

In [4]:
%%writefile iris_classifier.py
from bentoml import BentoService, api, env, artifacts
from bentoml.artifact import SklearnModelArtifact
from bentoml.handlers import DataframeHandler

@artifacts([SklearnModelArtifact('model')])
@env(auto_pip_dependencies=True)
class IrisClassifier(BentoService):

    @api(DataframeHandler)
    def predict(self, df):
        return self.artifacts.model.predict(df)

Overwriting iris_classifier.py


The `bentoml.api` and `DataframeHandler` here tells BentoML, that following by it, is
the service API callback function, and `pandas.Dataframe` is its expected input format.

The `bentoml.env` decorator allows user to specify the dependencies and environment 
settings for this prediction service. Here we are using BentoML's
`auto_pip_dependencies` fature which automatically extracts and bundles all pip
packages that are required for your prediction service and pins down their version.


Last but not least, `bentoml.artifact` declares the required trained model to be bundled
with this prediction service. Here it is using the built-in `SklearnModelArtifact` and
simply naming it 'model'. BentoML also provide model artifact for other frameworks such
as `PytorchModelArtifact`, `KerasModelArtifact`, `FastaiModelArtifact`, and
`XgboostModelArtifact` etc.


## Saving a versioned BentoService bundle

In [5]:
# 1) import the custom BentoService defined above
from iris_classifier import IrisClassifier

# 2) `pack` it with required artifacts
svc = IrisClassifier()
svc.pack('model', clf)

# 3) save BentoSerivce to a BentoML bundle
saved_path = svc.save()

[2020-03-04 14:34:22,117] INFO - BentoService bundle 'IrisClassifier:20200304143410_CD5F13' created at: /private/var/folders/7p/y_934t3s4yg8fx595vr28gym0000gn/T/bentoml-temp-zl7q9oqc
[2020-03-04 14:34:22,168] INFO - BentoService bundle 'IrisClassifier:20200304143410_CD5F13' saved to: /Users/chaoyu/bentoml/repository/IrisClassifier/20200304143410_CD5F13


_That's it._ You've just created a BentoService SavedBundle, it's a versioned file archive that is ready for production deployment. It contains the BentoService you defined, as well as the packed trained model artifacts, pre-processing code, dependencies and other configurations in a single file directory.

## Model Serving via REST API

From a BentoService SavedBundle, you can start a REST API server by providing the file path to the saved bundle:

In [6]:
# Note that REST API serving **does not work in Google Colab** due to unable to access Colab's VM
!bentoml serve IrisClassifier:latest

[2020-03-04 14:34:23,800] INFO - Getting latest version IrisClassifier:20200304143410_CD5F13
 * Serving Flask app "IrisClassifier" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [04/Mar/2020 14:34:45] "[37mGET / HTTP/1.1[0m" 200 -
127.0.0.1 - - [04/Mar/2020 14:34:46] "[37mGET /docs.json HTTP/1.1[0m" 200 -
127.0.0.1 - - [04/Mar/2020 14:35:02] "[37mPOST /predict HTTP/1.1[0m" 200 -
^C


#### View documentations for REST APIs

The REST API server provides a simply web UI for you to test and debug. If you are running this command on your local machine, visit http://127.0.0.1:5000 in your browser and try out sending API request to the server.

![BentoML API Server Web UI Screenshot](https://raw.githubusercontent.com/bentoml/BentoML/master/guides/quick-start/bento-api-server-web-ui.png)

#### Send prediction request to REST API server

You can also send prediction request with `curl` from command line:

```bash
curl -i \
--header "Content-Type: application/json" \
--request POST \
--data '[[5.1, 3.5, 1.4, 0.2]]' \
localhost:5000/predict
```

Or with `python` and `request` library:
```python
import requests
response = requests.post("http://127.0.0.1:5000/predict", json=[[5.1, 3.5, 1.4, 0.2]])
print(response.text)
```



## Containerize REST API server with Docker


The BentoService SavedBundle is structured to work as a docker build context, that can
be directed used to build a docker image for API server. Simply use it as the docker
build context directory:

Note that the `{saved_path}` in the following commands are referring to the returned value of `iris_classifier_service.save()`. It is the file path where the BentoService saved bundle is stored. You can also find it via `bentoml get IrisClassifier -o wide` command.

In [9]:
!cd {saved_path} && docker build -t iris-classifier .

Sending build context to Docker daemon  25.09kB
Step 1/15 : FROM continuumio/miniconda3:4.7.12
 ---> 406f2b43ea59
Step 2/15 : ENTRYPOINT [ "/bin/bash", "-c" ]
 ---> Using cache
 ---> a0c8b09d3f8f
Step 3/15 : EXPOSE 5000
 ---> Using cache
 ---> 063ef48adef5
Step 4/15 : RUN set -x      && apt-get update      && apt-get install --no-install-recommends --no-install-suggests -y libpq-dev build-essential      && rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 70e7d2f54f64
Step 5/15 : RUN conda install pip numpy scipy       && pip install gunicorn
 ---> Using cache
 ---> 00d7e233a814
Step 6/15 : COPY . /bento
 ---> 7c1fde4f46e8
Step 7/15 : WORKDIR /bento
 ---> Running in edb077d6e9c0
Removing intermediate container edb077d6e9c0
 ---> f678b421fd97
Step 8/15 : RUN if [ -f /bento/setup.sh ]; then /bin/bash -c /bento/setup.sh; fi
 ---> Running in a788f948226e
Removing intermediate container a788f948226e
 ---> 8a5f199ea354
Step 9/15 : RUN conda env update -n base -f /bento/environment.yml
 ---

  Building wheel for tabulate (setup.py): finished with status 'done'
  Created wheel for tabulate: filename=tabulate-0.8.6-py3-none-any.whl size=23273 sha256=13b63e3a10264088dad2a612b67f1553aa52f415da8775fe4d9ac746cc467bbb
  Stored in directory: /root/.cache/pip/wheels/09/b6/7e/08b4ee715a1239453e89a59081f0ac369a9036f232e013ecd8
  Building wheel for prometheus-client (setup.py): started
  Building wheel for prometheus-client (setup.py): finished with status 'done'
  Created wheel for prometheus-client: filename=prometheus_client-0.7.1-py3-none-any.whl size=41402 sha256=125eec2540966c2d1333ed39ddd02ed20a41efd5e1a6bb69c57b693c725627cf
  Stored in directory: /root/.cache/pip/wheels/30/0c/26/59ba285bf65dc79d195e9b25e2ddde4c61070422729b0cd914
  Building wheel for cerberus (setup.py): started
  Building wheel for cerberus (setup.py): finished with status 'done'
  Created wheel for cerberus: filename=Cerberus-1.3.2-py3-none-any.whl size=54335 sha256=79d57670972b9611bf0a1c43ddfe3b01e90cc61a477

Note that `docker` is __note available in Google Colab__, download the notebook, ensure docker is installed and try it locally.

Next, you can `docker push` the image to your choice of registry for deployment,
or run it locally for development and testing:

In [None]:
!docker run -p 5000:5000 iris-classifier:latest

## Load saved BentoService

`bentoml.load` is the enssential API for loading a Bento into your
python application:

In [11]:
import bentoml
import pandas as pd

bento_svc = bentoml.load(saved_path)

# Test loaded bentoml service:
bento_svc.predict([X[0]])



memmap([0])

## Distribute BentoML SavedBundle as PyPI package


The BentoService SavedBundle is pip-installable and can be directly distributed as a
PyPI package if you plan to use the model in your python applications. You can install
it as as a system-wide python package with `pip`:

In [12]:
!pip install {saved_path}

Processing /Users/chaoyu/bentoml/repository/IrisClassifier/20200304143410_CD5F13


Building wheels for collected packages: IrisClassifier
  Building wheel for IrisClassifier (setup.py) ... [?25ldone
[?25h  Created wheel for IrisClassifier: filename=IrisClassifier-20200304143410_CD5F13-py3-none-any.whl size=5322 sha256=80c7fc3eba4318b5c61255983c9b9aaef96689ad478f27c26151640e5e9b0c42
  Stored in directory: /private/var/folders/7p/y_934t3s4yg8fx595vr28gym0000gn/T/pip-ephem-wheel-cache-iqll0sxu/wheels/25/86/f1/26ba94f7c1b4ed71b127db597b2571fbcb134e98708025af9d
Successfully built IrisClassifier
Installing collected packages: IrisClassifier
  Attempting uninstall: IrisClassifier
    Found existing installation: IrisClassifier 20200213112239-7F9D47
    Uninstalling IrisClassifier-20200213112239-7F9D47:
      Successfully uninstalled IrisClassifier-20200213112239-7F9D47
Successfully installed IrisClassifier-20200304143410-CD5F13


In [13]:
# Your bentoML model class name will become packaged name
import IrisClassifier

installed_svc = IrisClassifier.load()
installed_svc.predict([X[0]])

memmap([0])

This also allow users to upload their BentoService to pypi.org as public python package
or to their organization's private PyPi index to share with other developers.

`cd {saved_path} & python setup.py sdist upload`

*You will have to configure ".pypirc" file before uploading to pypi index.
    You can find more information about distributing python package at:
    https://docs.python.org/3.7/distributing/index.html#distributing-index*


# Model Serving via CLI

`pip install {saved_path}` also installs a CLI tool for accessing the BentoML service, print CLI help document with `--help`:


In [14]:
!IrisClassifier --help

Usage: IrisClassifier [OPTIONS] COMMAND [ARGS]...

  BentoML CLI tool

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  info                List APIs
  install-completion  Install shell command completion
  open-api-spec       Display OpenAPI/Swagger JSON specs
  run                 Run API function
  serve               Start local rest server
  serve-gunicorn      Start local gunicorn server


Printing more information about this ML service with `info` command:

In [18]:
!IrisClassifier info

[39m{
  "name": "IrisClassifier",
  "version": "20200304143410_CD5F13",
  "created_at": "2020-03-04T22:34:22.106650Z",
  "env": {
    "conda_env": "name: bentoml-IrisClassifier\nchannels:\n- defaults\ndependencies:\n- python=3.7.5\n- pip\n",
    "pip_dependencies": "bentoml==0.6.2+32.g1ee00b6.dirty\nscikit-learn",
    "python_version": "3.7.5"
  },
  "artifacts": [
    {
      "name": "model",
      "artifact_type": "SklearnModelArtifact"
    }
  ],
  "apis": [
    {
      "name": "predict",
      "handler_type": "DataframeHandler",
      "docs": "BentoService API",
      "handler_config": {
        "input_dtypes": null,
        "output_orient": "records",
        "orient": "records",
        "typ": "frame"
      }
    }
  ]
}[0m


You can also print help and docs on individual commands:

In [19]:
!IrisClassifier run predict --help

Usage: IrisClassifier run [OPTIONS] API_NAME [RUN_ARGS]...

  Run a API defined in saved BentoService bundle from command line

Options:
  --with-conda        Run API server in a BentoML managed Conda environment
  -q, --quiet         Hide process logs and errors
  --verbose, --debug  Show additional details when running command
  --help              Show this message and exit.


Each service API you defined in the BentoService will be exposed as a CLI command with the same name as the API function:

In [20]:
!IrisClassifier run predict --input='[[5.1, 3.5, 1.4, 0.2]]'

[0]


BentoML cli also supports reading input data from `csv` or `json` files, in either local machine or remote HTTP/S3 location:

In [22]:
# Writing test data to a csv file
pd.DataFrame(iris.data).to_csv('iris_data.csv', index=False)

# Invoke predict from command lien
!IrisClassifier run predict --input='./iris_data.csv'

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2
 2 2]


Alternatively, you can also use the `bentoml` cli to load and run a BentoML service archive without installing it:

In [23]:
!bentoml info IrisClassifier:latest

[2020-03-04 14:48:56,058] INFO - Getting latest version IrisClassifier:20200304143410_CD5F13
[39m{
  "name": "IrisClassifier",
  "version": "20200304143410_CD5F13",
  "created_at": "2020-03-04T22:34:22.106650Z",
  "env": {
    "conda_env": "name: bentoml-IrisClassifier\nchannels:\n- defaults\ndependencies:\n- python=3.7.5\n- pip\n",
    "pip_dependencies": "bentoml==0.6.2+32.g1ee00b6.dirty\nscikit-learn",
    "python_version": "3.7.5"
  },
  "artifacts": [
    {
      "name": "model",
      "artifact_type": "SklearnModelArtifact"
    }
  ],
  "apis": [
    {
      "name": "predict",
      "handler_type": "DataframeHandler",
      "docs": "BentoService API",
      "handler_config": {
        "orient": "records",
        "typ": "frame",
        "input_dtypes": null,
        "output_orient": "records"
      }
    }
  ]
}[0m


In [24]:
!bentoml run IrisClassifier:latest predict --input='[[5.1, 3.5, 1.4, 0.2]]'

[2020-03-04 14:48:58,771] INFO - Getting latest version IrisClassifier:20200304143410_CD5F13
[0]


# Deploy REST API server to the cloud


BentoML has a built-in deployment management tool called YataiService. YataiService can
be deployed separately to manage all your teams' trained models, BentoService bundles,
and active deployments in the cloud or in your own kubernetes cluster. You can also
create simple model serving deployments with just the BentoML cli, which launches a
local YataiService backed by SQLite database on your machine.

Now let's deploy the IrisClassifier to [AWS Lambda](https://aws.amazon.com/lambda/) as
a serverless endpoint.

First you need to install the `aws-sam-cli` package, which is required by BentoML
to work with AWS Lambda deployment:

```
    pip install -U aws-sam-cli==0.31.1
```


You will also need to configure your AWS account and credentials if you don't have
it configured on your machine. You can do this either
[via environment variables](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/setup-credentials.html)
or through the `aws configure` command: install `aws` cli command via
`pip install awscli` and follow
[detailed instructions here](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html#cli-quick-configuration).

Now you can run the `bentoml lambda deploy` command, to create a AWS Lambda deployment,
hosting the BentService you've created:

In [25]:
!bentoml lambda deploy quick-start-guide-deployment -b IrisClassifier:{svc.version} 

Deploying "IrisClassifier:20200304143410_CD5F13" to AWS Lambda |[2020-03-04 14:49:24,018] INFO - Building lambda project
|[2020-03-04 14:50:25,323] INFO - Packaging AWS Lambda project at /private/var/folders/7p/y_934t3s4yg8fx595vr28gym0000gn/T/bentoml-temp-g412e5yk ...
|[2020-03-04 14:52:12,929] INFO - Deploying lambda project
\[2020-03-04 14:53:05,370] INFO - ApplyDeployment (quick-start-guide-deployment, namespace dev) succeeded
[32mSuccessfully created AWS Lambda deployment quick-start-guide-deployment[0m
[39m{
  "namespace": "dev",
  "name": "quick-start-guide-deployment",
  "spec": {
    "bentoName": "IrisClassifier",
    "bentoVersion": "20200304143410_CD5F13",
    "operator": "AWS_LAMBDA",
    "awsLambdaOperatorConfig": {
      "region": "us-west-2",
      "memorySize": 1024,
      "timeout": 3
    }
  },
  "state": {
    "state": "RUNNING",
    "infoJson": {
      "endpoints": [
        "https://rz1dov0qik.execute-api.us-west-2.amazonaws.com/Prod/predict"
      ],
      "

Here the 'quick-starrt-guide-deployment' is the deployment name, you can reference the deployment by this name and query its status. For example, to get current deployment status:

In [26]:
!bentoml lambda get quick-start-guide-deployment

[39m{
  "namespace": "dev",
  "name": "quick-start-guide-deployment",
  "spec": {
    "bentoName": "IrisClassifier",
    "bentoVersion": "20200304143410_CD5F13",
    "operator": "AWS_LAMBDA",
    "awsLambdaOperatorConfig": {
      "region": "us-west-2",
      "memorySize": 1024,
      "timeout": 3
    }
  },
  "state": {
    "state": "RUNNING",
    "infoJson": {
      "endpoints": [
        "https://rz1dov0qik.execute-api.us-west-2.amazonaws.com/Prod/predict"
      ],
      "s3_bucket": "btml-dev-quick-start-guide-deployment-aa5291"
    },
    "timestamp": "2020-03-04T22:53:14.603449Z"
  },
  "createdAt": "2020-03-04T22:49:18.450258Z",
  "lastUpdatedAt": "2020-03-04T22:49:18.450289Z"
}[0m


In [27]:
!bentoml lambda get quick-start-guide-deployment | jq ".state.infoJson.endpoints[0]"

[0;32m"https://rz1dov0qik.execute-api.us-west-2.amazonaws.com/Prod/predict"[0m


To send request to your AWS Lambda deployment, grab the endpoint URL from the json output above:

In [None]:
!curl -i \
--header "Content-Type: application/json" \
--request POST \
--data '[[5.1, 3.5, 1.4, 0.2]]' \
https://rz1dov0qik.execute-api.us-west-2.amazonaws.com/Prod/predict

And to delete an active deployment:

In [None]:
!bentoml lambda delete quick-start-guide-deployment

BentoML by default stores the deployment metadata on the local machine. For team settings, we recommend hosting a shared BentoML Yatai server for your entire team to track all BentoService saved bundle and deployments they've created in a central place.

# Summary

This is what it looks like when using BentoML to serve and deploy a model, as a prediction service running in the cloud. BentoML also supports many other Machine Learning frameworks, as well as many other deployment platforms. You can find more BentoML example notebooks [here](https://github.com/bentoml/BentoML#examples).