# Getting Started with BentoML

[BentoML](http://bentoml.ai) is an open-source framework for machine learning **model serving**, aiming to **bridge the gap between Data Science and DevOps**.

Data Scientists can easily package their models trained with any ML framework using BentoMl and reproduce the model for serving in production. BentoML helps with managing packaged models in the BentoML format, and allows DevOps to deploy them as online API serving endpoints or offline batch inference jobs, on any cloud platform.

This getting started guide demonstrates how to use BentoML to serve a sklearn modeld via a REST API server, and then containerize the model server for production deployment.

![Impression](https://www.google-analytics.com/collect?v=1&tid=UA-112879361-3&cid=555&t=event&ec=guides&ea=bentoml-quick-start-guide&dt=bentoml-quick-start-guide)

BentoML requires python 3.6 or above, install dependencies via `pip`:

In [1]:
# Install PyPI packages required in this guide, including BentoML
!pip install -q --pre bentoml  # install preview version of BentoML for this guide
!pip install -q 'scikit-learn>=0.23.2' 'pandas>=1.1.1'

Before starting, let's prepare a trained model for serving with BentoML. Train a classifier model on the [Iris data set](https://en.wikipedia.org/wiki/Iris_flower_data_set):

In [2]:
from sklearn import svm
from sklearn import datasets

# Load training data
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Model Training
clf = svm.SVC(gamma='scale')
clf.fit(X, y)

SVC()

## Create a Prediction Service with BentoML

Model serving with BentoML comes after a model is trained. The first step is creating a
prediction service class, which defines the models required and the inference APIs which
contains the serving logic. Here is a minimal prediction service created for serving
the iris classifier model trained above:

In [3]:
%%writefile iris_classifier.py
import pandas as pd

from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import DataframeInput
from bentoml.frameworks.sklearn import SklearnModelArtifact

@env(infer_pip_packages=True)
@artifacts([SklearnModelArtifact('model')])
class IrisClassifier(BentoService):
    """
    A minimum prediction service exposing a Scikit-learn model
    """

    @api(input=DataframeInput(), batch=True)
    def predict(self, df: pd.DataFrame):
        """
        An inference API named `predict` with Dataframe input adapter, which codifies
        how HTTP requests or CSV files are converted to a pandas Dataframe object as the
        inference API function input
        """
        return self.artifacts.model.predict(df)

Overwriting iris_classifier.py


This code defines a prediction service that packages a scikit-learn model and provides
an inference API that expects a `pandas.Dataframe` object as its input. BentoML also supports other API input 
data types including `JsonInput`, `ImageInput`, `FileInput` and 
[more](https://docs.bentoml.org/en/latest/api/adapters.html).


In BentoML, **all inference APIs are suppose to accept a list of inputs and return a 
list of results**. In the case of `DataframeInput`, each row of the dataframe is mapping
to one prediction request received from the client. BentoML will convert HTTP JSON 
requests into :code:`pandas.DataFrame` object before passing it to the user-defined 
inference API function.
 
This design allows BentoML to group API requests into small batches while serving online
traffic. Comparing to a regular flask or FastAPI based model server, this can increases
the overall throughput of the API server by 10-100x depending on the workload.

The following code packages the trained model with the prediction service class
`IrisClassifier` defined above, and then saves the IrisClassifier instance to disk 
in the BentoML format for distribution and deployment:

In [4]:
# import the IrisClassifier class defined above
from iris_classifier import IrisClassifier

# Create a iris classifier service instance
iris_classifier_service = IrisClassifier()

# Pack the newly trained model artifact
iris_classifier_service.pack('model', clf)

# Save the prediction service to disk for model serving
saved_path = iris_classifier_service.save()

[2020-09-23 09:09:55,449] INFO - BentoService bundle 'IrisClassifier:20200923090955_A2644D' saved to: /Users/chaoyu/bentoml/repository/IrisClassifier/20200923090955_A2644D


BentoML stores all packaged model files under the
`~/bentoml/{service_name}/{service_version}` directory by default.
The BentoML file format contains all the code, files, and configs required to 
deploy the model for serving.


## REST API Model Serving



To start a REST API model server with the `IrisClassifier` saved above, use 
the `bentoml serve` command:

In [5]:
!bentoml serve IrisClassifier:latest

[2020-09-23 09:10:01,479] INFO - Getting latest version IrisClassifier:20200923090955_A2644D
[2020-09-23 09:10:01,480] INFO - Starting BentoML API server in development mode..
 * Serving Flask app "IrisClassifier" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [23/Sep/2020 09:10:04] "[37mGET / HTTP/1.1[0m" 200 -
127.0.0.1 - - [23/Sep/2020 09:10:04] "[37mGET /swagger_static/swagger-ui.css HTTP/1.1[0m" 200 -
127.0.0.1 - - [23/Sep/2020 09:10:04] "[37mGET /swagger_static/swagger-ui-bundle.js HTTP/1.1[0m" 200 -
127.0.0.1 - - [23/Sep/2020 09:10:05] "[37mGET /docs.json HTTP/1.1[0m" 200 -
127.0.0.1 - - [23/Sep/2020 09:10:06] "[33mGET /favicon.ico HTTP/1.1[0m" 404 -
^C


If you are running this notebook from Google Colab, you can start the dev server with `--run-with-ngrok` option, to gain acccess to the API endpoint via a public endpoint managed by [ngrok](https://ngrok.com/): 

In [None]:
!bentoml serve IrisClassifier:latest --run-with-ngrok

The `IrisClassifier` model is now served at `localhost:5000`. Use `curl` command to send
a prediction request:

```bash
curl -i \
--header "Content-Type: application/json" \
--request POST \
--data '[[5.1, 3.5, 1.4, 0.2]]' \
localhost:5000/predict
```

Or with `python` and [request library](https://requests.readthedocs.io/):
```python
import requests
response = requests.post("http://127.0.0.1:5000/predict", json=[[5.1, 3.5, 1.4, 0.2]])
print(response.text)
```

Note that BentoML API server automatically converts the Dataframe JSON format into a
`pandas.DataFrame` object before sending it to the user-defined inference API function.

The BentoML API server also provides a simple web UI dashboard.
Go to http://localhost:5000 in the browser and use the Web UI to send
prediction request:

![BentoML API Server Web UI Screenshot](https://raw.githubusercontent.com/bentoml/BentoML/master/guides/quick-start/bento-api-server-web-ui.png)

## Containerize model server with Docker



One common way of distributing this model API server for production deployment, is via
Docker containers. And BentoML provides a convenient way to do that.

Note that `docker` is __not available in Google Colab__. You will need to download and run this notebook locally to try out this containerization with docker feature.

If you already have docker configured, simply run the follow command to product a 
docker container serving the `IrisClassifier` prediction service created above:

In [6]:
!bentoml containerize IrisClassifier:latest -t iris-classifier

[2020-09-23 09:10:19,870] INFO - Getting latest version IrisClassifier:20200923090955_A2644D
[39mFound Bento: /Users/chaoyu/bentoml/repository/IrisClassifier/20200923090955_A2644D[0m
[33mImage version not specified, using version parsed from BentoService: '20200923090955_A2644D'[0m
Building Docker image iris-classifier:20200923090955_A2644D from IrisClassifier:latest 
/[39mStep 1/15 : FROM bentoml/model-server:0.9.0.pre-py37[0m
[39m ---> a25066aa8b0e[0m
[39mStep 2/15 : ARG EXTRA_PIP_INSTALL_ARGS=[0m
|[39m ---> Running in 5b7819eb78f1[0m
-[39m ---> 759c154a95d2[0m
[39mStep 3/15 : ENV EXTRA_PIP_INSTALL_ARGS $EXTRA_PIP_INSTALL_ARGS[0m
[39m ---> Running in 74541945e22f[0m
|[39m ---> 8ab67fe36f33[0m
[39mStep 4/15 : COPY environment.yml requirements.txt setup.sh* bentoml-init.sh python_version* /bento/[0m
|[39m ---> 90e73bf43da4[0m
[39mStep 5/15 : WORKDIR /bento[0m
\[39m ---> Running in 421657afd40b[0m
-[39m ---> 83f62f6d6c1a[0m
[39mStep 6/15 : RUN chmod 

\[39mCollecting pytz>=2017.2[0m
-[39m  Downloading pytz-2020.1-py2.py3-none-any.whl (510 kB)[0m
[39mCollecting threadpoolctl>=2.0.0[0m
[39m  Downloading threadpoolctl-2.1.0-py3-none-any.whl (12 kB)[0m
/[39mCollecting joblib>=0.11[0m
|[39m  Downloading joblib-0.16.0-py3-none-any.whl (300 kB)[0m
-[39mCollecting scipy>=0.19.1[0m
/[39m  Downloading scipy-1.5.2-cp37-cp37m-manylinux1_x86_64.whl (25.9 MB)[0m
|[39mInstalling collected packages: pytz, pandas, threadpoolctl, joblib, scipy, scikit-learn[0m
|[39mSuccessfully installed joblib-0.16.0 pandas-1.1.1 pytz-2020.1 scikit-learn-0.23.2 scipy-1.5.2 threadpoolctl-2.1.0[0m
-[39m ---> 9d3ee7188b6d[0m
[39mStep 8/15 : COPY . /bento[0m
|[39m ---> a81f72c3024c[0m
[39mStep 9/15 : RUN if [ -d /bento/bundled_pip_dependencies ]; then pip install -U bundled_pip_dependencies/* ;fi[0m
[39m ---> Running in d51c2b4242de[0m
\[39m ---> 806b77c66db6[0m
[39mStep 10/15 : ENV PORT 5000[0m
[39m ---> Running in de95f1ec

Start a container with the docker image built in the previous step:

In [7]:
!docker run -p 5000:5000 iris-classifier:latest --workers=1 --enable-microbatch

[2020-09-23 16:22:33,549] INFO - Starting BentoML API server in production mode..
[2020-09-23 16:22:34,369] INFO - Running micro batch service on :5000
[2020-09-23 16:22:34 +0000] [12] [INFO] Starting gunicorn 20.0.4
[2020-09-23 16:22:34 +0000] [1] [INFO] Starting gunicorn 20.0.4
[2020-09-23 16:22:34 +0000] [12] [INFO] Listening at: http://0.0.0.0:5000 (12)
[2020-09-23 16:22:34 +0000] [1] [INFO] Listening at: http://0.0.0.0:56697 (1)
[2020-09-23 16:22:34 +0000] [1] [INFO] Using worker: sync
[2020-09-23 16:22:34 +0000] [12] [INFO] Using worker: aiohttp.worker.GunicornWebWorker
[2020-09-23 16:22:34 +0000] [14] [INFO] Booting worker with pid: 14
[2020-09-23 16:22:34 +0000] [13] [INFO] Booting worker with pid: 13
[2020-09-23 16:22:34,461] INFO - Micro batch enabled for API `predict`
[2020-09-23 16:22:34,462] INFO - Your system nofile limit is 1048576, which means each instance of microbatch service is able to hold this number of connections at same time. You can increase the number of file

This made it possible to deploy BentoML bundled ML models with platforms such as
[Kubeflow](https://www.kubeflow.org/docs/components/serving/bentoml/),
[Knative](https://knative.dev/community/samples/serving/machinelearning-python-bentoml/),
[Kubernetes](https://docs.bentoml.org/en/latest/deployment/kubernetes.html), which
provides advanced model deployment features such as auto-scaling, A/B testing,
scale-to-zero, canary rollout and multi-armed bandit.


## Load saved BentoService

`bentoml.load` is the API for loading a BentoML packaged model in python:

In [8]:
import bentoml
import pandas as pd

bento_svc = bentoml.load(saved_path)

# Test loaded bentoml service:
bento_svc.predict([X[0]])



memmap([0])

The BentoML format is pip-installable and can be directly distributed as a
PyPI package for using in python applications:

In [9]:
!pip install -q {saved_path}

In [10]:
# The BentoService class name will become packaged name
import IrisClassifier

installed_svc = IrisClassifier.load()
installed_svc.predict([X[0]])

memmap([0])

This also allow users to upload their BentoService to pypi.org as public python package
or to their organization's private PyPi index to share with other developers.

`cd {saved_path} & python setup.py sdist upload`

*You will have to configure ".pypirc" file before uploading to pypi index.
    You can find more information about distributing python package at:
    https://docs.python.org/3.7/distributing/index.html#distributing-index*


# Launch inference job from CLI

BentoML cli supports loading and running a packaged model from CLI. With the `DataframeInput` adapter, the CLI command supports reading input Dataframe data from CLI argument or local `csv` or `json` files:

In [11]:
!bentoml run IrisClassifier:latest predict --input='[[5.1, 3.5, 1.4, 0.2]]'

[2020-09-23 09:23:00,640] INFO - Getting latest version IrisClassifier:20200923090955_A2644D
[2020-09-23 09:23:06,415] INFO - {'service_name': 'IrisClassifier', 'service_version': '20200923090955_A2644D', 'api': 'predict', 'task': {'data': {}, 'task_id': 'f1982177-45bb-4c3f-8348-01d3c06b7cae', 'batch': 1, 'cli_args': ('--input=[[5.1, 3.5, 1.4, 0.2]]',)}, 'result': {'data': '[0]', 'http_status': 200, 'http_headers': (('Content-Type', 'application/json'),)}, 'request_id': 'f1982177-45bb-4c3f-8348-01d3c06b7cae'}
[0]


In [12]:
!bentoml run IrisClassifier:latest predict \
    --input-file="./iris_data.csv"

[2020-09-23 09:23:08,352] INFO - Getting latest version IrisClassifier:20200923090955_A2644D
[2020-09-23 09:23:11,535] INFO - {'service_name': 'IrisClassifier', 'service_version': '20200923090955_A2644D', 'api': 'predict', 'task': {'data': {'uri': 'file:///Users/chaoyu/workspace/BentoML/guides/quick-start/iris_data.csv', 'name': 'iris_data.csv'}, 'task_id': 'a02d0a95-09a6-4011-aa8d-3aa86681b11e', 'batch': 150, 'cli_args': ('--input-file=./iris_data.csv',)}, 'result': {'data': '[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]', 'http_status': 200, 'http_headers': (('Content-Type', 'applicatio

# Deployment Options

If you are at a small team with limited engineering or DevOps resources, try out automated deployment with BentoML CLI, currently supporting AWS Lambda, AWS SageMaker, and Azure Functions:
  - [AWS Lambda Deployment Guide](https://docs.bentoml.org/en/latest/deployment/aws_lambda.html)
  - [AWS SageMaker Deployment Guide](https://docs.bentoml.org/en/latest/deployment/aws_sagemaker.html)
  - [Azure Functions Deployment Guide](https://docs.bentoml.org/en/latest/deployment/azure_functions.html)

If the cloud platform you are working with is not on the list above, try out these step-by-step guide on manually deploying BentoML packaged model to cloud platforms:
  - [AWS ECS Deployment](https://docs.bentoml.org/en/latest/deployment/aws_ecs.html)
  - [Google Cloud Run Deployment](https://docs.bentoml.org/en/latest/deployment/google_cloud_run.html)
  - [Azure container instance Deployment](https://docs.bentoml.org/en/latest/deployment/azure_container_instance.html)
  - [Heroku Deployment](https://docs.bentoml.org/en/latest/deployment/heroku.html)

Lastly, if you have a DevOps or ML Engineering team who's operating a Kubernetes or OpenShift cluster, use the following guides as references for implementating your deployment strategy:
  - [Kubernetes Deployment](https://docs.bentoml.org/en/latest/deployment/kubernetes.html)
  - [Knative Deployment](https://docs.bentoml.org/en/latest/deployment/knative.html)
  - [Kubeflow Deployment](https://docs.bentoml.org/en/latest/deployment/kubeflow.html)
  - [KFServing Deployment](https://docs.bentoml.org/en/latest/deployment/kfserving.html)
  - [Clipper.ai Deployment Guide](https://docs.bentoml.org/en/latest/deployment/clipper.html)



# Summary

This is what it looks like when using BentoML to serve and deploy a model in the cloud. BentoML also supports [many other Machine Learning frameworks](https://docs.bentoml.org/en/latest/examples.html) besides Scikit-learn. The [BentoML core concepts](https://docs.bentoml.org/en/latest/concepts.html) doc is recommended for anyone looking to get a deeper understanding of BentoML.

Join the [BentoML Slack](https://join.slack.com/t/bentoml/shared_invite/enQtNjcyMTY3MjE4NTgzLTU3ZDc1MWM5MzQxMWQxMzJiNTc1MTJmMzYzMTYwMjQ0OGEwNDFmZDkzYWQxNzgxYWNhNjAxZjk4MzI4OGY1Yjg) to follow the latest development updates and roadmap discussions.