# BentoML Example: Sentiment Analysis with Scikit-learn


[BentoML](http://bentoml.ai) is an open source framework for building, shipping and running machine learning services. It provides high-level APIs for defining an ML service and packaging its artifacts, source code, dependencies, and configurations into a production-system-friendly format that is ready for deployment.

This notebook demonstrates how to use BentoML to turn a scikit-learn model into a docker image containing a REST API server serving this model, how to use your ML service built with BentoML as a CLI tool, and how to distribute it a pypi package.


*The example is based on [this notebook](https://github.com/crawles/sentiment_analysis_twitter_model/blob/master/build-sentiment-classifier.ipynb), using dataset from [Sentiment140](http://help.sentiment140.com/for-students/)*

![Impression](https://www.google-analytics.com/collect?v=1&tid=UA-112879361-3&cid=555&t=event&ec=scikit-learn&ea=sklearn-sentiment-analysis&dt=sklearn-sentiment-analysis)

In [57]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline 

In [2]:
!pip bentoml
!pip install sklearn pandas numpy

[33mYou are using pip version 18.1, however version 20.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [58]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score, roc_curve
from sklearn.pipeline import Pipeline

import bentoml

## Prepare Dataset

In [59]:
%%bash

if [ ! -f ./trainingandtestdata.zip ]; then
    wget -q http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip
    unzip -n trainingandtestdata.zip
fi

In [60]:
columns = ['polarity', 'tweetid', 'date', 'query_name', 'user', 'text']
dftrain = pd.read_csv('training.1600000.processed.noemoticon.csv',
                      header = None,
                      encoding ='ISO-8859-1')
dftest = pd.read_csv('testdata.manual.2009.06.14.csv',
                     header = None,
                     encoding ='ISO-8859-1')
dftrain.columns = columns
dftest.columns = columns

## Model Training

In [61]:
sentiment_lr = Pipeline([
                         ('count_vect', CountVectorizer(min_df = 100,
                                                        ngram_range = (1,2),
                                                        stop_words = 'english')), 
                         ('lr', LogisticRegression())])
sentiment_lr.fit(dftrain.text, dftrain.polarity)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html.
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


Pipeline(memory=None,
         steps=[('count_vect',
                 CountVectorizer(analyzer='word', binary=False,
                                 decode_error='strict',
                                 dtype=<class 'numpy.int64'>, encoding='utf-8',
                                 input='content', lowercase=True, max_df=1.0,
                                 max_features=None, min_df=100,
                                 ngram_range=(1, 2), preprocessor=None,
                                 stop_words='english', strip_accents=None,
                                 token_pattern='(?u)\\b\\w\\w+\\b',
                                 tokenizer=None, vocabulary=None)),
                ('lr',
                 LogisticRegression(C=1.0, class_weight=None, dual=False,
                                    fit_intercept=True, intercept_scaling=1,
                                    l1_ratio=None, max_iter=100,
                                    multi_class='auto', n_jobs=None,
              

In [62]:
Xtest, ytest = dftest.text[dftest.polarity!=2], dftest.polarity[dftest.polarity!=2]
print(classification_report(ytest,sentiment_lr.predict(Xtest)))

              precision    recall  f1-score   support

           0       0.87      0.82      0.84       177
           4       0.83      0.88      0.86       182

    accuracy                           0.85       359
   macro avg       0.85      0.85      0.85       359
weighted avg       0.85      0.85      0.85       359



In [63]:
sentiment_lr.predict([Xtest[0]])

array([4])

## Create BentoService for model serving

In [74]:
%%writefile sentiment_analysis_service.py
import pandas as pd
import bentoml
from bentoml.artifact import SklearnModelArtifact, PickleArtifact
from bentoml.handlers import DataframeHandler
from bentoml.adapters import DataframeInput

@bentoml.artifacts([PickleArtifact('model')])
@bentoml.env(pip_dependencies=["scikit-learn", "pandas"])
class SKSentimentAnalysis(bentoml.BentoService):

    @bentoml.api(input=DataframeInput(typ='series'))
    def predict(self, series):
        """
        predict expects pandas.Series as input
        """        
        return self.artifacts.model.predict(series)

Overwriting sentiment_analysis_service.py


## Save BentoService to file archive

In [75]:
# 1) import the custom BentoService defined above
from sentiment_analysis_service import SKSentimentAnalysis

# 2) `pack` it with required artifacts
bento_service = SKSentimentAnalysis()
bento_service.pack('model', sentiment_lr)

# 3) save your BentoSerivce to file archive
saved_path = bento_service.save()

running sdist
running egg_info
writing BentoML.egg-info/PKG-INFO
writing dependency_links to BentoML.egg-info/dependency_links.txt
writing entry points to BentoML.egg-info/entry_points.txt
writing requirements to BentoML.egg-info/requires.txt
writing top-level names to BentoML.egg-info/top_level.txt
reading manifest file 'BentoML.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'


no previously-included directories found matching 'examples'
no previously-included directories found matching 'tests'
no previously-included directories found matching 'docs'


writing manifest file 'BentoML.egg-info/SOURCES.txt'
running check





creating BentoML-0.6.1+2.gefb0204.dirty
creating BentoML-0.6.1+2.gefb0204.dirty/BentoML.egg-info
creating BentoML-0.6.1+2.gefb0204.dirty/bentoml
creating BentoML-0.6.1+2.gefb0204.dirty/bentoml/artifact
creating BentoML-0.6.1+2.gefb0204.dirty/bentoml/bundler
creating BentoML-0.6.1+2.gefb0204.dirty/bentoml/cli
creating BentoML-0.6.1+2.gefb0204.dirty/bentoml/clipper
creating BentoML-0.6.1+2.gefb0204.dirty/bentoml/configuration
creating BentoML-0.6.1+2.gefb0204.dirty/bentoml/deployment
creating BentoML-0.6.1+2.gefb0204.dirty/bentoml/deployment/aws_lambda
creating BentoML-0.6.1+2.gefb0204.dirty/bentoml/deployment/sagemaker
creating BentoML-0.6.1+2.gefb0204.dirty/bentoml/handlers
creating BentoML-0.6.1+2.gefb0204.dirty/bentoml/migrations
creating BentoML-0.6.1+2.gefb0204.dirty/bentoml/migrations/versions
creating BentoML-0.6.1+2.gefb0204.dirty/bentoml/proto
creating BentoML-0.6.1+2.gefb0204.dirty/bentoml/repository
creating BentoML-0.6.1+2.gefb0204.dirty/bentoml/server
creating BentoML-0.6.1

copying bentoml/proto/__init__.py -> BentoML-0.6.1+2.gefb0204.dirty/bentoml/proto
copying bentoml/proto/deployment_pb2.py -> BentoML-0.6.1+2.gefb0204.dirty/bentoml/proto
copying bentoml/proto/repository_pb2.py -> BentoML-0.6.1+2.gefb0204.dirty/bentoml/proto
copying bentoml/proto/status_pb2.py -> BentoML-0.6.1+2.gefb0204.dirty/bentoml/proto
copying bentoml/proto/yatai_service_pb2.py -> BentoML-0.6.1+2.gefb0204.dirty/bentoml/proto
copying bentoml/proto/yatai_service_pb2_grpc.py -> BentoML-0.6.1+2.gefb0204.dirty/bentoml/proto
copying bentoml/repository/__init__.py -> BentoML-0.6.1+2.gefb0204.dirty/bentoml/repository
copying bentoml/repository/metadata_store.py -> BentoML-0.6.1+2.gefb0204.dirty/bentoml/repository
copying bentoml/server/__init__.py -> BentoML-0.6.1+2.gefb0204.dirty/bentoml/server
copying bentoml/server/bento_api_server.py -> BentoML-0.6.1+2.gefb0204.dirty/bentoml/server
copying bentoml/server/bento_sagemaker_server.py -> BentoML-0.6.1+2.gefb0204.dirty/bentoml/server
copying

## Load BentoService archive from saved path

In [66]:
import bentoml

# Load exported bentoML model archive from path
loaded_bento_service = bentoml.load(saved_path)

# Call predict on the restored sklearn model
loaded_bento_service.predict(pd.Series(["good", "great"]))



array([4, 4])

In [80]:
!bentoml get SKSentimentAnalysis

[39mBENTO_SERVICE                              CREATED_AT        APIS                       ARTIFACTS
SKSentimentAnalysis:20200129210903_E48487  2020-01-30 05:09  predict::DataframeHandler  model::PickleArtifact
SKSentimentAnalysis:20200129205654_0D29B1  2020-01-30 04:57  predict::DataframeHandler  model::SklearnModelArtifact
SKSentimentAnalysis:20200129115524_801942  2020-01-29 19:56  predict::DataframeHandler  model::SklearnModelArtifact
SKSentimentAnalysis:20200128134748_BC079E  2020-01-28 21:48  predict::DataframeHandler  model::SklearnModelArtifact
SKSentimentAnalysis:20200128134526_FCDE3C  2020-01-28 21:45  predict::DataframeHandler  model::SklearnModelArtifact
SKSentimentAnalysis:20200128132708_0E9AD8  2020-01-28 21:28  predict::DataframeHandler  model::SklearnModelArtifact
SKSentimentAnalysis:20200128131725_1A2FD7  2020-01-28 21:18  predict::DataframeHandler  model::SklearnModelArtifact
SKSentimentAnalysis:20200128121823_F1A035  2020-01-28 20:19  predict::DataframeHandler  mod

In [81]:
!bentoml get SKSentimentAnalysis:20200129205654_0D29B1

[39m{
  "name": "SKSentimentAnalysis",
  "version": "20200129205654_0D29B1",
  "uri": {
    "type": "LOCAL",
    "uri": "/Users/bozhaoyu/bentoml/repository/SKSentimentAnalysis/20200129205654_0D29B1"
  },
  "bentoServiceMetadata": {
    "name": "SKSentimentAnalysis",
    "version": "20200129205654_0D29B1",
    "createdAt": "2020-01-30T04:57:42.457829Z",
    "env": {
      "condaEnv": "name: bentoml-SKSentimentAnalysis\nchannels:\n- defaults\ndependencies:\n- python=3.7.3\n- pip\n",
      "pipDependencies": "bentoml==0.6.1\nscikit-learn\npandas",
      "pythonVersion": "3.7.3"
    },
    "artifacts": [
      {
        "name": "model",
        "artifactType": "SklearnModelArtifact"
      }
    ],
    "apis": [
      {
        "name": "predict",
        "handlerType": "DataframeHandler",
        "docs": "predict expects pandas.Series as input"
      }
    ]
  }
}[0m


In [82]:
!bentoml run SKSentimentAnalysis:20200129205654_0D29B1 predict \
--input '["some new text, sweet noodles", "happy time", "sad day"]'

[4 4 0]


# Model Serving via REST API

#### Run REST API server locally

In [36]:
!bentoml serve {saved_path}

 * Serving Flask app "SKSentimentAnalysis" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
^C


#### Send prediction request to REST API server

Run the following command in terminal to make a HTTP request to the API server:
```bash
curl -i \
--header "Content-Type: application/json" \
--request POST \
--data '["some new text, sweet noodles", "happy time", "sad day"]' \
localhost:5000/predict
```

You can also view all availabl API endpoints at [localhost:5000](localhost:5000), or look at prometheus metrics at [localhost:5000/metrics](localhost:5000/metrics) in browser.

# "pip install" a saved BentoService archive

BentoML user can directly pip install saved BentoML archive with `pip install {saved_path}`,  and use it as a regular python package.

In [18]:
!pip install {saved_path}

Processing /Users/bozhaoyu/bentoml/repository/SKSentimentAnalysis/20200127144037_AE0EAC
Building wheels for collected packages: SKSentimentAnalysis


  Building wheel for SKSentimentAnalysis (setup.py) ... [?25ldone
[?25h  Created wheel for SKSentimentAnalysis: filename=SKSentimentAnalysis-20200127144037_AE0EAC-py3-none-any.whl size=57940664 sha256=7a7dbda641507991ac49b8bb404ffc868f404b06fde29f942d564986fe531ec0
  Stored in directory: /private/var/folders/kn/xnc9k74x03567n1mx2tfqnpr0000gn/T/pip-ephem-wheel-cache-wg_ahs5y/wheels/17/30/b7/8fb7d7f796f86f6c25b988c47cf5173f3be837cace665d0ced
Successfully built SKSentimentAnalysis
Installing collected packages: SKSentimentAnalysis
Successfully installed SKSentimentAnalysis-20200127144037-AE0EAC


In [20]:
# Your bentoML model class name will become packaged name
import SKSentimentAnalysis

svc = SKSentimentAnalysis.load() # call load to ensure all artifacts are loaded
svc.predict(pd.Series(["bad", "awesome"]))

array([0, 4])

## PyPI package Command Line Access

`pip install saved_path` also installs a CLI tool for accessing the BentoML service

In [22]:
!SKSentimentAnalysis info

[39m{
  "name": "SKSentimentAnalysis",
  "version": "20200127144037_AE0EAC",
  "created_at": "2020-01-27T22:41:25.663929Z",
  "env": {
    "conda_env": "name: bentoml-SKSentimentAnalysis\nchannels:\n- defaults\ndependencies:\n- python=3.7.3\n- pip\n",
    "pip_dependencies": "bentoml==0.6.1\nscikit-learn\npandas",
    "python_version": "3.7.3"
  },
  "artifacts": [
    {
      "name": "model",
      "artifact_type": "SklearnModelArtifact"
    }
  ],
  "apis": [
    {
      "name": "predict",
      "handler_type": "DataframeHandler",
      "docs": "predict expects pandas.Series as input"
    }
  ]
}[0m


In [23]:
!SKSentimentAnalysis --help

Usage: SKSentimentAnalysis [OPTIONS] COMMAND [ARGS]...

  BentoML CLI tool

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  info            List APIs
  open-api-spec   Display OpenAPI/Swagger JSON specs
  run             Run API function
  serve           Start local rest server
  serve-gunicorn  Start local gunicorn server


In [24]:
# Run prediction with sample input
!SKSentimentAnalysis run predict --input='["some new text, sweet noodles", "happy time", "sad day"]'

[4 4 0]


In [25]:
# OpenAPI docs for generating API Client
!SKSentimentAnalysis open-api-spec

[39m{
  "openapi": "3.0.0",
  "info": {
    "version": "20200127144037_AE0EAC",
    "title": "SKSentimentAnalysis",
    "description": "To get a client SDK, copy all content from <a href=\"/docs.json\">docs</a> and paste into <a href=\"https://editor.swagger.io\">editor.swagger.io</a> then click the tab <strong>Generate Client</strong> and choose the language."
  },
  "tags": [
    {
      "name": "infra"
    },
    {
      "name": "app"
    }
  ],
  "paths": {
    "/healthz": {
      "get": {
        "tags": [
          "infra"
        ],
        "description": "Health check endpoint. Expecting an empty response with status code 200 when the service is in health state",
        "responses": {
          "200": {
            "description": "success"
          }
        }
      }
    },
    "/metrics": {
      "get": {
        "tags": [
          "infra"
        ],
        "description": "Prometheus metrics endpoint",
        "responses": {
          

## Run REST API server with Docker

** _Note: `docker` is not available when running in Google Colaboratory_

### 1) build docker image with saved Bento and tag it (e.g. sentiment-analysis-servicel)

In [31]:
!bentoml containerize SKSentimentAnalysis:latest

Sending build context to Docker daemon  123.4MB
Step 1/12 : FROM continuumio/miniconda3:4.7.12
 ---> 406f2b43ea59
Step 2/12 : ENTRYPOINT [ "/bin/bash", "-c" ]
 ---> Using cache
 ---> 28172be83c07
Step 3/12 : EXPOSE 5000
 ---> Using cache
 ---> 840844d191d4
Step 4/12 : RUN set -x      && apt-get update      && apt-get install --no-install-recommends --no-install-suggests -y libpq-dev build-essential      && rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 243c05e712f3
Step 5/12 : RUN conda install pip numpy scipy       && pip install gunicorn
 ---> Using cache
 ---> 8fab95ab34fc
Step 6/12 : COPY . /bento
 ---> 67d9e50d566b
Step 7/12 : WORKDIR /bento
 ---> Running in b3dcad063ce1
Removing intermediate container b3dcad063ce1
 ---> f14962580e56
Step 8/12 : RUN if [ -f /bento/setup.sh ]; then /bin/bash -c /bento/setup.sh; fi
 ---> Running in 93e35cb1765b
Removing intermediate container 93e35cb1765b
 ---> b274d4877198
Step 9/12 : RUN conda env update -n base -f /bento/environment.yml
 ---

  Building wheel for tabulate (setup.py): finished with status 'done'
  Created wheel for tabulate: filename=tabulate-0.8.6-py3-none-any.whl size=23273 sha256=1e2ac06f60ddd211e05f0f23937233a0f1e6d42ff7d553069f710098fd730b50
  Stored in directory: /root/.cache/pip/wheels/09/b6/7e/08b4ee715a1239453e89a59081f0ac369a9036f232e013ecd8
  Building wheel for sqlalchemy (setup.py): started
  Building wheel for sqlalchemy (setup.py): finished with status 'done'
  Created wheel for sqlalchemy: filename=SQLAlchemy-1.3.13-cp37-cp37m-linux_x86_64.whl size=1223709 sha256=a3589c89331982830cf1bba0187eec69217d2be87bebdda137be736aca59b877
  Stored in directory: /root/.cache/pip/wheels/b9/ba/77/163f10f14bd489351530603e750c195b0ceceed2f3be2b32f1
  Building wheel for Mako (setup.py): started
  Building wheel for Mako (setup.py): finished with status 'done'
  Created wheel for Mako: filename=Mako-1.1.1-py3-none-any.whl size=75409 sha256=ac1c12f7fb3cd5c184dd788db6805a3a09b890477305c63b50a5bc0ac162fe36
  Stored

  Building wheel for BentoML (PEP 517): finished with status 'done'
  Created wheel for BentoML: filename=BentoML-0.6.1-py3-none-any.whl size=505667 sha256=c1f6b20b28250256f356e879575619a1171470fb1727f3b3635238ed322cd6fd
  Stored in directory: /root/.cache/pip/wheels/a2/b5/f6/4b37cd2a90c23d57718be64cb02a49396cc1f8014ebe1612b2
Successfully built BentoML
Installing collected packages: BentoML
  Attempting uninstall: BentoML
    Found existing installation: BentoML 0.6.1
    Uninstalling BentoML-0.6.1:
      Successfully uninstalled BentoML-0.6.1
Successfully installed BentoML-0.6.1
Removing intermediate container 26fac21e185c
 ---> 08ce88940dae
Step 12/12 : CMD ["bentoml serve-gunicorn /bento"]
 ---> Running in 509e19e67e04
Removing intermediate container 509e19e67e04
 ---> 1597af914b10
Successfully built 1597af914b10
Successfully tagged sk-sentiment-analysis:latest


### 2) run docker image and expose port 5000

In [32]:
!docker run -p 5000:5000 sk-sentiment-analysis

[2020-01-28 00:12:41,498] INFO - get_gunicorn_num_of_workers: 3, calculated by cpu count
[2020-01-28 00:12:41 +0000] [1] [INFO] Starting gunicorn 20.0.4
[2020-01-28 00:12:41 +0000] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)
[2020-01-28 00:12:41 +0000] [1] [INFO] Using worker: sync
[2020-01-28 00:12:41 +0000] [9] [INFO] Booting worker with pid: 9
[2020-01-28 00:12:41 +0000] [10] [INFO] Booting worker with pid: 10
[2020-01-28 00:12:41 +0000] [11] [INFO] Booting worker with pid: 11
^C
[2020-01-28 00:12:53 +0000] [1] [INFO] Handling signal: int


### 3) Similarly use the following command to query the REST server in Docker

```bash
curl -i \
--header "Content-Type: application/json" \
--request POST \
--data '["some new text, sweet noodles", "happy time", "sad day"]' \
localhost:5000/predict
```

## Deployments

BentoML provides a set of APIs and CLI commands for automating cloud deployment workflow which gets your BentoService API server up and running in the cloud, and allows you to easily update and monitor the service. Currently BentoML have implemented this workflow for AWS Lambda, AWS Sagemaker and Azure Functions. More platforms such as AWS EC2, Kubernetes Cluster, Azure Virtual Machines are on our roadmap.

You can also manually deploy the BentoService API Server or its docker image to cloud platforms, and we’ve created a few step by step tutorials for doing that. You can visit those tutorials at BentoML documentation webiste, or click this [link](https://docs.bentoml.org/en/latest/deployment/index.html)