# BentoML Example: Sentiment Analysis with Scikit-learn


[BentoML](http://bentoml.ai) is an open source framework for building, shipping and running machine learning services. It provides high-level APIs for defining an ML service and packaging its artifacts, source code, dependencies, and configurations into a production-system-friendly format that is ready for deployment.

This notebook demonstrates how to use BentoML to turn a scikit-learn model into a docker image containing a REST API server serving this model, how to use your ML service built with BentoML as a CLI tool, and how to distribute it a pypi package.


*The example is based on [this notebook](https://github.com/crawles/sentiment_analysis_twitter_model/blob/master/build-sentiment-classifier.ipynb), using dataset from [Sentiment140](http://help.sentiment140.com/for-students/)*

![Impression](https://www.google-analytics.com/collect?v=1&tid=UA-112879361-3&cid=555&t=event&ec=scikit-learn&ea=sklearn-sentiment-analysis&dt=sklearn-sentiment-analysis)

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline 

In [None]:
!pip install bentoml
!pip install sklearn pandas numpy

In [3]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score, roc_curve
from sklearn.pipeline import Pipeline

import bentoml

## Prepare Dataset

In [4]:
%%bash

if [ ! -f ./trainingandtestdata.zip ]; then
    wget -q http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip
    unzip -n trainingandtestdata.zip
fi

In [5]:
columns = ['polarity', 'tweetid', 'date', 'query_name', 'user', 'text']
dftrain = pd.read_csv('training.1600000.processed.noemoticon.csv',
                      header = None,
                      encoding ='ISO-8859-1')
dftest = pd.read_csv('testdata.manual.2009.06.14.csv',
                     header = None,
                     encoding ='ISO-8859-1')
dftrain.columns = columns
dftest.columns = columns

## Model Training

In [6]:
sentiment_lr = Pipeline([
                         ('count_vect', CountVectorizer(min_df = 100,
                                                        ngram_range = (1,1),
                                                        stop_words = 'english')), 
                         ('lr', LogisticRegression())])
sentiment_lr.fit(dftrain.text, dftrain.polarity)



Pipeline(memory=None,
         steps=[('count_vect',
                 CountVectorizer(analyzer='word', binary=False,
                                 decode_error='strict',
                                 dtype=<class 'numpy.int64'>, encoding='utf-8',
                                 input='content', lowercase=True, max_df=1.0,
                                 max_features=None, min_df=100,
                                 ngram_range=(1, 1), preprocessor=None,
                                 stop_words='english', strip_accents=None,
                                 token_pattern='(?u)\\b\\w\\w+\\b',
                                 tokenizer=None, vocabulary=None)),
                ('lr',
                 LogisticRegression(C=1.0, class_weight=None, dual=False,
                                    fit_intercept=True, intercept_scaling=1,
                                    l1_ratio=None, max_iter=100,
                                    multi_class='warn', n_jobs=None,
              

In [7]:
Xtest, ytest = dftest.text[dftest.polarity!=2], dftest.polarity[dftest.polarity!=2]
print(classification_report(ytest,sentiment_lr.predict(Xtest)))

              precision    recall  f1-score   support

           0       0.85      0.80      0.83       177
           4       0.82      0.86      0.84       182

    accuracy                           0.83       359
   macro avg       0.83      0.83      0.83       359
weighted avg       0.83      0.83      0.83       359



In [8]:
sentiment_lr.predict([Xtest[0]])

array([4])

## Create BentoService for model serving

In [9]:
%%writefile sentiment_analysis_service.py
import pandas as pd
import bentoml
from bentoml.artifact import SklearnModelArtifact
from bentoml.handlers import DataframeHandler

@bentoml.artifacts([SklearnModelArtifact('model')])
@bentoml.env(pip_dependencies=["scikit-learn", "pandas"])
class SentimentAnalysisService(bentoml.BentoService):

    @bentoml.api(DataframeHandler, typ='series')
    def predict(self, series):
        """
        predict expects pandas.Series as input
        """        
        return self.artifacts.model.predict(series)

Overwriting sentiment_analysis_service.py


## Save BentoService to file archive

In [10]:
# 1) import the custom BentoService defined above
from sentiment_analysis_service import SentimentAnalysisService

# 2) `pack` it with required artifacts
bento_service = SentimentAnalysisService.pack(
    model=sentiment_lr
)

# 3) save your BentoSerivce to file archive
saved_path = bento_service.save()

[2019-09-25 15:24:33,149] INFO - Successfully saved Bento 'SentimentAnalysisService:2019_09_25_41a744f8' to path: /Users/chaoyuyang/bentoml/repository/SentimentAnalysisService/2019_09_25_41a744f8


## Load BentoService archive from saved path

In [15]:
import bentoml

# Load exported bentoML model archive from path
loaded_bento_service = bentoml.load(saved_path)

# Call predict on the restored sklearn model
loaded_bento_service.predict(pd.Series(["good", "great"]))



array([4, 4])

# "pip install" a saved BentoService archive

BentoML user can directly pip install saved BentoML archive with `pip install {saved_path}`,  and use it as a regular python package.

In [16]:
!pip install {saved_path}

Processing /Users/chaoyuyang/bentoml/repository/SentimentAnalysisService/2019_09_25_41a744f8
Building wheels for collected packages: SentimentAnalysisService
  Building wheel for SentimentAnalysisService (setup.py) ... [?25ldone
[?25h  Stored in directory: /private/var/folders/ns/vc9qhmqx5dx_9fws7d869lqh0000gn/T/pip-ephem-wheel-cache-kqkhmej6/wheels/c6/c7/8a/7c77a55c0110a15d7bc547700159838f3350d500e463b16dee
Successfully built SentimentAnalysisService
Installing collected packages: SentimentAnalysisService
  Found existing installation: SentimentAnalysisService 2019-09-25-41a744f8
    Uninstalling SentimentAnalysisService-2019-09-25-41a744f8:
      Successfully uninstalled SentimentAnalysisService-2019-09-25-41a744f8
Successfully installed SentimentAnalysisService-2019-09-25-41a744f8


In [17]:
# Your bentoML model class name will become packaged name
import SentimentAnalysisService

svc = SentimentAnalysisService.load() # call load to ensure all artifacts are loaded
svc.predict(pd.Series(["bad", "awesome"]))



array([0, 4])

## BentoService Command Line Access

`pip install saved_path` also installs a CLI tool for accessing the BentoML service

In [18]:
!SentimentAnalysisService info

[39m{
  "name": "SentimentAnalysisService",
  "version": "2019_09_25_41a744f8",
  "created_at": "2019-09-25T22:24:33.138534Z",
  "env": {
    "conda_env": "name: bentoml-custom-conda-env\nchannels:\n- defaults\ndependencies:\n- python=3.7.3\n- pip\n- pip:\n  - bentoml[api_server]==0.4.2\n",
    "pip_dependencies": "bentoml==0.4.2\nscikit-learn\npandas"
  },
  "artifacts": [
    {
      "name": "model",
      "artifact_type": "SklearnModelArtifact"
    }
  ],
  "apis": [
    {
      "name": "predict",
      "handler_type": "DataframeHandler",
      "docs": "predict expects pandas.Series as input"
    }
  ]
}[0m


In [None]:
!SentimentAnalysisService --help

In [None]:
!SentimentAnalysisService predict --help

In [19]:
# Run prediction with sample input
!SentimentAnalysisService predict --input='["some new text, sweet noodles", "happy time", "sad day"]'

[4 4 0]


In [None]:
# OpenAPI docs for generating API Client
!SentimentAnalysisService open-api-spec

# Model Serving via REST API

#### Run REST API server locally

In [21]:
!bentoml serve {saved_path}

 * Serving Flask app "SentimentAnalysisService" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [25/Sep/2019 15:25:31] "[37mGET / HTTP/1.1[0m" 200 -
127.0.0.1 - - [25/Sep/2019 15:25:31] "[37mGET /docs.json HTTP/1.1[0m" 200 -
127.0.0.1 - - [25/Sep/2019 15:25:42] "[37mPOST /predict HTTP/1.1[0m" 200 -
127.0.0.1 - - [25/Sep/2019 15:25:44] "[37mPOST /predict HTTP/1.1[0m" 200 -
127.0.0.1 - - [25/Sep/2019 15:25:44] "[37mPOST /predict HTTP/1.1[0m" 200 -
^C


#### Send prediction request to REST API server

Run the following command in terminal to make a HTTP request to the API server:
```bash
curl -i \
--header "Content-Type: application/json" \
--request POST \
--data '["some new text, sweet noodles", "happy time", "sad day"]' \
localhost:5000/predict
```

You can also view all availabl API endpoints at [localhost:5000](localhost:5000), or look at prometheus metrics at [localhost:5000/metrics](localhost:5000/metrics) in browser.

## Run REST API server with Docker

** _Note: `docker` is not available when running in Google Colaboratory_

### 1) build docker image with saved Bento and tag it (e.g. sentiment-analysis-servicel)

In [22]:
!cd {saved_path} && docker build -t sentiment-analysis-service .

Sending build context to Docker daemon  13.19MB
Step 1/11 : FROM continuumio/miniconda3
 ---> ae46c364060f
Step 2/11 : ENTRYPOINT [ "/bin/bash", "-c" ]
 ---> Using cache
 ---> 2f135ada8e2d
Step 3/11 : EXPOSE 5000
 ---> Using cache
 ---> 738f652d09ae
Step 4/11 : RUN set -x      && apt-get update      && apt-get install --no-install-recommends --no-install-suggests -y libpq-dev build-essential      && rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 70c62a45013a
Step 5/11 : RUN conda update conda -y       && conda install pip numpy scipy       && pip install gunicorn six
 ---> Using cache
 ---> fe5d966ecc35
Step 6/11 : COPY . /bento
 ---> 73f29983b1e5
Step 7/11 : WORKDIR /bento
 ---> Running in c733e2b363cf
Removing intermediate container c733e2b363cf
 ---> 8ab3c0658d40
Step 8/11 : RUN conda env update -n base -f /bento/environment.yml
 ---> Running in 46ad4dcabd79
Collecting package metadata: ...working... done
Solving environment: ...working... 
The environment is inconsistent, plea

Removing intermediate container 46ad4dcabd79
 ---> 0d3c6d9e1972
Step 9/11 : RUN pip install -r /bento/requirements.txt
 ---> Running in cdba5db9fa17
Collecting scikit-learn (from -r /bento/requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/9f/c5/e5267eb84994e9a92a2c6a6ee768514f255d036f3c8378acfa694e9f2c99/scikit_learn-0.21.3-cp37-cp37m-manylinux1_x86_64.whl (6.7MB)
Collecting joblib>=0.11 (from scikit-learn->-r /bento/requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/cd/c1/50a758e8247561e58cb87305b1e90b171b8c767b15b12a1734001f41d356/joblib-0.13.2-py2.py3-none-any.whl (278kB)
Installing collected packages: joblib, scikit-learn
Successfully installed joblib-0.13.2 scikit-learn-0.21.3
Removing intermediate container cdba5db9fa17
 ---> 074a6de9feda
Step 10/11 : RUN if [ -f /bento/setup.sh ]; then /bin/bash -c /bento/setup.sh; fi
 ---> Running in cf7f244d65ff
Removing intermediate container cf7f244d65ff
 ---> 13a8614e8bca
Step 11/

### 2) run docker image and expose port 5000

In [None]:
!docker run -p 5000:5000 sentiment-analysis-service

[2019-09-25 22:30:26 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2019-09-25 22:30:26 +0000] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)
[2019-09-25 22:30:26 +0000] [1] [INFO] Using worker: sync
[2019-09-25 22:30:26 +0000] [10] [INFO] Booting worker with pid: 10
[2019-09-25 22:30:26 +0000] [11] [INFO] Booting worker with pid: 11
[2019-09-25 22:30:26 +0000] [12] [INFO] Booting worker with pid: 12


### 3) Similarly use the following command to query the REST server in Docker

```bash
curl -i \
--header "Content-Type: application/json" \
--request POST \
--data '["some new text, sweet noodles", "happy time", "sad day"]' \
localhost:5000/predict
```