<a href="https://colab.research.google.com/github/charithcherry/Internship-and-College-Repo-/blob/master/ML%20Ground/MLOPS/BentoML/scikit-learn/sentiment-analysis/sklearn-sentiment-analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# BentoML Example: Sentiment Analysis with Scikit-learn

**BentoML makes moving trained ML models to production easy:**

* Package models trained with **any ML framework** and reproduce them for model serving in production
* **Deploy anywhere** for online API serving or offline batch serving
* High-Performance API model server with *adaptive micro-batching* support
* Central hub for managing models and deployment process via Web UI and APIs
* Modular and flexible design making it *adaptable to your infrastrcuture*

BentoML is a framework for serving, managing, and deploying machine learning models. It is aiming to bridge the gap between Data Science and DevOps, and enable teams to deliver prediction services in a fast, repeatable, and scalable way.

Before reading this example project, be sure to check out the [Getting started guide](https://github.com/bentoml/BentoML/blob/master/guides/quick-start/bentoml-quick-start-guide.ipynb) to learn about the basic concepts in BentoML.

This notebook demonstrates how to use BentoML to turn a scikit-learn model into a docker image containing a REST API server serving this model, how to use your ML service built with BentoML as a CLI tool, and how to distribute it a pypi package.


*The example is based on [this notebook](https://github.com/crawles/sentiment_analysis_twitter_model/blob/master/build-sentiment-classifier.ipynb), using dataset from [Sentiment140](http://help.sentiment140.com/for-students/)*

![Impression](https://www.google-analytics.com/collect?v=1&tid=UA-112879361-3&cid=555&t=event&ec=scikit-learn&ea=scikit-learn-sentiment-analysis&dt=scikit-learn-sentiment-analysis)

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline 

In [2]:
!pip install -q bentoml 'scikit-learn>=0.23.2' 'pandas>=1.1.1' 'numpy>=1.8.2'

[K     |████████████████████████████████| 4.0 MB 13.0 MB/s 
[K     |████████████████████████████████| 22.3 MB 36 kB/s 
[K     |████████████████████████████████| 108 kB 49.5 MB/s 
[K     |████████████████████████████████| 79 kB 7.5 MB/s 
[K     |████████████████████████████████| 1.3 MB 39.0 MB/s 
[K     |████████████████████████████████| 1.3 MB 38.0 MB/s 
[K     |████████████████████████████████| 164 kB 49.5 MB/s 
[K     |████████████████████████████████| 86 kB 6.0 MB/s 
[K     |████████████████████████████████| 131 kB 56.3 MB/s 
[K     |████████████████████████████████| 63 kB 2.7 MB/s 
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
[K     |████████████████████████████████| 131 kB 50.1 MB/s 
[K     |████████████████████████████████| 146 kB 50.9 MB/s 
[K     |████████████████████████████████| 142 kB 49.2 MB/s 
[K     |████████████████████████████████| 294

In [3]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score, roc_curve
from sklearn.pipeline import Pipeline

import bentoml

## Prepare Dataset

In [4]:
%%bash

if [ ! -f ./trainingandtestdata.zip ]; then
    wget -q http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip
    unzip -n trainingandtestdata.zip
fi

Archive:  trainingandtestdata.zip
  inflating: testdata.manual.2009.06.14.csv  
  inflating: training.1600000.processed.noemoticon.csv  


In [5]:
columns = ['polarity', 'tweetid', 'date', 'query_name', 'user', 'text']
dftrain = pd.read_csv('training.1600000.processed.noemoticon.csv',
                      header = None,
                      encoding ='ISO-8859-1')
dftest = pd.read_csv('testdata.manual.2009.06.14.csv',
                     header = None,
                     encoding ='ISO-8859-1')
dftrain.columns = columns
dftest.columns = columns

In [23]:
dftrain[dftrain.polarity==0].text

0         @switchfoot http://twitpic.com/2y1zl - Awww, t...
1         is upset that he can't update his Facebook by ...
2         @Kenichan I dived many times for the ball. Man...
3           my whole body feels itchy and like its on fire 
4         @nationwideclass no, it's not behaving at all....
                                ...                        
799995    Sick  Spending my day laying in bed listening ...
799996                                      Gmail is down? 
799997                        rest in peace Farrah! So sad 
799998    @Eric_Urbane Sounds like a rival is flagging y...
799999    has to resit exams over summer...  wishes he w...
Name: text, Length: 800000, dtype: object

## Model Training

In [6]:
sentiment_lr = Pipeline([
                         ('count_vect', CountVectorizer(min_df = 100,
                                                        ngram_range = (1,2),
                                                        stop_words = 'english')), 
                         ('lr', LogisticRegression())])
sentiment_lr.fit(dftrain.text, dftrain.polarity)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


Pipeline(steps=[('count_vect',
                 CountVectorizer(min_df=100, ngram_range=(1, 2),
                                 stop_words='english')),
                ('lr', LogisticRegression())])

In [7]:
Xtest, ytest = dftest.text[dftest.polarity!=2], dftest.polarity[dftest.polarity!=2]
print(classification_report(ytest,sentiment_lr.predict(Xtest)))

              precision    recall  f1-score   support

           0       0.87      0.82      0.84       177
           4       0.83      0.88      0.86       182

    accuracy                           0.85       359
   macro avg       0.85      0.85      0.85       359
weighted avg       0.85      0.85      0.85       359



In [31]:
sentiment_lr.predict([Xtest[11]])

array([0])

In [32]:
Xtest[11]

"@Karoli I firmly believe that Obama/Pelosi have ZERO desire to be civil.  It's a charade and a slogan, but they want to destroy conservatism"

## Create BentoService for model serving

In [9]:
%%writefile sentiment_analysis_service.py
import pandas as pd
import bentoml
from bentoml.frameworks.sklearn import SklearnModelArtifact
from bentoml.service.artifacts.common import PickleArtifact
from bentoml.handlers import DataframeHandler
from bentoml.adapters import DataframeInput

@bentoml.artifacts([PickleArtifact('model')])
@bentoml.env(pip_packages=["scikit-learn", "pandas"])
class SKSentimentAnalysis(bentoml.BentoService):

    @bentoml.api(input=DataframeInput(), batch=True)
    def predict(self, df):
        """
        predict expects pandas.Series as input
        """        
        series = df.iloc[0,:]
        return self.artifacts.model.predict(series)

Writing sentiment_analysis_service.py


## Save BentoService to file archive

In [10]:
# 1) import the custom BentoService defined above
from sentiment_analysis_service import SKSentimentAnalysis

# 2) `pack` it with required artifacts
bento_service = SKSentimentAnalysis()
bento_service.pack('model', sentiment_lr)

# 3) save your BentoSerivce to file archive
saved_path = bento_service.save()



  """)


[2021-08-26 09:37:36,474] INFO - BentoService bundle 'SKSentimentAnalysis:20210826093710_6B5E44' saved to: /root/bentoml/repository/SKSentimentAnalysis/20210826093710_6B5E44


## REST API Model Serving


To start a REST API model server with the BentoService saved above, use the bentoml serve command:

In [13]:
!bentoml serve SKSentimentAnalysis:latest

  """)
[2021-08-26 09:40:03,318] INFO - Getting latest version SKSentimentAnalysis:20210826093710_6B5E44
[2021-08-26 09:40:03,331] INFO - Starting BentoML API proxy in development mode..
[2021-08-26 09:40:03,333] INFO - Starting BentoML API server in development mode..
[2021-08-26 09:40:03,564] INFO - Micro batch enabled for API `predict` max-latency: 20000 max-batch-size 4000
[2021-08-26 09:40:03,564] INFO - Your system nofile limit is 1048576, which means each instance of microbatch service is able to hold this number of connections at same time. You can increase the number of file descriptors for the server process, or launch more microbatch instances to accept more concurrent connection.
(Press CTRL+C to quit)
 * Serving Flask app "SKSentimentAnalysis" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off
 * Running on http://127.0.0.1:33651/ (Press CTRL+C to quit)

Aborted!


If you are running this notebook from Google Colab, you can start the dev server with `--run-with-ngrok` option, to gain acccess to the API endpoint via a public endpoint managed by [ngrok](https://ngrok.com/):

In [33]:
!bentoml serve SKSentimentAnalysis:latest --run-with-ngrok

  """)
[2021-08-26 09:56:53,153] INFO - Getting latest version SKSentimentAnalysis:20210826093710_6B5E44
[2021-08-26 09:56:53,169] INFO - Starting BentoML API proxy in development mode..
[2021-08-26 09:56:53,171] INFO - Starting BentoML API server in development mode..
[2021-08-26 09:56:53,413] INFO - Micro batch enabled for API `predict` max-latency: 20000 max-batch-size 4000
[2021-08-26 09:56:53,413] INFO - Your system nofile limit is 1048576, which means each instance of microbatch service is able to hold this number of connections at same time. You can increase the number of file descriptors for the server process, or launch more microbatch instances to accept more concurrent connection.
(Press CTRL+C to quit)
[2021-08-26 09:56:55,181] INFO -  * Running on http://70a5-35-187-174-139.ngrok.io
[2021-08-26 09:56:55,182] INFO -  * Traffic stats available on http://127.0.0.1:4040
 * Serving Flask app "SKSentimentAnalysis" (lazy loading)
 * Environment: production
[2m   Use a production

In [15]:
!curl -i  --header "Content-Type: application/json"  --request POST \ --data '["some new text, sweet noodles", "happy time", "sad day"]' \ localhost:5000/predict

curl: (3) Host name ' --header' contains bad letter
curl: (3) Port number ended with ' '
curl: (3) Host name ' --request' contains bad letter
curl: (6) Could not resolve host: POST
curl: (3) Host name ' --data' contains bad letter
curl: (3) [globbing] bad range specification in column 2
curl: (3) Host name ' localhost' contains bad letter


#### Send prediction request to REST API server

Run the following command in terminal to make a HTTP request to the API server:
```bash
curl -i \
--header "Content-Type: application/json" \
--request POST \
--data '["some new text, sweet noodles", "happy time", "sad day"]' \
localhost:5000/predict
```

You can also view all availabl API endpoints at [localhost:5000](localhost:5000), or look at prometheus metrics at [localhost:5000/metrics](localhost:5000/metrics) in browser.

## Containerize model server with Docker


One common way of distributing this model API server for production deployment, is via Docker containers. And BentoML provides a convenient way to do that.

Note that docker is **not available in Google Colab**. You will need to download and run this notebook locally to try out this containerization with docker feature.

If you already have docker configured, simply run the follow command to product a docker container serving the IrisClassifier prediction service created above:

In [None]:
!bentoml containerize SKSentimentAnalysis:latest

[2020-09-22 15:19:51,428] INFO - Getting latest version SKSentimentAnalysis:20200922150740_665E0F
[39mFound Bento: /Users/bozhaoyu/bentoml/repository/SKSentimentAnalysis/20200922150740_665E0F[0m
[39mTag not specified, using tag parsed from BentoService: 'sksentimentanalysis:20200922150740_665E0F'[0m
Building Docker image sksentimentanalysis:20200922150740_665E0F from SKSentimentAnalysis:latest 
-we in here
processed docker file
(None, None)
root in create archive /Users/bozhaoyu/bentoml/repository/SKSentimentAnalysis/20200922150740_665E0F ['Dockerfile', 'MANIFEST.in', 'README.md', 'SKSentimentAnalysis', 'SKSentimentAnalysis/__init__.py', 'SKSentimentAnalysis/__pycache__', 'SKSentimentAnalysis/__pycache__/sentiment_analysis_service.cpython-37.pyc', 'SKSentimentAnalysis/artifacts', 'SKSentimentAnalysis/artifacts/__init__.py', 'SKSentimentAnalysis/artifacts/model.pkl', 'SKSentimentAnalysis/bentoml.yml', 'SKSentimentAnalysis/sentiment_analysis_service.py', 'bentoml-init.sh', 'bentoml.y

\[39mCollecting pandas==0.24.2[0m
[39m  Downloading pandas-0.24.2-cp37-cp37m-manylinux1_x86_64.whl (10.1 MB)[0m
[39mCollecting threadpoolctl>=2.0.0[0m
/[39m  Downloading threadpoolctl-2.1.0-py3-none-any.whl (12 kB)[0m
|[39mCollecting joblib>=0.11[0m
[39m  Downloading joblib-0.16.0-py3-none-any.whl (300 kB)[0m
|[39mCollecting scipy>=0.19.1[0m
[39m  Downloading scipy-1.5.2-cp37-cp37m-manylinux1_x86_64.whl (25.9 MB)[0m
|[39mCollecting pytz>=2011k[0m
\[39m  Downloading pytz-2020.1-py2.py3-none-any.whl (510 kB)[0m


|[39mInstalling collected packages: threadpoolctl, joblib, scipy, scikit-learn, pytz, pandas[0m
/[39mSuccessfully installed joblib-0.16.0 pandas-0.24.2 pytz-2020.1 scikit-learn-0.23.0 scipy-1.5.2 threadpoolctl-2.1.0[0m
-[39m ---> 46ea3f6e2f1a[0m
[39mStep 8/15 : COPY . /bento[0m
\[39m ---> a21c7871ef61[0m
[39mStep 9/15 : RUN if [ -d /bento/bundled_pip_dependencies ]; then pip install -U bundled_pip_dependencies/* ;fi[0m
-[39m ---> Running in 9e7480b0a9a0[0m
|[39mProcessing ./bundled_pip_dependencies/BentoML-0.9.0rc0+3.gcebf2015.tar.gz[0m
|[39m  Installing build dependencies: started[0m
\[39m  Installing build dependencies: finished with status 'done'[0m
[39m  Getting requirements to build wheel: started[0m
/[39m  Getting requirements to build wheel: finished with status 'done'[0m
[39m    Preparing wheel metadata: started[0m
\[39m    Preparing wheel metadata: finished with status 'done'[0m


[39mBuilding wheels for collected packages: BentoML[0m
[39m  Building wheel for BentoML (PEP 517): started[0m
\[39m  Building wheel for BentoML (PEP 517): finished with status 'done'[0m
[39m  Created wheel for BentoML: filename=BentoML-0.9.0rc0+3.gcebf2015-py3-none-any.whl size=3064091 sha256=6ecc0cd97b1040685993d1442b121c6673cf956bb30836265b35a79aae78a9d3[0m
[39m  Stored in directory: /root/.cache/pip/wheels/a0/45/41/62152db705af4ff47e7a3d6abf6247986eef4aa1b94a58d3b9[0m
[39mSuccessfully built BentoML[0m
\[39mInstalling collected packages: BentoML
  Attempting uninstall: BentoML[0m
[39m    Found existing installation: BentoML 0.9.0rc0[0m
|[39m    Uninstalling BentoML-0.9.0rc0:[0m
-[39m      Successfully uninstalled BentoML-0.9.0rc0[0m
|[39mSuccessfully installed BentoML-0.9.0rc0+3.gcebf2015[0m
/[39m ---> bb136663f2f0[0m
[39mStep 10/15 : ENV PORT 5000[0m
\[39m ---> Running in c66e6adb4b02[0m
-[39m ---> a24979b816f6[0m
[39mStep 11/15 : EXPOSE $PORT

In [None]:
!docker run -p 5000:5000 sksentimentanalysis:20200922150740_665E0F

[2020-09-22 22:24:57,127] INFO - Starting BentoML API server in production mode..
[2020-09-22 22:24:57,567] INFO - get_gunicorn_num_of_workers: 3, calculated by cpu count
[2020-09-22 22:24:57 +0000] [1] [INFO] Starting gunicorn 20.0.4
[2020-09-22 22:24:57 +0000] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)
[2020-09-22 22:24:57 +0000] [1] [INFO] Using worker: sync
[2020-09-22 22:24:57 +0000] [11] [INFO] Booting worker with pid: 11
[2020-09-22 22:24:57 +0000] [12] [INFO] Booting worker with pid: 12
[2020-09-22 22:24:57 +0000] [13] [INFO] Booting worker with pid: 13
^C
[2020-09-22 22:25:07 +0000] [23] [INFO] Booting worker with pid: 23


## Load saved BentoService

bentoml.load is the API for loading a BentoML packaged model in python:

In [None]:
import bentoml
import pandas as pd

# Load exported bentoML model archive from path
loaded_bento_service = bentoml.load(saved_path)

# Call predict on the restored sklearn model
loaded_bento_service.predict(pd.DataFrame(data=["good", "great"]))



array([4])

## Launch inference job from CLI

BentoML cli supports loading and running a packaged model from CLI. With the DataframeInput adapter, the CLI command supports reading input Dataframe data from CLI argument or local csv or json files:

In [None]:
!bentoml run SKSentimentAnalysis:latest predict \
--input '["some new text, sweet noodles", "happy time", "sad day"]'

[2020-09-22 15:25:33,640] INFO - Getting latest version SKSentimentAnalysis:20200922150740_665E0F
[2020-09-22 15:25:34,544] INFO - Using default docker base image: `None` specified inBentoML config file or env var. User must make sure that the docker base image either has Python 3.7 or conda installed.
[2020-09-22 15:25:44,431] INFO - {'service_name': 'SKSentimentAnalysis', 'service_version': '20200922150740_665E0F', 'api': 'predict', 'task': {'data': {}, 'task_id': 'd5aebb09-b388-4e19-8a9f-2a364da90a54', 'batch': 3, 'cli_args': ('--input', '["some new text, sweet noodles", "happy time", "sad day"]')}, 'result': {'data': '[4, 4, 4]', 'http_status': 200, 'http_headers': (('Content-Type', 'application/json'),)}, 'request_id': 'd5aebb09-b388-4e19-8a9f-2a364da90a54'}
[4, 4, 4]


# Deployment Options

If you are at a small team with limited engineering or DevOps resources, try out automated deployment with BentoML CLI, currently supporting AWS Lambda, AWS SageMaker, and Azure Functions:
- [AWS Lambda Deployment Guide](https://docs.bentoml.org/en/latest/deployment/aws_lambda.html)
- [AWS SageMaker Deployment Guide](https://docs.bentoml.org/en/latest/deployment/aws_sagemaker.html)
- [Azure Functions Deployment Guide](https://docs.bentoml.org/en/latest/deployment/azure_functions.html)

If the cloud platform you are working with is not on the list above, try out these step-by-step guide on manually deploying BentoML packaged model to cloud platforms:
- [AWS ECS Deployment](https://docs.bentoml.org/en/latest/deployment/aws_ecs.html)
- [Google Cloud Run Deployment](https://docs.bentoml.org/en/latest/deployment/google_cloud_run.html)
- [Azure container instance Deployment](https://docs.bentoml.org/en/latest/deployment/azure_container_instance.html)
- [Heroku Deployment](https://docs.bentoml.org/en/latest/deployment/heroku.html)

Lastly, if you have a DevOps or ML Engineering team who's operating a Kubernetes or OpenShift cluster, use the following guides as references for implementating your deployment strategy:
- [Kubernetes Deployment](https://docs.bentoml.org/en/latest/deployment/kubernetes.html)
- [Knative Deployment](https://docs.bentoml.org/en/latest/deployment/knative.html)
- [Kubeflow Deployment](https://docs.bentoml.org/en/latest/deployment/kubeflow.html)
- [KFServing Deployment](https://docs.bentoml.org/en/latest/deployment/kfserving.html)
- [Clipper.ai Deployment Guide](https://docs.bentoml.org/en/latest/deployment/clipper.html)