*Python Machine Learning 2nd Edition* by [Sebastian Raschka](https://sebastianraschka.com), Packt Publishing Ltd. 2017

Code Repository: https://github.com/trungngv/python-machine-learning-book-2nd-edition

Code License: [MIT License](https://github.com/rasbt/python-machine-learning-book-2nd-edition/blob/master/LICENSE.txt)

# Python Machine Learning - Code Examples

# Week 6 - Deploying a Machine Learning Model

Slides: [https://](https://)

### Overview

- [Week 5 recap - Training a model for movie review classification](#Chapter-6-recap---Training-a-model-for-movie-review-classification)

- [Serializing fitted scikit-learn estimators](#Serializing-fitted-scikit-learn-estimators)
- [Deploying model as a Flask web application]()
    - [Local deployment]()
    - [Web server deployment]()
- [Deploying Jupyter notebook using Amazon Sage Maker]()
- Docker?
- [Cognitive services API]()
    - [API makes easy with Postman]()
    
- [Setting up a SQLite database for data storage Developing a web application with Flask](#Setting-up-a-SQLite-database-for-data-storage-Developing-a-web-application-with-Flask)
- [Our first Flask web application](#Our-first-Flask-web-application)
  - [Form validation and rendering](#Form-validation-and-rendering)
  - [Turning the movie classifier into a web application](#Turning-the-movie-classifier-into-a-web-application)
- [Deploying the web application to a public server](#Deploying-the-web-application-to-a-public-server)


# Week 5 recap - Training a model for movie review classification

This section is a recap of the logistic regression model that was trained in the last section of Chapter 5. Execute the folling code blocks to train a model that we will serialize in the next section.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv('movie_data.csv.gz')
df.head()

Unnamed: 0,review,sentiment
0,"In 1974, the teenager Martha Moxley (Maggie Gr...",1
1,OK... so... I really like Kris Kristofferson a...,0
2,"***SPOILER*** Do not read this, if you think a...",0
3,hi for all the people who have seen this wonde...,1
4,"I recently bought the DVD, forgetting just how...",0


In [2]:
X_train, X_test, y_train, y_test = train_test_split(df.review, df.sentiment, test_size=0.3)

In [9]:
import re

def tokenizer(text):
    text = re.sub('<[^>]*>', '', text)
    emoticons = re.findall('(?::|;|=)(?:-)?(?:\)|\(|D|P)', text.lower())
    text = re.sub('[\W]+', ' ', text.lower()) + ' '.join(emoticons).replace('-', '')
    return text.split()

In [10]:
from sklearn.feature_extraction.text import HashingVectorizer
from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import Pipeline

vect = HashingVectorizer(decode_error='ignore', 
                         n_features=2**21,
                         preprocessor=None,
                         tokenizer=tokenizer,
                         stop_words='english',
                         ngram_range=(1,2))
clf = SGDClassifier(loss='log', random_state=1, max_iter=1)

pipeline = Pipeline([
    ('feat', vect),
    ('clf', clf)
])

pipeline.fit(X_train, y_train)

Pipeline(memory=None,
     steps=[('feat', HashingVectorizer(alternate_sign=True, analyzer='word', binary=False,
         decode_error='ignore', dtype=<class 'numpy.float64'>,
         encoding='utf-8', input='content', lowercase=True,
         n_features=2097152, ngram_range=(1, 2), non_negative=False,
         norm='l2', pr...lty='l2', power_t=0.5, random_state=1, shuffle=True,
       tol=None, verbose=0, warm_start=False))])

**Note**

If you are using scikit-learn < 0.19, please replace `n_iter` by `max_iter` in the code example above.

In [None]:
print('Accuracy: %.3f' % pipeline.score(X_test, y_test))

# Serializing fitted scikit-learn estimators

After we trained the logistic regression model as shown above, we now save the pipeline as a serialized object to our local disk so that we can use the fitted classifier in our web application later.

In [11]:
from sklearn.externals import joblib
import os

joblib.dump(pipeline, os.path.join('movieclassifier', 'classifier.pkl'))

['movieclassifier/classifier.pkl']

In [12]:
!ls -thal movieclassifier/

total 32792
-rw-r--r--  1 trung  staff    16M Sep  3 21:59 classifier.pkl
drwxr-xr-x  8 trung  staff   272B Sep  3 21:59 [34m..[m[m
drwxr-xr-x  4 trung  staff   136B Sep  3 21:53 [34m__pycache__[m[m
drwxr-xr-x  5 trung  staff   170B Sep  3 21:53 [34m.ipynb_checkpoints[m[m
drwxr-xr-x  9 trung  staff   306B Sep  3 21:53 [34m.[m[m
-rw-r--r--  1 trung  staff     0B Sep  3 21:53 __init__.py
-rwxr-xr-x  1 trung  staff   1.0K Sep  3 21:51 [31mapp.py[m[m
-rw-r--r--  1 trung  staff   253B Sep  3 21:36 preprocess.py
-rw-r--r--  1 trung  staff     0B Sep  3 21:28 movie_classifier.log


Try loading the classifier and make a prediction.

In [13]:
clf = joblib.load(os.path.join('movieclassifier', 'classifier.pkl'))

In [14]:
import numpy as np
label = {0:'negative', 1:'positive'}

example = ['I love this movie']
print('Prediction: %s\nProbability: %.2f%%' %\
      (label[clf.predict(example)[0]], 
       np.max(clf.predict_proba(example))*100))

Prediction: positive
Probability: 91.91%


# Deploying model as a Flask web application

Flask: http://flask.pocoo.org/

Install by 

    * pip install Flask

To run the web applications locally, `cd` into the respective directory (as listed above) and execute the main-application script, for example,

    cd ./movie_classifier
    python3 app.py
    
Now, you should see something like
    
     * Running on http://127.0.0.1:5000/
     * Restarting with reloader
     
in your terminal.
Next, open a web browsert and enter the address displayed in your terminal (typically http://127.0.0.1:5000/) to view the web application.