# ML Model in Production

Objective : To teach people how Machine Learning in Practice

#### We will start using a common approach of building Machine Learning model in Jupyter Notebook using Sklearn

In [1]:
# Dataframe
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Load Data

In [6]:
trainFile = "dataset/iris.data"

In [7]:
train = pd.read_csv(trainFile,delimiter=',',names=["sepal_length", "sepal_width", "petal_length", "petal_width","class"])

In [8]:
train.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


# Train Test Split

In [28]:
feature_names = list(train.columns)
do_not_use_for_training = ['class']

feature_names = [f for f in train.columns if f not in do_not_use_for_training]

print('Total features : {}'.format(len(feature_names)))

train[feature_names].count()
y = train['class']

Total features : 4


In [29]:
Xtr, Xv, ytr, yv = train_test_split(train[feature_names], y, test_size=0.3, random_state=1987)

# Build Model

In [30]:
RF = RandomForestClassifier(random_state=1987)

In [31]:
model_rf = RF.fit(Xtr,ytr)

# Evaluation

## Train Score

In [32]:
model_rf.score(Xtr, ytr)

1.0

## Test Score

In [33]:
model_rf.score(Xv, yv)

0.9333333333333333

## Predict Test Data

In [34]:
Xv['prediction'] = model_rf.predict(Xv)
Xv['real_class'] = yv

In [35]:
Xv.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,prediction,real_class
61,5.9,3.0,4.2,1.5,Iris-versicolor,Iris-versicolor
115,6.4,3.2,5.3,2.3,Iris-virginica,Iris-virginica
24,4.8,3.4,1.9,0.2,Iris-setosa,Iris-setosa
124,6.7,3.3,5.7,2.1,Iris-virginica,Iris-virginica
79,5.7,2.6,3.5,1.0,Iris-versicolor,Iris-versicolor


# Now What?

A lot of online course teach machine learning, stop the lecture and materials up until evaluating, and continue to optimize the accuracy.

This creates confusion for new Data Scientists that Machine Learning is "an art of optimizing score of Testing dataset in Notebook environment". 

In fact, machine learning should be part of a system, a code or application so that it will impact customer experience and increase business value.

# 1. Save the model as a file

In [36]:
from sklearn.externals import joblib

In [37]:
joblib.dump(model_rf, 'ml_model/model_rf_iris.pkl')

['model_rf_iris.pkl']

Now we have a file named "model_rf_iris.pkl"

### What is this file for?
Usually this file will be copied to other code based outside of this notebook. But for tutorial, we will use same notebook , but pretend that anything below this line is in a different environment.


Notes:
To make sure anything written below is independent from any variable above, you can restart the Kernel, and start running below codes, without run above codes

## Types of Production ML
1. Batch
2. Serve as API

# 1. Batch

In [9]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.externals import joblib

## "E" xtract

In [9]:
file = "dataset/iris_new_data.data"
new_data = pd.read_csv(file,delimiter=',',names=["sepal_length", "sepal_width", "petal_length", "petal_width"])

In [10]:
new_data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,6.7,3.1,5.6,2.4
1,6.6,2.9,4.6,1.3
2,4.5,2.3,1.3,0.3


## "T" ransform

In [12]:
rf_model_load = joblib.load('ml_model/model_rf_iris.pkl')

In [13]:
new_data['prediction'] = rf_model_load.predict(new_data)

In [14]:
new_data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,prediction
0,6.7,3.1,5.6,2.4,Iris-virginica
1,6.6,2.9,4.6,1.3,Iris-versicolor
2,4.5,2.3,1.3,0.3,Iris-setosa


## "L" oad

In [54]:
new_data.to_csv("predicted_iris_data.csv", sep=',',index=False)

### Its most common the code not in Notebook format, so it should be like this in a .py file format

In [57]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.externals import joblib

# Extract
# This lines could be codes that extract or load data from any database directly
file = "dataset/iris_new_data.data"
new_data = pd.read_csv(file,delimiter=',',names=["sepal_length", "sepal_width", "petal_length", "petal_width"])

#Transform
rf_model_load = joblib.load('ml_model/model_rf_iris.pkl')
new_data['prediction'] = rf_model_load.predict(new_data)

#Load
# This lines could be codes that Export data to any database directly
new_data.to_csv("predicted_iris_data.csv", sep=',',index=False)

print("Job Success")

Job Success


### Summary Batch
Above codes can be scheduled using server scheduler, or have User interface to trigger the ETL job.

Extract and Load part of code can be called from any datasources to any datasource

# 2. Serve as API

In [4]:
from flask import Flask, jsonify,request,session
from sklearn.externals import joblib
import pandas as pd
import json

import os.path

## Web Apps using Flask

In [5]:
app = Flask(__name__)

Create Functions to Get data from POST input, Predict the input and return the preduction in JSON format 

In [6]:
@app.route('/predict_ml', methods=['POST'])
def predict_ml():
    if request.method=='POST':
        data = request.data
        dataDict = json.loads(data)
        
        pandas_df = pd.DataFrame([dataDict])
        rf_model_load = joblib.load('ml_model/model_rf_iris.pkl')
        prediction = rf_model_load.predict(pandas_df)[0]
        print(prediction)
        
        return jsonify({'prediction': prediction})

## Start below lines to start the web application

Notes that below code is a running service, means that it will never finish as a process. It will always run until you decide to stop the service.

In [4]:
if __name__ == '__main__':
     app.run(host='localhost',port=8080)

 * Running on http://localhost:8080/ (Press CTRL+C to quit)
127.0.0.1 - - [24/Jul/2019 13:41:36] "[37mPOST /predict_ml HTTP/1.1[0m" 200 -


Iris-versicolor


127.0.0.1 - - [24/Jul/2019 13:41:46] "[37mPOST /predict_ml HTTP/1.1[0m" 200 -


Iris-versicolor


127.0.0.1 - - [24/Jul/2019 13:41:50] "[37mPOST /predict_ml HTTP/1.1[0m" 200 -


Iris-versicolor


127.0.0.1 - - [24/Jul/2019 13:41:55] "[37mPOST /predict_ml HTTP/1.1[0m" 200 -


Iris-setosa


# In Practice

Again, its not a common practice to start a serving application in Jupyter Notebook!

Usually above codes will be run in a web server written in a .py code like below

In [1]:
from flask import Flask, jsonify,request,session
from sklearn.externals import joblib
import pandas as pd
import json

import os.path

app = Flask(__name__)

@app.route('/predict_ml', methods=['POST'])
def predict_ml():
    if request.method=='POST':
        data = request.data
        dataDict = json.loads(data)
        
        pandas_df = pd.DataFrame([dataDict])
        rf_model_load = joblib.load('ml_model/model_rf_iris.pkl')
        prediction = rf_model_load.predict(pandas_df)[0]
        print(prediction)
        
        return jsonify({'prediction': prediction})
    
if __name__ == '__main__':
     app.run(host='localhost',port=8080)

 * Running on http://localhost:8080/ (Press CTRL+C to quit)
127.0.0.1 - - [24/Jul/2019 13:49:33] "[37mPOST /predict_ml HTTP/1.1[0m" 200 -


Iris-versicolor


# What is that really?

Well, up until this point, this is Software Engineer area. It's common for a Software Engineer to use REST API like above.

For example, you can call the link in a REST API Client, looks like below. This shows your model url linked being called, and has 4 inputs variable. When the request made, the response shows the prediction result. 


![alt text](ML Model in Production/screenshot call API.jpg "Title")

# Summary

1. Machine Learning is not only "An art of increasing accuracy in a notebook", it is a part of software engineering

2. There are 2 common methods to use Machine Learning model in production Batch and REST API

3. Hope this tutorial helping you