## Machine Learning in Production - Part II

### Preparing the ML Models

Note: This exercise will not be a perfect machine learning approach, but rather a framework to go about `Deploying Machine Learning models`

We'll be dealing with `boston` dataset and creating an __ML Model__ to be deployed.

- Loading the dataset:

In [1]:
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
boston = load_boston()

X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, random_state=0)

In [2]:
X_train.shape

(379, 13)

In [3]:
X_test.shape

(127, 13)

In [4]:
y_train.shape

(379,)

In [5]:
y_test.shape

(127,)

- Creating functions to do preprocessing & create ML Models

In [13]:
from sklearn.linear_model import Ridge

In [14]:
ridge = Ridge()

In [15]:
ridge.fit(X_train,y_train)

Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [16]:
print("Train set score: {:.2f}".format(ridge.score(X_train, y_train)))
print("Test set score: {:.2f}".format(ridge.score(X_test, y_test)))

Train set score: 0.77
Test set score: 0.63


In [None]:
import numpy as np
np.savetxt("./data/X_test.csv", X_test, delimiter=",")

- To serialize the model with `pickle`

In [28]:
import pickle
pickled_model = "model1.pk"

In [29]:
import os

In [30]:
os.getcwd()

'/home/pratos/greyatom_final/day30_machine_learning_in_production_two'

In [31]:
pickle.dump(ridge, open(os.getcwd()+'/models/'+str(pickled_model), 'wb'))

- Just to check whether, the model pickling works and we are able to get predictions:

In [35]:
loaded_model = pickle.load(open(os.getcwd()+'/models/'+str(pickled_model), 'rb'))
result = loaded_model.score(X_test, y_test)
print("Test set score: {:.2f}".format(result))

Test set score: 0.63


Comparing it with the actual ridge test output, we can confirm that it is the same!

- Serializing model using `joblib`

In [25]:
from sklearn.externals import joblib

In [32]:
pickled_model2 = 'model2.pk'
joblib.dump(ridge, os.getcwd()+'/models/'+str(pickled_model2))

['/home/pratos/greyatom_final/day30_machine_learning_in_production_two/models/model2.pk']

Similar to what we did before, we'll be trying loading model with `joblib`.

In [38]:
loaded_model = joblib.load(os.getcwd()+'/models/'+str(pickled_model2))
result = loaded_model.score(X_test, y_test)
print("Test set score: {:.2f}".format(result))

Test set score: 0.63


#### Writing functions for Model Deployment

We'll create a function that works as a base to be implemented as an API.

In [43]:
def apicall():
    """API Call

    Pandas dataframe (sent as a payload) from API Call
    """
    try:
        test_json = request.get_json()
        test = pd.read_json(test_json, orient='split')
    except Exception as e:
        raise e

    clf = 'model1.pk'

    #Load the saved model
    loaded_model = pickle.load(open(os.getcwd()+'/models/'+str(clf), 'rb'))
    predictions = loaded_model.predict(test)

    """Add the predictions as Series to a new pandas dataframe
                            OR
       Depending on the use-case, the entire test data appended with the new files
    """
    prediction_series = pd.Series(predictions)

    """We can be as creative in sending the responses.
       But we need to send the response codes as well.
    """
    responses = jsonify(predictions=prediction_series.to_json())
    responses.status_code = 200

    return (responses)

The internal specifications might change while implementing in `Flask` or `Hug`, but above is the skeleton by which we should abide.

***
***

### Flask API

There are a few python modules you need to install before creating Flask APIs.

- flask: To create Web API
- gunicorn: Serve the Web API

We'll be deploying the API locally as well as on AWS. Locally it is pretty easy task, while on AWS we have task to setup and do a few installations.

As a rule of thumb, hereon each application/API that we create would have its own virtual environment so as to keep the local and deployment environments same.

1. Create a `virtual environment` using `conda distribution`.
    - `conda create --name flask_api python=2.7`
    - __NOTE:__ An environment file has already been provided, do: `conda env -f flask_api.yml` to setup the environment
    
    
2. Below is the directory structure for the basic API:

![Struct](./images/flask1.png)

### Local deployment:

Run the command: `gunicorn -w <number of workers> --bind <ip --> 0.0.0.0:8000> <python file name>:app`

![local](./images/flasklocal.png)

- Let's query using `requests` module:

In [1]:
import json
import requests
import pandas as pd

In [25]:
header = {'Content-Type': 'application/json', \
                  'Accept': 'application/json'}

df = pd.read_csv('./data/X_test.csv')
#df = pd.DataFrame()
data = df.to_json(orient='split')
resp = requests.post("http://192.168.99.100:5000/predict", \
                    data = json.dumps(data),\
                    headers= header)

In [26]:
resp.status_code

200

In [27]:
resp.content

'{\n  "predictions": "{\\"0\\":23.1970246692,\\"1\\":28.7831062489,\\"2\\":12.0389356585,\\"3\\":20.7348465382,\\"4\\":19.9706031476,\\"5\\":20.1416154764,\\"6\\":21.8972532839,\\"7\\":19.1367463674,\\"8\\":19.6964533114,\\"9\\":4.9026668722,\\"10\\":15.224867093,\\"11\\":17.3341126078,\\"12\\":5.3119671669,\\"13\\":39.4253347312,\\"14\\":32.2486492312,\\"15\\":21.6364944381,\\"16\\":36.226486366,\\"17\\":31.1512965416,\\"18\\":23.4883343239,\\"19\\":25.044188137,\\"20\\":23.8016226456,\\"21\\":20.5146124596,\\"22\\":30.2995961069,\\"23\\":22.3245598983,\\"24\\":9.286593858,\\"25\\":17.8754575572,\\"26\\":19.3292449293,\\"27\\":35.3485825098,\\"28\\":20.4004817178,\\"29\\":17.6117918717,\\"30\\":18.0904605832,\\"31\\":19.4512519069,\\"32\\":23.3022112769,\\"33\\":28.7336101301,\\"34\\":19.82816689,\\"35\\":10.9718782534,\\"36\\":24.3825460699,\\"37\\":16.5364883863,\\"38\\":14.3757529879,\\"39\\":25.6562015566,\\"40\\":20.7218223048,\\"41\\":22.2365756404,\\"42\\":14.5979761327,\\"43\\

Push the code in a Github Repository and prepare for the next deployment on AWS.

### AWS Deployment

- As mentioned in the slides, follow the steps and deploy the API to AWS
- To confirm whether the API works or not, run it with the code below:

In [43]:
import pandas as pd
import json
import requests

header = {'Content-Type': 'application/json', \
                  'Accept': 'application/json'}

df = pd.read_csv('./data/X_test.csv')
#df = pd.DataFrame()
data = df.to_json(orient='split')
resp = requests.post("http://54.85.55.6:5000/predict", \
                    data = json.dumps(data),\
                    headers= header)

In [44]:
resp.status_code

200

In [45]:
resp.content

'"{\\"predictions\\": \\"{\\\\\\"0\\\\\\":23.1970246692,\\\\\\"1\\\\\\":28.7831062489,\\\\\\"2\\\\\\":12.0389356585,\\\\\\"3\\\\\\":20.7348465382,\\\\\\"4\\\\\\":19.9706031476,\\\\\\"5\\\\\\":20.1416154764,\\\\\\"6\\\\\\":21.8972532839,\\\\\\"7\\\\\\":19.1367463674,\\\\\\"8\\\\\\":19.6964533114,\\\\\\"9\\\\\\":4.9026668722,\\\\\\"10\\\\\\":15.224867093,\\\\\\"11\\\\\\":17.3341126078,\\\\\\"12\\\\\\":5.3119671669,\\\\\\"13\\\\\\":39.4253347312,\\\\\\"14\\\\\\":32.2486492312,\\\\\\"15\\\\\\":21.6364944381,\\\\\\"16\\\\\\":36.226486366,\\\\\\"17\\\\\\":31.1512965416,\\\\\\"18\\\\\\":23.4883343239,\\\\\\"19\\\\\\":25.044188137,\\\\\\"20\\\\\\":23.8016226456,\\\\\\"21\\\\\\":20.5146124596,\\\\\\"22\\\\\\":30.2995961069,\\\\\\"23\\\\\\":22.3245598983,\\\\\\"24\\\\\\":9.286593858,\\\\\\"25\\\\\\":17.8754575572,\\\\\\"26\\\\\\":19.3292449293,\\\\\\"27\\\\\\":35.3485825098,\\\\\\"28\\\\\\":20.4004817178,\\\\\\"29\\\\\\":17.6117918717,\\\\\\"30\\\\\\":18.0904605832,\\\\\\"31\\\\\\":19.4512519069