# Deploying ML with Flask

In this walkthrough, we are going to deploy our first machine learning model using `Flask`. For now, we will work locally using our own environments. This activity will also help us to practice a creation of pipelines.

## Part 1: Model Creation

Import packages

In [238]:
# common
import pandas as pd
from sklearn.datasets import load_wine

# pipeline and hyperparameter tuning
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline, FeatureUnion

# preprocessing
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest

# model
from sklearn.ensemble import RandomForestClassifier

# model persistence
import pickle

Load dataset

In [239]:
data = load_wine()
x = pd.DataFrame(data.data)
x.columns = data.feature_names
y = data.target

In [240]:
x.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0


During model creation, we will work on the following:
- filter columns for `PCA`
- scaling
- `PCA`
- `SelectKBest`
- `RandomForestClassifier`
- pipeline

### Filter Columns

First, create a class to keep only features we want in our pipeline. We don't want to run PCA on all features. The class can be placed in the pipelines so long as they have the following methods:

- `.fit()`
- `.transform()`
- `.fit_transform()`

#### `RowFeats()`

In [241]:
# class must be pipeline ready
class RowFeats():
    def __init__(self, feats):
        self.feats = feats
    
    def fit(self, x, y=None):
        pass
    
    def transform(self, x, y=None):
        return x[self.feats]
    
    def fit_transform(self, x, y=None):
        self.fit(x)
        return self.transform(x)

In [242]:
# features we want to keep for PCA
feats = ['alcohol',
         'malic_acid',
         'ash',
         'alcalinity_of_ash',
         'magnesium',
         'total_phenols',
         'flavanoids',
         'nonflavanoid_phenols']

# creating class object with features to keep
row_feats = RowFeats(feats)

### Scaling and PCA

In [243]:
scaler = StandardScaler()
pca = PCA(n_components=2)

### SelectKBest

In [244]:
kbest = SelectKBest(k=4)

### Random Forest

In [245]:
rf = RandomForestClassifier()

### Build the Pipeline

We will apply two different feature extraction/reduction techniques...

- `PCA`
- `SelectKBest`

... and combine with them `FeatureUnion`. The small difference is that we will only use a sample of features for `PCA`.

In [246]:
pca_pipeline = Pipeline([
    ('rowFeats', row_feats),
    ('scaler', scaler),
    ('pca', pca)
])

kbest_pipeline = Pipeline([
    ('kBest', kbest)
])

In [247]:
all_features = FeatureUnion([
    ('pcaPipeline', pca_pipeline),
    ('kBestPipeline', kbest_pipeline)
])

Now create the **main** pipeline, which ends with the model.

In [248]:
main_pipeline = Pipeline([
    ('features', all_features),
    ('rf', rf)
])

### Hyperparamter Tuning with `GridSearchCV`

In [249]:
# define parameter grid
param_grid = {'features__pcaPipeline__pca__n_components' : [1, 2, 3],
              'features__kBestPipeline__kBest__k' : [1, 2, 3],
              'rf__n_estimators' : [2, 5, 10],
              'rf__max_depth' : [2, 4, 6]}

# create a Grid Search object
grid_search = GridSearchCV(
    main_pipeline,
    param_grid,
    n_jobs = -1,
    verbose = 10,
    refit = True
)

In [250]:
# fit the model and tune hyperparameters
grid_search.fit(x, y)

Fitting 5 folds for each of 81 candidates, totalling 405 fits


GridSearchCV(estimator=Pipeline(steps=[('features',
                                        FeatureUnion(transformer_list=[('pcaPipeline',
                                                                        Pipeline(steps=[('rowFeats',
                                                                                         <__main__.RowFeats object at 0x28bb906d0>),
                                                                                        ('scaler',
                                                                                         StandardScaler()),
                                                                                        ('pca',
                                                                                         PCA(n_components=2))])),
                                                                       ('kBestPipeline',
                                                                        Pipeline(steps=[('kBest',
                       

In [251]:
print(grid_search.best_params_)

{'features__kBestPipeline__kBest__k': 3, 'features__pcaPipeline__pca__n_components': 2, 'rf__max_depth': 6, 'rf__n_estimators': 5}


### Save the Model using `Pickle`

In [252]:
pickle.dump(grid_search, open('model.pk', 'wb'))

## Part 2: API Creation

Our goal is to build an API that will classify wine into the class when it receives the information about it. (Created as `api.py`)

> We don't have to retrain the model in the cloud. We will use the pickle file from the model, which was developed on our local machines.

## Part 3: POST Requests

In [253]:
# JSON values are from the first set of observations from the data
json_data = {
    'alcohol': 14.23,
    'malic_acid': 1.71,
    'ash': 2.43,
    'alcalinity_of_ash': 15.6,
    'magnesium': 127.0,
    'total_phenols': 2.8,
    'flavanoids': 3.06,
    'nonflavanoid_phenols': 0.28,
    'proanthocyanins': 2.29,
    'color_intensity': 5.64,
    'hue': 1.04,
    'od280/od315_of_diluted_wines': 3.92,
    'proline': 1065.0}

In [254]:
pd.DataFrame(json_data.values(), index=json_data.keys()).transpose()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0


In [255]:
import requests
url = 'http://ec2-34-224-81-164.compute-1.amazonaws.com:5555/scoring'

# send get request and save the result as a response object
r = requests.post(url=url,
                  json=json_data)

In [256]:
print(r.json())

[[0.9889824561403507, 0.011017543859649124, 0.0]]
