## Implementing a predictive model as a web application

Here I pip install Keras, then I import necessary packages for modelling. Since I'm using Google Colab, I use the google.colab import files to upload the diamonds csv and read it to a dataframe, diamonds.

In [None]:
pip install keras



In [None]:
## Imports
import numpy as np
import pandas as pd
import os
from keras.models import Sequential
from keras.layers import Dense
import joblib

In [None]:
from google.colab import files
uploaded = files.upload()
diamonds = pd.read_csv("diamonds.csv")

Saving diamonds.csv to diamonds.csv


Here I follow the data cleaning, and feature engineering described in the textbook. The main feature engineering here is using get_dummies() to one hot encode several categorical features and dimensionality reduction of 3 highly collinear features.

In [None]:
## Preparing the dataset
diamonds = diamonds.loc[(diamonds['x']>0) | (diamonds['y']>0)]
diamonds.loc[11182, 'x'] = diamonds['x'].median()
diamonds.loc[11182, 'z'] = diamonds['z'].median()
diamonds = diamonds.loc[~((diamonds['y'] > 30) | (diamonds['z'] \
> 30))]
diamonds = pd.concat([diamonds, pd.get_dummies(diamonds['cut'], \
prefix='cut', drop_first=True)], axis=1)
diamonds = pd.concat([diamonds,
pd.get_dummies(diamonds['color'], prefix='color', \
drop_first=True)], axis=1)
diamonds = pd.concat([diamonds, \
pd.get_dummies(diamonds['clarity'], prefix='clarity',
drop_first=True)], axis=1)

#Dimensionality reduction
from sklearn.decomposition import PCA
pca = PCA(n_components=1, random_state=123)
diamonds['dim_index'] = \
pca.fit_transform(diamonds[['x','y','z']])
diamonds.drop(['x','y','z'], axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(loc, value)


I create my predictors. Here x is all the features in the dataset, dropping the non-encoded versions of categorical features. Y is the target, price. Here the numerical features are scaled to be between -1 and 1.

In [None]:
## Creating x and y
X = diamonds.drop(['cut', 'color', 'clarity', 'price'], axis=1)
y = np.log(diamonds['price'])

## Standardization: centering and scaling
numerical_features = ['carat', 'depth', 'table', 'dim_index']
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X.loc[:, numerical_features] = \
scaler.fit_transform(X[numerical_features])



Below the neural network model is constructed. This is a straightforward model that doesn't incorporate much parameter tuning. Possible parameters that can be tuned for this model are described in chapter 6, including Early Stopping as a callback, using Dropout at each layer, as well as changing units, layers, and epochs.

In [None]:
## Building the model
n_input = X.shape[1]
n_hidden1 = 32
n_hidden2 = 16
n_hidden3 = 8

nn_reg = Sequential()
nn_reg.add(Dense(units=n_hidden1, activation='relu',
input_shape=(n_input,)))
nn_reg.add(Dense(units=n_hidden2, activation='relu'))
nn_reg.add(Dense(units=n_hidden3, activation='relu'))

# output layer
nn_reg.add(Dense(units=1, activation=None))

Next, this neural network is compiled using mean absolute error as loss. It is trained on the predictors, X, and target, y.

In [None]:
## Training the neural network
batch_size = 32
n_epochs = 40
nn_reg.compile(loss='mean_absolute_error', optimizer='adam')
nn_reg.fit(X, y, epochs=n_epochs, batch_size=batch_size)

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


<keras.callbacks.History at 0x7f11d538fd10>

Since I'm using Google Colab, I use this package, albumentations, which I think lets me save files to colab. I got this from stackoverflow when I was trying to get joblib to work.

In [None]:
!pip install -q -U albumentations
!echo "$(pip freeze | grep albumentations) is successfully installed"

[K     |████████████████████████████████| 102 kB 14.6 MB/s 
[K     |████████████████████████████████| 47.6 MB 1.5 MB/s 
[?25halbumentations==1.1.0 is successfully installed


In [None]:
import albumentations as A

Below Joblib is used to save files containing PCA and Scaler scripts which can be used later, as well as a model file which can be used later.

In [None]:
## Serializing :
# PCA
joblib.dump(pca, 'pca.joblib')

# Scaler
joblib.dump(scaler, 'scaler.joblib')

#Trained model
nn_reg.save("diamonds-prices-model.h5")


I found that regular 'install dash' didn't work properly in google colab, but I found that using 'install jupyter-dash' got dash to work.

In [None]:
pip install jupyter-dash -q #

[K     |████████████████████████████████| 7.3 MB 14.3 MB/s 
[K     |████████████████████████████████| 25.3 MB 1.4 MB/s 
[K     |████████████████████████████████| 357 kB 41.4 MB/s 
[?25h  Building wheel for dash-core-components (setup.py) ... [?25l[?25hdone
  Building wheel for dash-html-components (setup.py) ... [?25l[?25hdone
  Building wheel for dash-table (setup.py) ... [?25l[?25hdone


Below more packages are imported, including the dash core components and html components. These will be used to create the html website display on the dash application. Keras Load_model will be used later to load that model file saved with joblib.

In [None]:
import dash
from jupyter_dash import JupyterDash

import dash_core_components as dcc 
import dash_html_components as html
from dash.dependencies import Input, Output

from keras.models import load_model
import joblib

import numpy as np
import pandas as pd

The dash_core_components package is deprecated. Please replace
`import dash_core_components as dcc` with `from dash import dcc`
  after removing the cwd from sys.path.
The dash_html_components package is deprecated. Please replace
`import dash_html_components as html` with `from dash import html`
  """


There is a slight difference from the textbook here where instead of "app = Dash(__name__)", a different function "app = JupyterDash(__name__) is used. I think it works the same as dash, but is compatible with google colab.

In [None]:
app = JupyterDash(__name__)
app.css.append_css({
    'external_url': 'https://codepen.io/chriddyp/pen/bWLwgP.css'
})

Here I load the model file, the pca script, and the scaler script that were saved earlier.

In [None]:
model = load_model('diamonds-prices-model.h5')
pca = joblib.load('pca.joblib')
scaler = joblib.load('scaler.joblib')
## We have to do this due to some keras' issue
#model._make_predict_function()

Here I create some Content Division Elements as described in the textbook. Each feature gets a dropdown or a number entry box where the features of a diamond can be entered and a predicted price will be displayed using the model. The following code cell is for the number entry boxes.

In [None]:
## Div for carat
input_carat = dcc.Input(
    id='carat',
    type='number',
    value=0.7)

div_carat = html.Div(
    children=[html.H3('Carat:'), input_carat],
    className="four columns"
    )

## Div for depth
input_depth = dcc.Input(
    id='depth',
    placeholder='',
    type='number',
    value=60)

div_depth = html.Div(
    children=[html.H3('Depth:'), input_depth],
    className="four columns"
    )

## Dive for table
input_table = dcc.Input(
    id='table',
    placeholder='',
    type='number',
    value=60)

div_table = html.Div(
    children=[html.H3('Table:'), input_table],
    className="four columns"
    )

## Div for x
input_x = dcc.Input(
    id='x',
    placeholder='',
    type='number',
    value=5)

div_x = html.Div(
    children=[html.H3('x value:'), input_x],
    className="four columns"
)

## Div for y
input_y = dcc.Input(
    id='y',
    placeholder='',
    type='number',
    value=5)

div_y = html.Div(
    children=[html.H3('y value:'), input_y],
    className="four columns"
    )

## Div for Z
input_z = dcc.Input(
    id='z',
    placeholder='',
    type='number',
    value=3)

div_z = html.Div(
    children=[html.H3('z values: '), input_z],
    className="four columns"
    )

Next, the categorical feature dropdowns have Content Division Elements made for them.

In [None]:
## Div for cut
cut_values = ['Fair', 'Good', 'Ideal', 'Premium', 'Very Good']
cut_options = [{'label': x, 'value': x} for x in cut_values]
input_cut = dcc.Dropdown(
    id='cut',
    options = cut_options,
    value = 'Ideal'
    )

div_cut = html.Div(
    children=[html.H3('Cut:'), input_cut],
    className="four columns"
    )

#Div for color
color_values = ['D', 'E', 'F', 'G', 'H', 'I', 'J']
color_options = [{'label': x, 'value': x} for x in color_values]
input_color = dcc.Dropdown(
    id='color',
    options = color_options,
    value = 'G'
)

div_color = html.Div(
    children=[html.H3('Color:'), input_color],
    className="four columns"
)

#Div for clarity
clarity_values = ['I1', 'IF', 'SI1', 'SI2', 'VS1', 'VS2', 'VVS1', 'VVS2']
clarity_options = [{'label': x, 'value': x} for x in clarity_values]
input_clarity = dcc.Dropdown(
    id='clarity',
    options = clarity_options,
    value = 'SI1'
    )

div_clarity = html.Div(
    children=[html.H3('Clarity:'), input_clarity],
    className="four columns"
    )

In [None]:
## Div for numerical characteristics
div_numerical = html.Div(
    children = [div_carat, div_depth, div_table],
    className="row"
    )

## Div for dimensions
div_dimensions = html.Div(
    children = [div_x, div_y, div_z],
    className="row"
    ) 

## Div for categorical features
div_categorical = html.Div(
    children = [div_cut, div_color, div_clarity],
    className = "row"
    )

Below a function, get prediction, is created to take the values of features and output a price predicted using the neural network.

In [None]:
def get_prediction(carat, depth, table, x, y, z, cut, color, clarity):
    '''takes the inputs from the user and produces the price prediction'''
    
    cols = ['carat', 'depth', 'table',
            'cut_Good', 'cut_Ideal', 'cut_Premium', 'cut_Very Good',
            'color_E', 'color_F', 'color_G', 'color_H', 'color_I', 'color_J',
            'clarity_IF','clarity_SI1', 'clarity_SI2', 'clarity_VS1', 'clarity_VS2','clarity_VVS1', 'clarity_VVS2',
            'dim_index']

    cut_dict = {x: 'cut_' + x for x in cut_values[1:]}
    color_dict = {x: 'color_' + x for x in color_values[1:]}
    clarity_dict = {x: 'clarity_' + x for x in clarity_values[1:]}
    
    ## produce a dataframe with a single row of zeros
    df = pd.DataFrame(data = np.zeros((1,len(cols))), columns = cols)
    
    ## get the numeric characteristics
    df.loc[0,'carat'] = carat
    df.loc[0,'depth'] = depth
    df.loc[0,'table'] = table
    
    ## transform dimensions into a single dim_index using PCA
    dims_df = pd.DataFrame(data=[[x, y, z]], columns=['x','y','z'])
    df.loc[0,'dim_index'] = pca.transform(dims_df).flatten()[0]
    
    ## Use the one-hot encoding for the categorical features
    if cut!='Fair':
        df.loc[0, cut_dict[cut]] = 1
    
    if color!='D':
        df.loc[0, color_dict[color]] = 1
    
    if clarity != 'I1':
        df.loc[0, clarity_dict[clarity]] = 1
    
    ## Scale the numerical features using the trained scaler
    numerical_features = ['carat', 'depth', 'table', 'dim_index']
    df.loc[:,numerical_features] = scaler.transform(df.loc[:,numerical_features])
    
    ## Get the predictions using our trained neural network
    prediction = model.predict(df.values).flatten()[0]
    
    ## Transform the log-prices to prices
    prediction = np.exp(prediction)
   
    return int(prediction)

Here the layout of the Dash application is set up.

In [None]:
## App layout
app.layout = html.Div([
        html.H1('IDR Predict diamond prices'),
        
        html.H2('Enter the diamond characteristics to get the predicted price'),
        
        html.Div(
                children=[div_numerical, div_dimensions, div_categorical]
                ),
        html.H1(id='output',
                style={'margin-top': '50px', 'text-align': 'center'})
        ])

Here the function show predictions is made, where the get predictions function is assigned on a pred variable that gets printed as a string in the div element.

In [None]:
predictors = ['carat', 'depth', 'table', 'x', 'y', 'z', 'cut', 'color', 'clarity']
@app.callback(
        Output('output', 'children'),
        [Input(x, 'value') for x in predictors])
def show_prediction(carat, depth, table, x, y, z, cut, color, clarity): 
    pred = get_prediction(carat, depth, table, x, y, z, cut, color, clarity)
    return str("Predicted Price: {:,}".format(pred))


This code below starts my server and dash app. 

In [None]:
if __name__ == '__main__':
  app.run_server(debug=True)

Dash app running on:


If you added this file with `app.scripts.append_script` or `app.css.append_css`, use `external_scripts` or `external_stylesheets` instead.
See https://dash.plotly.com/external-resources
  ).format(s["external_url"])


<IPython.core.display.Javascript object>