# Regression Deep Learning Model for [Project Name] Using Keras Version 1
### David Lowe
### October 25, 2019
Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery. [https://machinelearningmastery.com/]

SUMMARY: [Sample Paragraph - The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Boston Housing Dataset is a regression situation where we are trying to predict the value of a continuous variable.]

INTRODUCTION: [Sample Paragraph - The purpose of the analysis is to predict the housing values in the suburbs of Boston by using the home sale transaction history.]

ANALYSIS: [Sample Paragraph - The baseline performance of the model achieved an average MSE score of 16.91. Using the same training parameters, the model processed the test dataset with an RMSE of 15.93, which was even better than results from the training data.]

CONCLUSION: [Sample Paragraph - For this dataset, the model built using Keras and TensorFlow achieved a satisfactory result and should be considered for future modeling activities.]

Dataset Used: [Boston Housing Price Data Set]

Dataset ML Model: Regression with numerical attributes

Dataset Reference: [https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.data]

One potential source of performance benchmarks: [https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/]

Any deep-learning modeling project genrally can be broken down into about seven major tasks:
0. Prepare Environment
1. Load Data
2. Define Model
3. Fit and Evaluate Model
4. Optimize Model
5. Finalize Model

# Section 0. Prepare Environment

In [1]:
# Create the random seed numbers for reproducible results
seedNum = 88

In [2]:
# Load libraries and packages
import random
random.seed(seedNum)
import numpy as np
np.random.seed(seedNum)
import pandas as pd
import os
import smtplib
from datetime import datetime
from email.message import EmailMessage
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

In [3]:
# Configure a new global `tensorflow` session
import keras as K
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
import tensorflow as tf
tf.set_random_seed(seedNum)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.backend.set_session(sess)

Using TensorFlow backend.


In [4]:
# Begin the timer for the script processing
startTimeScript = datetime.now()

# Set up the flag to stop sending progress emails (setting to True will send status emails!)
notifyStatus = False

# Set the flag for splitting the dataset
splitDataset = True
splitPercentage = 0.25

# Set various default Keras modeling parameters
default_kernel_init = K.initializers.glorot_uniform(seed=seedNum)
default_loss = 'mean_squared_error'
default_optimizer = 'adam'
default_epochs = 100
default_batches = 5

# Set the number of folds for cross validation
folds = 10

In [5]:
# Set up the email notification function
def email_notify(msg_text):
    sender = os.environ.get('MAIL_SENDER')
    receiver = os.environ.get('MAIL_RECEIVER')
    gateway = os.environ.get('SMTP_GATEWAY')
    smtpuser = os.environ.get('SMTP_USERNAME')
    password = os.environ.get('SMTP_PASSWORD')
    if sender==None or receiver==None or gateway==None or smtpuser==None or password==None:
        sys.exit("Incomplete email setup info. Script Processing Aborted!!!")
    msg = EmailMessage()
    msg.set_content(msg_text)
    msg['Subject'] = 'Notification from Keras Binary Classification Script'
    msg['From'] = sender
    msg['To'] = receiver
    server = smtplib.SMTP(gateway, 587)
    server.starttls()
    server.login(smtpuser, password)
    server.send_message(msg)
    server.quit()

In [6]:
if (notifyStatus): email_notify("Phase 0 Prepare Environment completed! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

# Section 1. Load Data

In [7]:
if (notifyStatus): email_notify("Phase 1 Load Data has begun! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

In [8]:
# Load the dataset
df_original = pd.read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = df_original.values
print(dataset)

[[6.3200e-03 1.8000e+01 2.3100e+00 ... 3.9690e+02 4.9800e+00 2.4000e+01]
 [2.7310e-02 0.0000e+00 7.0700e+00 ... 3.9690e+02 9.1400e+00 2.1600e+01]
 [2.7290e-02 0.0000e+00 7.0700e+00 ... 3.9283e+02 4.0300e+00 3.4700e+01]
 ...
 [6.0760e-02 0.0000e+00 1.1930e+01 ... 3.9690e+02 5.6400e+00 2.3900e+01]
 [1.0959e-01 0.0000e+00 1.1930e+01 ... 3.9345e+02 6.4800e+00 2.2000e+01]
 [4.7410e-02 0.0000e+00 1.1930e+01 ... 3.9690e+02 7.8800e+00 1.1900e+01]]


In [9]:
# Split the original dataset into input (X) and output (y) variables
X_original = dataset[:,0:13]
y_original = dataset[:,13]
print('Shape of X_original:', X_original.shape, '| Shape of y_original:', y_original.shape)

Shape of X_original: (506, 13) | Shape of y_original: (506,)


In [10]:
# Split the data further into training and test datasets
if (splitDataset):
    X_train, X_test, y_train, y_test = train_test_split(X_original, y_original, test_size=splitPercentage, random_state=seedNum)
else:
    X_train, y_train = X_original, y_original
    X_test, y_test = X_original, y_original
print('Shape of X_train:', X_train.shape, '| Shape of y_train:', y_train.shape)
print('Shape of X_test:', X_test.shape, '| Shape of y_test:', y_test.shape)

Shape of X_train: (379, 13) | Shape of y_train: (379,)
Shape of X_test: (127, 13) | Shape of y_test: (127,)


In [11]:
if (notifyStatus): email_notify("Phase 1 Load Data completed! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

# Section 2. Define Model

In [12]:
if (notifyStatus): email_notify("Phase 2 Define Model has begun! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

In [13]:
# Define the Keras model required for KerasClassifier
def create_baseline_model():
    model = K.models.Sequential()
    model.add(Dense(13, input_dim=13, kernel_initializer=default_kernel_init, activation='relu'))
    model.add(Dense(1, kernel_initializer=default_kernel_init))
    model.compile(loss=default_loss, optimizer=default_optimizer)
    return model

In [14]:
# Initialize the Keras model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=create_baseline_model, epochs=default_epochs, batch_size=default_batches, verbose=0)))
cv_model = Pipeline(estimators)

In [15]:
if (notifyStatus): email_notify("Phase 2 Define Model completed! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

# Section 3. Fit and Evaluate Model

In [16]:
if (notifyStatus): email_notify("Phase 3 Fit and Evaluate Model has begun! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

In [17]:
# Fit and evaluate the Keras model using 10-fold cross validation
kfold = KFold(n_splits=folds, shuffle=True, random_state=seedNum)
results = cross_val_score(cv_model, X_train, y_train, cv=kfold)
print('All cross-validated results:', results)
print('Baseline: %.2f (%.2f) MSE' % (results.mean(), results.std()))







All cross-validated results: [-10.62944558 -13.62060684 -10.09922258  -9.91811327 -30.31830477
 -17.00155454  -8.80626255 -23.08936269 -32.2089179  -13.40236817]
Baseline: -16.91 (8.22) MSE


In [18]:
if (notifyStatus): email_notify("Phase 3 Fit and Evaluate Model completed! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

# Section 4. Optimize Model

In [19]:
if (notifyStatus): email_notify("Phase 4 Optimize Model has begun! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

In [20]:
# Define the Keras model required for KerasClassifier
def create_grid_model(optimizer, kernel_init):
    model = K.models.Sequential()
    model.add(Dense(13, input_dim=13, kernel_initializer=kernel_init, activation='relu'))
    model.add(Dense(1, kernel_initializer=kernel_init))
    model.compile(loss=default_loss, optimizer=optimizer)
    return model

In [21]:
# Create model for grid search
grid_model = KerasRegressor(build_fn=create_grid_model, verbose=0)

# Perform grid search using different epochs, batch sizes, and optimizers
optimizer_grid = ['rmsprop', 'adam']
init_grid = ['Constant', 'RandomNormal', 'RandomUniform']
epoch_grid = [100, 150, 200]
batch_grid = [5, 10, 15]
param_grid = dict(optimizer=optimizer_grid, kernel_init=init_grid, epochs=epoch_grid, batch_size=batch_grid)
grid = GridSearchCV(estimator=grid_model, param_grid=param_grid, cv=5, n_jobs=-1)
grid_result = grid.fit(X_train, y_train)

# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
	print("%f (%f) with: %r" % (mean, stdev, param))



Best: -22.899395 using {'batch_size': 5, 'epochs': 200, 'kernel_init': 'RandomNormal', 'optimizer': 'adam'}
-358.194775 (43.843678) with: {'batch_size': 5, 'epochs': 100, 'kernel_init': 'Constant', 'optimizer': 'rmsprop'}
-364.779522 (44.313025) with: {'batch_size': 5, 'epochs': 100, 'kernel_init': 'Constant', 'optimizer': 'adam'}
-31.839012 (2.140355) with: {'batch_size': 5, 'epochs': 100, 'kernel_init': 'RandomNormal', 'optimizer': 'rmsprop'}
-27.291999 (5.668850) with: {'batch_size': 5, 'epochs': 100, 'kernel_init': 'RandomNormal', 'optimizer': 'adam'}
-29.331149 (3.769105) with: {'batch_size': 5, 'epochs': 100, 'kernel_init': 'RandomUniform', 'optimizer': 'rmsprop'}
-30.306892 (8.362598) with: {'batch_size': 5, 'epochs': 100, 'kernel_init': 'RandomUniform', 'optimizer': 'adam'}
-270.327445 (37.770698) with: {'batch_size': 5, 'epochs': 150, 'kernel_init': 'Constant', 'optimizer': 'rmsprop'}
-279.734062 (38.629240) with: {'batch_size': 5, 'epochs': 150, 'kernel_init': 'Constant', 'op

In [22]:
best_optimizer = 'adam'
best_kernel_init = 'RandomUniform'
best_epochs = 200
best_batches = 5

In [23]:
# Create the final model for evaluating the test dataset
final_model = create_grid_model(best_optimizer, best_kernel_init)
final_model.fit(X_train, y_train, epochs=best_epochs, batch_size=best_batches, verbose=1)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

<keras.callbacks.History at 0x7f01604890b8>

In [24]:
# Display a summary of the final model
print(final_model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_23 (Dense)             (None, 13)                182       
_________________________________________________________________
dense_24 (Dense)             (None, 1)                 14        
Total params: 196
Trainable params: 196
Non-trainable params: 0
_________________________________________________________________
None


In [25]:
# Evaluate the Keras model on previously unseen data
scores = final_model.evaluate(X_test, y_test)
print("Final MSE of the model: %.2f" % (scores))

Final MSE of the model: 15.93


In [26]:
if (notifyStatus): email_notify("Phase 4 Optimize Model completed! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

# Section 5. Finalize Model

In [27]:
if (notifyStatus): email_notify("Phase 5 Finalize Model has begun! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

In [28]:
# Make class predictions with the model
predictions = final_model.predict(X_test)

# Summarize the first 20 cases
for i in range(20):
	print('%s => %d (expected %d)' % (X_test[i].tolist(), predictions[i], y_test[i]))

[0.6147, 0.0, 6.2, 0.0, 0.507, 6.618, 80.8, 3.2721, 8.0, 307.0, 17.4, 396.9, 7.6] => 28 (expected 30)
[3.5350099999999998, 0.0, 19.58, 1.0, 0.871, 6.152, 82.6, 1.7455, 5.0, 403.0, 14.7, 88.01, 15.02] => 11 (expected 15)
[0.10612, 30.0, 4.93, 0.0, 0.428, 6.095, 65.1, 6.3361, 6.0, 300.0, 16.6, 394.62, 12.4] => 21 (expected 20)
[16.8118, 0.0, 18.1, 0.0, 0.7, 5.277, 98.1, 1.4261, 24.0, 666.0, 20.2, 396.9, 30.81] => 8 (expected 7)
[0.31827, 0.0, 9.9, 0.0, 0.544, 5.914, 83.2, 3.9986, 4.0, 304.0, 18.4, 390.7, 18.33] => 17 (expected 17)
[0.09744, 0.0, 5.96, 0.0, 0.499, 5.841, 61.4, 3.3779, 5.0, 279.0, 19.2, 377.56, 11.41] => 20 (expected 20)
[0.03466, 35.0, 6.06, 0.0, 0.4379, 6.031, 23.3, 6.6407, 1.0, 304.0, 16.9, 362.25, 7.83] => 21 (expected 19)
[0.28955, 0.0, 10.59, 0.0, 0.489, 5.412, 9.8, 3.5875, 4.0, 277.0, 18.6, 348.93, 29.55] => 12 (expected 23)
[0.76162, 20.0, 3.97, 0.0, 0.647, 5.56, 62.8, 1.9865, 5.0, 264.0, 13.0, 392.4, 10.45] => 22 (expected 22)
[0.02009, 95.0, 2.68, 0.0, 0.4161, 8.

In [29]:
if (notifyStatus): email_notify("Phase 5 Finalize Model completed! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

In [30]:
print ('Total time for the script:',(datetime.now() - startTimeScript))

Total time for the script: 0:36:13.989221
