# Regression Deep Learning Model for [PROJECT NAME] Using Keras Version 3
### David Lowe
### November 11, 2019

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery. [https://machinelearningmastery.com/]

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The [PROJECT NAME] dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: [Sample Paragraph - The purpose of the analysis is to predict the housing values in the suburbs of Boston by using the home sale transaction history.]

ANALYSIS: [Sample Paragraph - The baseline performance of the model achieved an average mean squared error of 70.08. After tuning the hyperparameters, the final model processed the test dataset with a mean squared error of 13.08, which was much better than the baseline result from the training dataset.]

CONCLUSION: For this dataset, the model built using Keras and TensorFlow achieved a satisfactory result and should be considered for future modeling activities.

Dataset Used: [PROJECT NAME] Dataset

Dataset ML Model: Regression with numerical attributes

Dataset Reference: [https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.data]

One potential source of performance benchmarks: [https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/]

Any deep-learning modeling project genrally can be broken down into about seven major tasks:
0. Prepare Environment
1. Load Data
2. Define Model
3. Fit and Evaluate Model
4. Optimize Model
5. Finalize Model

# Section 0. Prepare Environment

In [31]:
# Set the warning message filter
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=DeprecationWarning)

In [32]:
# Set the random seed number for reproducible results
seedNum = 888

In [33]:
# Load libraries and packages
import random
random.seed(seedNum)
import numpy as np
np.random.seed(seedNum)
import tensorflow as tf
tf.random.set_seed(seedNum)
import keras as K
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.wrappers.scikit_learn import KerasRegressor
from keras.utils import np_utils
import pandas as pd
import os
import sys
import shutil
import urllib.request
import zipfile
import smtplib
import matplotlib.pyplot as plt
from datetime import datetime
from email.message import EmailMessage
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn import preprocessing
from sklearn.pipeline import Pipeline

In [34]:
# Begin the timer for the script processing
startTimeScript = datetime.now()

# Set up the verbose flag to print detailed messages for debugging (setting to True will activate)
verbose = True
tf.debugging.set_log_device_placement(verbose)

# Set up the number of CPU cores available for multi-thread processing
n_jobs = -1
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

# Set up the flag to stop sending progress emails (setting to True will send status emails!)
notifyStatus = False

# Set the number of folds for cross validation
n_folds = 5

# Set the flag for splitting the dataset
splitDataset = True
splitPercentage = 0.25

# Set various default Keras modeling parameters
default_kernel_init = K.initializers.RandomNormal(seed=seedNum)
default_loss = 'mean_squared_error'
default_optimizer = 'adam'
default_epochs = 150
default_batches = 8

Num GPUs Available:  0


In [35]:
# Set up the email notification function
def email_notify(msg_text):
    sender = os.environ.get('MAIL_SENDER')
    receiver = os.environ.get('MAIL_RECEIVER')
    gateway = os.environ.get('SMTP_GATEWAY')
    smtpuser = os.environ.get('SMTP_USERNAME')
    password = os.environ.get('SMTP_PASSWORD')
    if sender==None or receiver==None or gateway==None or smtpuser==None or password==None:
        sys.exit("Incomplete email setup info. Script Processing Aborted!!!")
    msg = EmailMessage()
    msg.set_content(msg_text)
    msg['Subject'] = 'Notification from Keras Regression Script'
    msg['From'] = sender
    msg['To'] = receiver
    server = smtplib.SMTP(gateway, 587)
    server.starttls()
    server.login(smtpuser, password)
    server.send_message(msg)
    server.quit()

In [36]:
if (notifyStatus): email_notify("Phase 0 Prepare Environment completed! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

# Section 1. Load Data

In [37]:
if (notifyStatus): email_notify("Phase 1 Load Data has begun! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

In [38]:
# Load the dataset
df_original = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data", delim_whitespace=True, header=None)
dataset = df_original.values
print(dataset)

[[6.3200e-03 1.8000e+01 2.3100e+00 ... 3.9690e+02 4.9800e+00 2.4000e+01]
 [2.7310e-02 0.0000e+00 7.0700e+00 ... 3.9690e+02 9.1400e+00 2.1600e+01]
 [2.7290e-02 0.0000e+00 7.0700e+00 ... 3.9283e+02 4.0300e+00 3.4700e+01]
 ...
 [6.0760e-02 0.0000e+00 1.1930e+01 ... 3.9690e+02 5.6400e+00 2.3900e+01]
 [1.0959e-01 0.0000e+00 1.1930e+01 ... 3.9345e+02 6.4800e+00 2.2000e+01]
 [4.7410e-02 0.0000e+00 1.1930e+01 ... 3.9690e+02 7.8800e+00 1.1900e+01]]


In [39]:
# Split the original dataset into input (X) and output (y) variables
X_original = dataset[:,0:13].astype(float)
y_original = dataset[:,13]
print('Shape of X_original:', X_original.shape, '| Shape of y_original:', y_original.shape)

Shape of X_original: (506, 13) | Shape of y_original: (506,)


In [40]:
X_encoded = X_original
y_encoded = y_original
if (splitDataset):
    X_train, X_test, y_train, y_test = train_test_split(X_encoded, y_encoded, test_size=splitPercentage, random_state=seedNum)
else:
    X_train, y_train = X_encoded, y_encoded
    X_test, y_test = X_encoded, y_encoded
print("X_train.shape: {} X_train.type: {}".format(X_train.shape, type(X_train)))
print("y_train.shape: {} y_train.type: {}".format(y_train.shape, type(y_train)))
print("X_test.shape: {} X_test.type: {}".format(X_test.shape, type(X_test)))
print("y_test.shape: {} y_test.type: {}".format(y_test.shape, type(y_test)))

X_train.shape: (379, 13) X_train.type: <class 'numpy.ndarray'>
y_train.shape: (379,) y_train.type: <class 'numpy.ndarray'>
X_test.shape: (127, 13) X_test.type: <class 'numpy.ndarray'>
y_test.shape: (127,) y_test.type: <class 'numpy.ndarray'>


In [41]:
if (notifyStatus): email_notify("Phase 1 Load Data completed! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

# Section 2. Define Model

In [42]:
if (notifyStatus): email_notify("Phase 2 Define Model has begun! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

In [43]:
# Define the Keras model required for KerasClassifier
def create_default_model():
    default_model = K.models.Sequential()
    default_model.add(Dense(13, input_dim=13, kernel_initializer=default_kernel_init, activation='relu'))
    default_model.add(Dense(1, kernel_initializer=default_kernel_init))
    default_model.compile(loss=default_loss, optimizer=default_optimizer)
    return default_model

In [44]:
# Initialize the Keras model
estimators = []
estimators.append(('standardize', preprocessing.Normalizer()))
estimators.append(('mlp', KerasRegressor(build_fn=create_default_model, epochs=default_epochs, batch_size=default_batches, verbose=0)))
cv_model = Pipeline(estimators)

In [45]:
if (notifyStatus): email_notify("Phase 2 Define Model completed! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

# Section 3. Fit and Evaluate Model

In [46]:
if (notifyStatus): email_notify("Phase 3 Fit and Evaluate Model has begun! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

In [47]:
startTimeModule = datetime.now()

# Fit and evaluate the Keras model using 10-fold cross validation
kfold = KFold(n_splits=n_folds, shuffle=True, random_state=seedNum)
results = cross_val_score(cv_model, X_train, y_train, cv=kfold)
print('Generating results using the metrics of', default_loss)
print('All cross-Validate results:', results)
print('Baseline results [mean (std)]: %.2f (%.2f)' % (results.mean(), results.std()))

print('Total time for performing cross-validation of the default model:', (datetime.now() - startTimeModule))

Executing op RandomStandardNormal in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op Mul in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op Add in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op VarIsInitializedOp in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op LogicalNot in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op Assert in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op Fill in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op RandomStandardNormal in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op Reshape in device /job:loc

In [48]:
if (notifyStatus): email_notify("Phase 3 Fit and Evaluate Model completed! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

# Section 4. Optimize Model

In [49]:
if (notifyStatus): email_notify("Phase 4 Optimize Model has begun! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

In [50]:
# Define the Keras model required for KerasClassifier
def create_customized_model(optimizer, kernel_init):
    customized_model = K.models.Sequential()
    customized_model.add(Dense(13, input_dim=13, kernel_initializer=kernel_init, activation='relu'))
    customized_model.add(Dense(1, kernel_initializer=kernel_init))
    customized_model.compile(loss=default_loss, optimizer=optimizer)
    return customized_model

In [51]:
startTimeModule = datetime.now()

# Create model for grid search
grid_model = KerasRegressor(build_fn=create_customized_model, verbose=0)

# Perform grid search using different epochs, batch sizes, and optimizers
optz_1 = K.optimizers.RMSprop()
optz_2 = K.optimizers.Adam()
optimizer_grid = [optz_1, optz_2]
init_1 = K.initializers.RandomNormal(seed=seedNum)
init_2 = K.initializers.glorot_normal(seed=seedNum)
init_3 = K.initializers.Orthogonal(seed=seedNum)
init_grid = [init_1, init_2, init_3]
epoch_grid = [100, 150, 200]
batch_grid = [8, 16, 32]
param_grid = dict(optimizer=optimizer_grid, kernel_init=init_grid, epochs=epoch_grid, batch_size=batch_grid)
grid = GridSearchCV(estimator=grid_model, param_grid=param_grid, cv=n_folds, n_jobs=n_jobs, verbose=3)
# n_iter = (n_folds * len(optimizer_grid) * len(init_grid) * len(epoch_grid) * len(batch_grid)) * 0.30
# grid = RandomizedSearchCV(estimator=grid_model, param_grid=param_grid, n_iter=n_iter, cv=n_folds, n_jobs=n_jobs, verbose=3)
grid_result = grid.fit(X_train, y_train)

# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
	print("%f (%f) with: %r" % (mean, stdev, param))

print('Total time for performing grid-search of the best parameters:', (datetime.now() - startTimeModule))

Fitting 5 folds for each of 54 candidates, totalling 270 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:CPU:0


[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:   29.5s
[Parallel(n_jobs=-1)]: Done 104 tasks      | elapsed:  4.0min
[Parallel(n_jobs=-1)]: Done 270 out of 270 | elapsed:  6.8min finished


Executing op __inference_keras_scratch_graph_368763 in device /job:localhost/replica:0/task:0/device:CPU:0
Best: -25.185132 using {'batch_size': 8, 'epochs': 200, 'kernel_init': <keras.initializers.RandomNormal object at 0x7f5a6f532d10>, 'optimizer': <keras.optimizers.Adam object at 0x7f5a6f532950>}
-27.133658 (12.105422) with: {'batch_size': 8, 'epochs': 100, 'kernel_init': <keras.initializers.RandomNormal object at 0x7f5a6f532d10>, 'optimizer': <keras.optimizers.RMSprop object at 0x7f5a6f532fd0>}
-29.097372 (11.330128) with: {'batch_size': 8, 'epochs': 100, 'kernel_init': <keras.initializers.RandomNormal object at 0x7f5a6f532d10>, 'optimizer': <keras.optimizers.Adam object at 0x7f5a6f532950>}
-42.767426 (17.694832) with: {'batch_size': 8, 'epochs': 100, 'kernel_init': <keras.initializers.VarianceScaling object at 0x7f5a6f532790>, 'optimizer': <keras.optimizers.RMSprop object at 0x7f5a6f532fd0>}
-41.662408 (16.236490) with: {'batch_size': 8, 'epochs': 100, 'kernel_init': <keras.initia

In [52]:
best_optimizer = grid_result.best_params_["optimizer"]
best_kernel_init = grid_result.best_params_["kernel_init"]
best_epoch = grid_result.best_params_["epochs"]
best_batch = grid_result.best_params_["batch_size"]

In [53]:
# Create the final model for evaluating the test dataset
print('Forming the final model using: optimizer=%s, kernel=%s, epochs=%d, batch_size=%d'
      % (best_optimizer, best_kernel_init, best_epoch, best_batch))
final_model = create_customized_model(best_optimizer, best_kernel_init)
final_model.fit(X_train, y_train, epochs=best_epochs, batch_size=best_batches, verbose=1)

Forming the final model using: optimizer=<keras.optimizers.Adam object at 0x7f5a6f532950>, kernel=<keras.initializers.RandomNormal object at 0x7f5a6f532d10>, epochs=200, batch_size=8
Epoch 1/200
Executing op __inference_keras_scratch_graph_407816 in device /job:localhost/replica:0/task:0/device:CPU:0
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoc

<keras.callbacks.callbacks.History at 0x7f5a6f640310>

In [54]:
# Display a summary of the final model
print(final_model.summary())

Model: "sequential_14"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_27 (Dense)             (None, 13)                182       
_________________________________________________________________
dense_28 (Dense)             (None, 1)                 14        
Total params: 196
Trainable params: 196
Non-trainable params: 0
_________________________________________________________________
None


In [55]:
# Evaluate the Keras model on previously unseen data
scores = final_model.evaluate(X_test, y_test)
print("Final MSE of the model: %.2f" % (scores))

Executing op __inference_keras_scratch_graph_468649 in device /job:localhost/replica:0/task:0/device:CPU:0
Final MSE of the model: 12.78


In [56]:
if (notifyStatus): email_notify("Phase 4 Optimize Model completed! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

# Section 5. Finalize Model

In [57]:
if (notifyStatus): email_notify("Phase 5 Finalize Model has begun! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

In [58]:
# Make class predictions with the model
predictions = final_model.predict(X_test)

# Summarize the first 20 cases
for i in range(20):
	print('Data item #%d predicted to be %.2f (expected %.2f)' % (i, predictions[i], y_test[i]))

Executing op __inference_keras_scratch_graph_468682 in device /job:localhost/replica:0/task:0/device:CPU:0
Data item #0 predicted to be 20.63 (expected 22.40)
Data item #1 predicted to be 37.62 (expected 32.40)
Data item #2 predicted to be 20.12 (expected 21.70)
Data item #3 predicted to be 19.91 (expected 24.50)
Data item #4 predicted to be 20.99 (expected 16.80)
Data item #5 predicted to be 22.65 (expected 21.10)
Data item #6 predicted to be 31.88 (expected 29.40)
Data item #7 predicted to be 21.84 (expected 28.70)
Data item #8 predicted to be 20.23 (expected 21.50)
Data item #9 predicted to be 15.63 (expected 13.60)
Data item #10 predicted to be 20.48 (expected 21.40)
Data item #11 predicted to be 23.20 (expected 24.80)
Data item #12 predicted to be 18.90 (expected 16.80)
Data item #13 predicted to be 19.99 (expected 19.40)
Data item #14 predicted to be 19.66 (expected 21.70)
Data item #15 predicted to be 12.48 (expected 17.20)
Data item #16 predicted to be 14.24 (expected 17.10)
Da

In [59]:
if (notifyStatus): email_notify("Phase 5 Finalize Model completed! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

In [60]:
print ('Total time for the script:',(datetime.now() - startTimeScript))

Total time for the script: 0:08:51.590135
