In [1]:
from pathlib import Path

import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.neural_network import MLPClassifier, MLPRegressor 
from sklearn.preprocessing import StandardScaler

from mord import LogisticIT

from dmba import classificationSummary, regressionSummary

%matplotlib inline
import matplotlib.pylab as plt

##### 1. Upload, explore, clean, and preprocess data for neural network modeling. (This part of the case
will not be graded as all questions below have already been done in case study #1).

a. Create a boston_df data frame by uploading the original data set into Python. Determine
and present in this report the data frame dimensions, i.e., number of rows and columns.

In [2]:
try:
    boston_df = pd.read_csv('BostonHousing.csv')
except:
    print("BostonHousing.csv is not in the present working directory.")

In [3]:
print(f"The dimensions of the Boston Housing dataset is {boston_df.shape}", f"where there are {boston_df.shape[0]} rows and {boston_df.shape[1]} columns.")

The dimensions of the Boston Housing dataset is (506, 14) where there are 506 rows and 14 columns.


b. Display in Python the column titles. If some of them contain two (or more) words, convert
them into one-word titles, and present the modified titles in your report.

In [4]:
boston_df.columns = boston_df.columns.str.replace(' ', '_')
boston_df.columns

Index(['CRIME', 'ZONE', 'INDUST', 'CHAR_RIV', 'NIT_OXIDE', 'ROOMS', 'AGE',
       'DISTANCE', 'RADIAL', 'TAX', 'ST_RATIO', 'LOW_STAT', 'MVALUE',
       'C_MVALUE'],
      dtype='object')

c. Display in Python column data types. If some of them are listed as “object’, convert them
into dummy variables, and provide in your report the modified list of column titles with
dummy variables.

In [5]:
boston_df.dtypes

CRIME        float64
ZONE         float64
INDUST       float64
CHAR_RIV      object
NIT_OXIDE    float64
ROOMS        float64
AGE          float64
DISTANCE     float64
RADIAL         int64
TAX            int64
ST_RATIO     float64
LOW_STAT     float64
MVALUE       float64
C_MVALUE      object
dtype: object

In [6]:
boston_df.CHAR_RIV = boston_df.CHAR_RIV.astype('category')
boston_df.C_MVALUE = boston_df.C_MVALUE.astype('category')

In [7]:
boston_df = pd.get_dummies(boston_df, prefix_sep='_', 
                            drop_first=True)
boston_df.columns

Index(['CRIME', 'ZONE', 'INDUST', 'NIT_OXIDE', 'ROOMS', 'AGE', 'DISTANCE',
       'RADIAL', 'TAX', 'ST_RATIO', 'LOW_STAT', 'MVALUE', 'CHAR_RIV_Y',
       'C_MVALUE_Yes'],
      dtype='object')

##### 2. Develop a neural network model for Boston Housing and use it for predictions.

a. Develop in Python the outcome and predictor variables, partition the data set (70% for
training and 30% for validation partitions), display in Python and present in your report
the first five records of the training partition. Then, using the StandardScaler() function,
develop the scaled predictors for training and validation partitions. Display in Python and
provide in your report the first five records of the scaled training partition. Present a brief
explanation of what the scaled values mean and how they are calculated.

In [8]:
boston_df.columns

Index(['CRIME', 'ZONE', 'INDUST', 'NIT_OXIDE', 'ROOMS', 'AGE', 'DISTANCE',
       'RADIAL', 'TAX', 'ST_RATIO', 'LOW_STAT', 'MVALUE', 'CHAR_RIV_Y',
       'C_MVALUE_Yes'],
      dtype='object')

In [9]:
predictors = ['CRIME', 'ZONE', 'INDUST', 'NIT_OXIDE', 'ROOMS', 'AGE', 'DISTANCE',
       'RADIAL', 'TAX', 'ST_RATIO', 'LOW_STAT', 'CHAR_RIV_Y',
       'C_MVALUE_Yes']
outcome = 'MVALUE'

X = boston_df[predictors]
y = boston_df[outcome]

In [10]:
train_X, valid_X, train_y, valid_y = train_test_split(X, y, test_size=0.3, random_state=1)

In [11]:
# present this 
train_X.head(5)

Unnamed: 0,CRIME,ZONE,INDUST,NIT_OXIDE,ROOMS,AGE,DISTANCE,RADIAL,TAX,ST_RATIO,LOW_STAT,CHAR_RIV_Y,C_MVALUE_Yes
13,0.62976,0.0,8.14,0.538,5.949,61.8,4.7075,4,307,21.0,8.26,0,0
61,0.17171,25.0,5.13,0.453,5.966,93.4,6.8185,8,284,19.7,14.44,0,0
377,9.82349,0.0,18.1,0.671,6.794,98.8,1.358,24,666,20.2,21.24,0,0
39,0.02763,75.0,2.95,0.428,6.595,21.8,5.4011,3,252,18.3,4.32,0,1
365,4.55587,0.0,18.1,0.718,3.561,87.9,1.6132,24,666,20.2,7.12,0,0


In [12]:
# Scale here
sc_X = StandardScaler()
train_X_sc = sc_X.fit_transform(train_X)
valid_X_sc = sc_X.transform(valid_X)

train_X_sc_df = np.round(pd.DataFrame(train_X_sc), decimals=3)                            
train_X_sc_df.columns=predictors

In [13]:
# present this
train_X_sc_df.head()

Unnamed: 0,CRIME,ZONE,INDUST,NIT_OXIDE,ROOMS,AGE,DISTANCE,RADIAL,TAX,ST_RATIO,LOW_STAT,CHAR_RIV_Y,C_MVALUE_Yes
0,-0.366,-0.484,-0.462,-0.147,-0.44,-0.251,0.412,-0.646,-0.6,1.189,-0.647,-0.304,-0.452
1,-0.42,0.58,-0.902,-0.868,-0.416,0.868,1.401,-0.191,-0.736,0.582,0.203,-0.304,-0.452
2,0.714,-0.484,0.992,0.982,0.782,1.06,-1.157,1.629,1.512,0.816,1.139,-0.304,-0.452
3,-0.436,2.708,-1.22,-1.08,0.494,-1.668,0.737,-0.76,-0.924,-0.07,-1.189,-0.304,2.214
4,0.095,-0.484,0.992,1.381,-3.893,0.673,-1.037,1.629,1.512,0.816,-0.804,-0.304,-0.452


"""
Explanation of what the scaled values mean and how they are calculated:

Standardization of a dataset is a common requirement for many machine learning estimators since they might behave 
badly if the individual features do not more or less look like standard normally distributed data. 
The calculation that is made when using the standard scaler is as follows:
z = (x - u) / s
where u is the mean of the training samples and s is the standard deviation of the training samples.

For the training partition we can use the fit_transform function to first perform the calculation stated above, then transform
the partition so that the values are adjusted to fit a standard normal distribution. Since the scaler is already fitted, the tranform
function just needed to be applied to the validation partition to undergo the same transformation.  
"""

b. Train a neural network model using MLPRegressor() with the scaled training data set and
the following parameters: hidden_layer_sizes=10, solver=’lbfgs’, max_iter=10000, and
random_state=1. Identify and display in Python the final intercepts and network weights
of this model. Provide these intercepts and weights in your report and briefly explain what
the values of intercepts in the first and second arrays mean. Also, briefly explain what the
values of weights in the first and second arrays mean.

In [14]:
boston_reg = MLPRegressor(hidden_layer_sizes=(10), 
                solver='lbfgs', max_iter=10000, random_state=1)
boston_reg.fit(train_X_sc, train_y)

# PRESENT IN REPORT
# Display network structure with the final values of 
# intercepts (Theta) and weights (W).
print('Final Intercepts for Boston Housing Neural Network Model')
print(boston_reg.intercepts_)

print()
print('Network Weights for Boston Housing Neural Network Model')
print(boston_reg.coefs_)

"""
Briefly explain what the values of intercepts in the first and second arrays mean: 
The final values of intercepts in the first array represent the coefficients for each of the hidden layers.  
The final values of intercepts in the second array represent the coefficients of the output node. 

see slide 17 of the neural net slides

Briefly explain what the values of weights in the first and second arrays mean:
The values of weights in the first array represents the weights that point from each of the input nodes (13 features = 13 lists of weights)
to the hidden nodes. 
The values in the second array represents the weights that point to the output node.
"""

Final Intercepts for Boston Housing Neural Network Model
[array([ 0.03419315, -5.17494472, -3.19741419,  0.09979904, -1.52105762,
       -1.35938186, -0.98147659,  0.20502791,  3.90980501,  0.05306107]), array([1.89670365])]

Network Weights for Boston Housing Neural Network Model
[array([[ 6.78417419e-01,  1.40009172e+00, -2.90503954e-01,
         7.00785971e-01, -2.33128946e+00, -1.88707724e-01,
        -1.10882033e+00, -3.48212572e-01, -4.05340615e-02,
         2.81750728e-01],
       [ 9.13742181e-01,  1.17283170e-01,  4.63067759e-01,
         1.60207856e+00, -2.70619848e+00, -4.00493501e-02,
        -4.56463641e-01, -9.63440430e-01,  3.37150294e-01,
         1.85719544e+00],
       [ 1.15795214e+00,  8.58982703e-01, -1.15996602e-02,
        -4.76552881e-01,  2.67668766e-01,  8.13217035e-01,
        -4.95596540e-01, -2.31666290e-01,  1.33624317e-01,
        -1.54542128e+00],
       [ 1.32957065e+00,  6.32224008e-01, -5.34757482e-01,
        -3.11797925e+00,  1.98344263e+00,  1.9851

'\nBriefly explain what the values of intercepts in the first and second arrays mean: \nThe final values of intercepts in the first array represent the coefficients for each of the hidden layers.  \nThe final values of intercepts in the second array represent the coefficients of the output node. \n\nsee slide 17 of the neural net slides\n\nBriefly explain what the values of weights in the first and second arrays mean:\nThe values of weights in the first array represents the weights that point from each of the input nodes (13 features = 13 lists of weights)\nto the hidden nodes. \nThe values in the second array represents the weights that point to the output node.\n'

c. Using the developed neural network model, make in Python predictions for the outcome
variable (MVALUE) using the scaled validation predictors. Based on these predictions,
develop and display in Python a table for the first five validation records that contain
actual and predicted median prices (MVALUE), and their residuals. Present this table in
your report.

In [15]:
# Make 'MVALUE' predictions for validation set using Boston housing 
# neural network model. 

# Use boston_reg model to predict 'MVALUE' outcome
# for validation set.
mvalue_pred = np.round(boston_reg.predict(valid_X_sc), decimals=2)

# Create data frame to display prediction results for
# validation set. 
mvalue_pred_result = pd.DataFrame({'Actual': valid_y, 
                'Prediction': mvalue_pred, 'Residual': valid_y-mvalue_pred})

print('Predictions for House Price for Validation Partition')
print(mvalue_pred_result.head(5))

Predictions for House Price for Validation Partition
     Actual  Prediction  Residual
307    28.2       29.63     -1.43
343    23.9       25.81     -1.91
47     16.6       20.65     -4.05
67     22.0       20.63      1.37
362    20.8       24.47     -3.67


d. Identify and display in Python the common accuracy measures for training and validation
partitions. Provide and compare these accuracy measures in your report based on RMSE
and MAPE, and assess the possibility of overfitting. Would you recommend applying this
neural network model for predictions? Briefly explain.

In [16]:
# Neural network model accuracy measures for training and
# validation partitions. 

# Identify and display neural network model accuracy measures 
# for training partition.
print('Accuracy Measures for Training Partition for Neural Network')
regressionSummary(train_y, boston_reg.predict(train_X_sc))

# Identify and display neural network accuracy measures 
# for validation partition.
print()
print('Accuracy Measures for Validation Partition for Neural Network')
regressionSummary(valid_y, boston_reg.predict(valid_X_sc))

"""
TODO
"""

Accuracy Measures for Training Partition for Neural Network

Regression statistics

                      Mean Error (ME) : -0.0033
       Root Mean Squared Error (RMSE) : 1.5851
            Mean Absolute Error (MAE) : 1.1342
          Mean Percentage Error (MPE) : -0.9031
Mean Absolute Percentage Error (MAPE) : 6.1132

Accuracy Measures for Validation Partition for Neural Network

Regression statistics

                      Mean Error (ME) : -0.5680
       Root Mean Squared Error (RMSE) : 3.9407
            Mean Absolute Error (MAE) : 2.7470
          Mean Percentage Error (MPE) : -5.4903
Mean Absolute Percentage Error (MAPE) : 14.5074


'\nTODO\n'

##### 3. Develop an improved neural network model with grid search.

a. Use GridSearchCV() function to identify the best number of nodes for the hidden layer in
the Boston Housing neural network model. For that, consider the hidden_layer_sizes
parameter in a range from 2 to 20. Provide in your report the best score and best
parameter value.

In [17]:
# Identify grid search parameters. 
param_grid = {
    'hidden_layer_sizes': list(range(2, 20))
}

# Utilize GridSearchCV() to identify the best number 
# of nodes in the hidden layer. 
gridSearch = GridSearchCV(MLPRegressor(solver='lbfgs', max_iter=10000, random_state=1), 
                          param_grid, cv=5, n_jobs=-1, return_train_score=True)
gridSearch.fit(train_X_sc, train_y)

# Display the best score and best parament value.
print(f'Best score:{gridSearch.best_score_:.4f}')
print('Best parameter: ', gridSearch.best_params_)

Best score:0.8877
Best parameter:  {'hidden_layer_sizes': 2}


b. Train an improved neural network model using MLPRegressor() with the scaled training
data set and the best identified value of the parameter from the previous question. The
rest of the parameters remain the same as in model developed in 2b. Present in your
report the final intercepts and network weights of the improved neural network model.

In [18]:
boston_reg_opt_layers = MLPRegressor(hidden_layer_sizes=(2), 
                solver='lbfgs', max_iter=10000, random_state=1)
boston_reg_opt_layers.fit(train_X_sc, train_y)

# PRESENT IN REPORT
# Display network structure with the final values of 
# intercepts (Theta) and weights (W).
print('Final Intercepts for Boston Housing Neural Network Model')
print(boston_reg_opt_layers.intercepts_)

print()
print('Network Weights for Boston Housing Neural Network Model')
print(boston_reg_opt_layers.coefs_)

Final Intercepts for Boston Housing Neural Network Model
[array([-5.8049252 ,  9.24593197]), array([6.40051439])]

Network Weights for Boston Housing Neural Network Model
[array([[-0.30389718, -1.13481572],
       [-0.82423068,  0.01522733],
       [ 3.41753368, -0.19618274],
       [-0.8772186 , -0.35023142],
       [-1.38317186,  2.43543214],
       [ 0.02280384, -0.99890554],
       [-0.33525388, -1.02588688],
       [ 3.41626164, -0.42845199],
       [ 1.70241347, -1.51513226],
       [-1.32750925, -0.71321435],
       [-1.27314847, -0.4747823 ],
       [ 0.23793571,  0.12117713],
       [ 3.18966496,  1.48465918]]), array([[2.28575643],
       [1.50263471]])]


c. Identify and display in Python the common accuracy measures for the training and
validation partitions with the improved neural network model. Provide and compare the
these accuracy measures, specifically RMSE and MAPE, in your report and assess the
possibility of overfitting. Would you recommend applying this neural network model for
predictions? Briefly explain.

In [19]:
# Neural network model accuracy measures for training and
# validation partitions. 

# Identify and display neural network model accuracy measures 
# for training partition.
print('Accuracy Measures for Training Partition for Neural Network')
regressionSummary(train_y, boston_reg_opt_layers.predict(train_X_sc))

# Identify and display neural network accuracy measures 
# for validation partition.
print()
print('Accuracy Measures for Validation Partition for Neural Network')
regressionSummary(valid_y, boston_reg_opt_layers.predict(valid_X_sc))

"""
TODO
"""

Accuracy Measures for Training Partition for Neural Network

Regression statistics

                      Mean Error (ME) : -0.0001
       Root Mean Squared Error (RMSE) : 2.6987
            Mean Absolute Error (MAE) : 2.0674
          Mean Percentage Error (MPE) : -1.8526
Mean Absolute Percentage Error (MAPE) : 10.6337

Accuracy Measures for Validation Partition for Neural Network

Regression statistics

                      Mean Error (ME) : 0.1367
       Root Mean Squared Error (RMSE) : 3.0185
            Mean Absolute Error (MAE) : 2.2846
          Mean Percentage Error (MPE) : -2.7484
Mean Absolute Percentage Error (MAPE) : 12.1011


'\nTODO\n'

d. Present and compare the accuracy measures for the validation partition from the
Backward Elimination model for the multiple linear regression in case study #1 and the
validation partition for the improved neural network model in this case. Which of the
models would you recommend for predictions? Briefly explain.

In [20]:
# refer to case study #1