<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Boston-Housing-Prices-Regression-Modeling-with-Keras" data-toc-modified-id="Boston-Housing-Prices-Regression-Modeling-with-Keras-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Boston Housing Prices Regression Modeling with Keras</a></span></li><li><span><a href="#Purpose" data-toc-modified-id="Purpose-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Purpose</a></span></li><li><span><a href="#Load-libraries-and-data" data-toc-modified-id="Load-libraries-and-data-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Load libraries and data</a></span></li><li><span><a href="#Helper-functions" data-toc-modified-id="Helper-functions-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Helper functions</a></span></li><li><span><a href="#Inspect-and-visualize-the-data" data-toc-modified-id="Inspect-and-visualize-the-data-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Inspect and visualize the data</a></span></li><li><span><a href="#Model-the-data" data-toc-modified-id="Model-the-data-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Model the data</a></span><ul class="toc-item"><li><span><a href="#Create-validation-data-set" data-toc-modified-id="Create-validation-data-set-6.1"><span class="toc-item-num">6.1&nbsp;&nbsp;</span>Create validation data set</a></span></li><li><span><a href="#Build-models" data-toc-modified-id="Build-models-6.2"><span class="toc-item-num">6.2&nbsp;&nbsp;</span>Build models</a></span><ul class="toc-item"><li><span><a href="#Build-model-function" data-toc-modified-id="Build-model-function-6.2.1"><span class="toc-item-num">6.2.1&nbsp;&nbsp;</span>Build model function</a></span></li><li><span><a href="#Initial-pass" data-toc-modified-id="Initial-pass-6.2.2"><span class="toc-item-num">6.2.2&nbsp;&nbsp;</span>Initial pass</a></span></li><li><span><a href="#Grid-search-hyperparameter-tuning" data-toc-modified-id="Grid-search-hyperparameter-tuning-6.2.3"><span class="toc-item-num">6.2.3&nbsp;&nbsp;</span>Grid search hyperparameter tuning</a></span><ul class="toc-item"><li><span><a href="#Optimizers" data-toc-modified-id="Optimizers-6.2.3.1"><span class="toc-item-num">6.2.3.1&nbsp;&nbsp;</span>Optimizers</a></span></li><li><span><a href="#Epochs-and-Batch-Size" data-toc-modified-id="Epochs-and-Batch-Size-6.2.3.2"><span class="toc-item-num">6.2.3.2&nbsp;&nbsp;</span>Epochs and Batch Size</a></span></li><li><span><a href="#Learning-rate" data-toc-modified-id="Learning-rate-6.2.3.3"><span class="toc-item-num">6.2.3.3&nbsp;&nbsp;</span>Learning rate</a></span></li><li><span><a href="#Decay" data-toc-modified-id="Decay-6.2.3.4"><span class="toc-item-num">6.2.3.4&nbsp;&nbsp;</span>Decay</a></span></li><li><span><a href="#Epsilon" data-toc-modified-id="Epsilon-6.2.3.5"><span class="toc-item-num">6.2.3.5&nbsp;&nbsp;</span>Epsilon</a></span></li><li><span><a href="#Comments" data-toc-modified-id="Comments-6.2.3.6"><span class="toc-item-num">6.2.3.6&nbsp;&nbsp;</span>Comments</a></span></li></ul></li><li><span><a href="#Predictions" data-toc-modified-id="Predictions-6.2.4"><span class="toc-item-num">6.2.4&nbsp;&nbsp;</span>Predictions</a></span></li></ul></li></ul></li><li><span><a href="#Final-comments" data-toc-modified-id="Final-comments-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Final comments</a></span></li></ul></div>

<h1>Boston Housing Prices Regression Modeling with Keras</h1>

<img style="float: left; margin-right: 15px; width: 40%; height: 40%; " src="images/boston.jpg" />

# Purpose

The purpose of this write-up is to build upon the [first](https://nbviewer.jupyter.org/github/nrasch/Portfolio/blob/master/Machine-Learning/Python/04-Classic-Datasets/Model-02.ipynb) write-up involving the Boston housing prices dataset.  

Goals include:
* Build a predictive regression model via neural networks
* Utilize GridSearchCV for hyperparameter tuning

Dataset source:  [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php)

# Purpose

The purpose of this write-up is to build upon the [first](https://nbviewer.jupyter.org/github/nrasch/Portfolio/blob/master/Machine-Learning/Python/04-Classic-Datasets/Model-02.ipynb) write-up involving the Boston housing prices dataset.  

Goals include:
* Build a predictive regression model via neural networks
* Utilize GridSearchCV for hyperparameter tuning

# Load libraries and data

In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

import warnings
warnings.filterwarnings('ignore')

In [2]:
# Load libraries
import os

import numpy as np
from numpy import arange

from math import sqrt

from matplotlib import pyplot

from pandas import read_csv
from pandas import set_option
from pandas.plotting import scatter_matrix
from pandas import DataFrame

from sklearn.preprocessing import StandardScaler

from sklearn.decomposition import PCA

from keras.wrappers.scikit_learn import KerasRegressor
from keras.models import Sequential
from keras.layers import Dense

from keras.optimizers import Adam
from keras.optimizers import SGD

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split

from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.feature_selection import f_regression
from sklearn.feature_selection import RFE

from sklearn.pipeline import Pipeline
from sklearn.pipeline import FeatureUnion

from sklearn.metrics import mean_squared_error

Using TensorFlow backend.


In [3]:
dataFile = os.path.join(".", "datasets", "housing.csv")
data = read_csv(dataFile, header = 0, delim_whitespace = True)

# Helper functions

In [4]:
def corrTableColors(value):
    color = 'black'

    if value == 1:
        color = 'white'
    elif value < -0.7:
        color = 'red'
    elif value > 0.7:
        color = 'green'

    return 'color: %s' % color

In [5]:
def makeRange(start, stop, step, multi, dec):
    vals = []
    for i in range(start, stop, step):
        vals.append(np.round(multi * i, decimals = dec))
        
    return vals

# Inspect and visualize the data

Please the [first Boston housing data's write-up](https://nbviewer.jupyter.org/github/nrasch/Portfolio/blob/master/Machine-Learning/Python/04-Classic-Datasets/Model-02.ipynb#Inspect-and-visualize-the-data) details on this topic.

# Model the data

## Create validation data set

In [16]:
# Seperate X and Y values
x = data.values[:, 0:len(data.columns) - 1]
y = data.values[:, len(data.columns) - 1]

print("x.shape = ", x.shape)
print("y.shape = ", y.shape)

# Split out validation set -- 80/20 split
seed = 10
valSize = 0.2

xTrain, xVal, yTrain, yVal = train_test_split(x, y, test_size = valSize, random_state = seed)

print("--------")
print("xTrain.shape = ", xTrain.shape)
print("yTrain.shape = ", yTrain.shape)
print("xVal.shape = ", xVal.shape)
print("yVal.shape = ", yVal.shape)

x.shape =  (506, 13)
y.shape =  (506,)
--------
xTrain.shape =  (404, 13)
yTrain.shape =  (404,)
xVal.shape =  (102, 13)
yVal.shape =  (102,)


## Build models

### Build model function

More info on the `kernal_initializer`:  https://keras.io/initializers/

In [29]:
def buildModel(optimizer = 'Adam', lr = 0.001, decay = 0.0, epsilon = None):
    opt = None
    
    model = Sequential()
    
    # kernel_initializer='normal' -> Initializer capable of adapting its scale to the shape of weights
    # bias_initializer -> 'zeros' (default per the docs)
    
    model.add(Dense(20, input_dim = xTrain.shape[1], kernel_initializer='normal', activation = 'relu'))
    model.add(Dense(10, kernel_initializer='normal', activation = 'relu'))
    model.add(Dense(1, kernel_initializer='normal'))
    
    if optimizer.lower() == 'adam':
        opt = Adam(lr = lr, decay = decay, epsilon = epsilon)
    else:
        # Please don't ever use eval where you're recieving input from non-trusted sources!
        # A Jupyter notebook is OK; a public facing service is certainly not
        opt = eval(optimizer)()
    
    model.compile(loss = 'mean_squared_error', optimizer = opt)
    
    return model   

### Initial pass

For this first pass an educated guess is taken for what might work well on the dataset.  This provides an initial baseline, and then hyperparameter tuning an occur to refine the model.

In [31]:
# Define vars and init
folds = 10
seed = 10

np.random.seed(seed)

model = KerasRegressor(build_fn = buildModel, epochs = 200, batch_size = 5, verbose = 0)
kFold = KFold(n_splits = folds, random_state = seed)
results = cross_val_score(model, xTrain, yTrain, cv = kFold)

print("MSE: %.2f (%.2f)" % (results.mean(), results.std()))

MSE: -18.20 (7.33)


This is better then what the previous write-up's models accomplished with no tuning as of yet:

<pre>
         Model    MSE  StdDev
3    scaledKNN -20.35   11.87
0     scaledLR -21.26    7.11
4   scaledCART -22.66    9.31
1  scaledLASSO -26.94   10.38
5    scaledSVR -28.52   13.98
2     scaledEN -28.60   11.65
</pre>

It does not; however, compare to the results achieved via the ensemble methods:

<pre>
       Model     MSE  StdDev
1  scaledGBM -9.700   5.342 
3  scaledET  -10.339  5.399 
2  scaledRF  -13.695  7.276 
0  scaledAB  -14.176  8.917
</pre>

### Grid search hyperparameter tuning

Below we'll be using grid search to perform hyperparameter tuning.  We'll consider looking at hyperparameters such as:

* Epochs
* Batch size
* Optimization algorithm
* Learning rate
* Decay
* Epsilon

For a production quality model we'd want to test each of the hyperparameters above in combination with the others to find an optimal combination. We'd likely perform the initial search using a randomized parameter optimization procedure.   The results from that initial tuning effort would lead the way to a set of smaller, and smaller grids that would narrow in on whatever parameter permutations showed the most promise. 

You can see an example of this type of process I worked on previously [here](https://nbviewer.jupyter.org/github/nrasch/Portfolio/blob/master/Machine-Learning/Python/03-ComputerVision-Classification/Classification-03.ipynb).

The tuning process could possibly take hours, days, or even weeks depending on the data and model. For the purpose of this write-up; however, we'll consider the hyperparameter options above singly or perhaps in pairs over a small range. This will allow us to observe how each hyperparameter influences the fit of the model, and yet still be able to run this in a reasonable amount of time (i.e. minutes, not hours). 

We're going to start by importing a function written in a previous write-up:

In [30]:
def tuneModel(modelName, modelObj, params, returnModel = False, showSummary = True):
    # Init vars and params
    featureResults = {}
    featureFolds = 10
    featureSeed = 10
    
    np.random.seed(featureSeed)
    
    # Use MSE since this is a regression problem
    score = 'neg_mean_squared_error'

    # Create a Pandas DF to hold all our spiffy results
    featureDF = DataFrame(columns = ['Model', 'Accuracy', 'Best Params'])

    # Create feature union
    features = []
    features.append(('Scaler', StandardScaler()))
    featureUnion = FeatureUnion(features)

    # Search for the best combination of parameters
    featureResults = GridSearchCV(
        Pipeline(
            steps = [
                ('FeatureUnion', featureUnion),
                (modelName, modelObj)
        ]),
        param_grid = params,
        scoring = score,
        cv = KFold(n_splits = featureFolds, random_state = featureSeed)      
    ).fit(xTrain, yTrain)

    featureDF.loc[len(featureDF)] = list([
        modelName, 
        featureResults.best_score_,
        featureResults.best_params_,
    ])

    if showSummary:
        set_option('display.max_colwidth', -1)
        display(featureDF)
    
    if returnModel:
        return featureResults

Sanity check to ensure everything works as expected:

In [33]:
modelName = "housingModel"
modelObj =  KerasRegressor(build_fn = buildModel, verbose = 0)
params = {
    'housingModel__epochs' : [ 200 ],
    'housingModel__batch_size' : [ 5 ],
}

m = tuneModel(modelName, modelObj, params, True, True)

Unnamed: 0,Model,Accuracy,Best Params
0,housingModel,-12.91,"{'housingModel__batch_size': 5, 'housingModel__epochs': 200}"


This appears to be a reasonable outcome considering that the mean of the various features differed quiet a bit.  The `StandardScaler` clearly improved the model's performance.  We can now engage in further tuning. 

#### Optimizers

The following link provides more information on available optimizers and options for each:  https://keras.io/optimizers/

We'll try Adam and SGD for a set number of epochs, and depending on which provides the best fit we'll likely be able to examine some additional optimizer specific options.

In [34]:
# Modify buildModel() signature:
# def buildModel(optimizer = 'adam'):

modelName = "housingModel"
modelObj =  KerasRegressor(build_fn = buildModel, verbose = 0)
params = {
    'housingModel__epochs' : [200],
    'housingModel__optimizer' : ['SGD', 'Adam']
}

m = tuneModel(modelName, modelObj, params, True, True)

Unnamed: 0,Model,Accuracy,Best Params
0,housingModel,-15.33,"{'housingModel__epochs': 200, 'housingModel__optimizer': 'Adam'}"


The accuracy appears to be worse than the sanity check, but this is because we removed the `batch_size` parameter from the model.  This makes sense of course, because by definition SGD has a batch size of one.  Since Adam outperformed the SGD we can look for an optimal `batch_size` value next.

#### Epochs and Batch Size

We'll combine epochs and batch size, since the number of training iterations may have some synergy with the number of training dataset items the model is exposed to at a time.

In [35]:
modelName = "housingModel"
modelObj =  KerasRegressor(build_fn = buildModel, verbose = 0)
params = {
    'housingModel__epochs' : makeRange(100, 500, 100, 1, 1),
    'housingModel__batch_size' : [4, 16, 32],
    'housingModel__optimizer' : ['Adam']
}

tuneModel(modelName, modelObj, params)

Unnamed: 0,Model,Accuracy,Best Params
0,housingModel,-13.0,"{'housingModel__batch_size': 32, 'housingModel__epochs': 300, 'housingModel__optimizer': 'Adam'}"


#### Learning rate

We can examine the estimator defaults to give us an idea of what sort of range to test the learning rate over.  According to [this](https://keras.io/optimizers/) link Adam has the following defaults:

```python
keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
```

As such we'll start with the default learning rate of 0.001 and move towards 0.005.  We'll also need to change the `buildModel()` function's signature to accept the additional `lr` parameter.

In [36]:
# Modify buildModel() signature:
# def buildModel(optimizer = 'adam', lr = 0.001):

modelName = "housingModel"
modelObj =  KerasRegressor(build_fn = buildModel, verbose = 0)
params = {
    'housingModel__epochs' : [300],
    'housingModel__batch_size' : [32],
    'housingModel__optimizer' : ['Adam'],
    'housingModel__lr' : makeRange(1, 6, 1, .001, 3),
}

tuneModel(modelName, modelObj, params)

Unnamed: 0,Model,Accuracy,Best Params
0,housingModel,-12.8,"{'housingModel__batch_size': 32, 'housingModel__epochs': 300, 'housingModel__lr': 0.003, 'housingModel__optimizer': 'Adam'}"


So some progress, but nothing earthshattering which is perhaps to be expected since we're not tuning the parameters together as a composite.

#### Decay

In [37]:
# Modify buildModel() signature:
# def buildModel(optimizer = 'adam', lr = 0.001):

modelName = "housingModel"
modelObj =  KerasRegressor(build_fn = buildModel, verbose = 0)
params = {
    'housingModel__epochs' : [300],
    'housingModel__batch_size' : [32],
    'housingModel__optimizer' : ['Adam'],
    'housingModel__lr' : [0.003],
    'housingModel__decay' : [0.0001, 0.00001, 0.000001],
}

tuneModel(modelName, modelObj, params)

Unnamed: 0,Model,Accuracy,Best Params
0,housingModel,-13.17,"{'housingModel__batch_size': 32, 'housingModel__decay': 0.0001, 'housingModel__epochs': 300, 'housingModel__lr': 0.003, 'housingModel__optimizer': 'Adam'}"


It appears that letting Adam take care of the decay rate is probably the best option.

#### Epsilon

In [26]:
# Modify buildModel() signature:
# def buildModel(optimizer = 'Adam', lr = 0.001, decay = 0.0, epsilon = None):

modelName = "housingModel"
modelObj =  KerasRegressor(build_fn = buildModel, verbose = 0)
params = {
    'housingModel__epochs' : [300],
    'housingModel__batch_size' : [32],
    'housingModel__optimizer' : ['Adam'],
    'housingModel__lr' : [0.003],
    'housingModel__epsilon' : makeRange(2, 8, 1, .5, 1),
}

m = tuneModel(modelName, modelObj, params, True, True)

Unnamed: 0,Model,Accuracy,Best Params
0,housingModel,-12.84,"{'housingModel__batch_size': 32, 'housingModel__epochs': 300, 'housingModel__epsilon': 1.0, 'housingModel__lr': 0.003, 'housingModel__optimizer': 'Adam'}"


We do have some improvement with the `epsilon` value set to 1.0.  This also matches with the findings in various literature and postings on-line I reviewed when considering what range to experiment with for this write-up.

#### Comments

Again, if we were creating a production quality model we would have started with randomized parameter optimization process.  The results from that process would then lead to a set of smaller grids focusing more and more on whatever parameter option permutations showed the most promise.

You can see an example of this type of process I worked on previously [here](https://nbviewer.jupyter.org/github/nrasch/Portfolio/blob/master/Machine-Learning/Python/03-ComputerVision-Classification/Classification-03.ipynb).

Also, unless the randomized parameter optimization process were to lead to signifigant improvements from what we've seen so far we'd be better of utilizing the gradient boosting algorithm we utilized in [previous write-up](https://nbviewer.jupyter.org/github/nrasch/Portfolio/blob/master/Machine-Learning/Python/04-Classic-Datasets/Model-02.ipynb#Initial-pass---Ensemble-methods).

### Predictions

**NOTE**

Hopefully to same some one else some pain down the road:

I kept getting the following error when working on this prediction section, which frankly was driving me nuts:
    
```
TypeError: call() missing 1 required positional argument: 'inputs'
```

After researching the error message I came upon this comment which let me to the resolution:

_The thing here is that KerasRegressor expects a callable that builds a model, rather than the model itself. By wrapping your function in this way you can return the build function (without calling it)._  [Source](https://stackoverflow.com/questions/47944463/specify-input-argument-with-kerasregressor)

Solution:  I needed to **wrap** the `buildModel()` function!  :(

Once I 'wrapped' the `buildModel()` function the prediction code blocks finally started working, and that's why we have the `wrapper()` function implemented below...

**END NOTE**

And now that that's out of the way we'll take a look at some predictions using the test data set based on the tuning results from above.

In [120]:
# See NOTE above on why we have this new function
def wrapper(optimizer = 'Adam', lr = 0.001, decay = 0.0, epsilon = None):
    
    def buildModel():
        opt = None

        model = Sequential()

        # kernel_initializer='normal' -> Initializer capable of adapting its scale to the shape of weights
        # bias_initializer -> 'zeros' (default per the docs)

        model.add(Dense(20, input_dim = xTrain.shape[1], kernel_initializer='normal', activation = 'relu'))
        model.add(Dense(10, kernel_initializer='normal', activation = 'relu'))
        model.add(Dense(1, kernel_initializer='normal'))

        if optimizer.lower() == 'adam':
            opt = Adam(lr = lr, decay = decay, epsilon = epsilon)
        else:
            # Please don't ever use eval where you're recieving input from non-trusted sources!
            # A Jupyter notebook is OK; a public facing service is certainly not
            opt = eval(optimizer)()

        model.compile(loss = 'mean_squared_error', optimizer = opt)

        return model

    return buildModel

In [121]:
# Build the model, and pass the KerasRegressor a callable function to the 'build_fn' argument
# Use the parameters we found were most effective during the hyperparameter tuning
m =  KerasRegressor(
    build_fn = wrapper(optimizer = 'Adam', lr = 0.003, epsilon = 1), 
    epochs = 300, 
    batch_size = 32, 
    verbose = 0
)

# Now fit the model to the training data ensuring we perform the same sort of pipeline transformations
# that occured during the hyperparameter tuning (i.e. feature scaling)
xScaled = StandardScaler().fit(xTrain).transform(xTrain)
m.fit(xScaled, yTrain)

# Now we can finally make some predictions using our trained model on unseen data
xScaled = StandardScaler().fit(xTrain).transform(xVal)
preds = m.predict(xScaled)
mse = mean_squared_error(yVal, preds)
rmse = sqrt(mse)

print("MSE = ", mse)
print("RMSE = ", rmse)

MSE =  11.93922294223878
RMSE =  3.4553180667253747


# Final comments

We have some improvement over the [first write-up](https://nbviewer.jupyter.org/github/nrasch/Portfolio/blob/master/Machine-Learning/Python/04-Classic-Datasets/Model-02.ipynb), but nothing orders of magnitude earth shattering.  In fact we could likely explain away the difference as a statistical blip.  In the [next write-up](https://nbviewer.jupyter.org/github/nrasch/Portfolio/blob/master/Machine-Learning/Python/04-Classic-Datasets/Model-02.Keras.2.ipynb) we'll see what kind of results we can achieve using RandomizedSearchCV.  

The results so far for reference:

|Model     |Write-up              |Prediction MSE|
|----------|------------------------|--------------|
|GB        | Model-02.Keras         | 12.24        |
|Neural Net| Model-02.Keras.1.ipynb | 11.94        |