## Neural Networks Regression - Boston Housing Price prediction

In [None]:
# Install libraries:
# pip install keras
# pip install tensorflow

### Importing Library

In [1]:
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

### Reading Dataset and spliting into features and target variables

In [2]:
dataframe = pandas.read_csv("./housing.data", delim_whitespace=True, header=None)
dataset = dataframe.values

# spliting the dataset into features(X) and the output target variable - Price (Y)
X = dataset[:,0:13]
Y = dataset[:,13]

In [3]:
X

array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
        4.9800e+00],
       [2.7310e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9690e+02,
        9.1400e+00],
       [2.7290e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9283e+02,
        4.0300e+00],
       ...,
       [6.0760e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
        5.6400e+00],
       [1.0959e-01, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9345e+02,
        6.4800e+00],
       [4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
        7.8800e+00]])

Below we define the function to create the baseline model to be evaluated. It is a simple model that has a single fully connected hidden layer with the same number of neurons as input attributes (13). The network uses good practices such as the rectifier activation function for the hidden layer. No activation function is used for the output layer because it is a regression problem and we are interested in predicting numerical values directly without transform.

The efficient ADAM optimization algorithm is used and a mean squared error loss function is optimized. This will be the same metric that we will use to evaluate the performance of the model. It is a desirable metric because by taking the square root gives us an error value we can directly understand in the context of the problem (thousands of dollars).

In [4]:
# define baseline model to evaluate Neural network
def baseline_model():
    # Creating a sequential object for adding sequences of layers
    model = Sequential()
    # Adding a single input layer with same number of neurons as the number of features (13)
    # Activation function used 'relu' function 
    model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
    
    # Being a Regression problem, there is no activation layer in the output layer
    model.add(Dense(1, kernel_initializer='normal'))
    
    # Compile the model 'adam' optimiser and having a loss function as mean squared error
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

In [6]:
# Creating variables 
epochs = 100
batch_size = 5

# This method is used to perform 10 fold cross validation on the dataset and provide the results
def validate_model(estimator):
    # Using 10 fold cross validation and taking the average error
    kfold = KFold(n_splits=10)
    results = cross_val_score(estimator, X, Y, cv=kfold)
    
    return results


In [7]:
# evaluate Baseline model 
estimator = KerasRegressor(build_fn=baseline_model, epochs=epochs, batch_size=batch_size, verbose=0)
results = validate_model(estimator)
print("Results: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Results: -32.77 (31.02) MSE


In [None]:
results

### Observation:

The result reports the mean squared error including the average and standard deviation (average variance) across all 10 folds of the cross validation evaluation.

Note: The mean squared error is negative because scikit-learn inverts so that the metric is maximized instead of minimized. You can ignore the sign of the result.

### Standardisation of dataset and re-evaluatin the baseline model

We can use scikit-learn’s Pipeline framework to perform the standardization during the model evaluation process, within each fold of the cross validation. This ensures that there is no data leakage from each testset cross validation fold into the training data.

In [9]:
estimators = []
# Standardizing the feautures
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=baseline_model, epochs=epochs, batch_size=batch_size, verbose=0)))
pipeline = Pipeline(estimators)

results = validate_model(pipeline)
print("Standardized: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Standardized: -22.19 (22.55) MSE


### Observation:

Running the example provides an improved performance over the baseline model without standardized data, dropping the error.

A further extension of this section would be to similarly apply a rescaling to the output variable such as normalizing it to the range of 0-1 and use a Sigmoid or similar activation function on the output layer to narrow output predictions to the same range.

## Tune The Neural Network Topology

### Evaluate a Deeper Network Topology

One way to improve the performance a neural network is to add more layers. This might allow the model to extract and recombine higher order features embedded in the data.

In this section we will evaluate the effect of adding one more hidden layer to the model. This is as easy as defining a new function that will create this deeper model, copied from our baseline model above. We can then insert a new line after the first hidden layer. In this case with about half the number of neurons.

In [10]:
# define the model
def deeper_model():
    # Creating a sequential object for adding sequences of layers
    model = Sequential()
    # Adding a single input layer with same number of neurons as the number of features (13)
    # Activation function used 'relu' function 
    model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
    
    # Adding a single hidden layer with 6 neurons and 'relu' as Activation function
    model.add(Dense(6, kernel_initializer='normal', activation='relu'))
    
    # Being a Regression problem, there is no activation layer in the output layer
    model.add(Dense(1, kernel_initializer='normal'))
    
    # Compile the model 'adam' optimiser and having a loss function as mean squared error
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

In [11]:
estimators = []
# Standardizing the feautures
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=deeper_model, epochs=epochs, batch_size=batch_size, verbose=0)))
pipeline = Pipeline(estimators)

results = validate_model(pipeline)
print("Deeper Model: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Deeper Model: -21.83 (23.03) MSE


### Observation:

Running this model does show a further improvement in performance from 28 down to 24 thousand squared dollars.

### Evaluate a Wider Network Topology

Another approach to increasing the representational capability of the model is to create a wider network.

In this section we evaluate the effect of keeping a shallow network architecture and nearly doubling the number of neurons in the one hidden layer.

Again, all we need to do is define a new function that creates our neural network model. Here, we have increased the number of neurons in the hidden layer compared to the baseline model from 13 to 20.

In [19]:
def wider_model():
    # Creating a sequential object for adding sequences of layers
    model = Sequential()
    # Adding a single input layer with increased number of neurons (say 20) and with the number of features (13)
    # Activation function used 'relu' function 
    model.add(Dense(20, input_dim=13, kernel_initializer='normal', activation='relu'))
    
    # Being a Regression problem, there is no activation layer in the output layer
    model.add(Dense(1, kernel_initializer='normal'))
    
    # Compile the model 'adam' optimiser and having a loss function as mean squared error
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

In [23]:
estimators = []
# Standardizing the feautures
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=wider_model, epochs=epochs, batch_size=batch_size, verbose=0)))
pipeline = Pipeline(estimators)
results = validate_model(pipeline)
print("Wider Model: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Wider Model: -21.98 (22.94) MSE


### Observation:

Building the model does see a further drop in error to about 21 thousand squared dollars. This is not a bad result for this problem.

It would have been hard to guess that a wider network would outperform a deeper network on this problem. The results demonstrate the importance of empirical testing when it comes to developing neural network models.