<a href="https://colab.research.google.com/github/Deep-Learning-Challenge/challenge-notebooks/blob/master/1.Multilayer%20Perceptrons/2.Guided%20Projects/3.Regression%20Of%20Boston%20House%20Prices.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" /></a>

# Regression Of Boston House Prices

In this project tutorial, you will discover how to develop and evaluate neural network models using Keras for a regression problem. After completing this step-by-step tutorial, you will know:

* How to load a CSV dataset and make it available to Keras.
* How to create a neural network model with Keras for a regression problem.
* How to use scikit-learn with Keras to evaluate models using cross-validation.
* How to perform data preparation to improve skills with Keras models.
* How to tune the network topology of models with Keras.

Let's get started.

## Boston House Price Dataset

The problem that we will look at in this tutorial is the Boston house price dataset. The dataset describes houses' properties in Boston suburbs and is concerned with modeling houses' prices in those suburbs in thousands of dollars. This is a regression predictive modeling problem. 13 input variables describe a given Boston suburb's properties. The full list of attributes in this dataset are as follows:

1. **CRIM**: per capita crime rate by town.
2. **ZN**: the proportion of residential land zoned for lots over 25,000 sq. ft.
3. **INDUS**: the proportion of non-retail business acres per town.
4. **CHAS**: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
5. **NOX**: nitric oxide concentration (parts per 10 million).
6. **RM**: average number of rooms per dwelling.
7. **AGE**: the proportion of owner-occupied units built before 1940.
8. **DIS**: weighted distances to five Boston employment centers.
9. **RAD**: index of accessibility to radial highways.
10. **TAX**: full-value property-tax rate per \$10,000.
11. **PTRATIO**: pupil-teacher ratio by town.
12. **B**: 1000(Bk - 0.63)<sup>2</sup> where Bk is the proportion of blacks by town.
13. **LSTAT**: \% lower status of the population.
14. **MEDV**: Median value of owner-occupied homes in \$1000s.

This is a well-studied problem in machine learning. It is convenient to work with because all
of the input and output attributes are numerical, and there are 506 instances to work with. A
sample of the first 5 rows of the 506 in the dataset is provided below:

```
0.00632 18.00 2.310 0 0.5380 6.5750 65.20 4.0900 1 296.0 15.30 396.90 4.98 24.00
0.02731 0.00 7.070 0 0.4690 6.4210 78.90 4.9671 2 242.0 17.80 396.90 9.14 21.60
0.02729 0.00 7.070 0 0.4690 7.1850 61.10 4.9671 2 242.0 17.80 392.83 4.03 34.70
0.03237 0.00 2.180 0 0.4580 6.9980 45.80 6.0622 3 222.0 18.70 394.63 2.94 33.40
0.06905 0.00 2.180 0 0.4580 7.1470 54.20 6.0622 3 222.0 18.70 396.90 5.33 36.20
```

Reasonable performance for models evaluated using Mean Squared Error (MSE) is around 20 in squared thousands of dollars (or $4,500 if you take the square root). This is a nice target to aim for with our neural network model. You can learn more about the [Boston house price dataset](http://lib.stat.cmu.edu/datasets/boston).

## Runtime Setup

In [1]:
import sys

dataset_name = "housing.csv"
if 'google.colab' in sys.modules:
    DATASET = f"https://github.com/Deep-Learning-Challenge/challenge-notebooks/raw/master/datasets/{dataset_name}"
else:
    DATASET = f"../../datasets/{dataset_name}"
    
DATASET

'../../datasets/housing.csv'

## Develop a Baseline Neural Network Model

In this section, we will create a baseline neural network model for the regression problem. Let's start by importing all of the functions and objects we will need for this tutorial.

In [2]:
import tensorflow as tf

#import logging
#tf.get_logger().setLevel(logging.ERROR)

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor
from tensorflow.keras import utils

from sklearn.model_selection import cross_val_score, KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

import numpy
from pandas import read_csv

2021-10-16 20:43:19.133686: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-10-16 20:43:19.133786: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


We can now load our dataset from a file in the local directory. The dataset is not in CSV format on the UCI Machine Learning Repository; the attributes are instead separated by whitespace. We can load this easily using the Pandas library. We can then split the input (X), and output (Y ) attributes to be easier to model with Keras and scikit-learn.

In [3]:
# load dataset
dataframe = read_csv(DATASET, delim_whitespace=True, header=None)
dataset = dataframe.values

# split into input and output variables
X = dataset[:,0:13]
Y = dataset[:,13]

dataframe.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222.0,18.7,396.9,5.33,36.2


We can create Keras models and evaluate them with scikit-learn by using handy wrapper objects provided by the Keras library. This is desirable because scikit-learn excels at evaluating models and will allow us to use powerful data preparation and model evaluation schemes with very few lines of code. The Keras wrapper class requires a function as an argument. This function that we must define is responsible for creating the neural network model to be evaluated.

Below we define the function to create the baseline model to be evaluated. It is a simple model with a single, fully connected hidden layer with the same number of neurons as input attributes (13). The network uses good practices such as the rectifier activation function for the hidden layer. No activation function is used for the output layer because it is a regression problem, and we are interested in predicting numerical values directly without transform.

The efficient ADAM optimization algorithm is used, and a mean squared error loss function is optimized. This will be the same metric that we will use to evaluate the performance of the model. It is a desirable metric because taking the square root of an error value gives us a result that we can directly understand in the context of the problem with the units in thousands
of dollars.

In [4]:
# define base model
def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal'))
    
    # Compile model
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

The Keras wrapper object for use in scikit-learn as a regression estimator is called `KerasRegressor`. We create an instance and pass it both the name of the function to create the neural network model and some parameters to pass along to the model's `fit()` function later, such as the number of epochs and batch size. Both of these are set to sensible defaults. We also initialize the random number generator with a constant random seed, which we will repeat for each model evaluated in this tutorial. This is to ensure we compare models consistently and that the results are reproducible.

In [6]:
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# evaluate model
estimator = KerasRegressor(build_fn=baseline_model, 
                           epochs=100, 
                           batch_size=5, 
                           verbose=0)

The final step is to evaluate this baseline model. We will use 10-fold cross-validation to evaluate the model.

In [7]:
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, Y, cv=kfold, n_jobs=-1)

2021-10-16 20:58:59.094680: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-10-16 20:58:59.094774: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-10-16 20:58:59.122173: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-10-16 20:58:59.123421: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-10-16 20:58:59.242474: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or dire

Running this code gives us an estimate of the model's performance on the problem for unseen data. The result reports the mean squared error, including the average and standard deviation (average variance) across all ten folds of the cross-validation evaluation. 

***Note***, the mean squared error is negative because scikit-learn inverts so that the metric is maximized instead of minimized. You can ignore the sign of the result.

In [8]:
print("Baseline: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Baseline: -22.29 (6.67) MSE


## Lift Performance By Standardizing The Dataset

An important concern with the Boston house price dataset is that the input attributes vary in their scales because they measure different quantities. It is almost always good practice to prepare your data before modeling it using a neural network model. Continuing from the above baseline model, we can re-evaluate the same model using a standardized version of the input dataset.

We can use scikit-learn's `Pipeline` framework to perform the standardization during the model evaluation process, within each fold of the cross-validation. This ensures that there is no data leakage from each test set cross-validation fold into the training data. The code below creates a scikit-learn `Pipeline` that first standardizes the dataset then creates and evaluates the baseline neural network model.

In [9]:
# evaluate model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=baseline_model, 
                                         epochs=50, 
                                         batch_size=5,
                                         verbose=0)))
pipeline = Pipeline(estimators)

kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold, n_jobs=-1)

Running the example provides an improved performance over the baseline model without standardized data, dropping the error.

In [10]:
print("Standardized: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Standardized: -19.85 (7.30) MSE


A further extension of this section would be to apply a rescaling to the output variable, such as normalizing it to the range of 0 to 1 and use a Sigmoid or similar activation function on the output layer to narrow output predictions to the same range.

## Tune The Neural Network Topology

There are many concerns that can be optimized for a neural network model. Perhaps the point of biggest leverage is the networks' structure, including the number of layers and the number of neurons in each layer. In this section, we will evaluate two additional network topologies to improve the model's performance further. We will look at both a deeper and wider network topology.

### Evaluate a Deeper Network Topology

One way to improve the performance of a neural network is to add more layers. This might allow the model to extract and recombine higher-order features embedded in the data. In this section, we will evaluate the effect of adding one more hidden layer to the model. This is as easy as defining a new function that will create this deeper model, copied from our baseline model above. We can then insert a new line after the first hidden layer. In this case, with about half the number of neurons. Our network topology now looks like this:

`13 inputs -> [13 -> 6] -> 1 output`

We can evaluate this network topology in the same way as above while also using the dataset's standardization as shown, to improve performance.

In [11]:
# define the model
def larger_model():
    # create model
    model = Sequential()
    model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
    model.add(Dense(6, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal'))
    
    # Compile model
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

In [12]:
# evaluate model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=larger_model, 
                                         epochs=50, 
                                         batch_size=5,
                                         verbose=0)))
pipeline = Pipeline(estimators)

kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold, n_jobs=-1)

Running this model does show a further improvement in MSE performance.

In [13]:
print("Larger: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Larger: -15.33 (7.34) MSE


### Evaluate a Wider Network Topology

Another approach to increasing the representational capacity of the model is to create a wider network. In this section, we evaluate the effect of keeping a shallow network architecture and nearly doubling the number of neurons in the one hidden layer. Again, we need to define a new function that creates our neural network model. Here, we have increased the number of neurons in the hidden layer compared to the baseline model from 13 to 20. The topology for our wider network can be summarized as follows:

`13 inputs -> [20] -> 1 output`

In [14]:
# define wider model
def wider_model():
    # create model
    model = Sequential()
    model.add(Dense(20, input_dim=13, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal'))
    
    # Compile model
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

In [15]:
# evaluate model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=wider_model, 
                                         epochs=100, 
                                         batch_size=5,
                                         verbose=0)))
pipeline = Pipeline(estimators)

kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold, n_jobs=-1)

Building the model does see a further drop in error to about 13 thousand squared dollars. This is not a bad result for this problem.

In [16]:
print("Wider: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Wider: -13.37 (6.99) MSE


It would have been hard to guess that a wider network would outperform a deeper network on this problem. The results demonstrate the importance of empirical testing when it comes to developing neural network models.

## Summary

In this lesson, you discovered the Keras deep learning library for modeling regression problems. You learned how to develop and evaluate neural network models, including:

* How to load data and create a baseline model.
* How to lift performance using data preparation techniques like standardization.
* How to design and evaluate networks with different varying topologies on a problem.