## Keras Regression of House Prices

Use Keras with the standard Boston House price set


- http://www.kaggle.com/vikrishnan/boston-house-prices

The dataset describes properties of houses in Boston suburbs and is concerned with modeling the price of
houses in those suburbs in thousands of dollars. It' s a  Regression predictive modeling
problem. There are 13 input variables that describe the properties of a given Boston suburb.
1. CRIM: per capita crime rate by town.
2. ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
3. INDUS: proportion of non-retail business acres per town.
4. CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
5. NOX: nitric oxides concentration (parts per 10 million).
6. RM: average number of rooms per dwelling
7. AGE: proportion of owner-occupied units built prior to 1940.
8. DIS: weighted distances to five Boston employment centers.
9. RAD: index of accessibility to radial highways.
10. TAX: full-value property-tax rate per 10,000 dollars
11. PTRATIO: pupil-teacher ratio by town.
12. B: 1000(Bk âˆ’ 0.63)**2 where Bk is the proportion of blacks by town.
13. LSTAT: percentage lower status of the population.
14. MEDV: Median value of owner-occupied homes in 1000 dollars.

All input and output variables are numerical.  References suggest for models evaluated using MSE are around 20 squared thousand dollars

### Develop baseline model

In [1]:
import numpy as np
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [3]:
# load dataset
dataframe = read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = dataframe.values
# split into input and output variables
X = dataset[:,0:13]
Y = dataset[:,13]
dataframe.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222.0,18.7,396.9,5.33,36.2


Try a simple
model that has a single fully connected hidden layer with the same number of neurons as input
attributes (13). Use standard rectifier activation function for
the hidden layer. No activation function is used for the output layer because it is a regression
problem and we are interested in predicting numerical values directly without transform.

The efficient ADAM optimization algorithm is used and a mean squared error loss function
is optimized. This will be the same metric that we will use to evaluate the performance of the
model. It is a desirable metric because by taking the square root of an error value it gives us a
result that we can directly understand in the context of the problem with the units in thousands
of dollars.

In [6]:
# define base model
def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal'))
    # Compile model
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

In [8]:
#Set the seed and use KerasRegressor
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
# evaluate model
estimator = KerasRegressor(build_fn=baseline_model, epochs=100, batch_size=5, verbose=0)

In [9]:
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(estimator, X, Y, cv=kfold)
print("Baseline: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Baseline: -31.28 (23.61) MSE


Indicates around 31 thousand dollars away 

### Improve performance by preprocessing the data
Input attributes vary in scale as they measure different quantities; try standardizing the data - use pipelining in scikit learn to easily handle this for us


In [10]:
# evaluate model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=baseline_model, epochs=100, batch_size=5,verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Standardized: %.2f (%.2f) MSE" % (results.mean(), results.std()))


Standardized: -23.39 (28.00) MSE


Shows increase in performance by standardising the data - maybe should try normalising the output variable to range 0 to 1? and use sigmoid on the output layer to narrow predictions to same range

### Tune the Network Topology

#### Evaluate a Deeper Network
Add more layers, so network topology looks like this: 13 inputs --> [13->6] -> 1 output

In [11]:
# create model
def larger_model():
    model = Sequential()
    model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
    model.add(Dense(6, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal'))
    # Compile model
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=larger_model, epochs=50, batch_size=5,verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Larger: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Larger: -21.88 (23.40) MSE


Shows improvement with a deeper network

#### Evaluate a wider network topology
Try the approach of increasing the representational capacity of the model is to create a wider
network. Evaluate the effect of keeping a shallow network architecture and
nearly doubling the number of neurons in the one hidden layer. Increase the number of
neurons in the hidden layer compared to the baseline model from 13 to 20.
13 inputs -> [20] -> 1 output

In [12]:
def wider_model():
    # create model
    model = Sequential()
    model.add(Dense(20, input_dim=13, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal'))
    # Compile model
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

# evaluate model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=wider_model, epochs=100, batch_size=5,verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Wider: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Wider: -21.79 (24.83) MSE


No increase with wider network