# Example 2 - Boston Housing Price Regression

The dataset is taken from the StatLib library which is maintained at Carnegie Mellon University.
Samples contain 13 attributes of houses at different locations around the Boston suburbs in the late 1970s. 
Targets are the median values of the houses at a location (in k$).

This notebook is based on http://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/

First, there are several dependencies: 

In [None]:
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

We load the dataset (adjust path if necessary) split the dataset into into input (X) and output (Y) variables. The dataset is available online: https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data Make sure to change the file name to "housing.csv".

In [None]:
dataframe = pandas.read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = dataframe.values
X = dataset[:,0:13]
Y = dataset[:,13]

The next method defines the neural network's architecture. We provide an alternative layer for a wider network and an additional layer for a deeper network. Feel free to experiment with different network architectures.

In [None]:
def baseline_model():
    model = Sequential()
    model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
    #alternative layer for a wider network:
    #model.add(Dense(20, input_dim=13, kernel_initializer='normal', activation='relu')) 
    #optional layer for a deeper network:
    #model.add(Dense(6, kernel_initializer='normal', activation='relu')) 
    model.add(Dense(1, kernel_initializer='normal'))
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

Our model is defined. Next, we evaluate its accuracy with cross-validation:

In [None]:
seed = 7
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=baseline_model, epochs=20, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Standardized: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Independent of the cross-validation of our pipeline model, we can fit the pipeline to the entire dataset...

In [None]:
pipeline.fit(X,Y)

...and use the fitted model for predictions:

In [None]:
X[0].reshape(1, -1)
pipeline.predict(X[0].reshape(1, -1))
print(Y[0])