# C12: Project: Regression of Boston House Prices

## 1. Download Dataset

In the data_set/ directory and use terminal

```
./get_housing_data.sh
```

and the data file named as "housing.data" will be downloaded at directory. 

Import libraried we will need

## 2. Preparation

Import the libraries we will need

In [4]:
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

load data

In [5]:
dataframe = pd.read_csv('./data_set/housing.data', delim_whitespace=True, header=None)
dataset = dataframe.values
X = dataset[:, 0:13]
Y = dataset[:, 13]

## 2. Baseline Nerual Network Model

Develop a baseline nerual network model

In [6]:
# define baseline nn model
def create_baseline_nn():
    
    model = Sequential()
    model.add(Dense(13, input_dim=13, init='normal', activation='relu'))
    model.add(Dense(1, init='normal'))
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

# fix random seed
seed = 7
np.random.seed(seed)

# translate to sklearn form
estimator = KerasRegressor(build_fn=create_baseline_nn, nb_epoch=100, batch_size=5, verbose=0)

# evaluate model
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(estimator, X, Y, cv=kfold)
print("Baseline: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Baseline: 38.04 (28.15) MSE


## 3. Lift Performance by Standardizing The Dataset

Use the sklearn's Pipeline framework to perform the standardization during the model evaluation process.

In [7]:
# use pipeline to perform standardization
pipeline = [
    ('standardize', StandardScaler()),
    ('mlp', estimator),
]
estimator_sta = Pipeline(pipeline)

# evaluate model
results = cross_val_score(estimator_sta, X, Y, cv=kfold)
print "Baseline model with Standardization: %.2f (%.2f) MSE"%(results.mean(), results.std())

Baseline model with Standardization: 22.42 (26.58) MSE
