## Regression

Example of building an MLP for a regression problem.

The problem is to model the boston housing price dataset from UCI repository.

A reasonable evaluation metric, based on other studies conducted on the same problem, is a RMSE of around $4500 which we will use an out evaluation metric.

## 1 - Import required libs

In [3]:
import numpy as np
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import os

## 2 - Load data

In [5]:
seed = 7
np.random.seed(seed)

absdir = os.path.dirname(os.path.realpath('__file__'))
datapath = "../data/housing.csv"

dataframe = read_csv(os.path.join(absdir, datapath), header=None, delim_whitespace=True)
dataset = dataframe.values

X = dataset[:, 0:13]
y = dataset[:, 13]
print(X.shape)
print(y.shape)
print(X[0])
print(y[0])

(506, 13)
(506,)
[6.320e-03 1.800e+01 2.310e+00 0.000e+00 5.380e-01 6.575e+00 6.520e+01
 4.090e+00 1.000e+00 2.960e+02 1.530e+01 3.969e+02 4.980e+00]
24.0


## 3 - Building a base model

We iterate by building a base model with the following toplogy:

```
Input (13) -> Hidden (13) -> Output (1)
```

In [7]:
# define base model
def base_model():
    model = Sequential()
    model.add(Dense(13, input_dim=13, activation="relu", kernel_initializer="normal"))
    model.add(Dense(1, kernel_initializer="normal"))
    model.compile(loss="mean_squared_error", optimizer="adam")
    return model

estimator = KerasRegressor(build_fn=base_model, epochs=100, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(estimator, X, y, cv=kfold)
print(results)
print("Baseline MSE: {:.2f} ({:.2f})".format(results.mean(), results.std()))

[-11.06880946 -17.94759935  -7.71836767 -35.42759048 -37.26230825
 -27.64446202  -7.82347067 -96.4998024  -23.00892148 -23.81627668]
Baseline MSE: -28.82 (24.65)


## 4 - Optimization

Try to improve model performance by standardizing the dataset since the various attributes, although numeric, have different scales.