# California housing dataset regression with MLPs

In this notebook, we'll train a multi-layer perceptron model to to estimate median house values on Californian housing districts using [TensorFlow](https://www.tensorflow.org/) (version $\ge$ 2.0 required) with the [Keras API](https://www.tensorflow.org/guide/keras/overview).

First, the needed imports.

In [None]:
%matplotlib inline

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.utils import plot_model, to_categorical

from distutils.version import LooseVersion as LV

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

print('Using Tensorflow version: {}, and Keras version: {}.'.format(tf.__version__, tf.keras.__version__))
assert(LV(tf.__version__) >= LV("2.0.0"))

## Data

Then we load the California housing data. First time we need to download the data, which can take a while.

In [None]:
chd = datasets.fetch_california_housing()

The data consists of 20640 housing districts, each characterized with 8 attributes: *MedInc, HouseAge, AveRooms, AveBedrms, Population, AveOccup, Latitude, Longitude*. There is also a target value *MedHouseVal* (median house value) for each housing district.
 
Let's plot all attributes against the target value:

In [None]:
plt.figure(figsize=(15,10))
for i in range(8):
    plt.subplot(4,2,i+1)
    plt.scatter(chd.data[:,i], chd.target, s=2, label=chd.feature_names[i])
    plt.legend(loc='best')
    plt.ylabel('MedHouseVal')

We'll split the data into a training and a test set.

In [None]:
test_size = 5000

X_train, X_test, y_train, y_test = train_test_split(
    chd.data, chd.target, test_size=test_size, shuffle=True)

print()
print('California housing data: train:',len(X_train),'test:',len(X_test))
print()
print('X_train:', X_train.shape)
print('y_train:', y_train.shape)
print()
print('X_test', X_test.shape)
print('y_test', y_test.shape)

The training data matrix `X_train` is a matrix of size (20640-`test_size`, 8), and the vector `y_train` contains the target value (median house value) for each housing district in the training set.

## One hidden layer

### Initialization

Let's begin with a simple model that has a single hidden layer.  

We first create the `Input` with `X_train.shape[1]` inputs (one for each of the eight attributes in the training data).

As the first thing, we scale the input data feature-wise to zero mean and unit variance. This is done using a special `Normalization` layer, whose parameters are not set during training the network. Instead, we compute the mean and variance of the training data and store them as the layer's weights with separate call to the `adapt()` function. 

Next we add a `Dense` layer that has 10 output nodes. The `Dense` layer connects each input to each output with some weight parameter and then passes the result through a ReLU non-linear activation function.

Then we have the output layer that has only one unit with a linear activation function.

After all layers are created, we create the `Model` by specifying its inputs and outputs.

Finally, we select *mean squared error* as the loss function, select [*stochastic gradient descent*](https://keras.io/optimizers/#sgd) as the optimizer, and `compile()` the model. Note there are [several different options](https://keras.io/optimizers/) for the optimizer in Keras that we could use instead of *sgd*.

In [None]:
slinputs = keras.Input(shape=(X_train.shape[1],))

normlayer = layers.experimental.preprocessing.Normalization()
normlayer.adapt(X_train)
x = normlayer(slinputs)

x = layers.Dense(units=10, activation="relu")(x)
sloutputs = layers.Dense(units=1, activation='linear')(x)

slmodel = keras.Model(inputs=slinputs, outputs=sloutputs,
                    name="single_layer_mlp_model")
slmodel.compile(loss='mean_squared_error', 
                optimizer='sgd',
                metrics=[keras.metrics.MeanAbsoluteError()])
print(slmodel.summary())

We can also draw a fancier graph of our model.

In [None]:
plot_model(slmodel, show_shapes=True)

### Learning

Now we are ready to train our first model.  An *epoch* means one pass through the whole training data. 

You can run code below multiple times and it will continue the training process from where it left off.  If you want to start from scratch, re-initialize the model using the code a few cells ago. 

In [None]:
%%time
epochs = 10 

slhistory = slmodel.fit(X_train, 
                        y_train, 
                        epochs=epochs, 
                        batch_size=32,
                        verbose=2)

Let's now see how the training progressed. *Loss* is a function of the difference of the network output and the target values.  We are minimizing the loss function during training so it should decrease over time. We also monitor *mean absolute error* during training.

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12,3))
ax1.plot(slhistory.epoch,slhistory.history['loss'])
ax1.set_title('loss (mean squared error)')
ax1.set_xlabel('epoch');
ax2.plot(slhistory.epoch,slhistory.history['mean_absolute_error'])
ax2.set_title('mean absolute error')
ax2.set_xlabel('epoch');

### Inference

For a better measure of the quality of the model, let's see the mean squared error for the test data. 

In [None]:
%%time

slpred = slmodel.predict(X_test)
print("Mean squared error: %.3f"
      % mean_squared_error(y_test, slpred))

We can also take a look at the individual errors made by the model when making the predictions. The true values of the target variable (*MedHouseVal*) are on the horizontal axis and the corresponding predictions are on the vertical axis.  

In [None]:
plt.figure(figsize=(6,6))
plt.axes(aspect='equal')
plt.scatter(y_test, slpred, s=1)
plt.xlabel('true values')
plt.ylabel('predictions')
lims = [-0.1, 5.1]
plt.xlim(lims)
plt.ylim(lims)
plt.plot(lims, lims, color=sns.color_palette()[1]);

## Multiple hidden layers

### Initialization

Let's now create a more complex MLP model that has multiple dense layers and dropout layers.  `Dropout()` randomly sets a fraction of inputs to zero during training, which is one approach to regularization and can sometimes help to prevent overfitting.

The output layer needs again to have a single unit with linear activation. 

Finally, we again `compile()` the model, this time using [*Adam*](https://keras.io/optimizers/#adam) as the optimizer.

In [None]:
mlinputs = keras.Input(shape=(X_train.shape[1],))

normlayer = layers.experimental.preprocessing.Normalization()
normlayer.adapt(X_train)
x = normlayer(mlinputs)

x = layers.Dense(units=20, activation="relu")(x)
x = layers.Dense(units=20, activation="relu")(x)
x = layers.Dropout(0.5)(x)
mloutputs = layers.Dense(units=1, activation='linear')(x)

mlmodel = keras.Model(inputs=mlinputs, outputs=mloutputs,
                    name="multi_layer_mlp_model")
mlmodel.compile(loss='mean_squared_error', 
                optimizer='adam',
                metrics=[keras.metrics.MeanAbsoluteError()])
print(mlmodel.summary())

In [None]:
plot_model(mlmodel, show_shapes=True)

### Learning

In [None]:
%%time
epochs = 10 

mlhistory = mlmodel.fit(X_train, 
                        y_train, 
                        epochs=epochs, 
                        batch_size=32,
                        verbose=2)

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12,3))
ax1.plot(mlhistory.epoch,mlhistory.history['loss'])
ax1.set_title('loss (mean squared error)')
ax1.set_xlabel('epoch');
ax2.plot(mlhistory.epoch,mlhistory.history['mean_absolute_error'])
ax2.set_title('mean absolute error')
ax2.set_xlabel('epoch');

### Inference

In [None]:
%%time

mlpred = mlmodel.predict(X_test)
print("Mean squared error: %.3f"
      % mean_squared_error(y_test, mlpred))

In [None]:
plt.figure(figsize=(6,6))
plt.axes(aspect='equal')
plt.scatter(y_test, mlpred, s=1)
plt.xlabel('true values')
plt.ylabel('predictions')
lims = [-0.1, 5.1]
plt.xlim(lims)
plt.ylim(lims)
plt.plot(lims, lims, color=sns.color_palette()[1]);

## Model tuning

Try to reduce the mean squared error of the regression. Modify the network architectures and see if the results improve. See the documentation of [Keras](https://keras.io/) for further options.