# California housing dataset regression with MLPs

In this notebook, we'll train a multi-layer perceptron model to to estimate median house values on Californian housing districts using **Tensorflow** (version $\ge$ 2.0 required) with the **Keras API**.

First, the needed imports.

In [None]:
%matplotlib inline

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout, Input
from tensorflow.keras.utils import plot_model

from IPython.display import SVG

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

print('Using Tensorflow version: {}, and Keras version: {}.'.format(tf.__version__, tf.keras.__version__))

## Data

Then we load the California housing data. First time we need to download the data, which can take a while.

In [None]:
chd = datasets.fetch_california_housing()

The data consists of 20640 housing districts, each characterized with 8 attributes: *MedInc, HouseAge, AveRooms, AveBedrms, Population, AveOccup, Latitude, Longitude*. There is also a target value (median house value) for each housing district.
 
Let's plot all attributes against the target value:

In [None]:
plt.figure(figsize=(15,10))
for i in range(8):
    plt.subplot(4,2,i+1)
    plt.scatter(chd.data[:,i], chd.target, s=2, label=chd.feature_names[i])
    plt.legend(loc='best')

We'll split the data into a training and a test set.

Let's also select a single attribute to start the analysis with, say *MedInc*.

In [None]:
test_size = 5000
single_attribute = 'MedInc'

X_train_all, X_test_all, y_train, y_test = train_test_split(
    chd.data, chd.target, test_size=test_size, shuffle=True)

attribute_index = chd.feature_names.index(single_attribute)
X_train_single = X_train_all[:, attribute_index].reshape(-1, 1)
X_test_single = X_test_all[:, attribute_index].reshape(-1, 1)
     
print()
print('California housing data: train:',len(X_train_all),'test:',len(X_test_all))
print()
print('X_train_all:', X_train_all.shape)
print('X_train_single:', X_train_single.shape)
print('y_train:', y_train.shape)
print()
print('X_test_all', X_test_all.shape)
print('X_test_single', X_test_single.shape)
print('y_test', y_test.shape)

The training data matrix `X_train_all` is a matrix of size (`n_train`, 8), and `X_train_single` contains only the first attribute (*MedInc* by default). The vector `y_train` contains the target value (median house value) for each housing district in the training set.

Let's start our analysis with the single attribute. Later, you can set `only_single_attribute = False` to use all eight attributes in the regression.

As the final step, let's scale the input data to zero mean and unit variance: 

In [None]:
only_single_attribute = True

if only_single_attribute:
    X_train = X_train_single
    X_test = X_test_single
else:
    X_train = X_train_all
    X_test = X_test_all

scaler = StandardScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
print('X_train: shape:', X_train.shape, 'mean:', X_train.mean(axis=0), 'std:', X_train.std(axis=0))
print('X_test: shape:', X_test.shape, 'mean:', X_test.mean(axis=0), 'std:', X_test.std(axis=0))

## One hidden layer

### Initialization

Let's begin with a simple model that has a single hidden layer.  We first initialize the model with `Sequential()`.  Then we add a `Dense` layer that has `X_train.shape[1]` inputs (one for each attribute in the training data) and 10 units. The `Dense` layer connects each input to each output with some weight parameter. 
Then we have an output layer that has only one unit with a linear activation function.

Finally, we select *mean squared error* as the loss function, select [*stochastic gradient descent*](https://keras.io/optimizers/#sgd) as the optimizer, and `compile()` the model. Note there are [several different options](https://keras.io/optimizers/) for the optimizer in Keras that we could use instead of *sgd*.

In [None]:
slmodel = Sequential()
slmodel.add(Dense(units=10, input_dim=X_train.shape[1], activation='relu'))
slmodel.add(Dense(units=1, activation='linear'))

slmodel.compile(loss='mean_squared_error', 
                 optimizer='sgd')
print(slmodel.summary())

We can also draw a fancier graph of our model.

In [None]:
plot_model(slmodel, show_shapes=True)

### Learning

Now we are ready to train our first model.  An *epoch* means one pass through the whole training data. 

You can run code below multiple times and it will continue the training process from where it left off.  If you want to start from scratch, re-initialize the model using the code a few cells ago. 

In [None]:
%%time
epochs = 10 

slhistory = slmodel.fit(X_train, 
                        y_train, 
                        epochs=epochs, 
                        batch_size=32,
                        verbose=2)

Let's now see how the training progressed. *Loss* is a function of the difference of the network output and the target values.  We are minimizing the loss function during training so it should decrease over time.

In [None]:
plt.figure(figsize=(5,3))
plt.plot(slhistory.epoch,slhistory.history['loss'])
plt.title('loss');

In [None]:
if X_train.shape[1] == 1:
    plt.figure(figsize=(10, 10))
    plt.scatter(X_train, y_train, s=5)
    reg_x = np.arange(np.min(X_train), np.max(X_train), 0.01).reshape(-1, 1)
    plt.scatter(reg_x, slmodel.predict(reg_x), s=8, label='one hidden layer')
    plt.legend(loc='best');

### Inference

For a better measure of the quality of the model, let's see the model accuracy for the test data. 

In [None]:
%%time

slpred = slmodel.predict(X_test)
print("Mean squared error: %.3f"
      % mean_squared_error(y_test, slpred))

## Multiple hidden layers

### Initialization

Let's now create a more complex MLP model that has multiple dense layers and dropout layers.  `Dropout()` randomly sets a fraction of inputs to zero during training, which is one approach to regularization and can sometimes help to prevent overfitting.

The last layer needs to have a single unit with linear activation to match the groundtruth (`Y_train`). 

Finally, we again `compile()` the model, this time using [*Adam*](https://keras.io/optimizers/#adam) as the optimizer.

In [None]:
mlmodel = Sequential()

mlmodel.add(Input([X_train.shape[1]]))
mlmodel.add(Dense(units=20, activation='relu'))
mlmodel.add(Dense(units=20, activation='relu'))
mlmodel.add(Dropout(0.5))

mlmodel.add(Dense(units=1, activation='linear'))

mlmodel.compile(loss='mean_squared_error', 
                optimizer='adam')
print(mlmodel.summary())

In [None]:
plot_model(mlmodel, show_shapes=True)

### Learning

In [None]:
%%time
epochs = 10 

mlhistory = mlmodel.fit(X_train, 
                        y_train, 
                        epochs=epochs, 
                        batch_size=32,
                        verbose=2)

In [None]:
plt.figure(figsize=(5,3))
plt.plot(mlhistory.epoch,mlhistory.history['loss'])
plt.title('loss');

In [None]:
if X_train.shape[1] == 1:
    plt.figure(figsize=(10, 10))
    plt.scatter(X_train, y_train, s=5)
    reg_x = np.arange(np.min(X_train), np.max(X_train), 0.01).reshape(-1, 1)
    plt.scatter(reg_x, slmodel.predict(reg_x), s=8, label='one hidden layer')
    plt.scatter(reg_x, mlmodel.predict(reg_x), s=8, label='multiple hidden layers')
    plt.legend(loc='best');

### Inference

In [None]:
%%time

mlpred = mlmodel.predict(X_test)
print("Mean squared error: %.3f"
      % mean_squared_error(y_test, mlpred))

## Model tuning

Try to reduce the mean squared error of the regression. Modify the network architectures and see if the results improve. See the documentation of [Keras](https://keras.io/) for further options.

To further improve the results, it is possible to replace [StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html), that is scaling the input data to zero mean and unit variance, with more advanced preprocessing.
See [Preprocessing data](https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing-data) for more information.