# Chapter 10 - Regression on Housing Dataset

In [1]:
import tensorflow as tf
from tensorflow import keras 
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [2]:
housing = fetch_california_housing()

In [4]:
X_train_full, X_test, y_train_full, y_test = train_test_split(
    housing.data, housing.target
)
X_train, X_valid, y_train, y_valid = train_test_split(
    X_train_full, y_train_full
)

Create an instance of ```StandardScaler``` and call ```fit_transform``` on this instance with the data to fit the transformer to the data, and then return the transformed data instances. 

Same thing as calling ```.fit().transform()```

In [5]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.fit_transform(X_valid)
X_test = scaler.fit_transform(X_test)

Building a model is very similar to the classification example, except the output layer has only one neuron with no activation function. The loss function in this case will be MSE. Since the dataset is noisy, we use just one hidden layer with fewer neurons to avoid overfitting. 

In [6]:
model = keras.models.Sequential([
    keras.layers.Dense(30, activation='relu', input_shape=X_train.shape[1:]),
    keras.layers.Dense(1)
])

In [9]:
model.compile(
    loss='mean_squared_error', optimizer='sgd'
)
history = model.fit(
    X_train, y_train, epochs=20, validation_data=(X_valid, y_valid)
)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [10]:
mse_test = model.evaluate(X_test, y_test)



In [11]:
X_new = X_test[:3]

In [12]:
y_pred = model.predict(X_new)

In [14]:
y_pred, y_test[:3]

(array([[2.8503492 ],
        [0.97012275],
        [2.0389943 ]], dtype=float32),
 array([2.063, 1.25 , 1.66 ]))

This works, but it is sometimes better to build networks with more complex topologies, or with multiple inputs and outputs. In this case, Keras offers the Functional API.

An example of a more complex nonsequential network is a wide & deep neural network. This connects all or part of the inputs directly to the output layer, making it possible for the network to learn deep patterns (using the deep path) and simple rules (using the short path). In contrast, a normal MLP forces all the data to go through every layer of the model, which can distort some of the more simple patterns. 
                        
Deep Path

Input Layer --> Hidden 1 --> Hidden 2 --> Concat --> Output Layer

Wide Path          

Input Layer --------------> Concat --> Output Layer

The wide path sidesteps the hidden layers 

In [15]:
input_ = keras.layers.Input(shape=X_train.shape[1:])
hidden1 = keras.layers.Dense(30, activation='relu')(input_)
hidden2 = keras.layers.Dense(30, activation='relu')(hidden1)
concat = keras.layers.Concatenate()([input_, hidden2])
output = keras.layers.Dense(1)(concat)
model = keras.Model(inputs=[input_], outputs=[output])

1. First an ```input``` object is create. This specifies the shape and datatype of the input the model will get. Sometimes a model can have multiple inputs.
2. Next is a dense hidden layer with 30 neurons. Once it is created, we call it like a function and pass it in the input layer. This is why this is called the Functional API. Right now we are just telling keras how the model should be connected, no data is being passed yet. 
3. Then a second hidden layer is created, which is passed the output first hidden layer.
4. The we create a ```concatenate``` layer, and again immediately use it like a function to concatenate the input and the output of the second hidden layer. 
5. Then we create the output layer with a single neuron and no activation function (regression), which is passed the result of the concatenation.
6. Then the model is created, specifying which inputs and outputs to use. 

Once the model is created, the rest of the process is the same.

In [16]:
model.compile(
    loss='mean_squared_error', optimizer='sgd'
)
history = model.fit(
    X_train, y_train, epochs=20, validation_data=(X_valid, y_valid)
)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [17]:
mse_test = model.evaluate(X_test, y_test)



But what if you want to send a subset of features through the wide path, and another subset (possibly overlapping) through the deep path? 

Input B --> Hidden 1 --> Hidden 2 --> Concat --> Output

Input A -----------> Concat --> Output

Suppose we want to send 5 features [0,4] through the wide path, and 6 [2,7] through the deep path. 

In [21]:
input_A = keras.layers.Input(shape=[5], name='wide_input')
input_B = keras.layers.Input(shape=[6], name='deep_input')
hidden1 = keras.layers.Dense(30, activation='relu')(input_B)
hidden2 = keras.layers.Dense(30, activation='relu')(hidden1)
concat = keras.layers.concatenate([input_A, hidden2])
output = keras.layers.Dense(1, name='output')(concat)
model = keras.Model(inputs=[input_A, input_B], outputs=[output])

Its good convention to at least name the most important layers once the model begins to get more complex. Now we can compile the model as usual, but when we call the ```fit()``` method, we pass a pair of matricies (X_train_A, X_train_B), one per input. 

In [27]:
model.compile(
    loss = 'mse', optimizer=keras.optimizers.SGD(lr=1e-3)
)

X_train_A , X_train_B = X_train[:, :5], X_train[:, 2:]
X_valid_A, X_valid_B = X_valid[:, :5], X_valid[:, 2:]
X_test_A, X_test_B = X_test[:, :5], X_test[:, 2:]

  super(SGD, self).__init__(name, **kwargs)


In [28]:
history = model.fit(
    (X_train_A, X_train_B), y_train, epochs=20, validation_data=((X_valid_A, X_valid_B), y_valid)
)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [29]:
mse_test = model.evaluate((X_test_A, X_test_B), y_test)



Use cases for having multiple outputs:

- The task may demand it. For instance, you may want to locate and identify the main object in a picture. This is a regression task (finding the coordinates of the object's center, as well as its width and height) and a classification task
- You also may have multiple independent tasks based on the same data. You can get better results on all tasks by training a single network with one output per task. This is because the network can learn features in the data that are useful across tasks. For example, you could perform multitask classification on pictures of faces by classifying the person's facial expression, and if they are wearing glasses or not. 
- It can also be used as a regularization technique (ex. a training constraint whose objective is to reduce overfitting). For example, you may want to add some auxiliary outputs in a neural network to ensure that the underlying part of the network learns something useful on its own without relying on the rest of the network.  

Adding extra outputs to the model:

In [30]:
# same archetecture as above up to the main output layer
input_A = keras.layers.Input(shape=[5], name='wide_input')
input_B = keras.layers.Input(shape=[6], name='deep_input')
hidden1 = keras.layers.Dense(30, activation='relu')(input_B)
hidden2 = keras.layers.Dense(30, activation='relu')(hidden1)
concat = keras.layers.concatenate([input_A, hidden2])
#
output = keras.layers.Dense(1, name='main_output')(concat)
aux_output = keras.layers.Dense(1, name='aux_output')(hidden2)
model = keras.Model(inputs=[input_A, input_B], outputs=[output, aux_output])

Each output needs its own loss function. When we compile the model, we should pass a list of losses (if its a single loss, keras assumes this will be used for all outputs). Keras will, by default, add up all these losses to get the final loss during training. We care much more about the loss in the main output, so we give this value more weight. 

In [31]:
model.compile(
    loss=['mse', 'mse'], loss_weights= [0.9, 0.1], optimizer='sgd'
)

We also need to provide labels for each output during training. Since in this example both outputs are trying to predict the same thing, we should use the same labels. 

In [33]:
history = model.fit(
    [X_train_A, X_train_B], [y_train, y_train], epochs=20, validation_data=([X_valid_A, X_valid_B], [y_valid, y_valid])
)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [34]:
total_loss, main_loss, aux_loss = model.evaluate(
    [X_test_A, X_test_B], [y_test, y_test]
)

