# Model building with Keras

In this notebook, you will get to know keras' **functional API** by creating a simple model, and by implementing and adding your own layer.

We are not going to train it, but the purpose of this exercise is to be able to use keras for building flexible models.

**This exercise is organized in 3 steps**

1. Build a simple, fully-connected network for a classification task with the functional API
2. Create your own layer which is meant to perform input feature scaling.
3. Add the layer to your model.
4. Extend your network to model a multi-purpose network doing also a regression on some quantity.

## Step 1: Simple model

In [None]:
# imports
import tensorflow as tf
import numpy as np

Let's build a model with to following specs

- 32 input features
- 5 hidden layers with 128 units each
- "elu" activation function
- a final "softmax" layer with 3 units (3-class classification)

In [None]:
x = tf.keras.layers.Input(shape=(32,))

# stack hidden layers
a = x
for _ in range(5):
    # dense layer
    a = tf.keras.layers.Dense(128)(a)

    # activation
    a = tf.keras.layers.Activation("elu")(a)

# output layer
y = tf.keras.layers.Dense(3, activation="softmax")(a)

# construct the model
model = tf.keras.Model(inputs=x, outputs=y)

Now we can check if your model produces some outputs.

**Remember**, the weights are already initialized (can you find out how?) but they were not optimized yet by means of a training process. However, even with the initial weights, you should be able to perform a prediction.

In [None]:
# creata a batch of 10 input samples
random_inputs = np.random.random((10, 32))
pred = model.predict(random_inputs)
pred



array([[0.19352934, 0.43056267, 0.375908  ],
       [0.31703332, 0.33084533, 0.35212138],
       [0.32126305, 0.34349218, 0.3352447 ],
       [0.25064832, 0.405579  , 0.34377265],
       [0.23130643, 0.39652428, 0.37216923],
       [0.20979956, 0.45050025, 0.33970016],
       [0.2732788 , 0.39144984, 0.33527136],
       [0.24330816, 0.39655355, 0.36013833],
       [0.29108942, 0.39195138, 0.3169592 ],
       [0.35611117, 0.31104642, 0.33284244]], dtype=float32)

Do you understand the shape of the output?

Verify that the softmax activation in the last layer worked, i.e., check that the sum of outputs adds up to 1.

## Step 2: Custom layer

Now we are going to write our own custom layer that is supposed to apply input feature scaling. For this, we subclass the `tf.keras.layers.Layer` base class and implement the minimal set of methods to integrate it into our model.



In [None]:
# define the feature scaling procedure as a custom keras layer
# that has, of course, no weights as it is not trainable
# see https://keras.io/guides/making_new_layers_and_models_via_subclassing for more info

class FeatureScaling(tf.keras.layers.Layer):

    def __init__(self, means, stddevs):
        """
        Constructor. Stores arguments as instance members.
        """
        super(FeatureScaling, self).__init__(trainable=False)

        self.means = means
        self.stddevs = stddevs

    def get_config(self):
        """
        Method that is required for model cloning and saving. It should return a
        mapping of instance member names to the actual members.
        """
        return {"means": self.means, "stddevs": self.stddevs}

    def compute_output_shape(self, input_shape):
        """
        Method that, given an input shape, defines the shape of the output tensor.
        This way, the entire model can be built without actually calling it.
        """
        return (input_shape[0], input_shape[1])

    def build(self, input_shape):
        """
        Any variables defined by this layer should be created inside this method.
        This helps Keras to defer variable registration to the point where it is
        needed the first time, and in particular not at definition time.
        """
        # nothing to do here as our feature scaling has not trainable parameters

    def call(self, x):
        """
        Payload of the layer that takes inputs and computes the requested output
        whose shape should match what is defined in compute_output_shape.
        """
        # scale each feature such that it is distributed around 0 with a standard deviation of 1
        return (x - self.means) / self.stddevs

## Step 3: Add your layer

Now we are going to write our own custom layer that is supposed to apply input feature scaling. For this, we subclass the `tf.keras.layers.Layer` base class and implement the minimal set of methods to integrate it into our model.



In [None]:
# create the random data, which is a super position of gaussians
data1 = np.random.normal(loc=3.0, scale=1.0, size=(100, 32))
data2 = np.random.normal(loc=5.0, scale=2.0, size=(100, 32))

composed_data = np.concatenate([data1, data2], axis=0)
np.random.shuffle(composed_data)
composed_data

array([[2.05504154, 2.79018988, 1.72821344, ..., 2.54600696, 4.04501132,
        3.95596906],
       [5.17977536, 5.06266596, 2.31131453, ..., 6.66998048, 1.63773041,
        4.1835336 ],
       [4.67971617, 4.5988274 , 3.92547357, ..., 6.64876942, 3.85227336,
        6.71117932],
       ...,
       [2.31826299, 4.59579986, 5.17654838, ..., 3.04805325, 2.96284934,
        2.6495997 ],
       [3.63967808, 4.61357328, 9.30025715, ..., 2.49433114, 4.25265174,
        6.50498244],
       [5.63330117, 3.60716777, 6.27400824, ..., 3.23925531, 8.33567288,
        6.0906125 ]])

In [None]:
# meansure mean and standard deviation of the composed data
# this is done across axis 0, i.e., measured per input feature across the batch axis
means = np.mean(composed_data, axis=0)
stddevs = np.std(composed_data, axis=0)

In [None]:
x = tf.keras.layers.Input(shape=(32,))

# add the feature scaling layer, passing the previously measured means and stddevs
fs = FeatureScaling(means, stddevs)(x)

# stack hidden layers
a = fs
for _ in range(5):
    # dense layer
    a = tf.keras.layers.Dense(128)(a)

    # activation
    a = tf.keras.layers.Activation("elu")(a)

# output layer
y = tf.keras.layers.Dense(3, activation="softmax")(a)

# construct the model
model = tf.keras.Model(inputs=x, outputs=y)

Again, we can verify that our model actual predicts something 👾

In [None]:
# use the batch we created and show the first ten predictions
pred = model.predict(composed_data)
pred[:10]

array([[0.29628858, 0.3303082 , 0.37340322],
       [0.39025533, 0.31435713, 0.29538757],
       [0.29114312, 0.38839746, 0.3204594 ],
       [0.3950846 , 0.23325928, 0.3716562 ],
       [0.3094837 , 0.37747917, 0.31303716],
       [0.44303316, 0.27193996, 0.2850269 ],
       [0.298977  , 0.25975996, 0.44126314],
       [0.1750406 , 0.51542217, 0.3095372 ],
       [0.3190634 , 0.4872547 , 0.19368191],
       [0.50432557, 0.21591039, 0.27976406]], dtype=float32)

Nice!

## Step 4: Extend your network

The functional API is very flexible. In fact, you do not have to stick with a simple, single-purpose network but you can extend it to model a network with **multiple purposes**.

Take your network from before, and change it to have the following architecture:

#### Common network

- 32 input features
- 3 hidden layers with 128 units each
- "elu" activation function

#### Classification "head"

- 3 hidden layers with 128 units each
- "elu" activation function
- a final "softmax" layer with 3 units (3-class classification)

#### Regression "head"

- 5 hidden layers with 64 units each
- "selu" activation function
- a final "linear" layer with a single units

In [None]:
x = tf.keras.layers.Input(shape=(32,))

# add the feature scaling layer, passing the previously measured means and stddevs
fs = FeatureScaling(means, stddevs)(x)

# common layers
a = fs
for _ in range(3):
    # dense layer
    a = tf.keras.layers.Dense(128)(a)

    # activation
    a = tf.keras.layers.Activation("elu")(a)

# classification head
b = a
for _ in range(3):
    # dense layer
    b = tf.keras.layers.Dense(128)(b)

    # activation
    b = tf.keras.layers.Activation("elu")(b)

# classification output
y1 = tf.keras.layers.Dense(3, activation="softmax")(b)

# regression head
c = a
for _ in range(5):
    # dense layer
    c = tf.keras.layers.Dense(64)(c)

    # activation
    c = tf.keras.layers.Activation("selu")(c)

# regression output
y2 = tf.keras.layers.Dense(1, activation="linear")(c)

# construct the model
model = tf.keras.Model(inputs=x, outputs=[y1, y2])

And again, run the model prediction, which now returns **2** outputs!

In [None]:
# use the batch we created and show the first ten predictions
class_pred, reg_pred = model.predict(composed_data)

print(f"first 10 class predictions:\n{class_pred[:10]}\n")
print(f"first 10 regression predictions:\n{reg_pred[:10]}\n")

first 10 class predictions:
[[0.37955564 0.33074188 0.28970245]
 [0.3159389  0.28614768 0.39791343]
 [0.37199733 0.4678949  0.16010778]
 [0.3496212  0.40378    0.24659882]
 [0.27101213 0.504284   0.22470388]
 [0.36760885 0.34206164 0.29032958]
 [0.34607583 0.46843258 0.18549164]
 [0.22013687 0.59021086 0.18965226]
 [0.47388566 0.2144965  0.31161797]
 [0.2741796  0.17448226 0.55133814]]

first 10 regression predictions:
[[ 1.1370442 ]
 [-2.4196193 ]
 [-0.1467264 ]
 [-0.14404503]
 [ 1.5875952 ]
 [-2.3881588 ]
 [ 0.9758812 ]
 [ 0.9619354 ]
 [-1.7457979 ]
 [-0.16360575]]

