# Building custom models with tensorflow

In this chapter we are going to get into the details of creating our own model using tensorflow. Tensorflow is a wide librairy mainly used for machine
learning tasks but contains a lot more tools such as a numerical computation part similar to _numpy_, computation graph enabling us to export our models
into other devices and run them flawlessly, data loading or preprocessing part ...etc.

## Tensorflow as a numerical computation librairy

The tensorflow API uses _tensors_ which are the components and the result of all the operations. A tensor is very similar to a numpy _ndarray_ meaning
it is a data structure representing a multidimensional array. For example let's create a matrix with them to see how they work:

In [1]:
import tensorflow as tf

t = tf.constant([[1., 2., 3.], [4., 5., 6.]])# Creating a 2x3 matrix
print(t)

tf.Tensor(
[[1. 2. 3.]
 [4. 5. 6.]], shape=(2, 3), dtype=float32)


Note that we can call all sort of operations on them(much like our regular matrix operations like addition, matrix to matrix multiplication and others). We
can create tensors from numpy arrays(and vice versa) or do tensors operations directly on them.

In [None]:
import numpy as np

a = np.array([2., 4., 5.])
tf.constant(a)

The problem with this way of creating tensors is that they are immutable which means that we cannot use them as weights(since they need to be tweaked by
backpropagation). To mitigate that we need to be using the _Variable_ class instead when creating tensors.

In [None]:
t1 = tf.Variable([[1., 2., 3.], [4., 5., 6.]])
# We can modify individual cells or slices like this
t1.assign(2 * t1) # This just multiply all the values by 2
t1[0, 1].assign(1.) # Reassign the first value to 1.

## Custom functions or layers

Now we are going to try to use tensorflow to create a custom cost function. For example let's try to reproduce the _huber loss_ discussed in last chapter.

In [None]:
def huber_fn(y_true, y_pred):
    error = y_true - y_pred
    is_small_error = tf.abs(error) < 1
    squared_loss = tf.square(error) / 2
    linear_loss = tf.abs(error) - 0.5
    return tf.where(is_small_error, squared_loss, linear_loss)

The big difference is when it comes to loading a saved model that possess a custom loss function:

In [None]:
model = tf.keras.models.load_model("custom_model", custom_objects={"huber_fn": huber_fn})

Note that we could decorate the _huber\_fn_ function with _keras.utils_ to do the same without needing to pass it with the custom_object parameter. But if 
we want to be able to modify the threshold for when to consider the model small we have to modify the lost function like this:

In [None]:
def create_huber(threshold=1.0):
    def huber_fn(y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < threshold
        squared_loss = tf.square(error) / 2
        linear_loss = tf.abs(error) - 0.5
        return tf.where(is_small_error, squared_loss, linear_loss)
    return huber_fn

# Now loading the model requires us to specify the threshold
model = tf.keras.models.load_model("custom_model", custom_objects={"huber_fn": create_huber(2.0)})

# We can get rid of this inconvience by creating a custom loss function class and implementing the get_config method
class Huberloss(tf.keras.losses.Loss):
    def __init__(self, threshold=1.0, **kwargs):
        self.threshold = threshold
        super().__init__(**kwargs)
    
    def call(self, y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < self.threshold
        squared_loss = tf.square(error) / 2
        linear_loss = tf.abs(error) - 0.5
        return tf.where(is_small_error, squared_loss, linear_loss)
    
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "threshold": self.threshold}

In the same vein we can create custom initializers, activations functions or regularizers.

### Creating custom layers and models

The simplest way to create new layers is to use the _Lambda_ function like this:

In [None]:
exponential_layer = tf.keras.layers.Lambda(lambda x: tf.exp(x)) # The exponential layer

We could also use a class inheriting from _tf.keras.layers.Layer_. Let's build a replica of the dense layer.

In [None]:
class MyDenseLayer(tf.keras.layers.Layer):
    def __init__(self, units, activation=None, **kwargs):
        super().__init__(**kwargs)
        self.units = units
        self.activation = activation

    def build(self, batch_input_shape):
        self.kernel = self.add_weight(
            name="kernel",
            shape=[batch_input_shape[-1], self.units], initializer="glorot_normal"
        )
        self.bias = self.add_weight(name="bias", shape=[self.units], initializer="zeros")

    def call(self, X):
        return self.activation(X @ self.kernel + self.bias)
    
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "units": self.units, "activation": tf.keras.activations.serialize(self.activation)}

Now let's create custom models.  In order to do that subclass the _tf.keras.Model_ class, create layers and variables in the constructor, and implement the 
call() method to do whatever you want the model to do. To illustrate this let's build a model composed of an input layer, 2 residual block (a residual
block add its inputs to its outputs) and an output layer.

In [None]:
class ResidualBlock(tf.keras.layers.Layer):
    def __init__(self, n_layers, n_neurons, **kwargs):
        super().__init__(**kwargs)
        self.hidden = [tf.keras.layers.Dense(n_neurons, activation="relu", kernel_initializer="he_normal") for _ in range(n_layers)]
        # Note that this layer contains other layers which is fine

    def call(self, inputs):
        Z = inputs
        for layer in self.hidden:
            Z = layer(Z)
        return Z + inputs

# Now the custom model is define below
class ResidualRegressor(tf.keras.Model):
    def __init__(self, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.hidden1 = tf.keras.layers.Dense(30, activation="relu", kernel_initializer="he_normal")
        self.block1 = ResidualBlock(2, 30)
        self.block2 = ResidualBlock(2, 30)
        self.out = tf.keras.layers.Dense(output_dim)
        
    def call(self, inputs):
        Z = self.hidden1(inputs)
        for _ in range(1 + 3):
            Z = self.block1(Z)
# Note that to be able to save this model and load it we have to implement the get_config method

_tf.keras.Model_ is a subclass of the _Layer_ class so we can use them the same as ordinary layers.

## Losses and metrics based on model internals

To understand how to create a custom loss function or metrics based on weights or activation functions we are going to implement a MLP Regressor composed
of a stack of five hidden layers plus an output layer, it will also have an auxiliary output with a loss function called _reconstruction loss_

In [None]:
class MLPRegressor(tf.keras.Model):
    def __init__(self, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.hidden = [tf.keras.layers.Dense(30, activation="relu", kernel_initializer="he_normal") for _ in range(5)]
        self.out = tf.keras.layers.Dense(output_dim)
        self.reconstruction_mean = tf.keras.metrics.Mean(name="reconstruction_error")

    def build(self, batch_input_shape):
        n_inputs = batch_input_shape[-1]
        self.reconstruct = tf.keras.layers.Dense(n_inputs)

    def call(self, inputs, training=False):
        Z = inputs
        for layer in self.hidden:
            Z = layer(Z)
        reconstruction = self.reconstruction(Z)
        recon_loss = tf.reduce_mean(tf.square(reconstruction - inputs))
        self.add_loss(0.05 * recon_loss)
        if training:
            result = self.reconstruction_mean(recon_loss)
            self.add_metric(result)
        return self.out(Z)


### Computing gradients using autodiff

To understance how to compute the gradient manually, let's create an example function first:

In [None]:
def f(w1, w2):
    return 3 * w1 ** 2 + 2 * w1 * w2

# Now let's try to approximate the partial derivatives of each parameter based on w1 and w2
w1, w2 = 5, 3
eps = 1e-6
part_derivative1 = (f(w1 + eps, w2) - f(w1, w2)) / eps
part_derivative2 = (f(w1, w2 + eps) - f(w1, w2)) / eps

# We can do the same thing using autodiff(but it will be more suitable for large amount of parameters)
w1, w2 = tf.Variable(5.), tf.Variable(3.)
with tf.GradientTape() as tape: # This custom gradient will automatically record all the operations done in the following context block
    z = f(w1, w2)

gradients = tape.gradient(z, [w1, w2])

# Let's compare the results of both and demonstrate that they do the same thing
print(f"r1 = {part_derivative1}, r2 = {part_derivative2}")
print(gradients)

Note that using the gradient automatically erases it once we call its gradients method. To be able to use it multiple time we have to use the _persistent_
argument(we have to delete once we dont need it anymore to free ressources).

In [None]:
with tf.GradientTape(persistent=True) as tape:
    z = f(w1, w2)

g1 = tape.gradient(z, w1)
g2 = tape.gradient(z, w2)

del tape # To delete it

### Custom training loop

Lets try to build our custom _fit_ method.

In [None]:
# Let's build a simple model to use as an example
l2_reg = tf.keras.regularizers.l2(0.05)
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(30, activation="relu", kernel_initializer="he_normal", kernel_regularizer=l2_reg),
    tf.keras.layers.Dense(1, kernel_regularizer=l2_reg)
])

def random_batch(X, y, batch_size=32):
    idx = np.random.randint(len(X), size=batch_size)
    return X[idx], y[idx]

def print_status_bar(step, total, loss, metrics=None):
    metrics = " - ".join([f"{m.name}: {m.result():.4f}" for m in [loss] + (metrics or [])])
    end = "" if step < total else "\n"
    print(f"\r{step}/{total} - " + metrics, end=end)

X_train = []
n_epochs = 5
batch_size = 32
n_steps = len(X_train) // batch_size
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)
loss_fn = tf.keras.losses.mean_squared_error
mean_loss = tf.keras.metrics.Mean(name="mean_loss")
metrics = [tf.keras.metrics.MeanAbsoluteError()]

# And here is the custom training loop
for epoch in range(1, n_epochs + 1):
    print(f"Epoch {epoch}/{n_epochs}")
    for step in range(1, n_steps + 1):
        X_batch, y_batch = random_batch(X_train_scaled, y_train)
        with tf.GradientTape as tape:
            y_pred = model(X_batch, training=True)
            main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
            loss = tf.add_n([main_loss] + model.losses)

        gradients = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        mean_loss(loss)
        for metric in metrics:
            metric(y_batch, y_pred)

        print_status_bar(step, n_steps, mean_loss, metrics)

    for metric in [mean_loss] + metrics:
        metrics.reset_states()