## **Making New Layers and Models via Subclassing**

#### **Setup**

In [1]:
import tensorflow as tf
from tensorflow import keras

#### **The `Layer` Class: combination of state(weights) and computation**
###### **A layer encapsulates both a state(layer's weight) and a transformation from inputs to outputs. Let's have a look a densely-connected layer. It has a state: the variables `w` and `b`.**

In [2]:
class Linear(keras.layers.Layer):
    def __init__(self, units=32, input_dim=32):
        super(Linear, self).__init__()
        w_init = tf.random_normal_initializer()
        self.w = tf.Variable(
            initial_value = w_init(shape=(input_dim, units), dtype="float32"),
            trainable = True
        )
        b_init = tf.zeros_initializer()
        self.b = tf.Variable(
            initial_value = b_init(shape=(units,), dtype="float32"),
            trainable = True
        )
    def call(self, inputs):
        return(tf.matmul(inputs, self.w) + self.b)

###### **We could use the layer by calling it on some tensor input(s):**

In [3]:
x = tf.ones((2,2))
linear_layer = Linear(4, 2)     # The column size(in this case 2), must be equal to the row size of x)
y = linear_layer(x)
print(y)

tf.Tensor(
[[0.0422582  0.08816054 0.14377819 0.01861975]
 [0.0422582  0.08816054 0.14377819 0.01861975]], shape=(2, 4), dtype=float32)


###### **[NB] The weights `w` and `b` can be automatically tracked by the layer being set as layer attributes:**

In [4]:
assert linear_layer.weights == [linear_layer.w, linear_layer.b]

###### **We can also add weights to a layer using the `add_weights()` method.**

In [5]:
class Linear(keras.layers.Layer):
    def __init__(self, units=32, input_dim=32):
        super(Linear, self).__init__()
        self.w = self.add_weight(
            shape=(input_dim, units),
            initializer="random_normal",
            trainable=True
        )
        self.b = self.add_weight(shape=(units,),
            initializer="zeros",
            trainable=True
        )
    def call(self, inputs):
        return(tf.matmul(inputs, self.w) + self.b)

x = tf.ones((2,2))
linear_layer = Linear(4, 2)     # The column size(in this case 2), must be equal to the row size of x)
y = linear_layer(x)
print(y)

tf.Tensor(
[[-0.03163691  0.03956166  0.02934867 -0.03036006]
 [-0.03163691  0.03956166  0.02934867 -0.03036006]], shape=(2, 4), dtype=float32)


#### **Layers can have Non-Trainable Weights**
###### **Besides trainable weights, we can add non-trainable weights to a layer as well. These weights are meant not to be taken into account during backpropagation, when we are training the layer.<br>Here's how we can add and use a non-trainable weights:**

In [6]:
class ComputeSum(keras.layers.Layer):
    def __init__(self, input_dim):
        super(ComputeSum, self).__init__()
        self.total = tf.Variable(initial_value=tf.zeros((input_dim,)), trainable=False)
    
    def call(self, inputs):
        self.total.assign_add(tf.reduce_sum(inputs, axis=0))
        return(self.total)

x = tf.ones((2,2))
my_sum = ComputeSum(2)
y = my_sum(x)
print(y.numpy())
y = my_sum(x)
print(y.numpy())

[2. 2.]
[4. 4.]


###### **It's part of `layer.weights`, but it gets categorized as a non-trainable weight:**

In [7]:
print("weights:", len(my_sum.weights))
print("non-trainable weights:", len(my_sum.non_trainable_weights))
print("trainable weights:", len(my_sum.trainable_weights))

weights: 1
non-trainable weights: 1
trainable weights: 0


#### **Unknown Input: Deferring(delay/postpone) weight creation until the shape of the inputs is known**
###### **Our `Linear` layer above took an `input_dim` arguments that was used to compute the shape of the weights `w` and `b` in `__init__()`.<br>In many cases, we may not know in advance the size of the inputs and we would like to lazily create weights when that value becomes known, some time after instantiating the layer.<br>In the Keras API, it recommends creating layer weights in the `build(self, inputs_shape)` method of the layer:**

In [8]:
class Linear(keras.layers.Layer):
    def __init__(self, units=32):
        super(Linear, self).__init__()
        self.units = units
    
    def build(self, input_shape):
        self.w = self.add_weight(
            shape = (input_shape[-1], self.units),      # ???
            initializer = "random_normal",
            trainable = True
        )
        self.b = self.add_weight(
            shape = (self.units,),
            initializer = "random_normal",
            trainable = True
        )
    
    def call(self, inputs):
        return(tf.matmul(inputs, self.w) + self.b)

###### **The `__call__()` method of our layer will automatically run `build()` the first time it is called. We now have a lazy layer and thus easier to use.**

In [9]:
x = tf.ones((2,2))
# At instantiation, we don't know on what inputs this is going to be called
linear_layer = Linear(32)
# The Layer's weights are created dynamically the first time the layer is called
y = linear_layer(x)
print(y)

tf.Tensor(
[[-2.1580750e-01 -2.6077479e-02  1.0627230e-01  4.0894076e-03
   7.1341395e-03 -3.3395611e-02 -2.9900081e-02  7.6262727e-02
   6.8185702e-02 -3.1623989e-05  2.1464849e-01  7.7859588e-02
  -2.9782683e-02 -9.2187814e-02 -1.3099876e-01 -7.7789560e-02
  -8.0633752e-02 -4.1292630e-02 -8.5670874e-04  6.6575110e-02
  -2.6221806e-02 -3.0079255e-02  5.0033852e-03  1.9486625e-01
   1.8109690e-01 -1.1464104e-01 -3.3258304e-02  9.4418898e-03
  -2.6513293e-02 -3.2187860e-02  7.2351530e-02 -3.7395194e-02]
 [-2.1580750e-01 -2.6077479e-02  1.0627230e-01  4.0894076e-03
   7.1341395e-03 -3.3395611e-02 -2.9900081e-02  7.6262727e-02
   6.8185702e-02 -3.1623989e-05  2.1464849e-01  7.7859588e-02
  -2.9782683e-02 -9.2187814e-02 -1.3099876e-01 -7.7789560e-02
  -8.0633752e-02 -4.1292630e-02 -8.5670874e-04  6.6575110e-02
  -2.6221806e-02 -3.0079255e-02  5.0033852e-03  1.9486625e-01
   1.8109690e-01 -1.1464104e-01 -3.3258304e-02  9.4418898e-03
  -2.6513293e-02 -3.2187860e-02  7.2351530e-02 -3.7395194e

###### **Implementing `build()` separately as shown above nicely seperates creating weights only once from using weights in every call. Layer implementers are allowed to defer weight creation to the first `__call__()`, but need to take care that, later calls use the same weights. In addition, since `__call__()` is likely to be executed for the first time inside a `tf.function`, any variable creation that takes place in `__call__()` should be wrapped in a `tf.init_scope`.**

#### **Layers are Recursively Composable(Writeable)**
###### **If we assign a layer instance as an attribute of another layer, the outer layer will start tracking the weights created by the inner layer. Keras recommend creating such sublayers in the `__init__()` method and leave it to the first `__call__()` to trigger building their weights.**

In [10]:
class MLPBlock(keras.layers.Layer):
    def __init__(self):
        super(MLPBlock, self).__init__()
        self.linear1 = Linear(32)
        self.linear2 = Linear(32)
        self.linear3 = Linear(1)
    def call(self, inputs):
        x = self.linear1(inputs)
        x = tf.nn.relu(x)
        x = self.linear2(x)
        x = tf.nn.relu(x)
        return(self.linear3(x))
    
mlp = MLPBlock()
y = mlp(tf.ones(shape=(3,64)))      # The first call to the `mlp` will create the weights
print("weights:", len(mlp.weights))
print("trainable_weights:", len(mlp.trainable_weights))

weights: 6
trainable_weights: 6


#### **The `add_loss()` Method**
###### **While writing the call method, we can create loss tensors. That loss tensors, we will want to use later while writing our training loop. This is doable by calling `self.add_loss(value)`.**

In [11]:
# A layer that creates an activity regularixation loss
class ActivityRegularizationLayer(keras.layers.Layer):
    def __init__(self, rate=1e-2):
        super(ActivityRegularizationLayer, self).__init__()
        self.rate = rate
    def call(self, inputs):
        self.add_loss(self.rate * tf.reduce_sum(inputs))
        return(inputs)

###### **These losses(including those created by any inner layer) can be retrived via `layer.losses`. This property is reset at the start of every `__call__()` to the top-level layer, so that `layer,losses` always contains the loss values created during the last forward pass.**

In [12]:
class OuterLayer(keras.layers.Layer):
    def __init__(self):
        super(OuterLayer, self).__init__()
        self.activity_reg = ActivityRegularizationLayer(1e-2)
    def call(self, inputs):
        return(self.activity_reg(inputs))

layer = OuterLayer()
assert len(layer.losses) == 0       # No losses yet since the layer has never been called

_ = layer(tf.zeros(1, 1))
assert len(layer.losses) == 1       # We created one loss value

# `layer.losses` gets reset at the start of each __call__
_ = layer(tf.zeros(1, 1))
assert len(layer.losses) == 1       # This is the loss created during the call above

###### **In addition, the `loss` property also contains regularization losses created for the weights of any inner layer:**

In [13]:
class OuterLayerWithKernelRegularizer(keras.layers.Layer):
    def __init__(self):
        super(OuterLayerWithKernelRegularizer, self).__init__()
        self.dense = keras.layers.Dense(32, kernel_regularizer = tf.keras.regularizers.L2(1e-3))
    def call(self, inputs):
        return(self.dense(inputs))

layer = OuterLayerWithKernelRegularizer()
_ = layer(tf.zeros((1,1)))

# kernel_regularizer uses this formula "1e-3 * sum(layer.dense.kernel ** 2)"
print(layer.losses)

[<tf.Tensor: shape=(), dtype=float32, numpy=0.0016965924>]


###### **These losses are meant to be taken when writing training loops:**

In [None]:
# Instantiate an optimizer
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model = keras.Model(optimizer, loss_fn)

# Iterate over the batches of a dateset
for x_batch_train, y_batch_train in train_dataset:
    with tf.GradientTape() as tape:
        logits = layer(x_batch_train)       # Logits for this minibatch
        loss_value = loss_fn(y_batch_train, logits)     # Loss value for this minibatch
        # Add the extra losses created during this forward pass:
        loss_value += sum(model.losses)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))

###### **These losses also work seemlessly with `fit()` (they get automatically summed and added to the main loss, if any):**

In [14]:
import numpy as np

inputs = keras.Input(shape=(3,))
outputs = ActivityRegularizationLayer()(inputs)
model = keras.Model(inputs, outputs)

# If there is a loss passed in `compile`, the regularization losses get added to it
model.compile(optimizer="adam", loss="mse")
model.fit(np.random.random((2,3)), np.random.random((2,3)))

# It's also possible not to pass any loss in `compile`, since the model already has a loss to minimize, via the `add_loss` call during the forward pass.
model.compile(optimizer="adam")
model.fit(np.random.random((2, 3)), np.random.random((2, 3)))



<keras.callbacks.History at 0x255dd32ef10>

#### **The `add_metric()` Method**
###### **Like `add_loss()`, there also has the `add_metric()` method- used for tracking the moving average of a quantity during training.<br>Consider a layer: a `logistic endpoint` layer - takes predictions and targets as input, computes the loss tracked via `add_loss()`, and then computes an accuracy scalar, which is tracks via `add_metric()`.**

In [16]:
class LogisticEndpoint(keras.layers.Layer):
    def __init__(self, name=None):
        super(LogisticEndpoint, self).__init__(name=name)
        self.loss_fn = keras.losses.BinaryCrossentropy(from_logits=True)
        self.accuracy_fn = keras.metrics.BinaryAccuracy()
    
    def call(self, targets, logits, sample_weights=None):
        # Compute the training-time loss value and add it to the layer using `self.add_loss()`
        loss = self.loss_fn(targets, logits, sample_weights)
        self.add_loss(loss)
        # Copute the log accuracy as ametric and add it to the layer using `self.add_metric()`
        acc = self.accuracy_fn(targets, logits, sample_weights)
        self.add_metric(acc, name="accuracy")
    
        # Return the inference-time prediction temsor (for `.prediction()`)
        return(tf.nn.softmax(logits))

###### **Metrics tracked in this way are accessible via `layer.metrics`:**

In [20]:
layer = LogisticEndpoint()

targets = tf.ones((2,2))
logits = tf.ones((2,2))
y = layer(targets, logits)

print("layer_metrics:", layer.metrics)
print("current_accuracy_value", float(layer.metrics[0].result()))

layer_metrics: [<keras.metrics.BinaryAccuracy object at 0x00000255DD32E880>]
current_accuracy_value 1.0


###### **Just like `add_loss()`, these metrics are tracked by `fit()`:**

In [25]:
from turtle import shape


inputs = keras.Input(shape=(3,), name="inputs")
targets = keras.Input(shape=(10,), name="targets")
logits = keras.layers.Dense(10)(inputs)
predictions = LogisticEndpoint(name="predictions")(logits, targets)

model = keras.Model(inputs=[inputs, targets], outputs=predictions)
model.compile(optimizer="adam")

data = {
    "inputs": np.random.random((3,3)),
    "targets": np.random.random((3,10)),
}
model.fit(data)



<keras.callbacks.History at 0x255dfafc7c0>

#### **We can Optionally Enable Serialization on our Layers**
###### **If we need our custom layers to be serializable as part of a `Functional Model`, we can optionally implement a `get_onfig()` method:**