## **Making New Layers and Models via Subclassing**

#### **Setup**

In [1]:
import tensorflow as tf
from tensorflow import keras

#### **The `Layer` Class: combination of state(weights) and computation**
###### **A layer encapsulates both a state(layer's weight) and a transformation from inputs to outputs. Let's have a look a densely-connected layer. It has a state: the variables `w` and `b`.**

In [2]:
class Linear(keras.layers.Layer):
    def __init__(self, units=32, input_dim=32):
        super(Linear, self).__init__()
        w_init = tf.random_normal_initializer()
        self.w = tf.Variable(
            initial_value = w_init(shape=(input_dim, units), dtype="float32"),
            trainable = True
        )
        b_init = tf.zeros_initializer()
        self.b = tf.Variable(
            initial_value = b_init(shape=(units,), dtype="float32"),
            trainable = True
        )
    def call(self, inputs):
        return(tf.matmul(inputs, self.w) + self.b)

###### **We could use the layer by calling it on some tensor input(s):**

In [4]:
x = tf.ones((2,2))
linear_layer = Linear(4, 2)     # The column size(in this case 2), must be equal to the row size of x)
y = linear_layer(x)
print(y)

tf.Tensor(
[[-0.04570365 -0.03962987 -0.03884789  0.09662069]
 [-0.04570365 -0.03962987 -0.03884789  0.09662069]], shape=(2, 4), dtype=float32)


###### **[NB] The weights `w` and `b` can be automatically tracked by the layer being set as layer attributes:**

In [5]:
assert linear_layer.weights == [linear_layer.w, linear_layer.b]

###### **We can also add weights to a layer using the `add_weights()` method.**

In [6]:
class Linear(keras.layers.Layer):
    def __init__(self, units=32, input_dim=32):
        super(Linear, self).__init__()
        self.w = self.add_weight(
            shape=(input_dim, units),
            initializer="random_normal",
            trainable=True
        )
        self.b = self.add_weight(shape=(units,),
            initializer="zeros",
            trainable=True
        )
    def call(self, inputs):
        return(tf.matmul(inputs, self.w) + self.b)

x = tf.ones((2,2))
linear_layer = Linear(4, 2)     # The column size(in this case 2), must be equal to the row size of x)
y = linear_layer(x)
print(y)

tf.Tensor(
[[0.08698852 0.05305072 0.04511479 0.02427255]
 [0.08698852 0.05305072 0.04511479 0.02427255]], shape=(2, 4), dtype=float32)


#### **Layers can have Non-Trainable Weights**
###### **Besides trainable weights, we can add non-trainable weights to a layer as well. These weights are meant not to be taken into account during backpropagation, when we are training the layer.<br>Here's how we can add and use a non-trainable weights:**

In [12]:
class ComputeSum(keras.layers.Layer):
    def __init__(self, input_dim):
        super(ComputeSum, self).__init__()
        self.total = tf.Variable(initial_value=tf.zeros((input_dim,)), trainable=False)
    
    def call(self, inputs):
        self.total.assign_add(tf.reduce_sum(inputs, axis=0))
        return(self.total)

x = tf.ones((2,2))
my_sum = ComputeSum(2)
y = my_sum(x)
print(y.numpy())
y = my_sum(x)
print(y.numpy())

[2. 2.]
[4. 4.]


###### **It's part of `layer.weights`, but it gets categorized as a non-trainable weight:**

In [13]:
print("weights:", len(my_sum.weights))
print("non-trainable weights:", len(my_sum.non_trainable_weights))
print("trainable weights:", len(my_sum.trainable_weights))

weights: 1
non-trainable weights: 1
trainable weights: 0


#### **Unknown Input: Deferring(delay/postpone) weight creation until the shape of the inputs is known**
###### **Our `Linear` layer above took an `input_dim` arguments that was used to compute the shape of the weights `w` and `b` in `__init__()`.<br>In many cases, we may not know in advance the size of the inputs and we would like to lazily create weights when that value becomes known, some time after instantiating the layer.<br>In the Keras API, it recommends creating layer weights in the `build(self, inputs_shape)` method of the layer:**

In [3]:
class Linear(keras.layers.Layer):
    def __init__(self, units=32):
        super(Linear, self).__init__()
        self.units = units
    
    def build(self, input_shape):
        self.w = self.add_weight(
            shape = (input_shape[-1], self.units),      # ???
            initializer = "random_normal",
            trainable = True
        )
        self.b = self.add_weight(
            shape = (self.units,),
            initializer = "random_normal",
            trainable = True
        )
    
    def call(self, inputs):
        return(tf.matmul(inputs, self.w) + self.b)

###### **The `__call__()` method of our layer will automatically run `build()` the first time it is called. We now have a lazy layer and thus easier to use.**

In [5]:
x = tf.ones((2,2))
# At instantiation, we don't know on what inputs this is going to be called
linear_layer = Linear(32)
# The Layer's weights are created dynamically the first time the layer is called
y = linear_layer(x)
print(y)

tf.Tensor(
[[-0.04915261 -0.01473014 -0.06646204 -0.03587055 -0.11498079  0.0668017
   0.07341449 -0.04534969  0.23225236 -0.0580742   0.11638497  0.15088987
   0.08120204  0.0711768   0.05008749  0.10539781  0.03635679  0.16741621
   0.01297842 -0.1055408   0.00357105  0.06374896 -0.0660508  -0.0512604
  -0.00384466 -0.14281851  0.01928046  0.02773319 -0.01837889 -0.03710185
   0.06472108  0.04360595]
 [-0.04915261 -0.01473014 -0.06646204 -0.03587055 -0.11498079  0.0668017
   0.07341449 -0.04534969  0.23225236 -0.0580742   0.11638497  0.15088987
   0.08120204  0.0711768   0.05008749  0.10539781  0.03635679  0.16741621
   0.01297842 -0.1055408   0.00357105  0.06374896 -0.0660508  -0.0512604
  -0.00384466 -0.14281851  0.01928046  0.02773319 -0.01837889 -0.03710185
   0.06472108  0.04360595]], shape=(2, 32), dtype=float32)


###### **Implementing `build()` separately as shown above nicely seperates creating weights only once from using weights in every call. Layer implementers are allowed to defer weight creation to the first `__call__()`, but need to take care that, later calls use the same weights. In addition, since `__call__()` is likely to be executed for the first time inside a `tf.function`, any variable creation that takes place in `__call__()` should be wrapped in a `tf.init_scope`.**

#### **Layers are Recursively Composable(Writeable)**
###### **If we assign a layer instance as an attribute of another layer, the outer layer will start tracking the weights created by the inner layer. Keras recommend creating such subllayers in the `__init__()` method and leave it to the first `__call__()` to trigger building their weights.**

In [6]:
class MLPBlock(keras.layers.Layer):
    def __init__(self):
        super(MLPBlock, self).__init__()
        self.linear1 = Linear(32)
        self.linear2 = Linear(32)
        self.linear3 = Linear(1)
    def call(self, inputs):
        x = self.linear1(inputs)
        x = tf.nn.relu(x)
        x = self.linear2(x)
        x = tf.nn.relu(x)
        return(self.linear3(x))
    
mlp = MLPBlock()
y = mlp(tf.ones(shape=(3,64)))      # The first call to the `mlp` will create the weights
print("weights:", len(mlp.weights))
print("trainable_weights:", len(mlp.trainable_weights))

weights: 6
trainable_weights: 6


#### **The `add_loss()` Method**
###### **While writing the call method, we can create loss tensors. That loss tensors, we will want to use later while writing our training loop. This is doable by calling `self.add_loss(value)`.**

In [7]:
# A layer that creates an activity regularixation loss
class ActivityRegularizationLayer(keras.layers.Layer):
    def __init__(self, rate=1e-2):
        super(ActivityRegularizationLayer, self).__init__()
        self.rate = rate
    def call(self, inputs):
        self.add_loss(self.rate * tf.reduce_sum(inputs))
        return(inputs)

###### **These losses(including those created by any inner layer) can be retrived via `layer.losses`. This property is reset at the start of every `__call__()` to the top-level layer,**