# Making New Layers and Models via Subclassing

## Learning Objectives

* Use Layer class as the combination of state (weights) and computation.
* Defer weight creation until the shape of the inputs is known.
* Build recursively composable layers.
* Compute loss using add_loss() method.
* Compute average using add_metric() method.
* Enable serialization on layers.

## Introduction

This tutorial shows how to build new layers and models via [subclassing](https://towardsdatascience.com/model-sub-classing-and-custom-training-loop-from-scratch-in-tensorflow-2-cc1d4f10fb4e).
__Subclassing__ is a term that refers inheriting properties for a new object from a base or superclass object.

Each learning objective will correspond to a __#TODO__ in the [student lab notebook](../labs/custom_layers_and_models.ipynb) -- try to complete that notebook first before reviewing this solution notebook. 


## Setup

In [4]:
# Import necessary libraries
import tensorflow as tf
from tensorflow import keras

## The `Layer` class: the combination of state (weights) and some computation

One of the central abstraction in Keras is the `Layer` class. A layer
encapsulates both a state (the layer's "weights") and a transformation from
inputs to outputs (a "call", the layer's forward pass).

Here's a densely-connected layer. It has a state: the variables `w` and `b`.

In [5]:
# Define a Linear class
# A class called 'Linear' is defined which inherits from 'keras.layers.Layer'.
# This means that 'Linear' is a custom layer type that can be used in a Keras model.
class Linear(keras.layers.Layer):
    # The '__init__' method is the constructor of the class and is called when an instance of 'Linear' is created.
    # 'units' specifies the number of units (neurons) in the layer.
    # 'input_dim' specifies the input dimension.
    def __init__(self, units=32, input_dim=32):
        super(Linear, self).__init__()
        w_init = tf.random_normal_initializer() # 'w_init' is a weight initializer that initializes weights with random values according to a normal distribution.
        # 'self.w' is a TensorFlow variable representing the layer weights.
        # It is initialized with an array of size '(input_dim, units)' and type 'float32'.
        # It is a trainable variable, which means that it will be updated during model training.
        self.w = tf.Variable(
            initial_value=w_init(shape=(input_dim, units), dtype="float32"),
            trainable=True,
        )
        # b_init is a bias initializer that initializes the biases with zeros.
        # self.b is a TensorFlow variable that represents the layer biases.
        # It is initialized with a vector of size (units,) and type float32. It is also a trainable variable.
        b_init = tf.zeros_initializer()
        self.b = tf.Variable(
            initial_value=b_init(shape=(units,), dtype="float32"), trainable=True
        )

    # The call method is defined to specify the calculation logic of the layer.
    # inputs are the inputs to the layer.
    # tf.matmul(inputs, self.w) performs a matrix multiplication between inputs and weights.
    # + self.b adds the bias to each output.
    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b


You would use a layer by calling it on some tensor input(s), much like a Python
function.

In [6]:
# A TensorFlow tensor 'x' is created in the form (2, 2), which means that it is a matrix of 2 rows and 2 columns.
x = tf.ones((2, 2))

# An instance of the 'Linear' class is created with 'units=4' and 'input_dim=2'.
# This means that the layer will have 4 neurons and expects inputs of dimension 2.
linear_layer = Linear(4, 2)

# 'y' will be the result of the linear layer operation applied on 'x'.
y = linear_layer(x)
print(y)

tf.Tensor(
[[ 0.0819068  -0.12334651 -0.05264889  0.00625704]
 [ 0.0819068  -0.12334651 -0.05264889  0.00625704]], shape=(2, 4), dtype=float32)


Note that the weights `w` and `b` are automatically tracked by the layer upon
being set as layer attributes:

In [7]:
# Use the 'assert' statement to check if 'linear_layer.weights' is equal to '[linear_layer.w, linear_layer.b]'.
# This ensures that the 'weights' variables of the 'linear_layer' layer exactly match the 'w' (weights) and 'b' (biases) variables
# we have defined within the 'Linear' class.
assert linear_layer.weights == [linear_layer.w, linear_layer.b]

Note you also have access to a quicker shortcut for adding weight to a layer:
the `add_weight()` method:

In [8]:
# Use `add_weight()` method for adding weight to a layer
class Linear(keras.layers.Layer):
    def __init__(self, units=32, input_dim=32):
        super(Linear, self).__init__() # calls the constructor of the base class 'keras.layers.Layer'
        # is used to add weight variables (w and b) to the layer
        # represents the weights and are randomly initialized according to a normal distribution using the initializer 'random_normal'
        self.w = self.add_weight(
            shape=(input_dim, units), initializer="random_normal", trainable=True
        )
        # represents the biases and are initialized with zeros using the initializer 'zeros'.
        self.b = self.add_weight(shape=(units,), initializer="zeros", trainable=True)

    # Defines the operation of the layer during forward propagation.
    def call(self, inputs):
        # inputs are the inputs to the layer.
        # 'tf.matmul(inputs, self.w)' performs the matrix multiplication of the inputs by the weights.
        # '+ self.b' adds the biases to the output of the matrix multiplication.
        return tf.matmul(inputs, self.w) + self.b

# Create an input tensor 'x' with all elements initialized to one and of form '(2, 2)'.
x = tf.ones((2, 2))
# Instantiate a 'linear_layer' object of class 'Linear' with 'units=4' and 'input_dim=2'
linear_layer = Linear(4, 2)
# The 'call' method of the 'linear_layer' object with 'x' as input, which applies the linear layer on 'x'.
y = linear_layer(x)
# The result 'y' is the output of the linear layer after applying it on 'x'.
print(y)

tf.Tensor(
[[-0.04942621  0.02780763 -0.10388044 -0.02482288]
 [-0.04942621  0.02780763 -0.10388044 -0.02482288]], shape=(2, 4), dtype=float32)


## Layers can have non-trainable weights

Besides trainable weights, you can add non-trainable weights to a layer as
well. Such weights are meant not to be taken into account during
backpropagation, when you are training the layer.

Here's how to add and use a non-trainable weight:

In [9]:
# Add and use a non-trainable weight
class ComputeSum(keras.layers.Layer):
    def __init__(self, input_dim):
        super(ComputeSum, self).__init__() # calls the constructor of the base class 'keras.layers.Layer'

        # 'tf.Variable(initial_value=tf.zeros((input_dim,)), trainable=False)' initializes self.total with a tensor of zeros of form '(input_dim,)'
        self.total = tf.Variable(initial_value=tf.zeros((input_dim,)), trainable=False) # TensorFlow variable representing the cumulative sum.

    def call(self, inputs):
        # 'tf.reduce_sum(inputs, axis=0)' calculates the sum along axis 0 (rows) of the inputs.
        # 'self.total.assign_add(tf.reduce_sum(inputs, axis=0))' adds the calculated sum to self.total.
        self.total.assign_add(tf.reduce_sum(inputs, axis=0))
        return self.total

# An input tensor 'x' is created with all elements initialized to one and of form '(2, 2)'.
x = tf.ones((2, 2))

# Instantiate an object 'my_sum' of class 'ComputeSum' with 'input_dim=2'
my_sum = ComputeSum(2)

# Call the 'call' method of the 'my_sum' object with 'x' as input, which adds the sum of 'x' to 'my_sum.total' and returns the accumulated value
y = my_sum(x)

# 'print(y.numpy())' prints the accumulated value after the first call
print(y.numpy())

# A second call is made to 'my_sum(x)', which adds the sum of 'x' to 'my_sum.total' again
y = my_sum(x)

# 'print(y.numpy())' prints the accumulated value after the second call
print(y.numpy())

[2. 2.]
[4. 4.]


It's part of `layer.weights`, but it gets categorized as a non-trainable weight:

In [10]:
print("weights:", len(my_sum.weights)) # returns a list of all weight variables within the layer 'my_sum'.

# 'my_sum.non_trainable_weights' returns a list of all weight variables that are not trainable within the 'my_sum' layer.
print("non-trainable weights:", len(my_sum.non_trainable_weights)) # prints the number of weight variables in 'my_sum'.

# It's not included in the trainable weights:
print("trainable_weights:", my_sum.trainable_weights) # returns a list of all weight variables that are trainable within the layer 'my_sum'.

weights: 1
non-trainable weights: 1
trainable_weights: []


## Best practice: deferring weight creation until the shape of the inputs is known

Our `Linear` layer above took an `input_dim` argument that was used to compute
the shape of the weights `w` and `b` in `__init__()`:

In [11]:
# This code defines a custom linear layer Linear using the TensorFlow/Keras functional API.
class Linear(keras.layers.Layer): # This is the constructor of the 'Linear' class.
    def __init__(self, units=32, input_dim=32):
        super(Linear, self).__init__() # 'super(Linear, self).__init__()' calls the constructor of the base class 'keras.layers.Layer'.
        
        # 'self.w' represents the weights and is initialized randomly using the 'random_normal' initializer.
        self.w = self.add_weight( # 'self.add_weight()' is used to add weight variables ('w' and 'b') to the layer.
            shape=(input_dim, units), initializer="random_normal", trainable=True
        )
        # 'self.b' represents the biases and is initialized with zeros using the 'zeros' initializer.
        # Both variables ('self.w' and 'self.b') are marked as trainable ('trainable=True')
        self.b = self.add_weight(shape=(units,), initializer="zeros", trainable=True)

    # Defines the operation of the layer during forward propagation.
    def call(self, inputs): # 'inputs' are the inputs to the layer.
        
        # 'tf.matmul(inputs, self.w)' performs matrix multiplication of the inputs by the weights.
        return tf.matmul(inputs, self.w) + self.b # '+ self.b' adds the biases to the output of the matrix multiplication.


In many cases, you may not know in advance the size of your inputs, and you
would like to lazily create weights when that value becomes known, some time
after instantiating the layer.

In the Keras API, we recommend creating layer weights in the `build(self, input_shape)` method of your layer. Like this:

In [12]:
# TODO
class Linear(keras.layers.Layer):
    # Initializes the Linear class with a parameter units, which specifies the number of output units (neurons) for the layer
    def __init__(self, units=32):
        super(Linear, self).__init__() # calls the constructor of the base class 'keras.layers.Layer'
        self.units = units # stores the number of output units for the layer

    # This method is called when the layer is built, typically when it is first used or when 'model.build()' is called
    def build(self, input_shape): # is the shape of the input tensor that the layer will receive during execution
        self.w = self.add_weight( # is used to create trainable variables ('w' for weights and 'b' for biases) for the layer.
            shape=(input_shape[-1], self.units), # specifies the shape of 'self.w' based on the last dimension of the input shape and 'self.units'
            initializer="random_normal", # initializes both 'self.w' and 'self.b' with random values from a normal distribution.
            trainable=True, # marks both 'self.w' and 'self.b' as trainable variables
        )
        self.b = self.add_weight(
            shape=(self.units,), initializer="random_normal", trainable=True
        )

    # Defines the computation performed by the layer during forward pass (inference)
    def call(self, inputs):
        # 'tf.matmul(inputs, self.w)' computes the matrix multiplication of the inputs and 'self.w'
        return tf.matmul(inputs, self.w) + self.b # '+ self.b' adds the biases 'self.b' to the result of the matrix multiplication


The `__call__()` method of your layer will automatically run build the first time
it is called. You now have a layer that's lazy and thus easier to use:

In [13]:
# At instantiation, we don't know on what inputs this is going to get called
linear_layer = Linear(32)

# The layer's weights are created dynamically the first time the layer is called
y = linear_layer(x)

## Layers are recursively composable

If you assign a Layer instance as an attribute of another Layer, the outer layer
will start tracking the weights of the inner layer.

We recommend creating such sublayers in the `__init__()` method (since the
sublayers will typically have a build method, they will be built when the
outer layer gets built).

In [14]:
# TODO
# Let's assume we are reusing the Linear class
# with a `build` method that we defined above.


class MLPBlock(keras.layers.Layer):
    # Initializes the 'MLPBlock' class by calling the constructor of the base class 'keras.layers.Layer'
    def __init__(self):
        super(MLPBlock, self).__init__()
        self.linear_1 = Linear(32) # self.linear_1 with 32 output units.
        self.linear_2 = Linear(32) # self.linear_2 with 32 output units.
        self.linear_3 = Linear(1) # self.linear_3 with 1 output units.

    def call(self, inputs):
        x = self.linear_1(inputs) # Passes the inputs through self.linear_1, followed by applying the ReLU activation function
        x = tf.nn.relu(x)
        x = self.linear_2(x) # Passes the result through self.linear_2, followed by applying the ReLU activation function
        x = tf.nn.relu(x)
        return self.linear_3(x) # Passes the result through self.linear_3 and returns the output


mlp = MLPBlock()
# Calls the instance with an input tensor of shape (3, 64), which triggers the forward pass and initializes the weights for the first time
y = mlp(tf.ones(shape=(3, 64)))  # The first call to the `mlp` will create the weights

# 'mlp.weights' returns a list of all weights in the 'mlp' block.
# 'len(mlp.weights)' prints the number of weights in the 'mlp' block.
print("weights:", len(mlp.weights))

# 'mlp.trainable_weights' returns a list of all trainable weights in the 'mlp' block.
# 'len(mlp.trainable_weights)' prints the number of trainable weights in the 'mlp' block
print("trainable weights:", len(mlp.trainable_weights))

weights: 6
trainable weights: 6


## The `add_loss()` method

When writing the `call()` method of a layer, you can create loss tensors that
you will want to use later, when writing your training loop. This is doable by
calling `self.add_loss(value)`:

In [15]:
# A layer that creates an activity regularization loss
class ActivityRegularizationLayer(keras.layers.Layer):
    def __init__(self, rate=1e-2): # Initializes the 'ActivityRegularizationLayer' class with a parameter 'rate', which specifies the regularization rate
        # 'super(ActivityRegularizationLayer, self).__init__()' calls the constructor of the base class 'keras.layers.Layer'
        super(ActivityRegularizationLayer, self).__init__()
        self.rate = rate # 'self.rate' stores the regularization rate, defaulting to '1e-2'

    # Defines the computation performed by the layer during forward pass (inference)
    def call(self, inputs):
        
        self.add_loss(self.rate * tf.reduce_sum(inputs)) # adds the computed regularization loss term to the overall loss of the model
        return inputs


These losses (including those created by any inner layer) can be retrieved via
`layer.losses`. This property is reset at the start of every `__call__()` to
the top-level layer, so that `layer.losses` always contains the loss values
created during the last forward pass.

In [16]:
# TODO
class OuterLayer(keras.layers.Layer): # Initializes the OuterLayer class
    def __init__(self):
        super(OuterLayer, self).__init__() # Calls the constructor of the base class 'keras.layers.Layer'
        self.activity_reg = ActivityRegularizationLayer(1e-2) # Creates an instance of 'ActivityRegularizationLayer' with a regularization rate of '1e-2'

    def call(self, inputs):
        # The method simply forwards the inputs to 'self.activity_reg' and returns the output of the 'ActivityRegularizationLayer'
        return self.activity_reg(inputs)


layer = OuterLayer()

# Asserts that 'layer.losses' is empty because the layer has not been called yet, so no losses have been added
assert len(layer.losses) == 0  # No losses yet since the layer has never been called

_ = layer(tf.zeros(1, 1))
assert len(layer.losses) == 1  # We created one loss value

# `layer.losses` gets reset at the start of each __call__
_ = layer(tf.zeros(1, 1))
assert len(layer.losses) == 1  # This is the loss created during the call above

In addition, the `loss` property also contains regularization losses created
for the weights of any inner layer:

In [17]:
class OuterLayerWithKernelRegularizer(keras.layers.Layer):
    def __init__(self):
        # 'super(OuterLayerWithKernelRegularizer, self).__init__()' calls the constructor of the base class 'keras.layers.Layer'
        super(OuterLayerWithKernelRegularizer, self).__init__()
        self.dense = keras.layers.Dense(
            # self.dense creates an instance of a Dense layer with 32 output units and an L2 kernel regularizer with a regularization rate of 1e-3
            32, kernel_regularizer=tf.keras.regularizers.l2(1e-3)
        )

    # Defines the computation performed by the layer during the forward pass
    def call(self, inputs): # 'inputs' are the input tensors to the layer
        return self.dense(inputs) # The method simply forwards the inputs to 'self.dense' and returns the output of the 'Dense' layer


layer = OuterLayerWithKernelRegularizer() # Creates an instance of 'OuterLayerWithKernelRegularizer'
_ = layer(tf.zeros((1, 1))) # Calls the 'layer' with a tensor of zeros with shape '(1, 1)', which triggers the forward pass

# This is `1e-3 * sum(layer.dense.kernel ** 2)`,
# created by the `kernel_regularizer` above.
print(layer.losses)

[<tf.Tensor: shape=(), dtype=float32, numpy=0.00266289>]


These losses are meant to be taken into account when writing training loops,
like this:

```python
# Instantiate an optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Iterate over the batches of a dataset.
for x_batch_train, y_batch_train in train_dataset:
  with tf.GradientTape() as tape:
    logits = layer(x_batch_train)  # Logits for this minibatch
    # Loss value for this minibatch
    loss_value = loss_fn(y_batch_train, logits)
    # Add extra losses created during this forward pass:
    loss_value += sum(model.losses)

  grads = tape.gradient(loss_value, model.trainable_weights)
  optimizer.apply_gradients(zip(grads, model.trainable_weights))
```

For a detailed guide about writing training loops, see the
[guide to writing a training loop from scratch](https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch/).

These losses also work seamlessly with `fit()` (they get automatically summed
and added to the main loss, if any):

In [18]:
import numpy as np

inputs = keras.Input(shape=(3,)) # Define the input layer with a shape of '(3,)', meaning each input sample has 3 features
outputs = ActivityRegularizationLayer()(inputs) # Pass the inputs through the 'ActivityRegularizationLayer'
model = keras.Model(inputs, outputs) # Create a Keras model that maps the 'inputs' to the outputs

# If there is a loss passed in `compile`, the regularization
# losses get added to it
model.compile(optimizer="adam", loss="mse") # The model is compiled with the Adam optimizer and mean squared error (MSE) loss function
model.fit(np.random.random((2, 3)), np.random.random((2, 3)))

# It's also possible not to pass any loss in `compile`,
# since the model already has a loss to minimize, via the `add_loss`
# call during the forward pass!
model.compile(optimizer="adam") # The model is compiled again with the Adam optimizer. This time, no loss function is specified, so the default loss function (if any) is used
model.fit(np.random.random((2, 3)), np.random.random((2, 3)))

2024-06-16 21:48:25.515679: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)




<keras.callbacks.History at 0x7fe7b4b16070>

## The `add_metric()` method

Similarly to `add_loss()`, layers also have an `add_metric()` method
for tracking the moving average of a quantity during training.

Consider the following layer: a "logistic endpoint" layer.
It takes as inputs predictions & targets, it computes a loss which it tracks
via `add_loss()`, and it computes an accuracy scalar, which it tracks via
`add_metric()`.

In [19]:
# This code defines a custom Keras layer LogisticEndpoint that computes both the loss and accuracy for a binary classification problem.
# This layer can be integrated into a model to handle the loss and accuracy calculations.
class LogisticEndpoint(keras.layers.Layer):
    def __init__(self, name=None):
        super(LogisticEndpoint, self).__init__(name=name)

        # Initializes 'self.loss_fn' as an instance of 'BinaryCrossentropy' with 'from_logits=True',
        # indicating that the logits (raw predictions) will be provided directly
        self.loss_fn = keras.losses.BinaryCrossentropy(from_logits=True)

        # Initializes 'self.accuracy_fn' as an instance of 'BinaryAccuracy', a metric to compute the accuracy of binary classification predictions
        self.accuracy_fn = keras.metrics.BinaryAccuracy()

    # 'targets': The true labels for the data,
    # 'logits': The raw predictions (logits) from the model,
    # 'sample_weights': Optional weights for each sample, default is 'None'
    def call(self, targets, logits, sample_weights=None):
        # Compute the training-time loss value and add it
        # to the layer using `self.add_loss()`.
        loss = self.loss_fn(targets, logits, sample_weights) # Computes the binary cross-entropy loss using 'self.loss_fn'
        self.add_loss(loss) # Adds this loss to the model's loss using 'self.add_loss(loss)'

        # Log accuracy as a metric and add it
        # to the layer using `self.add_metric()`.
        acc = self.accuracy_fn(targets, logits, sample_weights)
        self.add_metric(acc, name="accuracy")

        # Return the inference-time prediction tensor (for `.predict()`).
        return tf.nn.softmax(logits)


Metrics tracked in this way are accessible via `layer.metrics`:

In [20]:
# This instance will be used to compute the binary cross-entropy loss and accuracy for given targets and logits
layer = LogisticEndpoint()

targets = tf.ones((2, 2)) # These represent the true labels for two samples, each with two target values.
logits = tf.ones((2, 2)) # These represent the raw predictions from the model

#  The call method of LogisticEndpoint computes the loss and accuracy, adds them to the model's losses and metrics, and returns the softmax of the logits
y = layer(targets, logits)

print("layer.metrics:", layer.metrics)
print("current accuracy value:", float(layer.metrics[0].result()))

layer.metrics: [<keras.metrics.BinaryAccuracy object at 0x7fe7b47826d0>]
current accuracy value: 1.0


Just like for `add_loss()`, these metrics are tracked by `fit()`:

In [21]:
# This model takes two inputs: the feature inputs and the targets

inputs = keras.Input(shape=(3,), name="inputs") # Define an input layer with a shape of '(3,)' for the feature inputs
targets = keras.Input(shape=(10,), name="targets") # Define an input layer with a shape of '(10,)' for the target labels
logits = keras.layers.Dense(10)(inputs) # Add a dense layer with 10 units. This layer processes the feature inputs
# Use the LogisticEndpoint layer to compute predictions and track the loss and accuracy
predictions = LogisticEndpoint(name="predictions")(logits, targets)

model = keras.Model(inputs=[inputs, targets], outputs=predictions) # Create a Keras model with the specified inputs and outputs
model.compile(optimizer="adam") # Compile the model with the Adam optimizer

data = { # Generate random data for the inputs and targets. The inputs have a shape of '(3, 3)' and the targets have a shape of '(3, 10)'
    "inputs": np.random.random((3, 3)),
    "targets": np.random.random((3, 10)),
}
model.fit(data)



<keras.callbacks.History at 0x7fe7b4afdc40>

## You can optionally enable serialization on your layers

If you need your custom layers to be serializable as part of a
[Functional model](https://www.tensorflow.org/guide/keras/functional/), you can optionally implement a `get_config()`
method:

In [22]:

# This code defines a custom Keras layer called 'Linear' and demonstrates how to use the 'get_config and
# 'from_config' methods to serialize and deserialize the layer
class Linear(keras.layers.Layer):
    def __init__(self, units=32):
        super(Linear, self).__init__()
        self.units = units

   # This method is called once the shape of the input is known
    def build(self, input_shape):
        # Defines and initializes the weights 'w' and biases 'b' of the layer using 'add_weight'
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units), # is initialized with a random normal distribution
            initializer="random_normal",
            trainable=True,
        )
        self.b = self.add_weight(
            shape=(self.units,), initializer="random_normal", trainable=True
        )

    def call(self, inputs): # Performs the forward pass of the layer
        return tf.matmul(inputs, self.w) + self.b

    def get_config(self): # Returns a dictionary containing the configuration of the layer
        return {"units": self.units}


# You can enable serialization on your layers using `get_config()` method
# Now you can recreate the layer from its config:
layer = Linear(64) #  Creates an instance of the 'Linear' layer with 64 units
config = layer.get_config() # Calls the 'get_config' method to retrieve the configuration of the layer, which is a dictionary '{"units": 64}'
print(config)

# Creates a new instance of the 'Linear' layer from the configuration dictionary using the 'from_config' method
new_layer = Linear.from_config(config)

{'units': 64}


Note that the `__init__()` method of the base `Layer` class takes some keyword
arguments, in particular a `name` and a `dtype`. It's good practice to pass
these arguments to the parent class in `__init__()` and to include them in the
layer config:

In [23]:
# This code defines a custom Keras layer called 'Linear' and demonstrates how to use the 'get_config' and 'from_config'
# methods to serialize and deserialize the layer, including any additional keyword arguments ('kwargs')
class Linear(keras.layers.Layer):
    def __init__(self, units=32, **kwargs): # Initializes the 'Linear' layer with a specified number of units (default is 32)
        # Calls the constructor of the base class 'keras.layers.Layer' with any additional keyword arguments ('kwargs')
        super(Linear, self).__init__(**kwargs)
        self.units = units

    def build(self, input_shape):
        # Defines and initializes the weights 'w' and biases 'b' of the layer using 'add_weight'
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units), # is initialized with a random normal distribution
            initializer="random_normal",
            trainable=True,
        )
        self.b = self.add_weight(
            shape=(self.units,), initializer="random_normal", trainable=True
        )

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b # Performs the forward pass of the layer

    def get_config(self):
        # Calls the 'get_config' method of the base class and updates the returned dictionary with the 'units' parameter
        config = super(Linear, self).get_config()
        config.update({"units": self.units})
        return config


layer = Linear(64) # Creates an instance of the Linear layer with 64 units

# Calls the 'get_config' method to retrieve the configuration of the layer,
# which is a dictionary including 'units' and any other configurations from the base class
config = layer.get_config() # This method is used to return a dictionary containing the configuration of the layer.
print(config)
new_layer = Linear.from_config(config) # This class method is used to create a new instance of the layer from the configuration dictionary

{'name': 'linear_9', 'trainable': True, 'dtype': 'float32', 'units': 64}


If you need more flexibility when deserializing the layer from its config, you
can also override the `from_config()` class method. This is the base
implementation of `from_config()`:

```python
def from_config(cls, config):
  return cls(**config)
```

To learn more about serialization and saving, see the complete
[guide to saving and serializing models](https://www.tensorflow.org/guide/keras/save_and_serialize/).

## Privileged `training` argument in the `call()` method

Some layers, in particular the `BatchNormalization` layer and the `Dropout`
layer, have different behaviors during training and inference. For such
layers, it is standard practice to expose a `training` (boolean) argument in
the `call()` method.

By exposing this argument in `call()`, you enable the built-in training and
evaluation loops (e.g. `fit()`) to correctly use the layer in training and
inference.

In [25]:
class CustomDropout(keras.layers.Layer):
    # Calls the constructor of the base class 'keras.layers.Layer' with any additional keyword arguments ('kwargs')
    def __init__(self, rate, **kwargs):
        super(CustomDropout, self).__init__(**kwargs)
        self.rate = rate

    def call(self, inputs, training=None):
        
        # The 'training' argument indicates whether the layer is in training mode. If 'training' is 'True', 
        # it applies dropout to the inputs using 'tf.nn.dropout' with the specified rate
        if training:
            return tf.nn.dropout(inputs, rate=self.rate)
        return inputs


## Privileged `mask` argument in the `call()` method

The other privileged argument supported by `call()` is the `mask` argument.

You will find it in all Keras RNN layers. A mask is a boolean tensor (one
boolean value per timestep in the input) used to skip certain input timesteps
when processing timeseries data.

Keras will automatically pass the correct `mask` argument to `__call__()` for
layers that support it, when a mask is generated by a prior layer.
Mask-generating layers are the `Embedding`
layer configured with `mask_zero=True`, and the `Masking` layer.

To learn more about masking and how to write masking-enabled layers, please
check out the guide
["understanding padding and masking"](https://www.tensorflow.org/guide/keras/masking_and_padding/).

## The `Model` class

In general, you will use the `Layer` class to define inner computation blocks,
and will use the `Model` class to define the outer model -- the object you
will train.

For instance, in a ResNet50 model, you would have several ResNet blocks
subclassing `Layer`, and a single `Model` encompassing the entire ResNet50
network.

The `Model` class has the same API as `Layer`, with the following differences:

- It exposes built-in training, evaluation, and prediction loops
(`model.fit()`, `model.evaluate()`, `model.predict()`).
- It exposes the list of its inner layers, via the `model.layers` property.
- It exposes saving and serialization APIs (`save()`, `save_weights()`...)

Effectively, the `Layer` class corresponds to what we refer to in the
literature as a "layer" (as in "convolution layer" or "recurrent layer") or as
a "block" (as in "ResNet block" or "Inception block").

Meanwhile, the `Model` class corresponds to what is referred to in the
literature as a "model" (as in "deep learning model") or as a "network" (as in
"deep neural network").

So if you're wondering, "should I use the `Layer` class or the `Model` class?",
ask yourself: will I need to call `fit()` on it? Will I need to call `save()`
on it? If so, go with `Model`. If not (either because your class is just a block
in a bigger system, or because you are writing training & saving code yourself),
use `Layer`.

For instance, we could take our mini-resnet example above, and use it to build
a `Model` that we could train with `fit()`, and that we could save with
`save_weights()`:

```python
class ResNet(tf.keras.Model):

    def __init__(self, num_classes=1000):
        super(ResNet, self).__init__()
        self.block_1 = ResNetBlock()
        self.block_2 = ResNetBlock()
        self.global_pool = layers.GlobalAveragePooling2D()
        self.classifier = Dense(num_classes)

    def call(self, inputs):
        x = self.block_1(inputs)
        x = self.block_2(x)
        x = self.global_pool(x)
        return self.classifier(x)


resnet = ResNet()
dataset = ...
resnet.fit(dataset, epochs=10)
resnet.save(filepath)
```

## Putting it all together: an end-to-end example

Here's what you've learned so far:

- A `Layer` encapsulate a state (created in `__init__()` or `build()`) and some
computation (defined in `call()`).
- Layers can be recursively nested to create new, bigger computation blocks.
- Layers can create and track losses (typically regularization losses) as well
as metrics, via `add_loss()` and `add_metric()`
- The outer container, the thing you want to train, is a `Model`. A `Model` is
just like a `Layer`, but with added training and serialization utilities.

Let's put all of these things together into an end-to-end example: we're going
to implement a Variational AutoEncoder (VAE). We'll train it on MNIST digits.

Our VAE will be a subclass of `Model`, built as a nested composition of layers
that subclass `Layer`. It will feature a regularization loss (KL divergence).

In [26]:
from tensorflow.keras import layers


class Sampling(layers.Layer):
    """Uses (z_mean, z_log_var) to sample z, the vector encoding a digit."""

    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = tf.keras.backend.random_normal(shape=(batch, dim))
        return z_mean + tf.exp(0.5 * z_log_var) * epsilon


class Encoder(layers.Layer):
    """Maps MNIST digits to a triplet (z_mean, z_log_var, z)."""

    def __init__(self, latent_dim=32, intermediate_dim=64, name="encoder", **kwargs):
        super(Encoder, self).__init__(name=name, **kwargs)
        self.dense_proj = layers.Dense(intermediate_dim, activation="relu")
        self.dense_mean = layers.Dense(latent_dim)
        self.dense_log_var = layers.Dense(latent_dim)
        self.sampling = Sampling()

    def call(self, inputs):
        x = self.dense_proj(inputs)
        z_mean = self.dense_mean(x)
        z_log_var = self.dense_log_var(x)
        z = self.sampling((z_mean, z_log_var))
        return z_mean, z_log_var, z


class Decoder(layers.Layer):
    """Converts z, the encoded digit vector, back into a readable digit."""

    def __init__(self, original_dim, intermediate_dim=64, name="decoder", **kwargs):
        super(Decoder, self).__init__(name=name, **kwargs)
        self.dense_proj = layers.Dense(intermediate_dim, activation="relu")
        self.dense_output = layers.Dense(original_dim, activation="sigmoid")

    def call(self, inputs):
        x = self.dense_proj(inputs)
        return self.dense_output(x)


class VariationalAutoEncoder(keras.Model):
    """Combines the encoder and decoder into an end-to-end model for training."""

    def __init__(
        self,
        original_dim,
        intermediate_dim=64,
        latent_dim=32,
        name="autoencoder",
        **kwargs
    ):
        super(VariationalAutoEncoder, self).__init__(name=name, **kwargs)
        self.original_dim = original_dim
        self.encoder = Encoder(latent_dim=latent_dim, intermediate_dim=intermediate_dim)
        self.decoder = Decoder(original_dim, intermediate_dim=intermediate_dim)

    def call(self, inputs):
        z_mean, z_log_var, z = self.encoder(inputs)
        reconstructed = self.decoder(z)
        # Add KL divergence regularization loss.
        kl_loss = -0.5 * tf.reduce_mean(
            z_log_var - tf.square(z_mean) - tf.exp(z_log_var) + 1
        )
        self.add_loss(kl_loss)
        return reconstructed


Let's write a simple training loop on MNIST:

In [27]:
original_dim = 784
vae = VariationalAutoEncoder(original_dim, 64, 32)

optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
mse_loss_fn = tf.keras.losses.MeanSquaredError()

loss_metric = tf.keras.metrics.Mean()

(x_train, _), _ = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype("float32") / 255

train_dataset = tf.data.Dataset.from_tensor_slices(x_train)
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

epochs = 2

# Iterate over epochs.
for epoch in range(epochs):
    print("Start of epoch %d" % (epoch,))

    # Iterate over the batches of the dataset.
    for step, x_batch_train in enumerate(train_dataset):
        with tf.GradientTape() as tape:
            reconstructed = vae(x_batch_train)
            # Compute reconstruction loss
            loss = mse_loss_fn(x_batch_train, reconstructed)
            loss += sum(vae.losses)  # Add KLD regularization loss

        grads = tape.gradient(loss, vae.trainable_weights)
        optimizer.apply_gradients(zip(grads, vae.trainable_weights))

        loss_metric(loss)

        if step % 100 == 0:
            print("step %d: mean loss = %.4f" % (step, loss_metric.result()))

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Start of epoch 0
step 0: mean loss = 0.3371
step 100: mean loss = 0.1245
step 200: mean loss = 0.0986
step 300: mean loss = 0.0888
step 400: mean loss = 0.0839
step 500: mean loss = 0.0807
step 600: mean loss = 0.0786
step 700: mean loss = 0.0770
step 800: mean loss = 0.0758
step 900: mean loss = 0.0748
Start of epoch 1
step 0: mean loss = 0.0745
step 100: mean loss = 0.0739
step 200: mean loss = 0.0734
step 300: mean loss = 0.0729
step 400: mean loss = 0.0726
step 500: mean loss = 0.0722
step 600: mean loss = 0.0719
step 700: mean loss = 0.0716
step 800: mean loss = 0.0714
step 900: mean loss = 0.0711


Note that since the VAE is subclassing `Model`, it features built-in training
loops. So you could also have trained it like this:

In [24]:
vae = VariationalAutoEncoder(784, 64, 32)

optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)

vae.compile(optimizer, loss=tf.keras.losses.MeanSquaredError())
vae.fit(x_train, x_train, epochs=2, batch_size=64)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7eff08c70150>

## Beyond object-oriented development: the Functional API

Was this example too much object-oriented development for you? You can also
build models using the [Functional API](https://www.tensorflow.org/guide/keras/functional/). Importantly,
choosing one style or another does not prevent you from leveraging components
written in the other style: you can always mix-and-match.

For instance, the Functional API example below reuses the same `Sampling` layer
we defined in the example above:

In [25]:
original_dim = 784
intermediate_dim = 64
latent_dim = 32

# Define encoder model.
original_inputs = tf.keras.Input(shape=(original_dim,), name="encoder_input")
x = layers.Dense(intermediate_dim, activation="relu")(original_inputs)
z_mean = layers.Dense(latent_dim, name="z_mean")(x)
z_log_var = layers.Dense(latent_dim, name="z_log_var")(x)
z = Sampling()((z_mean, z_log_var))
encoder = tf.keras.Model(inputs=original_inputs, outputs=z, name="encoder")

# Define decoder model.
latent_inputs = tf.keras.Input(shape=(latent_dim,), name="z_sampling")
x = layers.Dense(intermediate_dim, activation="relu")(latent_inputs)
outputs = layers.Dense(original_dim, activation="sigmoid")(x)
decoder = tf.keras.Model(inputs=latent_inputs, outputs=outputs, name="decoder")

# Define VAE model.
outputs = decoder(z)
vae = tf.keras.Model(inputs=original_inputs, outputs=outputs, name="vae")

# Add KL divergence regularization loss.
kl_loss = -0.5 * tf.reduce_mean(z_log_var - tf.square(z_mean) - tf.exp(z_log_var) + 1)
vae.add_loss(kl_loss)

# Train.
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
vae.compile(optimizer, loss=tf.keras.losses.MeanSquaredError())
vae.fit(x_train, x_train, epochs=3, batch_size=64)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7eff08c1f4d0>

For more information, make sure to read the [Functional API guide](https://www.tensorflow.org/guide/keras/functional/).