# Layers and Blocks



Neurons, layers, and models provide abstractions to build Deep Neural Networks.

it turns out that we need to define components that are larger than an individual layer but smaller than the entire model.


A component consist of repeating patterns of *groups of layers*. Such
design patterns are common in practice to implement complex networks.

To implement these complex networks, we introduce the concept of a neural network *block*.

A block could include a single layer, multiple layers, or the entire model itself!

One benefit of working with the block abstraction is that they can be combined into larger blocks, often recursively. 

From a programing standpoint,  a block is represented by a *class*.
 - Any subclass of it must define a `forward propagation function`, transforming its input into output and also storing any necessary parameters.
- Some blocks do not require any parameters at all.
- A block must possess a `backpropagation function` to calculate gradients.

Fortunately, due to some behind-the-scenes magic
supplied by the auto differentiation (`autograd`) when defining our own block,
you only need to worry about the block parameters and the forward propagation function.

[**To begin, we revisit the code
that we used to implement MLPs**]

The following code generates a network with one fully-connected hidden layer
with 256 units and ReLU activation, followed by a fully-connected output layer
with 10 units (no activation function).


In [None]:
import tensorflow as tf

net = tf.keras.models.Sequential([
    tf.keras.layers.Dense(256, activation=tf.nn.relu),
    tf.keras.layers.Dense(10),])

X = tf.random.uniform((2, 20))
net(X)

## [**A Custom Block**]


Each block must provide:

1. Ingest input data as arguments to its forward propagation function.
1. Generate an output by having the forward propagation function return a value. Note that the output may have a different shape from the input. For example, the first fully-connected layer in our model above ingests an      input of arbitrary dimension but returns an output of dimension 256.
1. Calculate the gradient of its output with respect to its input, which can be accessed via its backpropagation function. Typically this happens automatically.
1. Store and provide access to those parameters necessary
   to execute the forward propagation computation.
1. Initialize model parameters as needed.

In the following snippet,
we code up a block from scratch
corresponding to an MLP
with one hidden layer with 256 hidden units,
and a 10-dimensional output layer.

In [None]:
class MLP(tf.keras.Model):
    # Declare a layer with model parameters. Here, we declare two fully
    # connected layers
    def __init__(self):
        # Call the constructor of the `MLP` parent class `Model` to perform
        # the necessary initialization. In this way, other function arguments
        # can also be specified during class instantiation, such as the model
        # parameters, `params` (to be described later)
        super().__init__()
        # Hidden layer
        self.hidden = tf.keras.layers.Dense(units=256, activation=tf.nn.relu)
        self.out = tf.keras.layers.Dense(units=10)  # Output layer

    # Define the forward propagation of the model, that is, how to return the
    # required model output based on the input `X`
    def call(self, X):
        return self.out(self.hidden((X)))

Note that the `MLP` class below inherits the class that represents a block.
We will heavily rely on the parent class's functions,
supplying only our own constructor (the `__init__` function in Python) and the forward propagation function.

The forward propagation function, takes `X` as the input,
calculates the hidden representation
with the activation function applied,
and outputs its logits.

In this `MLP` implementation,
both layers are instance variables.

To see why this is reasonable, imagine
instantiating two MLPs, `net1` and `net2`,
and training them on different data.
Naturally, we would expect them
to represent two different learned models.

We [**instantiate the MLP's layers**]
in the constructor
(**and subsequently invoke these layers**)
on each call to the forward propagation function.


Note that the customized `__init__` function
invokes the parent class's `__init__` function
via `super().__init__()`
sparing us the pain of restating
boilerplate code applicable to most blocks.


We then instantiate our two fully-connected layers,
assigning them to `self.hidden` and `self.out`.

Note that we need not worry about the backpropagation function
or parameter initialization.

The TensorFlow will generate these functions automatically.


In [None]:
net = MLP()
net(X)


## [**The Sequential Block**]

Recall that `Sequential` was designed to chain layers together.

To build our own simplified `MySequential`, we just need to define two key function:
1. A function to append blocks one by one to a list.
2. A forward propagation function to pass an input through the chain of blocks, in the same order as they were appended.

The following `MySequential` class delivers the same
functionality of the default `Sequential` class.


In [None]:
class MySequential(tf.keras.Model):
    def __init__(self, *args):
        super().__init__()
        self.modules = []
        for block in args:
            # Here, `block` is an instance of a `tf.keras.layers.Layer`
            # subclass
            self.modules.append(block)

    def call(self, X):
        for module in self.modules:
            X = module(X)
        return X

When our `MySequential`'s forward propagation function is invoked,
each added block is executed
in the order in which they were added.

We can now reimplement an MLP using our `MySequential` class.


In [None]:
net = MySequential(tf.keras.layers.Dense(units=256, activation=tf.nn.relu),
                   tf.keras.layers.Dense(10))
net(X)


## [**Executing Code in the Forward Propagation Function**]

The `Sequential` class makes model construction easy,
allowing us to build new models without having to define our own class.

However, not all architectures are simple chain of layers.

However, sometimes it is required to execute
Python's control flow within the forward propagation function or perform
arbitrary mathematical operations without relying on predefined neural network layers.

You might have noticed that until now,
all of the operations in our networks
have acted upon our network's activations
and its parameters.

Sometimes, however, we might want to
incorporate terms
that are neither the result of previous layers
nor updatable parameters.

We call these *constant parameters*.
Say for example that we want a layer
that calculates the function
$f(\mathbf{x},\mathbf{w}) = c \cdot \mathbf{w}^\top \mathbf{x}$,
where $\mathbf{x}$ is the input, $\mathbf{w}$ is our parameter,
and $c$ is some specified constant
that is not updated during optimization.

So we implement a `FixedHiddenMLP` class as follows.


In [None]:
class FixedHiddenMLP(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.flatten = tf.keras.layers.Flatten()
        # Random weight parameters created with `tf.constant` are not updated
        # during training (i.e., constant parameters)
        self.rand_weight = tf.constant(tf.random.uniform((20, 20)))
        self.dense = tf.keras.layers.Dense(20, activation=tf.nn.relu)

    def call(self, inputs):
        X = self.flatten(inputs)
        # Use the created constant parameters, as well as the `relu` and
        # `matmul` functions
        X = tf.nn.relu(tf.matmul(X, self.rand_weight) + 1)
        # Reuse the fully-connected layer. This is equivalent to sharing
        # parameters with two fully-connected layers
        X = self.dense(X)
        # Control flow
        while tf.reduce_sum(tf.math.abs(X)) > 1:
            X /= 2
        return tf.reduce_sum(X)

The `FixedHiddenMLP` model implements a hidden layer whose weights
(`self.rand_weight`) are initialized randomly
at instantiation and are thereafter constant.

Note that this weight is not a model parameter and thus it is never updated by backpropagation. as a result, the network then passes the output of this "fixed" layer
through a fully-connected layer.

Note also that before returning the output,
our model ran a while-loop, testing
on the condition its $L_1$ norm is larger than $1$,
and dividing our output vector by $2$
until it satisfied the condition.


Finally, we returned the sum of the entries in `X`.

To our knowledge, no standard neural network
performs this operation.

This particular operation may not be useful
in any real-world task but it show you how to integrate
arbitrary code (if needed) into the flow of your
neural network computations.


In [None]:
net = FixedHiddenMLP()
net(X)

It is also [**mix and match various
ways of assembling blocks together.**]

In the following example, we nest blocks
in some creative ways.


In [None]:
class NestMLP(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.net = tf.keras.Sequential()
        self.net.add(tf.keras.layers.Dense(64, activation=tf.nn.relu))
        self.net.add(tf.keras.layers.Dense(32, activation=tf.nn.relu))
        self.dense = tf.keras.layers.Dense(16, activation=tf.nn.relu)

    def call(self, inputs):
        return self.dense(self.net(inputs))

chimera = tf.keras.Sequential()
chimera.add(NestMLP())
chimera.add(tf.keras.layers.Dense(20))
chimera.add(FixedHiddenMLP())
chimera(X)

## Summary

* Layers are blocks.
* Many layers can comprise a block.
* Many blocks can comprise a block.
* A block can contain code.
* Blocks take care of lots of housekeeping, including parameter initialization and backpropagation.
* Sequential concatenations of layers and blocks are handled by the `Sequential` block.


## Exercises (Optional)

1. What kinds of problems will occur if you change `MySequential` to store blocks in a Python list?
1. Implement a block that takes two blocks as an argument, say `net1` and `net2` and returns the concatenated output of both networks in the forward propagation. This is also called a parallel block.
1. Assume that you want to concatenate multiple instances of the same network. Implement a factory function that generates multiple instances of the same block and build a larger network from it.
