## Layers and Blocks


In this example, we constructed
our model by instantiating an `nn.Sequential`, with layers in the order
that they should be executed passed as arguments.

In [None]:
import torch
from torch import nn
from torch.nn import functional as F

net = nn.Sequential(nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10))

X = torch.rand(5, 20)
net(X)
#print(X)

tensor([[ 1.0975e-01,  1.3458e-01,  2.0602e-01,  6.3120e-01,  1.3348e-01,
         -5.5983e-02, -1.5761e-01,  1.5106e-01, -5.5152e-02, -1.2457e-01],
        [ 2.0242e-01, -6.1417e-04,  9.2329e-02,  6.2101e-01,  8.6609e-02,
          1.0189e-01, -1.8166e-01,  1.9436e-01, -1.6335e-01, -5.3707e-02],
        [ 1.7886e-01,  6.1261e-02,  3.3504e-03,  6.8099e-01,  1.3874e-01,
          1.0615e-01, -1.4284e-01,  1.1757e-01, -1.0429e-01, -4.9472e-02],
        [ 8.9643e-02,  9.5878e-02,  7.2870e-02,  5.5678e-01,  7.1848e-02,
          1.0350e-01, -1.3745e-01,  9.2253e-02, -9.1977e-02,  6.4456e-02],
        [ 1.0983e-01,  9.0974e-02,  7.4630e-02,  5.6777e-01,  1.8679e-01,
          8.4405e-02, -1.6439e-01,  1.6304e-01,  1.5852e-03, -9.7497e-02]],
       grad_fn=<AddmmBackward>)

Next, we code up an MLP
with one hidden layer with 256 hidden units,
and a 10-dimensional output layer.

In [None]:
class MLP(nn.Module):
    # Declare a layer with model parameters. Here, we declare two fully
    # connected layers
    def __init__(self):
        # Call the constructor of the `MLP` parent class `Module` to perform
        # the necessary initialization. In this way, other function arguments
        # can also be specified during class instantiation, such as the model
        # parameters, `params` (to be described later)
        super().__init__()
        self.hidden = nn.Linear(20, 256)  # Hidden layer
        self.out = nn.Linear(256, 10)  # Output layer

    # Define the forward propagation of the model, that is, how to return the
    # required model output based on the input `X`
    def forward(self, X):
        # Note here we use the funtional version of ReLU defined in the
        # nn.functional module.
        return self.out(F.relu(self.hidden(X)))

Next, we instantiate the MLP. 
Note a few key details.
First, our customized `__init__` function
invokes the parent class's `__init__` function
via `super().__init__()`.
We then instantiate our two fully-connected layers,
assigning them to `self.hidden` and `self.out`.


In [None]:
net = MLP()
net(X)

tensor([[-0.0031, -0.0590, -0.1337, -0.2045,  0.1432,  0.0219,  0.0728, -0.1828,
         -0.1712,  0.2511],
        [ 0.0045, -0.0982, -0.3168, -0.0333,  0.0853,  0.1641,  0.1434, -0.2105,
         -0.3333,  0.2882],
        [-0.0043, -0.0175, -0.1754, -0.1551,  0.0887,  0.0915,  0.3139, -0.1394,
         -0.2129,  0.2498],
        [ 0.0504, -0.0309, -0.0880,  0.0584,  0.2259,  0.0953,  0.1509, -0.1902,
         -0.2197,  0.1234],
        [ 0.0022, -0.0082, -0.0974, -0.0845,  0.1144,  0.1230,  0.2013, -0.2171,
         -0.2020,  0.1784]], grad_fn=<AddmmBackward>)

A key virtue of the block abstraction is its versatility.
We can subclass a block to create layers
(such as the fully-connected layer class),
entire models (such as the `MLP` class above),
or various components of intermediate complexity.
We exploit this versatility
throughout the following chapters,
such as when addressing
convolutional neural networks.


## [**The Sequential Block**]

We can now take a closer look
at how the `Sequential` class works.
To build our own simplified `MySequential`,
we just need to define two key function:
1. A function to append blocks one by one to a list.
2. A forward propagation function to pass an input through the chain of blocks, in the same order as they were appended.

The following `MySequential` class delivers the same
functionality of the default `Sequential` class.


In [None]:
class MySequential(nn.Module):
    def __init__(self, *args):
        super().__init__()
        for idx, module in enumerate(args):
            # Here, `module` is an instance of a `Module` subclass. We save it
            # in the member variable `_modules` of the `Module` class, and its
            # type is OrderedDict
            self._modules[str(idx)] = module

    def forward(self, X):
        # OrderedDict guarantees that members will be traversed in the order
        # they were added
        for block in self._modules.values():
            X = block(X)
        return X

When our `MySequential`'s forward propagation function is invoked,
each added block is executed
in the order in which they were added.
We can now reimplement an MLP
using our `MySequential` class.


In [None]:
net = MySequential(nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10))
net(X)

tensor([[ 0.0213,  0.0902,  0.0247,  0.0203, -0.0198, -0.0725, -0.1215,  0.2789,
          0.0965,  0.1723],
        [-0.0065,  0.1543, -0.0502,  0.0522,  0.0452, -0.0484, -0.0917,  0.3645,
          0.1556,  0.2705],
        [ 0.0310,  0.1359, -0.0770, -0.0006,  0.0813,  0.0310, -0.1085,  0.3499,
          0.1763,  0.1680],
        [ 0.0686,  0.1035,  0.0220,  0.0797, -0.0012, -0.1189, -0.2067,  0.2226,
          0.0359,  0.2509],
        [ 0.0863,  0.0633,  0.0308,  0.0397, -0.0027,  0.0202, -0.1145,  0.3047,
          0.1579,  0.1940]], grad_fn=<AddmmBackward>)

You might have noticed that until now,
all of the operations in our networks
have acted upon our network's activations
and its parameters.
Sometimes, however, we might want to
incorporate terms
that are neither the result of previous layers
nor updatable parameters.
Say for example that we want a layer
that calculates the function
$f(\mathbf{x},\mathbf{w}) = c \cdot \mathbf{w}^\top \mathbf{x}$,
where $\mathbf{x}$ is the input, $\mathbf{w}$ is our parameter,
and $c$ is some specified constant
that is not updated during optimization.
So we implement a `FixedHiddenMLP` class as follows.


In [None]:
class FixedHiddenMLP(nn.Module):
    def __init__(self):
        super().__init__()
        # Random weight parameters that will not compute gradients and
        # therefore keep constant during training
        self.rand_weight = torch.rand((20, 20), requires_grad=False)
        self.linear = nn.Linear(20, 20)

    def forward(self, X):
        X = self.linear(X)
        # Use the created constant parameters, as well as the `relu` and `mm`
        # functions
        X = F.relu(torch.mm(X, self.rand_weight) + 1)
        # Reuse the fully-connected layer. This is equivalent to sharing
        # parameters with two fully-connected layers
        X = self.linear(X)
        # Control flow
        while X.abs().sum() > 1:
            X /= 2
        return X.sum()

In this `FixedHiddenMLP` model,
we implement a hidden layer whose weights
(`self.rand_weight`) are initialized randomly
at instantiation and are thereafter constant.
This weight is not a model parameter
and thus it is never updated by backpropagation.
The network then passes the output of this "fixed" layer
through a fully-connected layer.

Note that before returning the output,
our model did something unusual.
We ran a while-loop, testing
on the condition its $L_1$ norm is larger than $1$,
and dividing our output vector by $2$
until it satisfied the condition.
Finally, we returned the sum of the entries in `X`.
To our knowledge, no standard neural network
performs this operation.
Note that this particular operation may not be useful
in any real-world task.
Our point is only to show you how to integrate
arbitrary code into the flow of your
neural network computations.


In [None]:
net = FixedHiddenMLP()
net(X)

tensor(0.2891, grad_fn=<SumBackward0>)

We can [**mix and match various
ways of assembling blocks together.**]
In the following example, we nest blocks
in some creative ways.


In [None]:
class NestMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(nn.Linear(20, 64), nn.ReLU(),
                                 nn.Linear(64, 32), nn.ReLU())
        self.linear = nn.Linear(32, 16)

    def forward(self, X):
        return self.linear(self.net(X))

chimera = nn.Sequential(NestMLP(), nn.Linear(16, 20), FixedHiddenMLP())
chimera(X)

tensor(-0.2119, grad_fn=<SumBackward0>)

## Summary

* Layers are blocks.
* Many layers can comprise a block.
* Many blocks can comprise a block.
* A block can contain code.
* Blocks take care of lots of housekeeping, including parameter initialization and backpropagation.
* Sequential concatenations of layers and blocks are handled by the `Sequential` block.

[Discussions](https://discuss.d2l.ai/t/55)
