Every neural network layer, architecture, or model in PyTorch is built as a subclass of `torch.nn.Module` class. Here we implement and explore in depth different sub-topics related.

In [16]:
import torch
import torch.nn as nn
import numpy as np
import torch.nn.functional as F
import matplotlib.pyplot as plt
from collections import OrderedDict

# from torchsummary import summary
from torchinfo import summary

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Parameters are Tensor subclasses, that have a very special property when used with Module s - when they’re assigned as Module attributes they are automatically added to the list of its parameters, and will appear e.g. in parameters() iterator.

How different are nn.Parameters from torch.tensor()?

#### Composing Multiple Layers using Sequential API

It is a container module that stores other nn.Module layers in the order they are passed and executes them sequentially in the forward() pass.

In [2]:
block = nn.Sequential(
    nn.Conv2d(3, 16, 3, padding=1),
    nn.BatchNorm2d(16),
    nn.ReLU()
)
print(block)

# we can nest nn.Sequential blocks inside other modules or inside other Sequentials
model = nn.Sequential(
    block,
    nn.MaxPool2d(2)
)
print(model)
print("Number of layers: " + str(len(model)))          # Number of layers

# by default, layers are given names based with int index
# print(model[1])

# for name, module in model.named_children():
#     print(f"Layer Name: {name}, Module: {module}")

Sequential(
  (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU()
)
Sequential(
  (0): Sequential(
    (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
Number of layers: 2


We can also use named layers with Sequential container, using `OrderedDict`

In [3]:
model = nn.Sequential(OrderedDict([
    ("conv", nn.Conv2d(3, 8, 3, padding=1)),
    ("relu", nn.ReLU()),
    ("pool", nn.MaxPool2d(2))
]))

print(f"Accessing the whole named model: \n{model}")
print(f"\nAccessing a single model layer: \n{model.conv}\n")

Accessing the whole named model: 
Sequential(
  (conv): Conv2d(3, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu): ReLU()
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)

Accessing a single model layer: 
Conv2d(3, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))



#### Custom Modules (Decomposing Models into Blocks)

We can create reusable blocks by subclassing `nn.Module`.

In [4]:
class MLPBlock(nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()
        self.block = nn.Sequential(
            nn.Linear(in_features, out_features),
            nn.BatchNorm1d(out_features),
            nn.ReLU()
        )

    def forward(self, x):
        return self.block(x)

class FullModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.mlp1 = MLPBlock(10, 64)
        self.mlp2 = MLPBlock(64, 32)
        self.output = nn.Linear(32, 1)

    def forward(self, x):
        x = self.mlp1(x)
        x = self.mlp2(x)
        return self.output(x)

In [18]:
model = FullModel()
summary(model, input_size=(1, 10)) # b = 1, N features = 10

Layer (type:depth-idx)                   Output Shape              Param #
FullModel                                [1, 1]                    --
├─MLPBlock: 1-1                          [1, 64]                   --
│    └─Sequential: 2-1                   [1, 64]                   --
│    │    └─Linear: 3-1                  [1, 64]                   704
│    │    └─BatchNorm1d: 3-2             [1, 64]                   128
│    │    └─ReLU: 3-3                    [1, 64]                   --
├─MLPBlock: 1-2                          [1, 32]                   --
│    └─Sequential: 2-2                   [1, 32]                   --
│    │    └─Linear: 3-4                  [1, 32]                   2,080
│    │    └─BatchNorm1d: 3-5             [1, 32]                   64
│    │    └─ReLU: 3-6                    [1, 32]                   --
├─Linear: 1-3                            [1, 1]                    33
Total params: 3,009
Trainable params: 3,009
Non-trainable params: 0
Total mult-a

#### Modules with Control Flow

This flexibility is essential for RNNs, dynamic routing, etc.

In [19]:
class ResidualBlock(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.linear = nn.Linear(dim, dim)

    def forward(self, x):
        if x.mean() > 0:
            return x + self.linear(x)
        else:
            return x

#### Nested Modules (Using `ModuleList` and `ModuleDict`)

We can create a list of layers using `ModuleList`. `ModuleList` is container to store layers, which are registered as submodules, but we must manually define the forward logic, unlike the previous `nn.Sequential` that didn't require.

In [23]:
class DeepMLP(nn.Module):
    def __init__(self, num_layers=5, in_dim=10, hidden_dim=64):
        super().__init__()
        self.layers = nn.ModuleList([
            nn.Linear(in_dim if i == 0 else hidden_dim, hidden_dim) for i in range(num_layers)
        ])
        self.output = nn.Linear(hidden_dim, 1)

    def forward(self, x):
        for layer in self.layers:
            x = torch.relu(layer(x))
        return self.output(x)

model = DeepMLP()
print(summary(model, input_size=(1, 10))) # b = 1, N features = 10
print(model)

Layer (type:depth-idx)                   Output Shape              Param #
DeepMLP                                  [1, 1]                    --
├─ModuleList: 1-1                        --                        --
│    └─Linear: 2-1                       [1, 64]                   704
│    └─Linear: 2-2                       [1, 64]                   4,160
│    └─Linear: 2-3                       [1, 64]                   4,160
│    └─Linear: 2-4                       [1, 64]                   4,160
│    └─Linear: 2-5                       [1, 64]                   4,160
├─Linear: 1-2                            [1, 1]                    65
Total params: 17,409
Trainable params: 17,409
Non-trainable params: 0
Total mult-adds (M): 0.02
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.07
Estimated Total Size (MB): 0.07
DeepMLP(
  (layers): ModuleList(
    (0): Linear(in_features=10, out_features=64, bias=True)
    (1-4): 4 x Linear(in_features=64, out_featur

`ModuleDict` is a dictionary-like container for layers or blocks, keyed by name.

In [28]:
class CustomBranch(nn.Module):
    def __init__(self):
        super().__init__()
        self.branches = nn.ModuleDict({
            'a': nn.Linear(10, 20),
            'b': nn.Linear(10, 30)
        })

    def forward(self, x, branch='a'):
        return self.branches[branch](x)

model = CustomBranch()

x = torch.randn(5, 10)  # batch of 5

out_a = model(x, branch='a')  # uses nn.Linear(10, 20)
out_b = model(x, branch='b')  # uses nn.Linear(10, 30)

print(out_a.shape)  # torch.Size([5, 20])
print(out_b.shape)  # torch.Size([5, 30])

torch.Size([5, 20])
torch.Size([5, 30])


ModuleDict is great when:
* You have multiple branches or modules (e.g. for conditional computation) or you want named modules, accessed dynamically.
* You're building things like: Mixture of Experts, task-specific heads in multitask learning, router-based architectures, or transformers with named components.