# Parameters Registration
In PyTorch, all submodules that you assign as attributes inside `__init__` and that inherit from `nn.Module` become part of the model’s parameters **automatically**.

Because this code does this:

```python
self.conv1 = nn.Conv2d(...)
self.FC    = nn.Sequential(...)
```

both `conv1` and the two Linear layers inside `FC` are registered automatically.

---

In [7]:
import torch
import torch.nn


class SimpleNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = torch.nn.Conv2d(
            in_channels=3, out_channels=4, kernel_size=3, padding=1, stride=2, bias=False)

        self.FC = torch.nn.Sequential(
            torch.nn.Linear(in_features=4*28*28, out_features=1000),
            torch.nn.ReLU(),
            torch.nn.Linear(in_features=1000, out_features=100)
        )

    def forward(self, x):
        x = self.conv1(x)
        x = x.flatten(1)
        x = self.FC(x)
        return x


model = SimpleNet()


B, C, H, W = 1, 3, 56, 56
x = torch.randn(B, C, H, W)

# out = model(x)
# print(out.shape)


for name, param in model.named_parameters():
    print(name, param.shape)


conv1.weight torch.Size([4, 3, 3, 3])
FC.0.weight torch.Size([1000, 3136])
FC.0.bias torch.Size([1000])
FC.2.weight torch.Size([100, 1000])
FC.2.bias torch.Size([100])


# Why they get registered automatically

Every time you do:

```python
self.some_name = nn.Module()
```

PyTorch performs two things:

1. **Registers `some_name` as a submodule**
2. **Registers all trainable parameters inside it**

This happens because `nn.Module` overrides `__setattr__`.

So:

* `self.conv1` → registered module
* `self.FC` → registered module

  * contains Linear(3136→1000) → parameters registered
  * contains Linear(1000→100) → parameters registered
  * contains ReLU → no parameters (but still registered)

---



# 1. What `nn.ModuleList` Actually Is

`nn.ModuleList` is a **container that registers submodules** inside a PyTorch model.

Any tensor parameters inside those submodules must be **registered** so that:

1. They appear in
   `model.parameters()`
2. They are moved correctly with
   `model.cuda()`, `model.to(device)`
3. They appear in the state dictionary
   `model.state_dict()`
4. The optimizer receives their parameters

A **normal Python list does not register anything**.

---

# 2. Why Registration Matters

Suppose you have layers

* $ W^{(1)} \in \mathbb{R}^{256 \times 256} $
* $ W^{(2)} \in \mathbb{R}^{256 \times 256} $

If they are inside a ModuleList, the model will collect them:

$$
\theta = {W^{(1)}, W^{(2)}, \dots}
$$

But if they are inside an ordinary list, these matrices **never appear** in (\theta).
That means:

* Optimizer will never update them
* State dict will not store them
* `.cuda()` will not transfer them

---

# 3. Minimal Example: What Goes Wrong with a Python List

### Example (incorrect)



In [8]:
import torch
import torch.nn as nn

class BadNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = [nn.Linear(10, 10), nn.Linear(10, 10)]  # Python list

    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return x

model = BadNetwork()
print(list(model.parameters()))


[]



The model has **zero parameters**.
Both Linear layers were ignored.

---

# 4. Correct Version Using `ModuleList`




In [9]:
class GoodNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.ModuleList([
            nn.Linear(10, 10),
            nn.Linear(10, 10)
        ])

    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return x

model = GoodNetwork()
for name, param in model.named_parameters():
    print(name, param.shape)

layers.0.weight torch.Size([10, 10])
layers.0.bias torch.Size([10])
layers.1.weight torch.Size([10, 10])
layers.1.bias torch.Size([10])


Now both layers are registered.

---

# 5. A Realistic Example: Deep Stacked Blocks

You want multiple blocks:

$$
x_{l+1} = f_l(x_l)
$$

Instead of writing them manually:

```python
self.block1 = ...
self.block2 = ...
self.block3 = ...
```

You can do:

```python
self.blocks = nn.ModuleList([Block() for _ in range(6)])
```

And the forward:

```python
for block in self.blocks:
    x = block(x)
```

This is **exactly** why PVT, Swin, FPN, U-Net, Mask2Former use ModuleList.

---

# 6. How `ModuleList` Differs from `nn.Sequential`

| Feature                     | `ModuleList` | `Sequential` |
| --------------------------- | ------------ | ------------ |
| Registers layers            | Yes          | Yes          |
| Layers have **fixed order** | You define   | Automatic    |
| You write your own forward  | Yes          | No           |
| Useful for loops / branches | Yes          | Not ideal    |

Example: U-Net decoder, FPN lateral layers, transformer blocks → use `ModuleList`.

---

# 7. Side-By-Side Comparison

### Python List

```python
self.layers = [nn.Linear(32, 32)]
```

* No registration
* No `parameters()`
* No gradient updates
* No `.cuda()` movement
* No saving in `state_dict`

### ModuleList

```python
self.layers = nn.ModuleList([nn.Linear(32, 32)])
```

* Fully registered
* Appears in `parameters()`
* Optimizable
* Stored in checkpoints

---

# 8. Why PVT / FPN / ResNet use ModuleList

### Example: PVT lateral layers:

```python
self.lateral = nn.ModuleList([
    nn.Conv2d(c, out_channels, 1) for c in in_channels
])
```

Why?
Because PVT outputs are:

* $P_1$
* $P_2$
* $P_3$
* $P_4$

And for each we need a 1×1 conv to unify channel dimensions:

$$
P_i' = \text{Conv}_{1\times1}(P_i)
$$

You cannot write 4 separate convs by hand each time.
Also you cannot store them in a Python list.

---



### Use `ModuleList` when:

* You need **a variable or iterable number of layers**
* You need loops inside forward
* You build **transformer blocks**
* You build **FPN lateral or top-down layers**
* You build **U-Net decoder lists**
* You need **skip connections** with lists of layers

### Do *not* use a Python list when the elements are modules.



## Manually Registering Each Layer in  FPN 

you **can** manually register each layer like:

```python
self.lp1 = nn.Conv2d(in_channels[0], out_channels, 1)
self.lp2 = nn.Conv2d(in_channels[1], out_channels, 1)
self.lp3 = nn.Conv2d(in_channels[2], out_channels, 1)
self.lp4 = nn.Conv2d(in_channels[3], out_channels, 1)
```

This works perfectly — PyTorch will register parameters because each one is assigned as an attribute.

But here’s the key:

# Why `ModuleList` Is Still Better

## 1. Manual attributes do not scale

If you have N layers:

* Manual registration:

  * Must define N attributes
  * Must write N forward operations
  * Error-prone and repetitive

* ModuleList:

  * Automatically registers each layer
  * Makes forward loops easy

---

# Comparison Example

## Manual Registration

```python
class FPN(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.l1 = nn.Conv2d(in_channels[0], out_channels, 1)
        self.l2 = nn.Conv2d(in_channels[1], out_channels, 1)
        self.l3 = nn.Conv2d(in_channels[2], out_channels, 1)
        self.l4 = nn.Conv2d(in_channels[3], out_channels, 1)

    def forward(self, feats):
        p1 = self.l1(feats[0])
        p2 = self.l2(feats[1])
        p3 = self.l3(feats[2])
        p4 = self.l4(feats[3])
        return p1, p2, p3, p4
```

This works but is rigid.

---

## ModuleList Version (recommended)

```python
class FPN(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.lateral = nn.ModuleList(
            [nn.Conv2d(c, out_channels, kernel_size=1) for c in in_channels]
        )

    def forward(self, feats):
        return [layer(f) for layer, f in zip(self.lateral, feats)]
```

Advantages:

* Cleaner
* Can support variable number of inputs
* Easy loops
* Less code
* Avoids mistakes

---

# 2. ModuleList is essential when you want dynamic number of layers

Example in a Transformer:

```python
self.blocks = nn.ModuleList([Block() for _ in range(depth)])
```

You **cannot** do this with manual attributes unless you hard-code every depth.

---

# 3. ModuleList integrates better with flexible architectures

Think of PVT, Swin, Mask2Former, U-Net, FPN — all use lists of layers.

Because their depth and number of feature levels are often:

* dynamic
* dataset-dependent
* architecture-dependent

---

