<a href="https://colab.research.google.com/github/asrjy/d2l-notes/blob/master/Chapter%206%20-Builder's%20Guide.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Layers and Modules


In [1]:
import torch
from torch import nn 
from torch.nn import functional as F

net = nn.Sequential(nn.LazyLinear(256), nn.ReLU(), nn.LazyLinear(10))

X = torch.rand(2, 20)
net(X).shape



torch.Size([2, 10])

In [3]:
net.__call__(X).shape

torch.Size([2, 10])

## A Custom Module

Basic functionality of a module:

1 - Ingest input data as arguments and pass it to it's forward propagation method. 

2 - Generate an output from the input passed to it, at the end of forward propagation computation. 

3 - Calculate the backpropagation of the output with respect to the input. 

4 - Store and provide access to it's parameters that are necessary for the forward propagation (weights). 

5 - Initialize model parameters as needed. 

In [4]:
class MLP(nn.Module):
  def __init__(self):
    # Calling the constructor of the parent class nn.Module to perform the necessary initialization
    super().__init__()
    self.hidden = nn.LazyLinear(256)
    self.out = nn.LazyLinear(10)
  def forward(self, X):
    return self.out(F.relu(self.hidden(X)))

In [6]:
net = MLP()
net(X).shape



torch.Size([2, 10])

## The Sequential Module

We can build our own version of Sequential if we can provide 

1 - A method to append modules one by one to  a list

2 - A forward propagation method to pass an input through a chain of modules

In [15]:
class MySequential(nn.Module):
  def __init__(self, *args):
    super().__init__()
    for idx, module  in enumerate(args):
      self.add_module(str(idx), module)
    
  def forward(self, X):
    for module in self.children():
      X = module(X)
    return X

In [16]:
net = MySequential(nn.LazyLinear(256), nn.ReLU(), nn.LazyLinear(10))
net(X).shape



torch.Size([2, 10])

## Executing code in the forward propagation method

Sequential() is not so helpful when we want to include python control flow during forward propagation or apply some mathematical operations on the output of layers instead of relying on predefined network layers. 

We may also use constant parameters that are not a result of previous iteration or are updatable parameters. 

Defining an MLP that does this


In [18]:
class FixedHiddenMLP(nn.Module):
  def __init__(self):
    super().__init__()
    self.rand_weight = torch.rand((20, 20))
    self.linear = nn.LazyLinear(20)
  def forward(self, X):
    X = self.linear(X)
    X = F.relu(X @ self.rand_weight + 1)
    # Reusing the fully connected layer. This is equivalent to sharing parameters with two fully connected layers
    X = self.linear(X)
    # This may not be seen in a real life neural network. Just to showcase the advantage of creating a custom class instead of using Sequential() class. 
    while X.abs().sum() > 1:
      X /= 2
    return X.sum()

In [19]:
net = FixedHiddenMLP()
net(X)



tensor(0.0711, grad_fn=<SumBackward0>)

We can also use Sequntial inside of class in other words nesting of modules is possible. 

In [20]:
class NestMLP(nn.Module):
  def __init__(self):
    super().__init__()
    self.net = nn.Sequential(nn.LazyLinear(64), nn.ReLU(), nn.LazyLinear(32), nn.ReLU())
    self.linear = nn.LazyLinear(16)

  def forward(self, X):
    return self.linear(self.net(X))

In [21]:
chimera = nn.Sequential(NestMLP(), nn.LazyLinear(20), FixedHiddenMLP())
chimera(X)



tensor(0.3221, grad_fn=<SumBackward0>)

## Parameter Management

Sometimes we may need to access the parameters of the model. Either to store them in the disk, or when we are working with a complex model and don't want to leave the initialization to the library, or any similar reasons. 

In [22]:
net = nn.Sequential(nn.LazyLinear(8), nn.ReLU(), nn.LazyLinear(1))
X = torch.rand(size = (2, 4))
net(X).shape



torch.Size([2, 1])

### Parameter Access

Each layer's attributes are available to be accessed in it's corresponding attribute. 

In [23]:
net[2].state_dict()

OrderedDict([('weight',
              tensor([[ 0.0707,  0.0520, -0.2712, -0.2416, -0.2818, -0.3370,  0.2806, -0.0592]])),
             ('bias', tensor([-0.1263]))])

#### Targeted Parameters

Parameters are complex objects containing values, gradients and additional information. When requested, PyTorch returns a parameter object. So we need to request the data explicityly if we need to access the underlying numerical values. 

In [24]:
type(net[2].weight), net[2].weight.data

(torch.nn.parameter.Parameter,
 tensor([[ 0.0707,  0.0520, -0.2712, -0.2416, -0.2818, -0.3370,  0.2806, -0.0592]]))

Since this network's backpropagation has not been initiated yet, grad should be a None value. 

In [26]:
net[2].bias.grad == None

True

#### All parameters at once

In [27]:
[(name, param.shape) for name, param in net.named_parameters()]

[('0.weight', torch.Size([8, 4])),
 ('0.bias', torch.Size([8])),
 ('2.weight', torch.Size([1, 8])),
 ('2.bias', torch.Size([1]))]

### Tied Parameters/Weight Sharing

In [28]:
shared = nn.LazyLinear(8)
net = nn.Sequential(nn.LazyLinear(8), nn.ReLU(), shared, nn.ReLU(), shared, nn.ReLU(), nn.LazyLinear(1))
net(X)



tensor([[-0.0531],
        [-0.0540]], grad_fn=<AddmmBackward0>)

In [30]:
print(net[2].weight.data[0] == net[4].weight.data[0])
net[2].weight.data[0, 0] == 100
# Since they are the same object, changing at one place will be reflected on the other side
print(net[2].weight.data[0] == net[4].weight.data[0])

tensor([True, True, True, True, True, True, True, True])
tensor([True, True, True, True, True, True, True, True])


Since they are shared, the gradients are also added during backpropagation