Every neural network layer, architecture, or model in PyTorch is built as a subclass of `torch.nn.Module` class. Here we implement and explore in depth different sub-topics related.

In [45]:
import torch
import torch.nn as nn
import numpy as np
import torch.nn.functional as F
import matplotlib.pyplot as plt
from collections import OrderedDict

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


#### Pytorch nn.Parameter and nn.Modules

Parameters are Tensor subclasses, that have a very special property when used with Module s - when they’re assigned as Module attributes they are automatically added to the list of its parameters, and will appear e.g. in parameters() iterator.

In [37]:
# Here, the state consists of randomly-initialized weight and bias tensors that define the affine transformation. 
# Because each of these is defined as a Parameter, they are registered for the module and will automatically be tracked and 
# returned from calls to parameters()

class LinearLayer(nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(in_features, out_features))
        self.bias = nn.Parameter(torch.randn(out_features))
        
    def forward(self, input):
        return (input @ self.weight) + self.bias

linear_layer = LinearLayer(3, 4)

# it is a generator object
# print(linear_layer.parameters())

# note that the parameters are given the names of the class declared parameter variables
for name, param in linear_layer.named_parameters():
    print(f"Parameter: {name}, Values: {param}, Requires Grad: {param.requires_grad}")

x_input = torch.randn(3)
y_output = linear_layer(x_input)
print(f"\ninput: {x_input}.\noutput: {y_output}")

Parameter: weight, Values: Parameter containing:
tensor([[ 0.3147, -0.3086,  0.0777, -0.6696],
        [-1.5764,  0.8109,  0.4712,  0.7366],
        [-0.5918,  0.1557, -0.8665, -0.0236]], requires_grad=True), Requires Grad: True
Parameter: bias, Values: Parameter containing:
tensor([-2.4605, -0.4729, -1.6643, -0.1562], requires_grad=True), Requires Grad: True

input: tensor([-0.4777, -2.4996, -2.7967]).
output: tensor([ 2.9846, -2.7880, -0.4557, -1.6114], grad_fn=<AddBackward0>)


How different are nn.Parameters from torch.tensor()?

In [43]:
# The parameter below
param1 = nn.Parameter(torch.randn(3, 3)) # torch.Size([3, 3]), requires_grad=True as default
print(param1.requires_grad)

# is similar to ...
tensor = torch.randn(3, 3, requires_grad= True) # Not a parameter!
param2 = nn.Parameter(tensor)
print(param2.requires_grad)

# but tensor is not a parameter by it's own, meaning that it won't be found in the list of model.parameters()
# thus, tensor is not registered as a trainable parameter

# When a tensor is wrapped with nn.Parameter and assigned as an attribute of an nn.Module, 
# it's automatically added to the list returned by model.parameters() and model.named_parameters(), similar to the
# example in the previous cell. The optimizer (e.g., torch.optim.Adam) will also update nn.Parameters during training.
# Therefor, nn.Parameter allows you to introduce weights that are learnable.

# Note, to freeze a parameter, we can use param.requires_grad = False
# It will stay in model.parameters() but will not receive gradient updates.

True
True


#### Composing Multiple Layers using Sequential API

It is a container module that stores other nn.Module layers in the order they are passed and executes them sequentially in the forward() pass.

In [66]:
block = nn.Sequential(
    nn.Conv2d(3, 16, 3, padding=1),
    nn.BatchNorm2d(16),
    nn.ReLU()
)
print(block)

# we can nest nn.Sequential blocks inside other modules or inside other Sequentials
model = nn.Sequential(
    block,
    nn.MaxPool2d(2)
)
print(model)
print("Number of layers: " + str(len(model)))          # Number of layers

# by default, layers are given names based with int index
# print(model[1])

# for name, module in model.named_children():
#     print(f"Layer Name: {name}, Module: {module}")

Sequential(
  (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU()
)
Sequential(
  (0): Sequential(
    (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
Number of layers: 2


We can also use named layers with Sequential container, using `OrderedDict`

In [59]:
model = nn.Sequential(OrderedDict([
    ("conv", nn.Conv2d(3, 8, 3, padding=1)),
    ("relu", nn.ReLU()),
    ("pool", nn.MaxPool2d(2))
]))

print(f"Accessing the whole named model: \n{model}")
print(f"\nAccessing a single model layer: \n{model.conv}\n")

Accessing the whole named model: 
Sequential(
  (conv): Conv2d(3, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu): ReLU()
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)

Accessing a single model layer: 
Conv2d(3, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

Layer Name: conv, Module: Conv2d(3, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
Layer Name: relu, Module: ReLU()
Layer Name: pool, Module: MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)


#### Custom Modules (Decomposing Models into Blocks)