<a href="https://colab.research.google.com/github/Redcoder815/Deep_Learning_PyTorch/blob/main/10ParameterInitialization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import torch
from torch import nn

In [2]:
net = nn.Sequential(nn.LazyLinear(8), nn.ReLU(), nn.LazyLinear(1))
X = torch.rand(size=(2, 4))
net(X).shape

torch.Size([2, 1])

Normal distribution

In [8]:
def init_normal(module):
    if type(module) == nn.Linear:
        nn.init.normal_(module.weight, mean=0, std=0.01)
        nn.init.zeros_(module.bias)

net.apply(init_normal)
net[0].weight.data[0], net[0].bias.data[0]

(tensor([ 0.0219, -0.0026, -0.0062,  0.0102]), tensor(0.))

Constant

In [9]:
def init_constant(module):
    if type(module) == nn.Linear:
        nn.init.constant_(module.weight, 1)
        nn.init.zeros_(module.bias)

net.apply(init_constant)
net[0].weight.data[0], net[0].bias.data[0]

(tensor([1., 1., 1., 1.]), tensor(0.))

Xavier initializer

In [10]:
def init_xavier(module):
    if type(module) == nn.Linear:
        nn.init.xavier_uniform_(module.weight)

def init_42(module):
    if type(module) == nn.Linear:
        nn.init.constant_(module.weight, 42)

net[0].apply(init_xavier)
net[2].apply(init_42)
print(net[0].weight.data[0])
print(net[2].weight.data)

tensor([ 0.5111, -0.0740,  0.5338, -0.4167])
tensor([[42., 42., 42., 42., 42., 42., 42., 42.]])


Custom initialization

def my_init(module):: This defines the custom initialization function.

if type(module) == nn.Linear:: Similar to previous initialization functions, this ensures the logic only applies to nn.Linear layers.

print("Init", *[(name, param.shape) for name, param in module.named_parameters()][0]): This line prints information about the first parameter (which is usually the weight) of the linear layer being initialized. It shows the name of the parameter and its shape. You can see this in the output: Init weight torch.Size([8, 4]) and Init weight torch.Size([1, 8]) for your two linear layers.

nn.init.uniform_(module.weight, -10, 10): This initializes the weights of the linear layer with values drawn uniformly from the range [-10, 10].

module.weight.data *= module.weight.data.abs() >= 5: This is the most unique part of this custom initialization. It's a conditional modification of the weights:

module.weight.data.abs(): Calculates the absolute value of each weight.
module.weight.data.abs() >= 5: Creates a boolean mask (a tensor of True and False values) where True indicates a weight whose absolute value is 5 or greater, and False indicates a weight whose absolute value is less than 5.
*= (boolean mask): Multiplies the original weights by this boolean mask. In PyTorch, True is treated as 1 and False as 0. Therefore, any weight whose absolute value was less than 5 gets multiplied by 0 (becoming 0), while weights whose absolute value was 5 or greater get multiplied by 1 (retaining their original value). The effect is that only weights with an absolute value of 5 or more are kept; all others are zeroed out.
net.apply(my_init): This applies your my_init function to every module (layer) in the net sequential model.

net[0].weight[:2]: Finally, this displays the first two rows of the weight matrix of the first linear layer (net[0]) after the my_init function has been applied. You can observe from the output that some values are zeroed out (e.g., -0.0000) because their initial uniform value had an absolute magnitude less than 5, while others remain (e.g., -6.3531, -9.4595) because their magnitude was 5 or greater.

In [11]:
def my_init(module):
    if type(module) == nn.Linear:
        print("Init", *[(name, param.shape)
                        for name, param in module.named_parameters()][0])
        nn.init.uniform_(module.weight, -10, 10)
        module.weight.data *= module.weight.data.abs() >= 5

net.apply(my_init)
net[0].weight[:2]

Init weight torch.Size([8, 4])
Init weight torch.Size([1, 8])


tensor([[-6.3531, -0.0000, -0.0000, -9.4595],
        [ 7.6521,  0.0000, -9.0553,  7.1844]], grad_fn=<SliceBackward0>)

In [12]:
net[0].weight.data[:] += 1
net[0].weight.data[0, 0] = 42
net[0].weight.data[0]

tensor([42.0000,  1.0000,  1.0000, -8.4595])