# Parameter Initialization

Most deep learning frameworks prove default random initializations to its layers. However, we often want to initialize our weights according to various other protocols. 

In [2]:
import torch
from torch import nn

net = nn.Sequential(nn.LazyLinear(8), nn.ReLU(), nn.LazyLinear(1))
X = torch.rand(size=(2, 4))
net(X).shape



torch.Size([2, 1])

## Built-in Initialization

The code below initializes all weight parameters as Gaussian random variables with standard deviation 0.01, while bias parameters cleared to zero. 

In [3]:
def init_normal(module):
    if type(module) == nn.Linear:
        nn.init.normal_(module.weight, mean=0, std=0.01)
        nn.init.zeros_(module.bias)

net.apply(init_normal)
net[0].weight.data[0], net[0].bias.data[0]

(tensor([-0.0086, -0.0004, -0.0026, -0.0049]), tensor(0.))

We can also initialize all the parameters to a given constant value (say, 1). 

In [4]:
def init_constant(module):
    if type(module) == nn.Linear:
        nn.init.constant_(module.weight, 1)
        nn.init.zeros_(module.bias)

net.apply(init_constant)
net[0].weight.data[0], net[0].bias.data[0]

(tensor([1., 1., 1., 1.]), tensor(0.))

We can also apply different initializers for certain blocks. For example we initialize the first layer with the Xavier initializer and initialize the second layer to a constant value of 42.

In [5]:
def init_xavier(module):
    if type(module) == nn.Linear:
        nn.init.xavier_uniform_(module.weight)

def init_42(module):
    if type(module) == nn.Linear:
        nn.init.constant_(module.weight, 42)

net[0].apply(init_xavier)
net[2].apply(init_42)
print(net[0].weight.data[0])
print(net[2].weight.data)

tensor([0.1858, 0.5911, 0.0423, 0.2628])
tensor([[42., 42., 42., 42., 42., 42., 42., 42.]])
