In [1]:
import torch
from torch import nn

Default Initialization is using a uniform distribution that is computed according the the input and output dimension.

In [3]:
net = nn.Sequential(nn.LazyLinear(8), nn.ReLU(), nn.LazyLinear(1))
X = torch.rand(size=(2, 4))
net(X).shape



torch.Size([2, 1])

### What are the built-in initializations for pytorch?
All built-in initializtions are under **torch.nn.init**, Examples are:
* normal_(parameters,mean=mean,std=std) for normal distribution.
* zeros_(parameters) for all zeros.
* constant_(parameters,constant) for a constant of some value.

### How do I apply these to a given NN sequence or model?

Use the `apply` method under the nn i.e. net.apply(initialization_function)

In [7]:
def init_normal(module):
    if type(module) == nn.Linear:
        nn.init.normal_(module.weight, mean=0, std=0.01)
        nn.init.zeros_(module.bias)

net.apply(init_normal)
net[0].weight.data[0], net[0].bias.data[0]

(tensor([ 0.0170, -0.0032,  0.0056,  0.0070]), tensor(0.))

The initializtion does not necessarily need to apply to all layers. Instead it can be implied to individual layers like so:

In [8]:
def init_xavier(module):
    if type(module) == nn.Linear:
        nn.init.xavier_uniform_(module.weight)

def init_42(module):
    if type(module) == nn.Linear:
        nn.init.constant_(module.weight, 42)

net[0].apply(init_xavier)
net[2].apply(init_42)
print(net[0].weight.data[0])
print(net[2].weight.data)

tensor([-0.1064,  0.5866, -0.3540,  0.2383])
tensor([[42., 42., 42., 42., 42., 42., 42., 42.]])
