# Parameter Initialization
The deep learning framework provides default random initializations to its layers. However, we often want to initialize our weights according to various other protocols. The framework provides most commonly used protocols, and also allows to create a custom initializer.

#### Pytorch
The deep learning framework provides default random initializations to its layers. However, we often want to initialize our weights according to various other protocols. The framework provides most commonly used protocols, and also allows to create a custom initializer.

In [50]:
import torch
from torch import nn
net = nn.Sequential(nn.LazyLinear(8), nn.ReLU(), nn.LazyLinear(1))
X = torch.rand(size=(2, 4))
net(X).shape



torch.Size([2, 1])

##### Tensorflow
By default, Keras initializes weight matrices uniformly by drawing from a range that is computed according to the input and output dimension, and the bias parameters are all set to zero. TensorFlow provides a variety of initialization methods both in the root module and the keras.initializers module.

In [12]:
import tensorflow as tf
from keras import models, layers
net_tf = models.Sequential([
    layers.Flatten(),
    layers.Dense(4, activation=tf.nn.relu),
    layers.Dense(1)
])
Xtf = tf.random.uniform((2, 4))
net_tf(Xtf).shape

TensorShape([2, 1])

## Built-in Initialization
Let’s begin by calling on built-in initializers. The code below initializes all weight parameters as Gaussian random variables with standard deviation 0.01, while bias parameters cleared to zero.

##### Pytorch

In [13]:
def init_normal(module):
    if type(module) == nn.LazyLinear:
        nn.init.normal_(module.weight, mean=0, std=0.01)
        nn.init.zeros_(module.bias)
net.apply(init_normal)
net[0].weight.data[0], net[0].bias.data[0]

(tensor([-0.4837,  0.2075, -0.3511,  0.3341]), tensor(-0.0970))

##### Tensorflow

In [42]:
net_tf2 = models.Sequential([
    layers.Flatten(),
    layers.Dense(4, activation=tf.nn.relu,
                 kernel_initializer=tf.random_normal_initializer(mean=0, stddev=0.01),
                 bias_initializer=tf.zeros_initializer()),
    layers.Dense(1)])
net_tf2(Xtf)

<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[0.00473327],
       [0.00602803]], dtype=float32)>

In [43]:
net_tf2.weights[0], net_tf2.weights[1]

(<tf.Variable 'dense_48/kernel:0' shape=(4, 4) dtype=float32, numpy=
 array([[ 0.01039849, -0.01009278, -0.00451558,  0.00464403],
        [-0.01296778, -0.00861183,  0.01227336, -0.00187183],
        [ 0.00573951,  0.0006217 , -0.01865394, -0.00032949],
        [ 0.00942527, -0.01458713, -0.00321218,  0.004323  ]],
       dtype=float32)>,
 <tf.Variable 'dense_48/bias:0' shape=(4,) dtype=float32, numpy=array([0., 0., 0., 0.], dtype=float32)>)

We can also initialize all the parameters to a given constant value (say, 1).
##### Tensorflow

In [44]:
net_tf = models.Sequential([
    layers.Flatten(),
    layers.Dense(
        4, activation=tf.nn.relu,
        kernel_initializer=tf.keras.initializers.Constant(1),
        bias_initializer=tf.zeros_initializer()),
    layers.Dense(1),
])

net_tf(Xtf)
net_tf.weights[0], net_tf.weights[1]

(<tf.Variable 'dense_50/kernel:0' shape=(4, 4) dtype=float32, numpy=
 array([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]], dtype=float32)>,
 <tf.Variable 'dense_50/bias:0' shape=(4,) dtype=float32, numpy=array([0., 0., 0., 0.], dtype=float32)>)

##### Pytorch

In [46]:
def init_constant(module):
    if type(module) == nn.LazyLinear:
        nn.init.constant_(module.weight, 1)
        nn.init.zeros_(module.bias)
net.apply(init_constant)
net[0].weight.data[0], net[0].bias.data[0]

(tensor([ 0.1969, -0.3471,  0.2994, -0.0401]), tensor(0.4927))

We can also apply different initializers for certain blocks. For example, below we initialize the first layer with the Xavier initializer and initialize the second layer to a constant value of 42.
##### Pytorch

In [51]:
def init_xavier(module):
    if type(module) == nn.LazyLinear:
        nn.init.xavier_uniform_(module.weight)
def init_42(module):
    if type(module) == nn.LazyLinear:
        nn.init.constant_(module.weight, 42)
net[0].apply(init_xavier)
net[2].apply(init_42)
print(net[0].weight.data[0])
print(net[2].weight.data)

tensor([0.4895, 0.2771, 0.3648, 0.0370])
tensor([[-0.1930, -0.3330,  0.1449,  0.2647,  0.1932,  0.2214,  0.2403,  0.2836]])


##### Tensorflow

In [49]:
net_tf = models.Sequential([
    layers.Flatten(),
    layers.Dense(
        4,
        activation=tf.nn.relu,
        kernel_initializer=tf.keras.initializers.GlorotUniform()),
    tf.keras.layers.Dense(
        1, kernel_initializer=tf.keras.initializers.Constant(42)),
])

net_tf(Xtf)
print(net_tf.layers[1].weights[0])
print(net_tf.layers[2].weights[0])

<tf.Variable 'dense_54/kernel:0' shape=(4, 4) dtype=float32, numpy=
array([[ 0.07217056, -0.01334167,  0.3533346 , -0.59766597],
       [ 0.6870113 , -0.7438208 , -0.76574075, -0.5723773 ],
       [-0.4310332 , -0.4496198 ,  0.4885251 ,  0.3261791 ],
       [ 0.22152168, -0.48189497,  0.5968824 , -0.64959604]],
      dtype=float32)>
<tf.Variable 'dense_55/kernel:0' shape=(4, 1) dtype=float32, numpy=
array([[42.],
       [42.],
       [42.],
       [42.]], dtype=float32)>


### Custom Initialization
Sometimes, the initialization methods we need are not provided by the deep learning framework.

In [53]:
def my_init(module):
    if type(module) == nn.LazyLinear:
        print("Init", *[(name, param.shape)
                        for name, param in module.named_parameters()][0])
        nn.init.uniform_(module.weight, -10, 10)
        module.weight.data *= module.weight.data.abs() >= 5

net.apply(my_init)
net[0].weight[:2]

tensor([[ 0.4895,  0.2771,  0.3648,  0.0370],
        [ 0.1121, -0.4048, -0.3783,  0.4683]], grad_fn=<SliceBackward0>)

In [54]:
class MyInit(tf.keras.initializers.Initializer):
    def __call__(self, shape, dtype=None):
        data=tf.random.uniform(shape, -10, 10, dtype=dtype)
        factor=(tf.abs(data) >= 5)
        factor=tf.cast(factor, tf.float32)
        return data * factor
net_tf = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(
        4,
        activation=tf.nn.relu,
        kernel_initializer=MyInit()),
    tf.keras.layers.Dense(1),
])

net_tf(Xtf)
print(net_tf.layers[1].weights[0])

<tf.Variable 'dense_56/kernel:0' shape=(4, 4) dtype=float32, numpy=
array([[ 6.2356224,  0.       , -6.2199545, -0.       ],
       [ 6.7111206,  0.       ,  0.       ,  0.       ],
       [-6.407988 ,  0.       ,  9.199654 ,  9.038624 ],
       [-0.       , -9.799301 ,  0.       ,  0.       ]], dtype=float32)>


## Summary
We can initialize parameters using built-in and custom initializers.