## Initialization Methods

Traditionally, the weights of a neural network were set to small random numbers.

The initialization of the weights of neural networks is a whole field of study as the careful initialization of the network can speed up the learning process.

Modern deep learning libraries, such as Keras, offer a host of network initialization methods, mostb are variations of initializing the weights with small random numbers.

For example, the current methods are available in Keras, for all network types:

- **Zeros:** Initializer that generates tensors initialized to 0.
- **Ones:** Initializer that generates tensors initialized to 1.
- **Constant:** Initializer that generates tensors initialized to a constant value.
- **RandomNormal:** Initializer that generates tensors with a normal distribution.
- **RandomUniform:** Initializer that generates tensors with a uniform distribution.
- **TruncatedNormal:** Initializer that generates a truncated normal distribution.
- **VarianceScaling:** Initializer capable of adapting its scale/variance to the shape of weights.
- **Orthogonal:** Initializer that generates a random orthogonal matrix.
- **Identity:** Initializer that generates the identity matrix.
- **glorot_normal:** Glorot normal initializer, also called Xavier normal initializer.
- **glorot_uniform:** Glorot uniform initializer, also called Xavier uniform initializer.
- **he_normal:** He normal initializer.
- **he_uniform:** He uniform variance scaling initializer.
- **lecun_normal:** LeCun normal initializer.
- **lecun_uniform:** LeCun uniform initializer.



See the [documentation](https://keras.io/initializers/) for more details.

**Examples: For few important initializers**

![main_image](main_tiny.png)

In [2]:
import tensorflow as tf
from tensorflow import keras

In [7]:
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt

In [9]:
np.set_printoptions(precision=4)

In [65]:
model = keras.Sequential([keras.layers.Dense(4, input_shape=(3,) , activation="relu", kernel_initializer=keras.initializers.glorot_uniform()),
                          keras.layers.Dense(2, activation="sigmoid", kernel_initializer=keras.initializers.he_uniform())])

In [42]:
model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_4 (Dense)              (None, 4)                 16        
_________________________________________________________________
dense_5 (Dense)              (None, 2)                 10        
Total params: 26
Trainable params: 26
Non-trainable params: 0
_________________________________________________________________


In [43]:
model.layers

[<tensorflow.python.keras.layers.core.Dense at 0x1f03fe4a940>,
 <tensorflow.python.keras.layers.core.Dense at 0x1f03fe4a630>]

In [44]:
hidden_layer = model.layers[0]

In [45]:
hidden_layer.get_config()

{'name': 'dense_4',
 'trainable': True,
 'batch_input_shape': (None, 3),
 'dtype': 'float32',
 'units': 4,
 'activation': 'relu',
 'use_bias': True,
 'kernel_initializer': {'class_name': 'GlorotUniform',
  'config': {'seed': None}},
 'bias_initializer': {'class_name': 'Zeros', 'config': {}},
 'kernel_regularizer': None,
 'bias_regularizer': None,
 'activity_regularizer': None,
 'kernel_constraint': None,
 'bias_constraint': None}

In [46]:
hidden_layer_weights,hidden_layer_biases = hidden_layer.get_weights()

In [47]:
hidden_layer_weights.shape

(3, 4)

In [48]:
hidden_layer_weights

array([[ 0.1938, -0.3414,  0.6044,  0.6202],
       [ 0.5674, -0.5828,  0.4806, -0.436 ],
       [-0.6131,  0.6881, -0.3588,  0.8258]], dtype=float32)

In [49]:
fan_in = 3
fan_out = 4
fan_avg = (fan_in+fan_out)/2

In [50]:
import math
std_deviation = math.sqrt(1.0/fan_avg)
std_deviation

0.5345224838248488

In [51]:
truncated_range= (-2*std_deviation, 2*std_deviation)
truncated_range

(-1.0690449676496976, 1.0690449676496976)

for Glorot  uniform initializer, the limit is squrt of 3/fan_avg, let us find out:


In [52]:
limit = math.sqrt(3/fan_avg)
limit

0.9258200997725514

**Exercise:** check for `he_unifrom` initialization for output layer.

In [66]:
output_layer = model.layers[1]

In [67]:
output_layer_weights, output_layer_biases = output_layer.get_weights()

In [68]:
output_layer_weights.shape


(4, 2)

In [69]:
output_layer_weights

array([[-0.3992, -0.3198],
       [-1.1526,  0.666 ],
       [-0.6927, -0.5481],
       [-0.2715, -0.3956]], dtype=float32)

**he_uniform**


`keras.initializers.he_uniform(seed=None)`
He uniform variance scaling initializer.

It draws samples from a uniform distribution within [-limit, limit] where limit is sqrt(6 / fan_in) where fan_in is the number of input units in the weight tensor.

In [59]:
limit = math.sqrt(6/4)
limit

1.224744871391589