# Parameter Initialization
Random initialization and zero initialization for weight and bias parameters. A well chosen initialization can speed
up gradient descent and increase the odds of gradient descent converging to a lower training error

### Three Kinds of Initialization Patterns
- Zeros
- Random
- He

### Zeros Initialization
There are two parameters that need to be initialized
- Weight matrices
- Bias vectors

In [3]:
import numpy as np

In [24]:
def initialize_parameters_zeros(layers_dims):
    """
    Initializes weights and biases to zeros for each layer in the network
    
    Arguments:
    layers_dims -- a python list/array containing the size of each layer
    
    Returns:
    parameters -- a python dictionary containing parameters "W1", "b1", ... "WL", "bL"
                  
                  "W1" -- weight matrix of shape (layers_dims[1], layers_dims[0])
                  "b1" -- bias vector of shape (layers_dims[1], 0)
                  "WL" -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
                  "bL" -- bias vector of shape (layers_dims[L], 0)
    """
    
    parameters = {}
    L = len(layers_dims)
    print("L: " + str(L))
    
    for l in range(1, L):
        parameters["W" + str(l)] = np.zeros((layers_dims[l], layers_dims[l-1]))
        parameters["b" + str(l)] = np.zeros((layers_dims[l], 1))
    
    return parameters

In [30]:
parameters = initialize_parameters_zeros([3,2,1])
print(parameters["W1"].shape)
print(parameters["b1"].shape)
print(parameters["W2"].shape)
print(parameters["b2"].shape)

L: 3
(2, 3)
(2, 1)
(1, 2)
(1, 1)
(2, 3)
(2, 1)
(1, 2)
(1, 1)


### Random Initialization
Here we initialize the weight matrice to random numbers to reduce symmetry so each neuron can learn a different function of its inputs

In [8]:
def initialize_parameters_random(layers_dims):
    """
    Initializes weights to random numbers and biases to zeros for each layer in the network
    
    Arguments:
    layers_dims -- a python list/array containing the size of each layer
    
    Returns:
    parameters -- a python dictionary containing parameters "W1", "b1", ... "WL", "bL"
                  
                  "W1" -- weight matrix of shape (layers_dims[1], layers_dims[0])
                  "b1" -- bias vector of shape (layers_dims[1], 0)
                  "WL" -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
                  "bL" -- bias vector of shape (layers_dims[L], 0)
    """
    parameters = {}
    L = len(layers_dims)
    
    for l in range(1,L):
        parameters["W"+str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1])*100
        parameters["b"+str(l)] = np.zeros((layers_dims[l], 1))
    
    return parameters

In [10]:
parameters = initialize_parameters_random([3,2,1])
print(parameters)

{'W1': array([[  39.44652651,  113.81267306, -197.69783122,  134.52115664],
       [ -48.5511674 , -146.43350257,   27.55507781,  -40.72846812],
       [ -54.32649667, -236.51869032,  -59.78637922,  -85.53514902]]), 'b1': array([[ 0.],
       [ 0.],
       [ 0.]]), 'W2': array([[-73.90643561,  68.636284  , -27.28415422],
       [-40.38758833,  42.63536627, -96.9557333 ]]), 'b2': array([[ 0.],
       [ 0.]]), 'W3': array([[-48.96254441, -81.02019568]]), 'b3': array([[ 0.]])}


### He Initialization
Similar to Xavier Initialization except Xavier uses a scaling factor for weight where as He uses a Square Root
$\sqrt{\frac{2}{\text{dimension of the previous layer}}}$

In [33]:
def initialize_parameters_he(layers_dims):
    """
    Initializes weights to random numbers and biases to zeros for each layer in the network
    
    Arguments:
    layers_dims -- a python list/array containing the size of each layer
    
    Returns:
    parameters -- a python dictionary containing parameters "W1", "b1", ... "WL", "bL"
                  
                  "W1" -- weight matrix of shape (layers_dims[1], layers_dims[0])
                  "b1" -- bias vector of shape (layers_dims[1], 0)
                  "WL" -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
                  "bL" -- bias vector of shape (layers_dims[L], 0)
    """
    parameters = {}
    L = len(layers_dims)
    
    for l in range(1,L):
        parameters["W"+str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1])* np.sqrt(2/layers_dims[l-1])
        parameters["b"+str(l)] = np.zeros((layers_dims[l], 1))
    
    return parameters

In [34]:
parameters = initialize_parameters_he([3,2,1])
print(parameters)

{'W1': array([[ 0.96853535, -1.13612556,  0.10729257],
       [ 0.57502473,  0.24825157,  0.17108535]]), 'b1': array([[ 0.],
       [ 0.]]), 'W2': array([[ 0.59034135, -1.23267981]]), 'b2': array([[ 0.]])}
