# Training Deep Neural Networks

if you need to tackle a complex problem, such as detecting hundreds of types of objects in high-resolution images? You may need to train a much deeper DNN

Training a deep DNN isn’t a walk in the park. Here are
some of the problems you could run into :

- You may be faced with the tricky *vanishing gradients* (gradient decreasing close to 0 can't update Weight) problem or the related *exploding gradients* (gradient increasing to infiny or NAN) problem.
- Training may be extremely slow

In this chapter we will go through each of these problems and present
techniques to solve them. 

## The Vanishing/Exploding Gradients Problems

the combination of the popular logistic sigmoid activation function and
the weight initialization technique that was most popular at the time (i.e.,
a normal distribution with a mean of 0 and a standard deviation of 1). the fact
that the logistic function has a mean of 0.5, not 0

In [4]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt

In [2]:
def sigmoid(z) :
    return 1 / (1 + np.exp(-z))

<img src='satur.png' />

Xavier and He Initialization

In [7]:
[name for name in dir(keras.initializers) if not name.startswith("_")]

['Constant',
 'GlorotNormal',
 'GlorotUniform',
 'HeNormal',
 'HeUniform',
 'Identity',
 'Initializer',
 'LecunNormal',
 'LecunUniform',
 'Ones',
 'Orthogonal',
 'RandomNormal',
 'RandomUniform',
 'TruncatedNormal',
 'VarianceScaling',
 'Zeros',
 'constant',
 'deserialize',
 'get',
 'glorot_normal',
 'glorot_uniform',
 'he_normal',
 'he_uniform',
 'identity',
 'lecun_normal',
 'lecun_uniform',
 'ones',
 'orthogonal',
 'random_normal',
 'random_uniform',
 'serialize',
 'truncated_normal',
 'variance_scaling',
 'zeros']

In [14]:
keras.layers.Dense(10, activation='relu', kernel_initializer='he_normal')

<keras.layers.core.dense.Dense at 0x1ab899812a0>

In [15]:
init = keras.initializers.VarianceScaling(scale=.2, mode='fan_avg', distribution='uniform')
keras.layers.Dense(10, activation='relu', kernel_initializer=init)

<keras.layers.core.dense.Dense at 0x1ab89981f00>

### Nonsaturating Activation Functions

In [16]:
def leaky_relu(z, alpha=0.01) :
    return np.maximum(alpha*z, z)

<img src="leak.png"/>

In [17]:
[activation for activation in dir(keras.activations) if not activation.startswith("_")]

['deserialize',
 'elu',
 'exponential',
 'gelu',
 'get',
 'hard_sigmoid',
 'linear',
 'relu',
 'selu',
 'serialize',
 'sigmoid',
 'softmax',
 'softplus',
 'softsign',
 'swish',
 'tanh']

In [20]:
[layer for layer in dir(keras.layers) if "relu" in layer.lower()]

['LeakyReLU', 'PReLU', 'ReLU', 'ThresholdedReLU']