# Chapter 11. Training Deep Neural Networks

## The Vanishing/Exploding Gradients Problems

### Glorot and He Initialization
* By default, Keras uses Glorot initialization with a uniform distribution. When you create a layer, you can switch to He initialization by setting kernel_initializer="he_uniform" or kernel_initializer="he_normal" like this:

| Initialization | Activation functions                 | σ² (Normal)      |
|----------------|--------------------------------------|------------------|
| Glorot         | None, tanh, sigmoid, softmax        | 1 / fan_avg      |
| He             | ReLU, Leaky ReLU, ELU, GELU, Swish, Mish | 2 / fan_in       |
| LeCun          | SELU                                | 1 / fan_in       |


In [1]:
import tensorflow as tf

dense = tf.keras.layers.Dense(50, activation="relu", kernel_initializer="he_normal")

* Alternatively, you can obtain any of the initializations listed in **Table**  and more using the `VarianceScaling` initializer. For example, if you want  **He initialization** with a uniform distribution and based on *fan_avg*  (rather than *fan_in*), you can use the following code:


In [2]:
he_avg_init = tf.keras.initializers.VarianceScaling(scale=2., mode="fan_avg", distribution="uniform")
dense = tf.keras.layers.Dense(
    50,
    activation="sigmoid",
    kernel_initializer=he_avg_init #Here it is...
)

### Better Activation Functions

#### Leaky ReLU

* Keras includes the classes LeakyReLU and PReLU in the tf.keras.layers package. Just like for other ReLU variants, you should use He initialization with these. For example:

In [4]:
leaky_Relu = tf.keras.layers.LeakyReLU(negative_slope=0.2)

dense = tf.keras.layers.Dense(50, activation=leaky_Relu, kernel_initializer="he_normal")

* If you prefer, you can also use LeakyReLU as a separate layer in your model; it makes no difference for training and predictions:

In [7]:
# model = tf.keras.models.Sequential([
#     [...], # more layers
#     tf.keras.layers.Dense(50, kernel_initializer="he_normal"), # no activation
#     tf.keras.layers.LeakyReLU(alpha=0.2), # activation as a separate layer
#     [...] # more layers
# ])

#### ELU and SELU

In [8]:
# ELU
dense = tf.keras.layers.Dense(50, activation="elu", kernel_initializer="he_normal")

In [9]:
#SELU
dense = tf.keras.layers.Dense(50, activation="selu", kernel_initializer="lecun_normal")