Vanishing/Exploding Gradient Problem
By default keras use Glorot initialization.

In [2]:
# keras.layers.Dense(10, activation="relu", kernel_initializer="he_normal")

Relu is a very good activation function but some it may suffer from a problem called dying ReLu. Where effectively some neurons die. Solution - use leaky relu.
There are many other ReLU variants also that outperform the ReLU.

In [None]:
# using leaky ReLU
model = keras.models.Sequential([
    [...]
    keras.layers.Dense(10, kernel_initializer="he_normal"),
    keras.layers.LeakyReLU(alpha=0.2),
    [...]
])
# using leaky PReLU
model = keras.models.Sequential([
    [...]
    keras.layers.Dense(10, kernel_initializer="he_normal"),
    keras.layers.PReLU(),
    [...]
])
# using leaky SELU
model = keras.models.Sequential([
    [...]
    keras.layers.Dense(10, activation="selu", kernel_initializer="lecun_normal")
    [...]
])


For better Optimizations - Pg351

# Regularization to Avoid Overfitting

In [None]:
layer = keras.layers.Dense(100, activation="elu",
    kernel_initializer="he_normal",
    kernel_regularizer=keras.regularizers.l2(0.01))

In [None]:
# Better code
from functools import partial
RegularizedDense = partial(keras.layers.Dense,
                          activation="elu",
                          kernel_initializer="he_normal",
                          kernel_regularizater=keras.regularizers.l2(0.01))

model = keras.models.Sequential([
    RegularizedDense(300),
    RegularizedDense(200),
    RegularizedDense(100)
])