In [1]:
ls

training_deep_neural_networks.ipynb


In [2]:
# Chapter 10 showed us artificial neural networks and we trained a deep one for the first time.

# Some probelms you may find in dnn is: 
# Vanishing gradients or exploding gradients (both making lower layers hard to train)
# Not enough training data means too costly to label
# Extremely slow training (some times)
# so many parameters means we could overfit

# In this chapter we will study each of these problems and how we could get around them!

In [3]:
# Vanishing and Exploding Gradients

# From chaper 10 we know back propagation works by going from the output layer to the input 
# layer. Propagating the errors as well! Overall DNN's suffer from unstable gradients and thus
# different layers many learn at different speeds.

# A solution preposed by Glorot and Bengio. The connetion weights must be initialised randomly
# as described below, where: fan_avg = (fan_in +fan_out)/2

# Glorot initialization (when using logistic activiation function)
# normal distribution wiht mean 0 and varience sigma**2 = 1/fan_avg
# or a uniform dist between -r and r where r = sqrt(3/fan_avg)

In [4]:
# The initialization strategy for the ReLO activation fuction is cometimes called He initialization
# The SELU activation function will be explained later in the chapter (init = LeCun)

In [5]:
# By default keras uses Glorot init wiht uniform ditributions. You can change this to He with:
import tensorflow as tf
from tensorflow import keras

keras.layers.Dense(10, activation="relu", kernel_initializer="he_normal")

<tensorflow.python.keras.layers.core.Dense at 0x7fe152004950>

In [7]:
# If you want He init wiht a uniform distribution but based pn fan_avg rather than fan_in,
# you can use the VarianceScaling init as follows:

he_avg_init = keras.initializers.VarianceScaling(scale=2, mode='fan_avg',
                                                 distribution='uniform')
keras.layers.Dense(10, activation="sigmoid", kernel_initializer=he_avg_init)

<tensorflow.python.keras.layers.core.Dense at 0x7fe158dc8310>

In [10]:
# A better version of relu is the leaky relu which has a "leak" in negative values (look online)
# example of use:
model = keras.models.Sequential([
    ...
    keras.layers.Dense(10, kernel-initializer="he_normal"),
    keras.layeres.LeakyReLU(alpha=0.2),
    ...
])

SyntaxError: invalid syntax (<ipython-input-10-10c2bdbbecb0>, line 5)

In [11]:
# for PReLU replace LeakyReLU with PReLU. This is where alpha is no longer a hyper param
# instead it is a paramerter that changes with the model. It is great for large image datasets!

In [12]:
# Batch Normalisation

# Gradient problems can still come through during training. Ioffe and Szegedy proposed BN to
# address this problem.

# It consists of adsing an operation in the model just before or after the activation function
# of each hidden layer.The function will simply zero-centre and normalize each input. In other
# words if you add a BN layer as the first layer of your nn then you dont need to standardise
# your data sets (eg standard scaler)

# See equation 11-3 

# During training the BN standardises the inputs, rescales and offsets them.

In [13]:
# The main negatives wiht BN is firstly that it adds a layer of complexity and worse than this is
# that it adds run time. But this can be avoided. Now let us impleement BN using keras

In [None]:
model = keras.model