## Exploding and Vanishing Gradient Demo

Source: https://github.com/grasool/explore-gradients

Increasing the depth of a neural netwrok generally leads to increased accuracy. However, with the increasing number of layers in a neural netwroks, the gradients of the loss function with respect to the unknown parameters (weights and biases) may either explode or vanish.

In [9]:
import keras
from keras.datasets import mnist
from keras.layers import Dense, Input
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.layers import BatchNormalization
from keras.layers import Activation
from keras.callbacks import TensorBoard
from keras.utils import np_utils
import pandas
import math
from keras.models import Model
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' ### just to remove memory warnings
import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit
# from tensorflow.examples.tutorials.mnist import input_data
import matplotlib.pyplot as plt
%matplotlib inline

"""
Creates MLP layers 
Fits the model
Saves the model
Saves csv for loss and accuracy (trainig and validation)
Saves tensorboard summary for Gradients
@author: Ghulam Rasool
"""

In [10]:
# Specify number of layers
N_LAYERS = 25

# Width of hidden layer, number of neurons in the hidden layer. all layers have same width. 
n_hwidth = 128
batch_size = 128
n_classes = 10
epochs = 20

n_layers = N_LAYERS -1

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
y_train = keras.utils.np_utils.to_categorical(y_train, n_classes)
y_test = keras.utils.np_utils.to_categorical(y_test, n_classes)

60000 train samples
10000 test samples


In [11]:
inputs = Input(shape=(784,))

def fCreate_Layers(n_layers, inputs):
    x = inputs
    for k in range(n_layers):
        x = Dense(n_hwidth)(x)
        x = BatchNormalization()(x)
        x = Activation('relu')(x)
        print('Layer %d added' % (k+1))
        
    return x

# Create all layers    
x_all = fCreate_Layers(n_layers, inputs)
# Output layer
predictions = Dense(n_classes, activation='softmax')(x_all)
print('Output layer added')
# Create Model
model = Model(inputs=inputs, outputs=predictions)
# Print model summary
model.summary()

model.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy'])

history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test))

model_name = 'MLP_No_SKIP_%s.h5' % N_LAYERS
csv_name = 'MLP_No_SKIP_%s.csv' % N_LAYERS

model.save(model_name)
pandas.DataFrame(history.history).to_csv(csv_name)

Layer 1 added
Layer 2 added
Layer 3 added
Layer 4 added
Layer 5 added
Layer 6 added
Layer 7 added
Layer 8 added
Layer 9 added
Layer 10 added
Layer 11 added
Layer 12 added
Layer 13 added
Layer 14 added
Layer 15 added
Layer 16 added
Layer 17 added
Layer 18 added
Layer 19 added
Layer 20 added
Layer 21 added
Layer 22 added
Layer 23 added
Layer 24 added
Output layer added
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 784)]             0         
                                                                 
 dense (Dense)               (None, 128)               100480    
                                                                 
 batch_normalization (BatchN  (None, 128)              512       
 ormalization)                                                   
                                                                 
 activation (Activati

"""
Creates MLP layers with skip connections
Fits the model
Saves the model
Saves csv for loss and accuracy (trainig and validation)
Saves tensorboard summary for Gradients
@author: Ghulam Rasool
"""

In [12]:
inputs = Input(shape=(784,))

def fCreate_Layers(n_layers, inputs):
    x = Dense(n_hwidth)(inputs)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    print('Layer 1 added')
    
    n_skip_layers = math.floor(n_layers/2)
    
    for k in range(n_skip_layers):
        x = Dense(n_hwidth)(x)
        x = BatchNormalization()(x)
        x = Activation('relu')(x)
        print('Layer %d added' % (2*k+2))
        
        y = Dense(n_hwidth)(x)
        y = BatchNormalization()(y)
        x = keras.layers.add([y, x])
        x = Activation('relu')(x)
        print('Layer %d with skip connection  added' % (2*k+3))

    return x

In [13]:
# Create all layers  
x_all = fCreate_Layers(n_layers, inputs)
# Output layer
predictions = Dense(n_classes, activation='softmax')(x_all)
print('Output layer added')
# Create Model
model = Model(inputs=inputs, outputs=predictions)
# Print model summary
model.summary()

model.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy'])

ttb_dir = './MLP_SKIP_%s' % n_layers

history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test))


model_name = 'MLP_SKIP_%s.h5' % n_layers
csv_name = 'MLP_SKIP_%s.csv' % n_layers

model.save(model_name)
pandas.DataFrame(history.history).to_csv(csv_name)

Layer 1 added
Layer 2 added
Layer 3 with skip connection  added
Layer 4 added
Layer 5 with skip connection  added
Layer 6 added
Layer 7 with skip connection  added
Layer 8 added
Layer 9 with skip connection  added
Layer 10 added
Layer 11 with skip connection  added
Layer 12 added
Layer 13 with skip connection  added
Layer 14 added
Layer 15 with skip connection  added
Layer 16 added
Layer 17 with skip connection  added
Layer 18 added
Layer 19 with skip connection  added
Layer 20 added
Layer 21 with skip connection  added
Layer 22 added
Layer 23 with skip connection  added
Layer 24 added
Layer 25 with skip connection  added
Output layer added
Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_2 (InputLayer)           [(None, 784)]        0           []                               
                                     

1st Row - MLP No Skip Connection

2nd Row - MLP No Skip Connection No Batch Normalization

3rd Row - MLP With Skip Connection

![alt text](vanishingGrads.png "Effect of adding skip connections")