# Gradient checking

On this notebook, we'll briefly implement gradient checking to ensure that our gradient computations are correct. To do so, we'll use the first derivative formula:

$$f'(h) = lim_{\epsilon->0} \frac{f(h + \epsilon) - f(h - \epsilon)}{2 \, \epsilon} \tag{1}$$

With $\epsilon \simeq 10^{-7}$:

$$f_{approx}'(h) = \frac{f(h + \epsilon) - f(h - \epsilon)}{2 \, \epsilon} \tag{2}$$


Then, with
$$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(A^{[L](i)}, Y^{(i)})\tag{3}$$

We can write:
$$ \frac{\partial J}{\partial h}_{approx} = \frac{J(h + \epsilon) - J(h - \epsilon)}{2 \, \epsilon}\tag{4}$$

## What to do?

At the end of this notebook, we'll start again the cat example using the gradient check technique on the first gradients.

## Organization of this Notebook 
    1. Packages
    2. Gradient checking
    3. Model construction with Gradient checking
    4. Try with the cat training set and the 2_L-LayerNN architecture

## 1 - Packages ##

- [numpy](www.numpy.org): the fundamental package for scientific computing with Python.
- [h5py](http://www.h5py.org): a common package to interact with a dataset that is stored on an H5 file.
- [matplotlib](http://matplotlib.org): a famous library to plot graphs in Python.
- [L_NN](L_NN.py): my own L-layer Neural Network implementation

In [1]:
import numpy as np
import h5py
from L_NN import initialize, globalForwardPropagation, computeCost, globalBackwardPropagation, updateParameters
from py_utils import load_dataset # Copy/pasted loader

%matplotlib inline

  from ._conv import register_converters as _register_converters


## 2 - Gradient checking ##

As we are using dictionnaries, to vectorize the code we need to implement np.array TO Dictionnaries functions

In [2]:
def ParamNpArrayToDict(params, layers_dims, actFunctions):
    parameters = {}
    
    L = len(layerDims)
    ptr = 0
    for l in range(1, L):
        next_ptr = ptr + layers_dims[l]*layers_dims[l-1]
        parameters['W' + str(l)] = params[ptr:next_ptr].reshape((layers_dims[l],layers_dims[l-1]))
        ptr = next_ptr
        
        next_ptr = ptr + layers_dims[l]
        parameters['b' + str(l)] = params[ptr:next_ptr].reshape((layers_dims[l],1))
        ptr = next_ptr
        
        parameters['F' + str(l)] = actFunctions[l-1]

    return parameters

def ParamDictToNpArray(parameters):
    L = len(parameters) // 3
    
    new_param = np.reshape(parameters["W1"], (-1,1))
    params = new_param
    
    new_param = np.reshape(parameters["b1"], (-1,1))
    params = np.concatenate((params, new_param), axis=0)
    
    for l in range(1, L):
        new_paramW = np.reshape(parameters['W' + str(l+1)], (-1,1))
        new_paramb = np.reshape(parameters['b' + str(l+1)], (-1,1))
        
        params = np.concatenate((params, new_paramW), axis=0)
        params = np.concatenate((params, new_paramb), axis=0)
    
    return params

def GrdtDictToNpArray(gradients):
    L = len(gradients) // 3
    
    new_gradient = np.reshape(gradients["dW1"], (-1,1))
    grads = new_gradient
    
    new_gradient = np.reshape(gradients["db1"], (-1,1))
    grads = np.concatenate((grads, new_gradient), axis=0)
    
    for l in range(1, L):
        new_gradientW = np.reshape(gradients["dW" + str(l+1)], (-1,1))
        new_gradientb = np.reshape(gradients["db" + str(l+1)], (-1,1))
        
        grads = np.concatenate((grads, new_gradientW), axis=0)
        grads = np.concatenate((grads, new_gradientb), axis=0)
    
    return grads

Let's now implement the gradient checking. We will use the following:

$$ J^+ = J(param+\epsilon)\tag{5}$$
$$ J^- = J(param-\epsilon)\tag{6}$$

$$ \partial param_{approx} = \frac{J^+ - J^-}{2 \, \epsilon}\tag{7}$$

And then
$$ gradientCheck = \frac {\| \partial param - \partial param_{approx} \|_2}{\| \partial param \|_2 + \| \partial param_{approx} \|_2 } \tag{8}$$

In [3]:
def gradientCheck(parameters, gradients, layers_dims, actFunctions, X, Y, epsilon = 1e-7):
    
    L = len(parameters) // 3
    
    gradsArray = GrdtDictToNpArray(gradients)
    paramArray = ParamDictToNpArray(parameters)
    
    numWeightsAndBiases = gradsArray.shape[0]
    
    gradApprox = np.zeros((numWeightsAndBiases, 1))
    
    J_plus = np.zeros((numWeightsAndBiases, 1))
    J_minus = np.zeros((numWeightsAndBiases, 1))
    
    # Compute gradApprox
    for i in range(numWeightsAndBiases):
        
        paramPlus = np.copy(paramArray)
        paramPlus[i][0] = paramPlus[i][0] + epsilon
        _, _, AL = globalForwardPropagation(X, ParamNpArrayToDict(paramPlus, layers_dims, actFunctions))
        J_plus[i] = computeCost(AL, Y)
        
        paramMinus = np.copy(paramArray)
        paramMinus[i][0] = paramMinus[i][0] - epsilon     
        _, _, AL = globalForwardPropagation(X, ParamNpArrayToDict(paramMinus, layers_dims, actFunctions))
        J_minus[i] = computeCost(AL, Y)
        
        gradApprox[i] = (J_plus[i] - J_minus[i]) / (2 * epsilon)
    
    grdtCheck = np.linalg.norm(gradsArray - gradApprox) / (np.linalg.norm(gradsArray) + np.linalg.norm(gradApprox))

    return grdtCheck

## 3 - Model construction with Gradient checking ##

In [4]:
def modelWithGradientCheck(X, Y, layers_dims, actFunctions, learning_rate = 0.0008, num_iterations = 2, print_cost = False):
    
    # Initialize the weights and the bias
    parameters = initialize(layers_dims, actFunctions)
    
    for i in range(num_iterations):

        Zs, As, AL = globalForwardPropagation(X, parameters)
        
        cost = computeCost(AL, Y)
        
        grads = globalBackwardPropagation(Y, Zs, As, parameters)
        
        grdtCheck = gradientCheck(parameters, grads, layers_dims, actFunctions, X, Y)
            
        parameters = updateParameters(parameters, grads, learning_rate)
                
        if print_cost:
            print ("Cost after iteration %i: %f" %(i, cost))
            
        if grdtCheck > 2e-7:
            print ("\033[93m" + "There is a mistake in the backward propagation! gradientCheck = " + str(grdtCheck) + "\033[0m")
        else:
            print ("\033[92m" + "The Backward propagation works perfectly fine! gradientCheck = " + str(grdtCheck) + "\033[0m")
    
    return parameters

## 4 - Try with the cat training set and the 2_L-LayerNN architecture ##

In [5]:
training_imgLoaded, training_label, testing_imgLoaded, testing_label, classes = load_dataset()

Here, we reshape and normalize the images to have a $(height*width*3 \text{ x } 1)$ matrix with pixels values between $0$ and $1$.

In [6]:
training_img = training_imgLoaded.reshape(training_imgLoaded.shape[0], -1).T / 255.
testing_img = testing_imgLoaded.reshape(testing_imgLoaded.shape[0], -1).T / 255.

In [7]:
print("Number of pixels: %i" %(training_img.shape[0]))
print("Number of training data: %i" %(training_img.shape[1]))

Number of pixels: 12288
Number of training data: 209


Here, we choose the dimensions of each layers and the activation functions:

In [8]:
layerDims = [12288, 20, 7, 5, 1]
actFunctions =  ["ReLU", "ReLU", "ReLU", "sigmoid"]

In [9]:
parameters = modelWithGradientCheck(training_img, training_label, layerDims, actFunctions, print_cost = True)

number of Layers: 4
  1 / W(20, 12288) / b(20, 1) / ReLU
  2 / W(7, 20) / b(7, 1) / ReLU
  3 / W(5, 7) / b(5, 1) / ReLU
  4 / W(1, 5) / b(1, 1) / sigmoid

Cost after iteration 0: 0.764218
[92mThe Backward propagation works perfectly fine! gradientCheck = 1.679466501538512e-08[0m
Cost after iteration 1: 0.725598
[92mThe Backward propagation works perfectly fine! gradientCheck = 2.254014894917003e-08[0m
