## Building your Deep Neural Network: step by step

Welcome to your week4 assignemnt(part 1 of 2)! Previously you trained 2-layer Nearual Network with a single hidden layer. This week, you will build a deep neural network with as many layers as you want. 
- in this notebook, you'll implement all the functions required to build a deep neural nework.
- For the next assignment, you'll use these functions to build a deep neural network for image classification. 

#### By the end of this assignement, you'll be able to: 
- Use non-linear units like RelU to improve your model. 
- build  a deeper neural network (with more than 1 hidden layer)
- implement an easy-to-use neural network class



### 1 - Packages

First, import all the packages you'll need during this assignment.  
- numpy 
- matplotlib: is a library to plot graphs in Python
- dnn_utils provides some necessary funcitons for this notebook
- testCases provides soome test cases to assess to connectness of your functions
- np.random.seed(1) is used to keep all the random function calls consistent.  It helps grade your work. please don't change the seed!



In [9]:
import numpy as np
import h5py
import matplotlib.pyplot as plt

from testCases import *
from dnn_utils import sigmoid, sigmoid_backward, relu, relu_backward
from public_tests import *

import copy

%matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0)
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

%load_ext autoreload
%autoreload 2

np.random.seed(1)

### 2 -Outline

To build your neural network, you'll be implementing several 'helper functions'. There helper functions will be used in the next assignement to build a two-layer neural network and an L-layer neural network. 

Each small helper function will have detailed instructions to walk you through the necessary steps. Here is an outline of the steps in this assignemnt. 
- Initialize the parameters for a two-layer network and for an L-layer neural network. 
- Implement the forward propagation module (shown in the purple in the figure below)
    - Complete the LINEAR part of a layer's forward propagation step (resulting in Z)
    - The ACTIVATION function is provided for you (relu, sigmoid)
    - Combine the previous two steps into a new forward function
    - Stack the forward function L-1 time and add a sigmoid at the end. This give you a new L_model_forward function.  
-  Complete the loss
- Implement the backword propagation module 
- Finally, update the parameters



### 3 - Initialization

You will write two helper functions to initialize the parameters for your model. The first function will be used to initialize parameters for a two layer model.  The second one generalizes this initialization process to L layers. 


### 3.1 - 2-layer Neural Network 

#### Exercise 1 - initialize_parameters
Create and initialize the parameters of the 2-layer neural nework. 

**Instructions** 
- The model's structure is: LINEAR -> RELU -> LINEAR -> SIGMOID
- Use this random initialization for the weight matrics
- Use zero initialization for the biases


In [10]:
def initialize_parameters(n_x, n_h, n_y):
    """
    Argument: 
    n_x: size of the input layer 
    n_h: size of hidden layer
    n_y: size of output layer
    
    Returns:
    Parameters: python dictionary containing your parameters:
        W1 -- weight matrix of shape (n_h, n_x)
        b1 -- bias vector of shape (n_h, 1)
        W2 - weight matrix of shape (n_y, n_h)
        b2 - bias vector of shape (n_y, 1)
    """
    np.random.seed(1)
    #(≈ 4 lines of code)
    # W1 = ...
    # b1 = ...
    # W2 = ...
    # b2 = ...
    # YOUR CODE STARTS HERE
    W1 = np.random.randn(n_h, n_x) * 0.01
    b1 = np.zeros((n_h, 1))
    W2 = np.random.randn(n_y, n_h) * 0.01
    b2 = np.zeros((n_y, 1))
    
    # YOUR CODE ENDS HERE
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters    
    

In [11]:
print("Test Case 1:\n")
parameters = initialize_parameters(3,2,1)

print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

initialize_parameters_test_1(initialize_parameters)

print("\033[90m\nTest Case 2:\n")
parameters = initialize_parameters(4,3,2)

print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

initialize_parameters_test_2(initialize_parameters)

Test Case 1:

W1 = [[ 0.01624345 -0.00611756 -0.00528172]
 [-0.01072969  0.00865408 -0.02301539]]
b1 = [[0.]
 [0.]]
W2 = [[ 0.01744812 -0.00761207]]
b2 = [[0.]]
[92m All tests passed.
[90m
Test Case 2:

W1 = [[ 0.01624345 -0.00611756 -0.00528172 -0.01072969]
 [ 0.00865408 -0.02301539  0.01744812 -0.00761207]
 [ 0.00319039 -0.0024937   0.01462108 -0.02060141]]
b1 = [[0.]
 [0.]
 [0.]]
W2 = [[-0.00322417 -0.00384054  0.01133769]
 [-0.01099891 -0.00172428 -0.00877858]]
b2 = [[0.]
 [0.]]
[92m All tests passed.


### 3.2 L-layer Neural Network

The initialization for a deeper L-layer neural network is more complicated because there are many more weight matrices and bias vectors. 
When completing the initialize_parameters_deep function, you should make sure that your demensions match between each layer.  
For example, if the size of your input X is (12288, 209) (with m = 209 examples) then: 

<table style="width:100%">
    <tr>
        <td>  </td> 
        <td> <b>Shape of W</b> </td> 
        <td> <b>Shape of b</b>  </td> 
        <td> <b>Activation</b> </td>
        <td> <b>Shape of Activation</b> </td> 
    <tr>
    <tr>
        <td> <b>Layer 1</b> </td> 
        <td> $$(n^{[1]},12288)$$ </td> 
        <td> $(n^{[1]},1)$ </td> 
        <td> $Z^{[1]} = W^{[1]}  X + b^{[1]} $ </td> 
        <td> $(n^{[1]},209)$ </td> 
    <tr>
    <tr>
        <td> <b>Layer 2</b> </td> 
        <td> $(n^{[2]}, n^{[1]})$  </td> 
        <td> $(n^{[2]},1)$ </td> 
        <td>$Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}$ </td> 
        <td> $(n^{[2]}, 209)$ </td> 
    <tr>
       <tr>
        <td> $\vdots$ </td> 
        <td> $\vdots$  </td> 
        <td> $\vdots$  </td> 
        <td> $\vdots$</td> 
        <td> $\vdots$  </td> 
    <tr>  
   <tr>
       <td> <b>Layer L-1</b> </td> 
        <td> $(n^{[L-1]}, n^{[L-2]})$ </td> 
        <td> $(n^{[L-1]}, 1)$  </td> 
        <td>$Z^{[L-1]} =  W^{[L-1]} A^{[L-2]} + b^{[L-1]}$ </td> 
        <td> $(n^{[L-1]}, 209)$ </td> 
   <tr>
   <tr>
       <td> <b>Layer L</b> </td> 
        <td> $(n^{[L]}, n^{[L-1]})$ </td> 
        <td> $(n^{[L]}, 1)$ </td>
        <td> $Z^{[L]} =  W^{[L]} A^{[L-1]} + b^{[L]}$</td>
        <td> $(n^{[L]}, 209)$  </td> 
    <tr>
</table>



Remember that when you compute $W X + b $ in python, it carries out broadcasting. for example, if :


$$ W = \begin{bmatrix}
    w_{00}  & w_{01} & w_{02} \\
    w_{10}  & w_{11} & w_{12} \\
    w_{20}  & w_{21} & w_{22} 
\end{bmatrix}\;\;\; X = \begin{bmatrix}
    x_{00}  & x_{01} & x_{02} \\
    x_{10}  & x_{11} & x_{12} \\
    x_{20}  & x_{21} & x_{22} 
\end{bmatrix} \;\;\; b =\begin{bmatrix}
    b_0  \\
    b_1  \\
    b_2
\end{bmatrix}\tag{2}$$

Then $WX + b$ will be:

$$ WX + b = \begin{bmatrix}
    (w_{00}x_{00} + w_{01}x_{10} + w_{02}x_{20}) + b_0 & (w_{00}x_{01} + w_{01}x_{11} + w_{02}x_{21}) + b_0 & \cdots \\
    (w_{10}x_{00} + w_{11}x_{10} + w_{12}x_{20}) + b_1 & (w_{10}x_{01} + w_{11}x_{11} + w_{12}x_{21}) + b_1 & \cdots \\
    (w_{20}x_{00} + w_{21}x_{10} + w_{22}x_{20}) + b_2 &  (w_{20}x_{01} + w_{21}x_{11} + w_{22}x_{21}) + b_2 & \cdots
\end{bmatrix}\tag{3}  $$




#### Exercise 2 - initialize_parameters_deep

Implement initialization for an L-layer Neural Network

**Instructions**
- The model's structure is [LINEAR -> RELUE] X (L-1) -> LINEAR -> SIGMOID. I.e., It has L-1 layers using RELU activation funcition followed by an output layer with a sigmoid activation funtion. 
- 