### Forward propagation with dropout

**Exercise**: Implement the forward propagation with dropout. You are using a 3 layer neural network, and will add dropout to the first and second hidden layers. We will not apply dropout to the input layer or output layer. 

**Instructions**:
You would like to shut down some neurons in the first and second layers. To do that, you are going to carry out 4 Steps:
1. In lecture, we dicussed creating a variable $d^{[1]}$ with the same shape as $a^{[1]}$ using `np.random.rand()` to randomly get numbers between 0 and 1. Here, you will use a vectorized implementation, so create a random matrix $D^{[1]} = [d^{[1](1)} d^{[1](2)} ... d^{[1](m)}] $ of the same dimension as $A^{[1]}$.
2. Set each entry of $D^{[1]}$ to be 0 with probability (`1-keep_prob`) or 1 with probability (`keep_prob`), by thresholding values in $D^{[1]}$ appropriately. Hint: to set all the entries of a matrix X to 0 (if entry is less than 0.5) or 1 (if entry is more than 0.5) you would do: `X = (X < 0.5)`. Note that 0 and 1 are respectively equivalent to False and True.
3. Set $A^{[1]}$ to $A^{[1]} * D^{[1]}$. (You are shutting down some neurons). You can think of $D^{[1]}$ as a mask, so that when it is multiplied with another matrix, it shuts down some of the values.
4. Divide $A^{[1]}$ by `keep_prob`. By doing this you are assuring that the result of the cost will still have the same expected value as without drop-out. (This technique is also called inverted dropout.)

In [10]:
import numpy as np

X = np.array([
    [0,0],
    [0,1],
    [1,0],
    [1,1]
])

Y = np.array([
    [0],
    [0],
    [0],
    [1]
])

m = X.shape[0]
keep_prob = 0.6
num_nodes = 400

W1 = np.random.randn(num_nodes,X.shape[1])*0.1
b1 = np.zeros((num_nodes,1))

W2 = np.random.randn(1,num_nodes)*0.1
b2 = np.zeros((1,X.shape[0]))

X = X.T
Y = Y.T

costs = []

for i in range(4000):
    # Foward Prop
    # LAYER 1
    Z1 = np.dot(W1,X) + b1
    A1 = 1/(1+np.exp(-Z1))
    
    # Apply Drop Out to the Final Layer
    D1 = np.random.rand(A1.shape[0], A1.shape[1])
    D1 = D1 < keep_prob
    A1 = A1 * D1
    A1 = A1 / keep_prob
    
    # LAYER 2
    Z2 = np.dot(W2,A1) + b2
    A2 = 1/(1+np.exp(-Z2))
    
    # Back Prop
    dZ2 = A2 - Y
    dW2 = (1/m)*np.dot(dZ2,A1.T)
    db2 = (1/m)*np.sum(dZ2,axis=1,keepdims=True)
    
    dZ1 = np.multiply(np.dot(W2.T, dZ2), 1 - np.power(A1, 2))
    dW1 = (1/m)*np.dot(dZ1,X.T)
    db1 = (1/m)*np.sum(dZ1,axis=1,keepdims=True)
    
    # Gradient Descent
    W2 = W2 - 0.01*dW2
    b2 = b2 - 0.01*db2
    
    W1 = W1 - 0.01*dW1
    b1 = b1 - 0.01*db1
    
    # Loss 
    L = (-1/m)*np.sum(Y*np.log(A2) + (1-Y)*np.log(1-A2))
    L = np.squeeze(L)
    costs.append(L)
    if i%500 == 0:
        print("=======================================")
        print("Loss = ",L)
        print(Y,"===",A2)

Loss =  0.7333137291590309
[[0 0 0 1]] === [[0.33211028 0.48670396 0.78340635 0.71677886]]
Loss =  0.389660193981372
[[0 0 0 1]] === [[0.08466573 0.36032529 0.16109699 0.42839092]]
Loss =  0.503711467174047
[[0 0 0 1]] === [[0.11836645 0.3867019  0.28488285 0.34484715]]
Loss =  0.38039513573676753
[[0 0 0 1]] === [[0.05191123 0.44623436 0.22913318 0.53954988]]
Loss =  0.26417223114916083
[[0 0 0 1]] === [[0.09689711 0.12925555 0.2825547  0.61612537]]
Loss =  0.15357637721061695
[[0 0 0 1]] === [[0.04646957 0.12845001 0.08485497 0.71136702]]
Loss =  0.1557863015917216
[[0 0 0 1]] === [[0.00136849 0.18256184 0.04861382 0.69048535]]
Loss =  0.12344912810594039
[[0 0 0 1]] === [[0.04198774 0.29365151 0.00955837 0.91060058]]
