# Drop Out Regularization

We will now proceed to implement forward and backward propagation with drop out regularization


Our simple neural network will have one input layer, one hidden layer with a tanh activation and an output layer with a sigmoid activation. We choose tanh cos its derivative is easy to caculate. 

To begin, we implement forward propagation. 

In [1]:
def forward_prop_dropout(n_h, n_f, n_O, X, keep_prob):
    
    '''
    Description: This funcition performs forward propagation with drop out in mind.
    
    input - 
    n_h - number of nodes in hidden layers
    n_O - number of nodes in output layers
    n_f - number of features in input layer
    X - training samples
    keep_prob - keep probability 
    
    Output - 
    
    forward - the results from the forward prop
    parameters - netowrk parameters
    
    '''
    
    # We start by initializing the weights and biases
    W1 = np.random.randn(n_h, n_f) * 0.1,
    b1 = np.zeros((n_h, 1)),
    W2 = np.random.randn(n_O, n_h) * 0.1,
    b2 = np.zeros((n_O, 1))
    
    #Next, we perform forward propagation for the layers. 
    
    # First, linear calculation for hidden layer
    Z1 = np.dot(W1.T, X)  + b1
    
    # Activation for hidden layer
    A = np.tanh(Z1)
    
    # Get dropout for hidden layer withthe same shape as activation output
    D = np.random.rand(A.shape[0], A.shape[1])
    
    #Convert D to 1s or 0s based on the keep probability 
    D = D < keep_prob
    
    #Multiply A with D so as to randomly drop some nodes
    A * D
    
    #Divide A by keep_prob to scale the number of neurons that have not been dropped
    A = A /  keep_prob
    
    # Linear calculation for output layer
    Z2 = np.dot(W2.T, X) + b2
    
    # Activation for output layer
    Y_out = 1 / (1 + (np.exp(-Z2)))
    
    #Save initialized weights
    Parameters = {
        'W1': W1,
        'b1': b1,
        'W2': W2,
        'b2': b2
        
    }
    
    # get results from forward propagation
    forward = {"Z1": Z1,
                "A": A,
                "Z2": Z2,
                "Y_out": Y_out,
                "D" : D
              
              }
    
    return parameters, forwards
    

We will now prceed to perform backward propagation with drop out regularization in mind

In [2]:
#First, get results from forward propagation by running the code below with numbers in place of the arguments

#  parameters, forward = forward_prop_dropout(n_h, n_f, n_O, X, keep_prob)

def backward_prop_dropout(X, Y, forward, parameters, keep_prob):
    
    '''
    Description: This funcition performs backward propagation with drop out in mind.
    
    input - 
    X - training samples
    Y - output value
    forward - forward prop results
    parameters - parameters of theneural net
    keep_prob - keep probability 
    
    Output - 
    
    gradients - the calculated gradients
    
    '''
    
    
    #Get sample size
    sample_size = X.shape[1]
    
    #Get parameters 
    Z1 = forward['Z1']
    A = forward['A']
    Z2 = forward['Z2']
    Y_out = forward['Y_out']
    D = forward['D']
    
    
    #Calculate gradients for output layer
    dZ2= Y_out - Y
    dW2 = (1/sample_size) * np.dot(dZ2, A.T)
    db2 = (1/sample_size) * np.sum(dZ2, axis=1, keepdims=True)
    
    
    #calculate gradients for hidden layer with drop out in mind
    
    '''
    Simply, the gradient calculation is essentially the same as without drop out except that we will perform
    all the drop out operations we performed for the parameters in the forward prop for their gradients. 
    '''
    
    #calculate gradient for dA
    dA = np.dot(parameters['W2'].T, dZ2) * (1 - np.power(A, 2))
    
    #Multiply by drop out binary 
    dA = dA * D
    
    # divide by the keep probability so as to scale
    dA = dA / keep_prob
    
    #calculate other gradients
    dW1 = (1/sample_size) * np.dot(dZ1, X.T)
    db1 = (1/sample_size) * np.sum(dZ1, axis=1, keepdims=True)
    
    # store calculated gradients
    
    gradients = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}
    
    return gradients

### There you have it – the implementation of drop out regularization from scratch in forward propagation and backward propagation. 