In [1]:
import numpy as np

There are various types of dropouts. Below, I have done the scratch implementation of the inverted dropout. Inverted dropout has one simple twist in its implementation that makes it more optimized than the original standard dropout.

In Inverted dropout, after eliminating the neurons, we divide all the remaining neurons by the probability of keeping them. For example: if our dropout rate is 40% then we will keep the remaining 60% of the neurons activated. That means we will divide all the remaining neurons by 0.6 (probabilistic value). 

We have to do this because, when we randomly remove the neurons by making them zero, it will also decrease the value of the mean (expected value) of the layers by the rate of the dropout. For example: if we had 10 neurons and we dropped 50% i.e., 5 neurons, then the possibility that its expected value will also get reduced by 50%. 
This will cause a problem in the testing time since we do not use dropout while testing the model. So, to make this balance, scaling on the training phase like the above-mentioned step is done in inverted dropout.
There are other effective dropouts as well. Some of them are gaussian dropout and monte Carlo dropout. 

The dropout which I have implemented is used in the dense layer (fully connected layer). I read many blogs and article and what I found is we should avoide using dropout in convulation layers in CNN and same with the RNN. Inverted dropout should only be used in dense layers. Instead, we should use batch normalization to get the better result.


In [2]:
X = np.array([[2],[3]]) #Just a dummy data to test the code

In [3]:
X.shape

(2, 1)

In [3]:
Y = np.array([[1]])

In [None]:
z

In [6]:
def activation_pass(Z,activation,keepprob):
    #implementation of inverted dropout
    if activation == 'relu':
        a = np.where(0,Z<0,Z)
    
    else:
        a = 1/(1+np.exp(-Z))
        
    mask = np.random.rand(a.shape[0],a.shape[1]) < keepprob
    a = np.multiply(a,mask.astype(int))
    return a/keepprob
    

Dividing a by keepprob prevents network’s activations from getting too large, and does not require any changes to the network during evaluation. In contrast, traditional dropout requires scaling to be implemented during the test phase.



In [4]:
def parameters_initializer(l):
    '''
    initializes the parameters for each layer. 
    '''
    parameters = {}
    for i in range(1,len(l)):
        parameters["W"+str(i)] = np.random.randn(l[i],l[i-1])
        parameters["b"+str(i)] = np.zeros((l[i],1))
        
    print(parameters,'\n')
    return parameters

In [5]:
def linear_forward(X,W,b):
    #W.X + b
    z = np.dot(W,X) + b  
    return z
        

In [7]:
def cost_function(Y,yhat):
    '''
    binary cross-entropy as a cost function.
    '''
    m = Y.shape[1]
    return np.multiply(-1/m,np.dot(Y,yhat) + np.dot((1-Y),(1-yhat)))

In [8]:
def forward_prop(X,L,Y,keepprob=1):
    '''
    This is not full-fledged forward propagation. I will implement full-fledged soon.
    '''
    layer = len(L)
    parameters = parameters_initializer(L)

    X = X
    caches = {}
    
    
    for i in range(1,layer-1): #for hidden layers
        z = linear_forward(X, parameters["W"+str(i)], parameters["b"+str(i)])
        a = activation_pass(z,'relu',keepprob)
        X = a
        caches['z' + str(i)] = z
        caches["a" + str(i)] = a
        
    z = linear_forward(X, parameters["W"+str(layer-1)], parameters['b'+str(layer-1)]) #for output layer
    a = activation_pass(z,'sigmoid',keepprob)
    X = a
    caches['z' + str(i)] = z
    caches["a" + str(i)] = a
    
    cost_val = cost_function(Y,X)
    
    return caches, cost_val
    

In [9]:
L = [2,4,4,5,1] #number of layers and neurons in each layer.
forward_prop(X,L,Y,1)

{'W1': array([[-0.71314354,  0.11057287],
       [ 0.96394978, -0.37549555],
       [ 0.34669511,  0.30641827],
       [-1.01993101, -2.54204545]]), 'b1': array([[0.],
       [0.],
       [0.],
       [0.]]), 'W2': array([[ 0.98671014, -0.79219814,  0.86651908,  0.51284834],
       [ 1.16395767,  0.27965488, -1.5279927 , -0.25330069],
       [ 1.03145572,  1.55036054,  0.6327283 , -2.10131863],
       [ 1.15807224, -1.13090936,  1.16413935,  0.69559121]]), 'b2': array([[0.],
       [0.],
       [0.],
       [0.]]), 'W3': array([[-1.8885206 , -1.70873625, -0.75854098,  0.49487076],
       [-0.18948547,  0.82294103, -1.1826424 , -0.3775646 ],
       [ 1.03470308,  0.39170679, -0.73598235, -1.2953592 ],
       [ 1.17522386,  0.96359291, -0.0342983 , -1.54382562],
       [-0.00510098,  2.87909031, -0.01801583, -1.72076018]]), 'b3': array([[0.],
       [0.],
       [0.],
       [0.],
       [0.]]), 'W4': array([[ 1.83231561,  1.00296196, -1.31656502, -1.62101661, -1.38368064]]), 'b4': array

({'z1': array([[-1.09456847],
         [ 0.80141292],
         [ 1.61264501],
         [-9.66599837]]),
  'a1': array([[-1.09456847],
         [ 0.80141292],
         [ 1.61264501],
         [-9.66599837]]),
  'z2': array([[-5.27470319],
         [-1.06561805],
         [21.44518868],
         [-7.02015471]]),
  'a2': array([[-5.27470319],
         [-1.06561805],
         [21.44518868],
         [-7.02015471]]),
  'z3': array([[-37.33191721]]),
  'a3': array([[6.12286038e-17]])},
 array([[-6.12286038e-17]]))