# Dropout Layer
In this notebook, we will look into the the dropout layer.
Consider a standard 3 layered network as shown bellow in (a):
<img src="https://raw.githubusercontent.com/stencilman/CS763_Spring2017/master/Lec3%2C4/dropout_base_network.png" style="width:400px;"/>
During training with dropout, as shown bellow in code snippet, we randomly set some neurons to zero in the <b> forward pass</b>. The network with dropout on can be seen in (b).

In [20]:
require 'nn';
p = 0.5
x = torch.rand(5)
L1 = nn.Linear(5, 5)
L2 = nn.Linear(5, 5)
L3 = nn.Linear(5, 5)
L4 = nn.Linear(5, 1)


H1 = L1:forward(x)
print('Hidden Layer 1 output before dropout:')
print(H1)
U1 = torch.rand(H1:size(1)):gt(p):double()
print('Dropout Mask:')
print(U1)
H1 = H1:cmul(U1)
print('Hidden Layer 1 output after dropout')
print(H1)
print('-----------------------------------')

H2 = L2:forward(H1)
print('Hidden Layer 2 output before dropout')
print(H2)
U2 = torch.rand(H2:size(1)):gt(p):double()
print('Dropout Mask:')
print(U2)
H2 = H2:cmul(U2)
print('Hidden Layer 2 output after dropout')
print(H2)
print('-----------------------------------')

H3 = L3:forward(H2)
print('Hidden Layer 3 output before dropout')
print(H3)
U3 = torch.rand(H3:size(1)):gt(p):double()
print('Dropout Mask:')
print(U3)
H2 = H3:cmul(U3)
print('Hidden Layer 2 output after dropout')
print(H3)
print('-----------------------------------')

out = L4:forward(H3)

Hidden Layer 1 output before dropout:	
 0.2164
 0.0931
-0.3029
-0.6296
 0.3302
[torch.DoubleTensor of size 5]

Dropout Mask:	
 1
 1
 1
 1
 0
[torch.DoubleTensor of size 5]

Hidden Layer 1 output after dropout	
 0.2164
 0.0931
-0.3029
-0.6296
 0.0000
[torch.DoubleTensor of size 5]

-----------------------------------	
Hidden Layer 2 output before dropout	
 0.1721
 0.0472
 0.1428
 0.1557
 0.0723
[torch.DoubleTensor of size 5]

Dropout Mask:	
 1
 1
 0
 0
 1
[torch.DoubleTensor of size 5]

Hidden Layer 2 output after dropout	
 0.1721
 0.0472
 0.0000
 0.0000
 0.0723
[torch.DoubleTensor of size 5]

-----------------------------------	
Hidden Layer 3 output before dropout	
-0.3645
-0.1621
-0.0378
-0.0005
-0.1822
[torch.DoubleTensor of size 5]

Dropout Mask:	
 0
 0
 0
 1
 1
[torch.DoubleTensor of size 5]

Hidden Layer 2 output after dropout	
-0.0000
-0.0000
-0.0000
-0.0005
-0.1822
[torch.DoubleTensor of size 5]

-----------------------------------	


However, we must componsate for the dropout, such as the total magnitude of the activations are same both in the trining and the test phase. This can be done by scaling the activations down during the forward pass at the test time.

In [21]:
require 'nn';
p = 0.5
x = torch.rand(5)
L1 = nn.Linear(5, 5)
L2 = nn.Linear(5, 5)
L3 = nn.Linear(5, 5)
L4 = nn.Linear(5, 1)

function forward_train(x)
    H1 = L1:forward(x)
    U1 = torch.rand(H1:size(1)):gt(p):double()
    H1 = H1:cmul(U1)

    H2 = L2:forward(H1)
    U2 = torch.rand(H2:size(1)):gt(p):double()
    H2 = H2:cmul(U2)

    H3 = L3:forward(H2)
    U3 = torch.rand(H3:size(1)):gt(p):double()
    H2 = H3:cmul(U3)

    out = L4:forward(H3)
    return out
end

function forward_test(x)
    H1 = L1:forward(x) * p
    H2 = L2:forward(H1) * p
    H3 = L3:forward(H2) * p
    out = L4:forward(H3)
    return out
end

Alternatively, we can scale the activations up at training time, and thus the test time code remains untouched.

In [None]:
function forward_train(x)
    H1 = L1:forward(x)
    U1 = torch.rand(H1:size(1)):gt(p):double()
    H1 = H1:cmul(U1) / p

    H2 = L2:forward(H1)
    U2 = torch.rand(H2:size(1)):gt(p):double()
    H2 = H2:cmul(U2) / p

    H3 = L3:forward(H2)
    U3 = torch.rand(H3:size(1)):gt(p):double()
    H2 = H3:cmul(U3) / p

    out = L4:forward(H3)
    return out
end

function forward_test(x)
    H1 = L1:forward(x) 
    H2 = L2:forward(H1) 
    H3 = L3:forward(H2) 
    out = L4:forward(H3)
    return out
end