<a href="https://colab.research.google.com/github/HWKim-postech/PlayGround/blob/main/DropOut_backpropagation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#DropOut during the back-propagation

Suddenly, I wonder that what does happen if I use dropout during the back propagation? (On loss function)

So I built simple FC model.

In [None]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms 
import argparse
import time

In [None]:
parser = argparse.ArgumentParser()

parser.add_argument("--batchsize", type = int, default = 32)
parser.add_argument("--epochs", type = int, default = 10)
parser.add_argument("--betas", type = float, default = (0.9, 0.999))
parser.add_argument("--lr", type = float, default = 0.001)
parser.add_argument("--img", type = int, default = 28*28)

opt = parser.parse_args(args=[])

In [None]:
train_dataset = datasets.MNIST("root = ../MNIST", train = True, download=True, transform= transforms.Compose([transforms.ToTensor(), transforms.Normalize([0.5], [0.5])]))
test_dataset = datasets.MNIST("root = ../MNIST", train = False, download=True, transform= transforms.Compose([transforms.ToTensor(), transforms.Normalize([0.5], [0.5])]))

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=opt.batchsize, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=opt.batchsize, shuffle = False)



1.   2 Dropout layers in forward()
2.   Single dropout layer in loss()



In [None]:
device = torch.device("cpu")

class FC(nn.Module):
  def __init__(self):
    super(FC, self).__init__()
    self.fc1 = nn.Linear(opt.img, 256)
    self.fc2 = nn.Linear(256, 128)
    self.fc3 = nn.Linear(128, 10)

    self.BN1 = nn.BatchNorm1d(256)
    self.BN2 = nn.BatchNorm1d(128)

  def forward(self, x):
    x = x.view(-1, opt.img)

    x = self.fc1(x)
    x = self.BN1(x)
    x = F.relu(x)
    #x = F.dropout(x, p = 0.5)

    x = self.fc2(x)
    x = self.BN2(x)
    x = F.relu(x)
    #x = F.dropout(x, p = 0.5)

    x = self.fc3(x)
    x = F.log_softmax(x)

    return x

In [None]:
model = FC().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr = opt.lr, betas = opt.betas)
criterion = nn.CrossEntropyLoss()

In [None]:
def eval(model, test_loader):
  model.eval()
  acc = 0
  loss = 0
  with torch.no_grad():
   for img, label in test_loader:
      img = img.to(device)
      label = label.to(device)
      output = model(img)
      loss += criterion(output, label)
      pred = output.max(1, keepdim = True)[1]
      acc += pred.eq(label.view_as(pred)).sum().item()

  return loss/len(test_loader.dataset), 100*acc/len(test_loader.dataset)

#I add DropOut layer on the loss function. 

And if you run this code and compare the result with 2 DropOut layers on forward(), you will find out that this model shows much more powerful performance.

So I felt like "Wow I find out something new!" But after running this model without any DropOut-layer, I realize that No DropOut model is the best.

It was just happening and meaningless work.. But one day, I'm sure that my curiosity will find something cool.

##And I think it can be useful on the more complex dataset which makes DropOut technique work well.

In [None]:
start = time.time()
for epoch in range(1, opt.epochs+1):
  model.train()
  for batch_index, (img, label) in enumerate(train_loader):
    img = img.to(device)
    label = label.to(device)
    output = model(img)
    optimizer.zero_grad()
    loss = criterion(output, label)
    loss = F.dropout(loss, p = 0.3)
    loss.backward()
    optimizer.step()
    if batch_index % 200 == 0 :
      print("[Epochs : %d] [batch_index = %d] [loss = %f]"%(epoch, batch_index, loss.item()))

  Evaluate_loss, acc = eval(model, test_loader)
  print("Evaluate Loss =", Evaluate_loss.item(), "\tAccuracy(%) =", acc)
  
print(time.time() - start)



[Epochs : 1] [batch_index = 0] [loss = 2.569268]
[Epochs : 1] [batch_index = 200] [loss = 0.486209]
[Epochs : 1] [batch_index = 400] [loss = 0.344182]
[Epochs : 1] [batch_index = 600] [loss = 0.068658]
[Epochs : 1] [batch_index = 800] [loss = 0.453863]
[Epochs : 1] [batch_index = 1000] [loss = 0.282217]
[Epochs : 1] [batch_index = 1200] [loss = 0.127292]
[Epochs : 1] [batch_index = 1400] [loss = 0.032784]
[Epochs : 1] [batch_index = 1600] [loss = 0.250965]
[Epochs : 1] [batch_index = 1800] [loss = 0.060236]
Evaluate Loss = 0.002982230857014656 	Accuracy(%) = 97.02
[Epochs : 2] [batch_index = 0] [loss = 0.273739]
[Epochs : 2] [batch_index = 200] [loss = 0.058181]
[Epochs : 2] [batch_index = 400] [loss = 0.070183]
[Epochs : 2] [batch_index = 600] [loss = 0.027527]
[Epochs : 2] [batch_index = 800] [loss = 0.044461]
[Epochs : 2] [batch_index = 1000] [loss = 0.155629]
[Epochs : 2] [batch_index = 1200] [loss = 0.015186]
[Epochs : 2] [batch_index = 1400] [loss = 0.029316]
[Epochs : 2] [batch_