<a href="https://colab.research.google.com/github/EggPudding/Deep-Learning-Practice-with-Codes/blob/main/AlexNet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **ImageNet Classification with Deep Convolutional Neural Networks (NIPS 2012) Tutorial**

*   Pratice for AlexNet Architecture
*   Orginal Paper: https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html
*   Note that you first change **Runtime** to **GPU** setting
*   **MNIST** Dataset is used for practice for simplicity

In [None]:
!nvidia-smi # Make Sure you are using GPU

Mon Jan 25 02:04:46 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   72C    P0    32W /  70W |   1281MiB / 15079MiB |      0%      Default |
|                               |                      |                 ERR! |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

### **AlexNet Model Definition**
* In this **Tutorial**, Model object for **MNIST** Dataset. So Original model architecture doesn't fit into size of **28x28** which is MNIST DIGIT Image
* **MNIST** is dataset for classifying digit image into category from 0 to 9.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

import torch.backends.cudnn as cudnn
import torch.optim as optim

import os

# AlexNet Implementation
# Model architecture adjusted to MNIST Dataset, so it is different from original one
# Also GPU Parallel, LRN (Local Response Normalization) aren't utilized since they
# are not used in recent researches as well as for reader's understanding.
class AlexNet(nn.Module):
  def __init__(self, num_classes=10):
    super(AlexNet, self).__init__()

    self.feature_extract = nn.Sequential(
      nn.Conv2d(1, 64, kernel_size=3, stride=1, padding=1),
      nn.ReLU(inplace=True),
      nn.MaxPool2d(kernel_size=3, stride=2),
      nn.Conv2d(64, 128, kernel_size=3, padding=1),
      nn.ReLU(inplace=True),
      nn.MaxPool2d(kernel_size=3, stride=2),
      nn.Conv2d(128, 256, kernel_size=3, padding=1),
      nn.ReLU(inplace=True),     
      nn.MaxPool2d(kernel_size=3, stride=2),  
    )

    self.classifier = nn.Sequential(
      nn.Dropout(),
      nn.Linear(256 * 2 * 2, 512),
      nn.ReLU(inplace=True),
      nn.Dropout(),
      nn.Linear(512, num_classes),       
    )

  def forward(self, x):
    x = self.feature_extract(x)
    x = torch.flatten(x, 1)
    x = self.classifier(x)

    return x

### **Hyper Parameter Setting**

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu" # whether using gpu or cpu

model = AlexNet() # model assignment
model.to(device) # mapping model weight & bias into gpu memory
model = torch.nn.DataParallel(model) # used for parallel setting

cudnn.benchmark = True # using cudnn which optimizes the algorithm

learning_rate = 0.01
batch_size = 128
max_epoch = 10

model_path = 'alexnet_mnist.pt'

criterion = nn.CrossEntropyLoss() # simple cross-entropy is utilized
optimizer = optim.Adam(model.parameters(), lr=learning_rate) # Adam optimizer utilized

* **torchsummary** is package for visualizing pytorch model.
* Layers, Number of parameters can be viewed through this.

In [None]:
import torchsummary

torchsummary.summary(model, (1, 28, 28)) # MNIST Image has 28 x 28 x 1 shape

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 64, 28, 28]             640
              ReLU-2           [-1, 64, 28, 28]               0
         MaxPool2d-3           [-1, 64, 13, 13]               0
            Conv2d-4          [-1, 128, 13, 13]          73,856
              ReLU-5          [-1, 128, 13, 13]               0
         MaxPool2d-6            [-1, 128, 6, 6]               0
            Conv2d-7            [-1, 256, 6, 6]         295,168
              ReLU-8            [-1, 256, 6, 6]               0
         MaxPool2d-9            [-1, 256, 2, 2]               0
          Dropout-10                 [-1, 1024]               0
           Linear-11                  [-1, 512]         524,800
             ReLU-12                  [-1, 512]               0
          Dropout-13                  [-1, 512]               0
           Linear-14                   

### **Training and Evaluation function definition**

In [None]:
def train(epoch, max_epoch):
    print(f"Train Epoch [{epoch}/{max_epoch}]")
    model.train() # Model to train mode

    train_loss = 0
    correct = 0
    total = 0
    acc = 0

    for idx, (x, y) in enumerate(train_dataloader):
        x = x.to(device) # maps data into GPU memory
        y = y.to(device)

        optimizer.zero_grad() # reset gradients in optimizer before calculating the loss

        y_pred = model(x) # model inference
        loss = criterion(y_pred, y) # calculating the loss

        loss.backward() # back-propagation to get cumulative gradients

        optimizer.step() # update model parameters
        train_loss += loss.item()
        _, inference = y_pred.max(1)

        total += x.size(0)
        correct += inference.eq(y).sum().item()

        if idx % 100 == 0:
            print(f"Epoch [{epoch}/{max_epoch}] Batch [{idx}] Train Loss: {loss.item()}")

    acc = 100*correct/total
    print(f"Epoch [{epoch}/{max_epoch}] Train Loss: {train_loss/total} Train Accuracy: {acc}")

def valid(epoch, max_epoch):
    print(f"Valid Epoch [{epoch}/{max_epoch}]")
    model.eval() # Model to evaluation mode

    valid_loss = 0
    correct = 0
    total = 0

    for idx, (x, y) in enumerate(valid_dataloader):
        x = x.to(device) # maps data into GPU memory
        y = y.to(device)

        with torch.no_grad():
            y_pred = model(x) # model inference
            valid_loss += criterion(y_pred, y).item()

            _, inference = y_pred.max(1)

            total += x.size(0)
            correct += inference.eq(y).sum().item()

    acc = 100*correct/total
    print(f"Epoch [{epoch}/{max_epoch}] Valid Loss: {valid_loss/total} Valid Accuracy: {acc}")

    if not os.path.exists('checkpoint'):
        os.mkdir('checkpoint')

    torch.save(model.state_dict(), f'checkpoint/{model_path}')
    print(f"Epoch [{epoch}/{max_epoch}] Valid Model Saved: checkpoint/{model_path}")

# Custom leraning rate scheduler is use.
# Decay learning rate by 10 at epoch 5.
def lr_schedule(optimizer, epoch):
    if epoch == 5:
        lr = learning_rate / 10
        for param_group in optimizer.param_groups:
            param_group['lr'] = lr

### **Data Preparation**
* First we are going to use **torchvision** which contains famous vision dataset, and we will use **MNIST** especially 


In [None]:
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose([
    transforms.ToTensor(),
])

train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
valid_dataset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4)
valid_dataloader = torch.utils.data.DataLoader(valid_dataset, batch_size=batch_size, shuffle=False, num_workers=4)

### **Training**

In [None]:
for epoch in range(0, max_epoch):
    lr_schedule(optimizer, epoch)
    train(epoch, max_epoch)
    valid(epoch, max_epoch)

Train Epoch [0/10]
Epoch [0/10] Batch [0] Train Loss: 2.3007636070251465
Epoch [0/10] Batch [100] Train Loss: 0.9037967324256897
Epoch [0/10] Batch [200] Train Loss: 0.6736580729484558
Epoch [0/10] Batch [300] Train Loss: 0.37950873374938965
Epoch [0/10] Batch [400] Train Loss: 0.43045854568481445
Epoch [0/10] Train Loss: 0.006541210406025251 Train Accuracy: 72.435
Valid Epoch [0/10]
Epoch [0/10] Valid Loss: 0.0017999021627008915 Valid Accuracy: 93.22
Epoch [0/10] Valid Model Saved: checkpoint/alexnet_mnist.pt
Train Epoch [1/10]
Epoch [1/10] Batch [0] Train Loss: 0.6461343169212341
Epoch [1/10] Batch [100] Train Loss: 0.324937641620636
Epoch [1/10] Batch [200] Train Loss: 0.17657935619354248
Epoch [1/10] Batch [300] Train Loss: 0.34770509600639343
Epoch [1/10] Batch [400] Train Loss: 0.40423160791397095
Epoch [1/10] Train Loss: 0.002769552831227581 Train Accuracy: 89.32
Valid Epoch [1/10]
Epoch [1/10] Valid Loss: 0.0014781719436869026 Valid Accuracy: 94.52
Epoch [1/10] Valid Model Save