<a href="https://colab.research.google.com/github/Sangh0/DeepLearning-Tutorial/blob/main/current_materials/3_cnn_with_mnist.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CNN (Convolutional Neural Network) 모델을 구현하고 MNIST hand-written dataset으로 학습하기

- CNN은 DNN or MLP의 단점을 타파한 신경망이예요
- 앞에 강의했던 DNN과 다른 점은 딱 하나예요 모델.
- DNN은 fully connected layer의 연속이라면 CNN은 convolution layer로 이루어져 있어요

## Step 1. 패키지 모듈 및 임포트

In [None]:
import numpy as np # 텐서 계산을 위해
import matplotlib.pyplot as plt # 시각화를 위해

import torch # 파이토치 텐서 사용을 위해
import torch.nn as nn # 뉴럴 네트워크 빌드를 위해
import torch.optim as optim # optimizer 사용을 위해
import torchvision.datasets as dsets # torchvision에 내장된 MNIST 데이터셋 다운로드 위해
import torchvision.transforms as transforms # torchvision 전처리를 위해
from torch.utils.data import DataLoader # 딥러닝 학습 데이터로더 구현을 위해

## Step 2. 하이퍼파라미터 설정

In [None]:
# Set hyperparameters
Config = {
    'batch_size': 32,
    'learning_rate': 1e-3,
    'epochs': 10,
}

## Step 3. MNIST hand-written dataset 로드하기

In [None]:
# Load MNIST dataset
train_set = dsets.MNIST(
    root='mnist/',
    train=True,
    transform=transforms.ToTensor(),
    download=True,
)

test_set = dsets.MNIST(
    root='mnist/',
    train=False,
    transform=transforms.ToTensor(),
    download=True,
)

train_loader = DataLoader(
    dataset=train_set,
    batch_size=Config['batch_size'],
    shuffle=True,
    drop_last=True,
)

test_loader = DataLoader(
    dataset=test_set,
    batch_size=Config['batch_size'],
    shuffle=True,
    drop_last=True,
)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to mnist/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 78284623.98it/s]


Extracting mnist/MNIST/raw/train-images-idx3-ubyte.gz to mnist/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to mnist/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 77551660.58it/s]

Extracting mnist/MNIST/raw/train-labels-idx1-ubyte.gz to mnist/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to mnist/MNIST/raw/t10k-images-idx3-ubyte.gz



100%|██████████| 1648877/1648877 [00:00<00:00, 20761085.85it/s]


Extracting mnist/MNIST/raw/t10k-images-idx3-ubyte.gz to mnist/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 14929881.48it/s]


Extracting mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz to mnist/MNIST/raw



- 여기까지는 DNN 부분과 똑같아요
- 모델을 빌드해야 하는데 convolution layer로 쌓을게요
- 그 전에 CNN을 구현하기 위해 알아야 하는 몇 가지가 있어요

**CNN을 다루기 위해 필요한 용어들**
- 1. convolution
- 2. channel
- 3. filter or kernel
- 4. stride
- 5. padding
- 6. pooling


### 1. Convolution

<img src = "https://camo.githubusercontent.com/f3959a36c49ab55f45e2fae1793757e1941eec2ebb1369b3ff025f4a96f88d94/687474703a2f2f646565706c6561726e696e672e7374616e666f72642e6564752f77696b692f696d616765732f362f36632f436f6e766f6c7574696f6e5f736368656d617469632e676966">

- convolution은 다음과 같이 수행이 돼요
- 5x5의 초록색 박스인 feature map과 3x3의 노란색 박스에 해당하는 sliding window가 있어요
- sliding window가 초록색 박스를 하나씩 훑으면서 새로운 feature map을 생성해요

### 2. channel

<img src = "https://camo.githubusercontent.com/c61ee1746d97812e0b64f60bf8288c8bcdd209148dcde1b9eb18ebe3a5dfa01d/68747470733a2f2f74616577616e6d657265706f2e6769746875622e696f2f323031382f30312f636e6e2f6368616e6e656c2e6a7067" width = 800>

- 컬러 사진은 RGB 3개의 채널로 이루어져 있어요
- 또한 convolution layer를 쌓으면서 가로 세로 사이즈는 작아지고 채널 갯수는 많아져요
- 즉, 앞 부분 레이어는 이미지 디테일을 파악하기 위해 존재하고
- 뒷 부분 레이어는 전체적인 컨텍스트를 파악하기 위해 존재한다고 생각하시면 됩니다
- 왜 그런건지는 이따 밑에서 보시죠

### 3. filter or kernel

<img src = "https://camo.githubusercontent.com/f2ea5e053843a1c7198a74bbebea914b9adb2e8446c52e240e68a48cfdf15140/68747470733a2f2f74616577616e6d657265706f2e6769746875622e696f2f323031382f30312f636e6e2f636f6e762e706e67">

- 이미지의 특징을 찾아내는 역할을 수행해요
- 얘네들이 결국 CNN의 파라미터에 해당해요
- 즉, 학습의 대상이 되는거죠
- filter는 지정된 간격(stride)으로 이동하면서 이미지와 합성하면서 feature map을 만들어내요

### 4. stride

<img src = "https://camo.githubusercontent.com/b85b2ab96ef08b113a01b4d5476f8dd1af3ce39665ca4194d0bc5f9b943e4228/68747470733a2f2f74616577616e6d657265706f2e6769746875622e696f2f323031382f30312f636e6e2f66696c7465722e6a7067">

- filter를 얼만큼 순회할지 결정해요
- stride=1로 설정하면 1칸씩 이동하면서 합성곱을 수행해요


### 5. padding

<img src = "https://camo.githubusercontent.com/752b077999e432c520bb95c51b38c98eafc507289760d42d48d6c555eeec0ed5/68747470733a2f2f74616577616e6d657265706f2e6769746875622e696f2f323031382f30312f636e6e2f70616464696e672e706e67">

- convolution layer에서 stride로 인해 feature map의 크기는 input image보다 크기가 작아요
- output의 크기가 줄어드는 것을 방지하는 것이 padding입니다
- padding은 외곽에 지정된 픽셀만큼 특정 값으로 채운다는 것을 의미해요
- 보통 0으로 많이 채워요


### 6. pooling

<img src = "https://camo.githubusercontent.com/3952a493704e8e9582c4e86a3c075bb2936e2a6a1c11c5551ee2b00631fe7fa5/68747470733a2f2f74616577616e6d657265706f2e6769746875622e696f2f323031382f30322f636e6e2f6d617870756c6c696e672e706e67">

- pooling은 convolution layer의 출력을 입력으로 받아 크기를 줄여줘요
- 또는 특정 feature를 강조하는 용도로 사용되기도 해요
- max pooling, average pooling 등이 존재해요

## Step 4. CNN 모델 빌드하기

In [None]:
class CNN(nn.Module):

    def __init__(self, in_dim=1, hidden_dim=8, out_dim=10):
        super(CNN, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(in_dim, hidden_dim, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(hidden_dim, hidden_dim*2, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.classifier = nn.Sequential(
            nn.Linear(7*7*16, 100),
            nn.ReLU(),
            nn.Linear(100, out_dim),
        )

    def forward(self, x):
        batch_size = x.size(0)
        x = self.features(x)
        x = x.view(batch_size, -1)
        x = self.classifier(x)
        return x


from torchsummary import summary

summary(CNN(), (1, 28, 28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 8, 28, 28]              80
              ReLU-2            [-1, 8, 28, 28]               0
         MaxPool2d-3            [-1, 8, 14, 14]               0
            Conv2d-4           [-1, 16, 14, 14]           1,168
              ReLU-5           [-1, 16, 14, 14]               0
         MaxPool2d-6             [-1, 16, 7, 7]               0
            Linear-7                  [-1, 100]          78,500
              ReLU-8                  [-1, 100]               0
            Linear-9                   [-1, 10]           1,010
Total params: 80,758
Trainable params: 80,758
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.16
Params size (MB): 0.31
Estimated Total Size (MB): 0.47
---------------------------------------------

## Step 5. 모델 학습

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = CNN().to(device)
loss_func = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=Config['learning_rate'])

def cal_accuracy(outputs, labels):
    outputs = torch.argmax(outputs, dim=1)
    correct = (outputs == labels).sum()/len(outputs)
    return correct


# Training
for epoch in range(Config['epochs']):
    for batch, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)

        optimizer.zero_grad()
        outputs = model(images)
        acc = cal_accuracy(outputs, labels)
        loss = loss_func(outputs, labels)
        loss.backward()
        optimizer.step()

        if (batch+1) % 100 == 0:
            print(f'Epoch {epoch+1}/{Config["epochs"]}, Batch {batch+1}/{len(train_loader)}\n'
                  f'loss: {loss.item():.3f}, accuracy: {acc.item():.3f}')

Epoch 1/10, Batch 100/1875
loss: 0.488, accuracy: 0.844
Epoch 1/10, Batch 200/1875
loss: 0.284, accuracy: 0.938
Epoch 1/10, Batch 300/1875
loss: 0.250, accuracy: 0.906
Epoch 1/10, Batch 400/1875
loss: 0.230, accuracy: 0.938
Epoch 1/10, Batch 500/1875
loss: 0.098, accuracy: 0.969
Epoch 1/10, Batch 600/1875
loss: 0.346, accuracy: 0.875
Epoch 1/10, Batch 700/1875
loss: 0.130, accuracy: 0.969
Epoch 1/10, Batch 800/1875
loss: 0.078, accuracy: 0.969
Epoch 1/10, Batch 900/1875
loss: 0.113, accuracy: 0.938
Epoch 1/10, Batch 1000/1875
loss: 0.154, accuracy: 0.938
Epoch 1/10, Batch 1100/1875
loss: 0.064, accuracy: 0.969
Epoch 1/10, Batch 1200/1875
loss: 0.065, accuracy: 0.969
Epoch 1/10, Batch 1300/1875
loss: 0.101, accuracy: 0.969
Epoch 1/10, Batch 1400/1875
loss: 0.086, accuracy: 0.969
Epoch 1/10, Batch 1500/1875
loss: 0.129, accuracy: 0.969
Epoch 1/10, Batch 1600/1875
loss: 0.273, accuracy: 0.938
Epoch 1/10, Batch 1700/1875
loss: 0.041, accuracy: 1.000
Epoch 1/10, Batch 1800/1875
loss: 0.128,

## Step 6. 모델 성능 평가

In [None]:
# Testing
test_loss, test_acc = 0, 0
with torch.no_grad():
    model.eval()
    for batch, (images, labels) in enumerate(test_loader):
        images = images.to(device)
        labels = labels.to(device)

        outputs = model(images)
        acc = cal_accuracy(outputs, labels)
        test_acc += acc.item()
        loss = loss_func(outputs, labels)
        test_loss += loss.item()

print(f'Test Loss: {test_loss/(batch+1):.3f}, Test Accuracy: {test_acc/(batch+1):.3f}')

Test Loss: 0.043, Test Accuracy: 0.988
