<a href="https://colab.research.google.com/github/JiNYouNG2222/pattern-recognition/blob/main/EMNIST_CNN_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transfroms
import pandas as pd
from collections import OrderedDict
from IPython.display import clear_output

In [None]:
learning_rate = 0.001
batch_size = 100
num_classes = 10
epochs = 5
Drp = 0.5

In [None]:
learning_rate = 0.001
batch_size = 128
num_classes = 26
epochs = 30
Drp = 0.5

In [None]:
train_set = torchvision.datasets.EMNIST(
    root = './data/EMNIST',
    split = 'letters',
    train = True,
    download = True,
    transform = transfroms.Compose([
        transfroms.ToTensor() # 데이터를 0에서 255까지 있는 값을 0에서 1사이 값으로 변환
    ])
)
test_set = torchvision.datasets.EMNIST(
    root = './data/EMNIST',
    split = 'letters',
    train = False,
    download = True,
    transform = transfroms.Compose([
        transfroms.ToTensor() # 데이터를 0에서 255까지 있는 값을 0에서 1사이 값으로 변환
    ])
)

print(train_set, test_set)

train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size)

Dataset EMNIST
    Number of datapoints: 124800
    Root location: ./data/EMNIST
    Split: Train
    StandardTransform
Transform: Compose(
               ToTensor()
           ) Dataset EMNIST
    Number of datapoints: 20800
    Root location: ./data/EMNIST
    Split: Test
    StandardTransform
Transform: Compose(
               ToTensor()
           )


In [None]:
import torch
if torch.cuda.is_available():
    device = torch.device("cuda:0")
else:
    device = torch.device('cpu')

# 2 layer model (origin)



In [None]:
class NeuralNet(nn.Module):
    def __init__(self):
        super().__init__()

        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=10, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(in_channels=10, out_channels=100, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.dropout = nn.Dropout(p=Drp)
        self.fc1 = nn.Linear(in_features=100*7*7, out_features=1000)
        self.fc2 = nn.Linear(in_features=1000, out_features=26)

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = x.reshape(x.size(0), -1)
        x = self.dropout(x)
        x = self.fc1(x)
        x = self.fc2(x)
        return x

# 4 layer model

즉 4층 레이어 CNN모델
* layer3: Conv2d에서 100개의 input channel과 200개의 output channel을 사용.
* layer4: Conv2d에서 200개의 input channel과 400개의 output channel을 사용.


Feature Map 크기 계산:

eMNIST 입력 크기가 (1, 28, 28)일 경우, 각 layer의 MaxPooling으로 인해 feature map 크기가 28 -> 14 -> 7 -> 3 -> 1로 감소합니다.
따라서 최종적으로 400 * 1 * 1을 fc1의 입력 크기로 설정.


유지된 구조:

Dropout과 Fully Connected Layer (fc1, fc2)는 기존과 동일하지만, in_features를 최종 feature map 크기에 맞게 조정

In [None]:
import torch
import torch.nn as nn

class NeuralNet(nn.Module):
    def __init__(self):
        super().__init__()

        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=10, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(in_channels=10, out_channels=100, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        # 추가된 layer3
        self.layer3 = nn.Sequential(
            nn.Conv2d(in_channels=100, out_channels=200, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        # 추가된 layer4
        self.layer4 = nn.Sequential(
            nn.Conv2d(in_channels=200, out_channels=400, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )

        self.dropout = nn.Dropout(p=Drp)
        # layer4 이후의 크기에 맞춰 in_features를 조정
        self.fc1 = nn.Linear(in_features=400*1*1, out_features=1000)
        self.fc2 = nn.Linear(in_features=1000, out_features=26)

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = x.reshape(x.size(0), -1)  # Flatten
        x = self.dropout(x)
        x = self.fc1(x)
        x = self.fc2(x)
        return x

# 4 layer model develop

아래 주어진 5-layer 모델을 4-layer 모델로 변경한 코드.

배치 정규화(Batch Normalization), 드롭아웃(Dropout), 그리고 옵티마이저에서의 Weight Decay와 같은 설정은 그대로 유지

### 변경 사항 요약
1. 레이어 수 축소:

  5개의 Conv 레이어를 4개로 줄였습니다.
마지막 레이어인 layer4는 128×3×3 크기의 feature map을 출력하도록 유지했습니다.

2. in_features 수정:

  Fully Connected Layer에서 fc1의 입력 크기를 128×3×3=1152로 수정.


3. 드롭아웃과 배치 정규화 유지:

 각 Conv 레이어 뒤에 Batch Normalization 적용.
Dropout 비율 p=0.5로 유지.


4. 파라미터 조정:

 첫 번째 Conv 레이어의 출력 채널을 10에서 16으로 늘려, 네트워크의 표현력을 보존하면서 레이어 수 감소에 대응했습니다.

In [None]:
import torch
import torch.nn as nn

class NeuralNet(nn.Module):
    def __init__(self):
        super().__init__()

        # Layer 1
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(16),  # Batch Normalization
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 28x28 -> 14x14
        )

        # Layer 2
        self.layer2 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(32),  # Batch Normalization
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 14x14 -> 7x7
        )

        # Layer 3
        self.layer3 = nn.Sequential(
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(64),  # Batch Normalization
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 7x7 -> 3x3
        )

        # Layer 4
        self.layer4 = nn.Sequential(
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(128),  # Batch Normalization
            nn.ReLU()
            # No pooling, keeps 3x3 spatial dimension
        )

        # Dropout and Fully Connected Layers
        self.dropout = nn.Dropout(p=0.5)  # Dropout rate fixed to 0.5
        self.fc1 = nn.Linear(in_features=128 * 3 * 3, out_features=1000)
        self.fc2 = nn.Linear(in_features=1000, out_features=26)

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = x.reshape(x.size(0), -1)  # Flatten
        x = self.dropout(x)
        x = self.fc1(x)
        x = self.fc2(x)
        return x


In [None]:
# Training Loop
net = NeuralNet()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=learning_rate, weight_decay=1e-5)



pd_results = []

for epoch in range(epochs):
    for i, (images, labels) in enumerate(train_loader):
        out = net(images)
        loss = criterion(out, labels-1)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total = labels.size(0)
        preds = torch.max(out.data, 1)[1]
        correct = (preds==labels-1).sum().item()

        if (i+1)%200==0:
            results = OrderedDict()
            results['epoch'] = epoch+1
            results['idx'] = i+1
            results['loss'] = loss.item()
            results['accuracy'] = 100.*correct/total
            pd_results.append(results)
            df = pd.DataFrame.from_dict(pd_results, orient='columns')

            clear_output(wait=True)
            display(df)

Unnamed: 0,epoch,idx,loss,accuracy
0,1,200,0.566672,81.25000
1,1,400,0.481950,84.37500
2,1,600,0.328704,91.40625
3,1,800,0.336337,88.28125
4,2,200,0.355539,86.71875
...,...,...,...,...
115,29,800,0.169127,96.09375
116,30,200,0.107152,93.75000
117,30,400,0.116802,94.53125
118,30,600,0.094669,96.87500


# 5 layer model


변경 사항
추가된 layer5:

layer5는 Conv2D 레이어만 포함하고, MaxPooling을 적용하지 않아 feature map의 공간 크기를 유지합니다.

Input channel은 128, output channel은 256으로 설정.


Feature Map 크기 계산:

    입력 크기 (1, 28, 28):
    layer1: MaxPooling으로 28 -> 14
    layer2: MaxPooling으로 14 -> 7
    layer3: MaxPooling으로 7 -> 3
    layer4: MaxPooling으로 3 -> 1
    layer5: Spatial 크기 유지 (1x1)


Fully Connected Layers:

    fc1의 in_features를 256 * 1 * 1로 설정.
    Dropout을 통해 정규화 유지.
이 모델은 5개의 convolutional layers로 더 깊은 구조를 가지며, 각 단계에서 feature의 복잡도가 증가하도록 설계되었습니다.

In [None]:
import torch
import torch.nn as nn

class NeuralNet(nn.Module):
    def __init__(self):
        super().__init__()

        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=10, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 28x28 -> 14x14
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(in_channels=10, out_channels=32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 14x14 -> 7x7
        )
        self.layer3 = nn.Sequential(
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 7x7 -> 3x3
        )
        self.layer4 = nn.Sequential(
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 3x3 -> 1x1
        )
        self.layer5 = nn.Sequential(
            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=1, padding=1),
            nn.ReLU()
            # No pooling here, keeps 1x1 spatial dimension
        )

        self.dropout = nn.Dropout(p=Drp)
        # Adjust fc1's in_features based on the final layer output
        self.fc1 = nn.Linear(in_features=256 * 1 * 1, out_features=1000)
        self.fc2 = nn.Linear(in_features=1000, out_features=26)

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.layer5(x)
        x = x.reshape(x.size(0), -1)  # Flatten
        x = self.dropout(x)
        x = self.fc1(x)
        x = self.fc2(x)
        return x

# 5 layer model develop



정규화(Weight Decay)와 Batch Normalization을 적용한 5-layer CNN 모델

정규화는 Weight Decay를 통해 구현하며, Batch Normalization은 각 Convolutional Layer 뒤에 추가하여 학습 안정성을 개선

**Weight Decay 적용**

Weight Decay는 옵티마이저 설정 시 적용됩니다. 아래는 Adam 옵티마이저에 Weight Decay를 추가하는 코드입니다:

    python

    import torch.optim as optim

    model = NeuralNet()
    optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)

weight_decay=1e-5: 가중치 값의 크기를 정규화하여 과적합을 방지합니다. 필요에 따라 값을 조정할 수 있습니다.

**주요 변경 사항**

1. Batch Normalization:

 각 Convolution Layer 뒤에 추가하여 입력값의 분포를 정규화.
네트워크 학습을 안정화하고 수렴 속도를 높입니다. + 일반화 성능 향상
2. Dropout:

 Dropout(p=0.5)로 고정하여 과적합 방지.

 신경망 일부를 무작위로 비활성화하여 과적합 방지.
3. Weight Decay:

 옵티마이저에 weight_decay 추가.

 네트워크가 과도하게 복잡한 가중치를 학습하지 않도록 제약

In [None]:
import torch
import torch.nn as nn

class NeuralNet(nn.Module):
    def __init__(self):
        super().__init__()

        # Layer 1
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=10, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(10),  # Batch Normalization
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 28x28 -> 14x14
        )

        # Layer 2
        self.layer2 = nn.Sequential(
            nn.Conv2d(in_channels=10, out_channels=32, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(32),  # Batch Normalization
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 14x14 -> 7x7
        )

        # Layer 3
        self.layer3 = nn.Sequential(
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(64),  # Batch Normalization
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 7x7 -> 3x3
        )

        # Layer 4
        self.layer4 = nn.Sequential(
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(128),  # Batch Normalization
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 3x3 -> 1x1
        )

        # Layer 5
        self.layer5 = nn.Sequential(
            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(256),  # Batch Normalization
            nn.ReLU()
            # No pooling, keeps 1x1 spatial dimension
        )

        # Dropout and Fully Connected Layers
        self.dropout = nn.Dropout(p=0.5)  # Dropout rate fixed to 0.5
        self.fc1 = nn.Linear(in_features=256 * 1 * 1, out_features=1000)
        self.fc2 = nn.Linear(in_features=1000, out_features=26)

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.layer5(x)
        x = x.reshape(x.size(0), -1)  # Flatten
        x = self.dropout(x)
        x = self.fc1(x)
        x = self.fc2(x)
        return x


**학습 실행 코드**

이제 모델 학습을 실행

In [None]:
# Training Loop
net = NeuralNet()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=learning_rate, weight_decay=1e-5)



pd_results = []

for epoch in range(epochs):
    for i, (images, labels) in enumerate(train_loader):
        out = net(images)
        loss = criterion(out, labels-1)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total = labels.size(0)
        preds = torch.max(out.data, 1)[1]
        correct = (preds==labels-1).sum().item()

        if (i+1)%200==0:
            results = OrderedDict()
            results['epoch'] = epoch+1
            results['idx'] = i+1
            results['loss'] = loss.item()
            results['accuracy'] = 100.*correct/total
            pd_results.append(results)
            df = pd.DataFrame.from_dict(pd_results, orient='columns')

            clear_output(wait=True)
            display(df)

Unnamed: 0,epoch,idx,loss,accuracy
0,1,200,0.583307,80.46875
1,1,400,0.386922,83.59375
2,1,600,0.340512,91.40625
3,1,800,0.537898,90.62500
4,2,200,0.310683,89.06250
...,...,...,...,...
95,24,800,0.163612,95.31250
96,25,200,0.114732,94.53125
97,25,400,0.101188,93.75000
98,25,600,0.087260,96.87500


# 6 layer model

### 최적의 하이퍼파라미터 설정

1.   Convolutional Layers:
  *   Input channel 크기와 Output channel 크기를 점진적으로 증가 (16 -> 32 -> 64 -> 128 -> 256 -> 512).
  *   각 레이어에 BatchNorm2d를 추가하여 학습 안정화 및 일반화 성능 향상.
  *   마지막 두 레이어 (layer5, layer6)는 MaxPooling을 사용하지 않고 feature map을 고정.


2.   Dropout:
  *   Fully connected 레이어에서 Dropout(p=0.5)를 사용하여 과적합 방지.


3. Fully Connected Layers:
 *   최종 feature map 크기 (512 * 1 * 1)을 기반으로 fc1의 입력 크기 설정.
 *   Hidden units 수를 1024로 설정하여 충분한 표현력 확보.


4. Activation Functions:
 *   모든 convolutional layers에서 ReLU를 사용하여 비선형성 유지.


5. Batch Normalization:
 *   각 convolutional layer에 추가하여 학습 안정화 및 빠른 수렴 보장.

### 모델 크기 및 데이터셋 고려
EMNIST 데이터셋(28x28 해상도)을 기준으로 feature map 크기를 점진적으로 줄이며 설계했습니다. 최종적으로는 feature map 크기를 1x1로 축소하여 모델의 효율성을 보장합니다.

In [None]:
import torch
import torch.nn as nn

class NeuralNet(nn.Module):
    def __init__(self):
        super().__init__()

        # Convolutional layers
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 28x28 -> 14x14
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(32),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 14x14 -> 7x7
        )
        self.layer3 = nn.Sequential(
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(64),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 7x7 -> 3x3
        )
        self.layer4 = nn.Sequential(
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(128)  # 3x3 -> 3x3
        )
        self.layer5 = nn.Sequential(
            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(256)  # 3x3 -> 3x3
        )
        self.layer6 = nn.Sequential(
            nn.Conv2d(in_channels=256, out_channels=512, kernel_size=1, stride=1, padding=0),
            nn.ReLU(),
            nn.BatchNorm2d(512)  # 3x3 -> 3x3
        )

        # Fully connected layers
        self.dropout = nn.Dropout(p=Drp)
        self.fc1 = nn.Linear(in_features=512 * 3 * 3, out_features=1024)  # Updated to 512 * 3 * 3 = 4608
        self.fc2 = nn.Linear(in_features=1024, out_features=26)

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.layer5(x)
        x = self.layer6(x)
        x = x.reshape(x.size(0), -1)  # Flatten
        x = self.dropout(x)
        x = self.fc1(x)
        x = self.fc2(x)
        return x


# 7 layer model


변경 사항 요약

1. 7th Conv Layer 추가 (layer7):

    Input channel: 512 → Output channel: 1024.
    1×1 Kernel 사용.


2. fc1 입력 크기 조정:

    최종 Flatten된 크기를 1024×3×3=9216로 설정.


3. 모델 출력:

    fc2는 최종 클래스 개수인 26을 출력.


In [None]:
import torch
import torch.nn as nn

class NeuralNet(nn.Module):
    def __init__(self):
        super().__init__()

        # Convolutional layers
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 28x28 -> 14x14
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(32),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 14x14 -> 7x7
        )
        self.layer3 = nn.Sequential(
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(64),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 7x7 -> 3x3
        )
        self.layer4 = nn.Sequential(
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(128)  # 3x3 -> 3x3 (No pooling here)
        )
        self.layer5 = nn.Sequential(
            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(256)  # 3x3 -> 3x3
        )
        self.layer6 = nn.Sequential(
            nn.Conv2d(in_channels=256, out_channels=512, kernel_size=1, stride=1, padding=0),
            nn.ReLU(),
            nn.BatchNorm2d(512)  # 3x3 -> 3x3
        )
        self.layer7 = nn.Sequential(
            nn.Conv2d(in_channels=512, out_channels=1024, kernel_size=1, stride=1, padding=0),
            nn.ReLU(),
            nn.BatchNorm2d(1024)  # 3x3 -> 3x3
        )

        # Fully connected layers
        self.dropout = nn.Dropout(p=Drp)
        self.fc1 = nn.Linear(in_features=1024 * 3 * 3, out_features=1024)  # Updated to 1024 * 3 * 3
        self.fc2 = nn.Linear(in_features=1024, out_features=26)

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.layer5(x)
        x = self.layer6(x)
        x = self.layer7(x)
        x = x.reshape(x.size(0), -1)  # Flatten
        x = self.dropout(x)
        x = self.fc1(x)
        x = self.fc2(x)
        return x


# parameter 조절 및 training start

In [None]:
learning_rate = 0.001
batch_size = 100
num_classes = 26
epochs = 3
Drp = 0.5

In [None]:
net = NeuralNet()

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=learning_rate)

In [None]:
pd_results = []

for epoch in range(epochs):
    for i, (images, labels) in enumerate(train_loader):
        out = net(images)
        loss = criterion(out, labels-1)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total = labels.size(0)
        preds = torch.max(out.data, 1)[1]
        correct = (preds==labels-1).sum().item()

        if (i+1)%200==0:
            results = OrderedDict()
            results['epoch'] = epoch+1
            results['idx'] = i+1
            results['loss'] = loss.item()
            results['accuracy'] = 100.*correct/total
            pd_results.append(results)
            df = pd.DataFrame.from_dict(pd_results, orient='columns')

            clear_output(wait=True)
            display(df)

Unnamed: 0,epoch,idx,loss,accuracy
0,1,200,0.658513,77.34375
1,1,400,0.546699,81.25
2,1,600,0.29793,91.40625
3,1,800,0.478698,89.84375
4,2,200,0.416404,87.5
5,2,400,0.416659,87.5
6,2,600,0.248346,92.1875
7,2,800,0.420239,92.96875
8,3,200,0.275727,89.84375
9,3,400,0.324023,89.84375


# state_dict 저장

In [None]:
# 모델 초기화
model = NeuralNet()

# 옵티마이저 초기화
optimizer = optim.Adam(model.parameters(), lr=0.001)

# 모델의 state_dict 출력
print("Model's state_dict:")
for param_tensor in model.state_dict():
    print(param_tensor, "\t", model.state_dict()[param_tensor].size())

# 옵티마이저의 state_dict 출력
print("Optimizer's state_dict:")
for var_name in optimizer.state_dict():
    print(var_name, "\t", optimizer.state_dict()[var_name])

Model's state_dict:
layer1.0.weight 	 torch.Size([16, 1, 3, 3])
layer1.0.bias 	 torch.Size([16])
layer1.1.weight 	 torch.Size([16])
layer1.1.bias 	 torch.Size([16])
layer1.1.running_mean 	 torch.Size([16])
layer1.1.running_var 	 torch.Size([16])
layer1.1.num_batches_tracked 	 torch.Size([])
layer2.0.weight 	 torch.Size([32, 16, 3, 3])
layer2.0.bias 	 torch.Size([32])
layer2.1.weight 	 torch.Size([32])
layer2.1.bias 	 torch.Size([32])
layer2.1.running_mean 	 torch.Size([32])
layer2.1.running_var 	 torch.Size([32])
layer2.1.num_batches_tracked 	 torch.Size([])
layer3.0.weight 	 torch.Size([64, 32, 3, 3])
layer3.0.bias 	 torch.Size([64])
layer3.1.weight 	 torch.Size([64])
layer3.1.bias 	 torch.Size([64])
layer3.1.running_mean 	 torch.Size([64])
layer3.1.running_var 	 torch.Size([64])
layer3.1.num_batches_tracked 	 torch.Size([])
layer4.0.weight 	 torch.Size([128, 64, 3, 3])
layer4.0.bias 	 torch.Size([128])
layer4.1.weight 	 torch.Size([128])
layer4.1.bias 	 torch.Size([128])
layer4.1.run

In [None]:
torch.save({
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict()
}, "emnist_4layer_ver5.pth")

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# testing loop

In [None]:
net.eval()

correct, total = 0, 0
with torch.no_grad():
    for i, (images, labels) in enumerate(test_loader):
        out = net(images)
        preds = torch.max(out.data, 1)[1]
        correct += (preds==labels-1).sum().item()
        total += len(labels)

    print("Test accuracy: ", 100.*correct/total)

Test accuracy:  94.40865384615384


### 2 layer

origin (2 layer)

learning_rate = 0.001
batch_size = 100
num_classes = 26
epochs = 3

**Test accuracy: 92.48** 15sec






origin2 (2 layer ver2)  
learning_rate = 0.001
batch_size = 128
num_classes = 26
epochs = 10

**Test accuracy: 92.94** 15sec






origin3 (2 layer ver3)  
learning_rate = 0.001
batch_size = 100
num_classes = 10
epochs = 5

**Test accuracy: 92.87** 16sec



---


### 4 layer

develop1 (4 layer ver1)  
learning_rate = 0.001
batch_size = 100
num_classes = 26
epochs = 3

**Test accuracy: 92.88** 16sec






develop2 (4 layer ver2)  
learning_rate = 0.001
batch_size = 128
num_classes = 26
epochs = 10

**Test accuracy: 94.00** 24sec

Retry as best 93.85 26sec...maybe adam 때문에..




develop3 (4 layer ver3)  
learning_rate = 0.001
batch_size = 100
num_classes = 10
epochs = 5

**Test accuracy: 93.67** 27sec





---


### 5 layer

develop4 (5 layer ver1)  
learning_rate = 0.001
batch_size = 100
num_classes = 26
epochs = 3

**Test accuracy: 92.19** 12sec






develop5 (5 layer ver2)  
learning_rate = 0.001
batch_size = 128
num_classes = 26
epochs = 10

**Test accuracy: 93.24** 8sec






develop6 (5 layer ver3)  
learning_rate = 0.001
batch_size = 100
num_classes = 10
epochs = 5

**Test accuracy: 93.14** 10sec



---


### 6 layer

develop7 (6 layer ver1)  
learning_rate = 0.001
batch_size = 100
num_classes = 26
epochs = 3

**Test accuracy: 89.34** 20sec






develop8 (6 layer ver2)  
learning_rate = 0.001
batch_size = 128
num_classes = 26
epochs = 10

**Test accuracy: 93.59** 20sec






develop9 (6 layer ver3)  
learning_rate = 0.001
batch_size = 100
num_classes = 10
epochs = 5

**Test accuracy: 93.41** 20sec



---


### 7 layer

develop10 (7 layer ver1)  
learning_rate = 0.001
batch_size = 100
num_classes = 26
epochs = 3

**Test accuracy: 89.** 31sec






develop11 (7 layer ver2)  
learning_rate = 0.001
batch_size = 128
num_classes = 26
epochs = 10

**Test accuracy: 91.80** 31sec






develop12 (7 layer ver3)  
learning_rate = 0.001
batch_size = 100
num_classes = 10
epochs = 5

**Test accuracy: 92.17** 18sec

# Develop version

정규화 Weight Decay & Batch Normalization

Adam optimazier에 Weight Decay를 추가. 과적합방지.
각 Conv layer 뒤에 batch normalization 추가.


--------------------------------


### 5 layer ver1 (ver4~ver
learning_rate = 0.001,  batch_size = 64,  num_classes = 26,  epochs = 5,   Drp = 0.5

**Test accuracy: 93.46 07sec**

--

learning_rate = 0.001,  batch_size = 128,  num_classes = 26,  epochs = 5,   Drp = 0.5

**Test accuracy: 93.38 10sec**


--

learning_rate = 0.001,  batch_size = 128,  num_classes = 26,  epochs = 20,   Drp = 0.5

**Test accuracy: 93.76 10sec**


--

learning_rate = 0.001,  batch_size = 128,  num_classes = 26,  epochs = 25,   Drp = 0.5

**Test accuracy: 93.65 09sec**




--------------------------------


### 4 layer ver1 (ver4~ver
learning_rate = 0.001,  batch_size = 64,  num_classes = 26,  epochs = 10,   Drp = 0.5

**Test accuracy: 94.02 11sec**


--


learning_rate = 0.001,  batch_size = 128,  num_classes = 26,  epochs = 20,   Drp = 0.5

**Test accuracy: 94.41 11sec**


--


learning_rate = 0.001,  batch_size = 128,  num_classes = 26,  epochs = 30,   Drp = 0.5

**Test accuracy: 94.75 ?sec**