# 모델 구조 설명

- **기본 구조**:
  - MobileNet은 깊이별 분리 합성곱(depthwise separable convolutions)을 사용하여 구축되며, 첫 번째 층만 전체 합성곱을 사용합니다.
  
- **모든 층의 특징**:
  - 모든 층은 batch normalization(BatchNorm)과 ReLU 비선형성을 가지고 있습니다.
  - 최종 완전 연결층(fully connected layer)만 비선형성이 없고 softmax 층으로 분류 작업을 수행합니다.
- **층의 대비**:
  - 표준 합성곱 층: 3×3 합성곱 + BatchNorm + ReLU.
  - 분리 합성곱 층: 깊이별 합성곱(3×3) + BatchNorm + 1×1 합성곱(포인트 와이즈 합성곱) + BatchNorm + ReLU (각 합성곱 층마다).
- **다운 샘플링(Down Sampling)**:
  - 첫 번째 층과 깊이별 합성곱에서 stride 2를 사용하여 수행됩니다.
  - 최종 평균 폴링(Average Pooling)은 공간 해상도를 1로 줄입니다.
- **효율적인 연산**:
  - MobileNet은 1×1 합성곱(포인트 와이즈 합성곱)에 대부분의 계산을 집중시킵니다.
  - 1×1 합성곱은 im2col 재정렬이 필요 없기에 매우 최적화된 GEMM 함수를 통해 직접적으로 구현 가능합니다.
- **훈련 방법**:
  - TensorFlow를 사용하여 RMSprop 최적화 기법과 비동기식 gradient descent으로 훈련됩니다.
  - 과적합을 줄이기 위해 큰 모델 훈련과 달리, 적은 정규화 및 데이터 증강 기법 사용.
- **매개변수와 계산시간 분배**:
  - MobileNet의 95% 계산 시간은 1×1 합성곱에 사용되며, 이는 전체 매개변수의 75%를 차지합니다(Tables 1, 2 참조).

# Data Prepairing

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
from torchvision import datasets
from torch.utils.data import DataLoader

In [2]:
transform = transforms.Compose(
    [
        # transforms.Resize(224),
        # transforms.RandomCrop((224, 224), padding=4),
        transforms.RandomCrop((32, 32), padding=4),
        transforms.RandomVerticalFlip(0.5),
        transforms.RandomHorizontalFlip(0.5),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261)),
    ]
)

train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)

test_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
test_loader = DataLoader(test_dataset, batch_size=16, shuffle=False)

Files already downloaded and verified
Files already downloaded and verified


# Modeling

## Depthwise Separable Convolution

In [3]:
class DepthwiseSeparableConv(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(DepthwiseSeparableConv, self).__init__()
        self.depthwise = nn.Conv2d(in_channels, in_channels,
                                   kernel_size=3, padding=1,
                                   groups=in_channels, bias=False)
        self.pointwise = nn.Conv2d(in_channels, out_channels,
                                   kernel_size=1, bias=False)

    def forward(self, x):
        out = self.depthwise(x)
        out = self.pointwise(out)
        return out

In [4]:
class MobileNet(nn.Module):
    class ConvBlock(nn.Module):
        def __init__(self, in_channels, out_channels, kernel_size, stride, padding):
            super(MobileNet.ConvBlock, self).__init__()

            self.conv = nn.Conv2d(in_channels, out_channels,
                                  kernel_size, stride, padding, bias=False)
            self.bn = nn.BatchNorm2d(out_channels)
            self.relu = nn.ReLU(inplace=True)

        def forward(self, x):
            out = self.conv(x)
            out = self.bn(out)
            out = self.relu(out)
            return out

    class DWConvBlock(nn.Module):
        def __init__(self, in_channels, out_channels):
            super(MobileNet.DWConvBlock, self).__init__()

            self.conv = DepthwiseSeparableConv(in_channels, out_channels)
            self.bn = nn.BatchNorm2d(out_channels)
            self.relu = nn.ReLU(inplace=True)

        def forward(self, x):
            out = self.conv(x)
            out = self.bn(out)
            out = self.relu(out)
            return out

    class ConvDWConvBlock(nn.Module):
        def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=0):
            super(MobileNet.ConvDWConvBlock, self).__init__()

            self.conv_block = MobileNet.ConvBlock(in_channels, out_channels,
                                                  kernel_size, stride, padding)
            self.dw_conv_block = MobileNet.DWConvBlock(out_channels, out_channels)

        def forward(self, x):
            out = self.conv_block(x)
            out = self.dw_conv_block(out)
            return out

    def __init__(self, num_classes=10):
        super(MobileNet, self).__init__()

        self.layers = nn.Sequential(
            self.ConvDWConvBlock(3, 32, stride=2, padding=1),
            self.ConvDWConvBlock(32, 64),
            self.ConvDWConvBlock(64, 128, kernel_size=1),
            self.ConvDWConvBlock(128, 128, kernel_size=1),
            self.ConvDWConvBlock(128, 256, kernel_size=1),
            self.ConvDWConvBlock(256, 256, kernel_size=1),
            self.ConvDWConvBlock(256, 512, kernel_size=1),
            self.ConvDWConvBlock(512, 512),
            self.DWConvBlock(512, 512),
            self.DWConvBlock(512, 512),
            self.DWConvBlock(512, 512),
            self.DWConvBlock(512, 512),
            self.ConvDWConvBlock(512, 512, kernel_size=1),
            self.ConvDWConvBlock(512, 1024, kernel_size=1),
            self.ConvBlock(1024, 1024, kernel_size=1, stride=1, padding=0),
            nn.AdaptiveAvgPool2d(1),
            nn.Flatten(),
            nn.Linear(1024, num_classes)
        )

    def forward(self, x):
        return self.layers(x)



# Training

In [5]:
import tqdm
import torch.optim as optim

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MobileNet()
model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

num_epochs = 10
for epoch in range(num_epochs):
    iterator = tqdm.tqdm(train_loader)
    model.train()
    for images, labels in iterator:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()

        outputs = model(images)
        loss = criterion(outputs, labels)

        loss.backward()
        optimizer.step()

        iterator.set_description(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')


Epoch [1/10], Loss: 2.0543: 100%|██████████| 3125/3125 [01:17<00:00, 40.27it/s]
Epoch [2/10], Loss: 1.5630: 100%|██████████| 3125/3125 [01:16<00:00, 40.93it/s]
Epoch [3/10], Loss: 1.2879: 100%|██████████| 3125/3125 [01:16<00:00, 40.91it/s]
Epoch [4/10], Loss: 1.2537: 100%|██████████| 3125/3125 [01:19<00:00, 39.52it/s]
Epoch [5/10], Loss: 1.3219: 100%|██████████| 3125/3125 [01:19<00:00, 39.35it/s]
Epoch [6/10], Loss: 1.3167: 100%|██████████| 3125/3125 [01:19<00:00, 39.27it/s]
Epoch [7/10], Loss: 1.7158: 100%|██████████| 3125/3125 [01:16<00:00, 40.85it/s]
Epoch [8/10], Loss: 1.4315: 100%|██████████| 3125/3125 [01:15<00:00, 41.15it/s]
Epoch [9/10], Loss: 1.1968: 100%|██████████| 3125/3125 [01:15<00:00, 41.13it/s]
Epoch [10/10], Loss: 1.2828: 100%|██████████| 3125/3125 [01:15<00:00, 41.17it/s]


# Testing

In [6]:
model.eval()
with torch.no_grad():
    total = 0
    correct = 0
    iterator = tqdm.tqdm(test_loader)
    for images, labels in iterator:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print(f'\nAccuracy of the model on the test images: {100 * correct / total:.2f}%')


100%|██████████| 625/625 [00:07<00:00, 85.96it/s]


Accuracy of the model on the test images: 55.77%



