# "[CNN] lr와batch_size가 loss값에 영향을 어떻게 끼치는가"
> "이미지 분류하기 & 전이학습"

- toc: true
- branch: master
- badges: true
- comments: true
- categories: [DL, Pytorch, CNN, VGG]
- author: 도형준

# lr & batch_size가 loss에 영향을 어떻게 미칠까

![colab_gpu](https://user-images.githubusercontent.com/105966480/202898524-61913841-c682-4674-b741-09697e506e1e.png)|![local_gpu](https://user-images.githubusercontent.com/105966480/202898527-5bca2baa-2074-4d26-9d3e-4514ca2a921b.png)
--- | --- | 

## 의문이 생긴 계기

- 코랩과 로컬에서 같은 파라미터를 가진 모델을 학습을 시켰는데, loss 값이 다르게 나왔다.
- 로컬에서의 loss 값을 봤을 때, 최적화가 제대로 이루어지지 않고 있음을 알 수 있다.
- 이 때, gpu의 성능 차이인지 로컬 환경을 잘못 세팅해서인지 의문점이 들었다.
- 사실 아직까지 이 부분에 대해서 제대로 결론을 내리지는 못했다.
- 하지만 이 파트의 의문점을 해소하려고 하는 과정에서, 배치 사이즈가 loss값의 수렴도에 큰 영향을 주게 된다는 것을 알게 되었다.

![image](https://user-images.githubusercontent.com/105966480/202898743-d467f838-1e91-4766-94b6-fbd72068cf5f.png)
- 참조 논문 링크: https://www.sciencedirect.com/science/article/pii/S2405959519303455#fig2
- 결과는 이 노트북 끝의 결과를 보면, 논문을 참고하여 최적은 파라미터들을 제공해주었을 때 성능이 향상되었음을 확인해 볼 수 있다.

In [1]:
import matplotlib.pyplot as plt

from torchvision.datasets.cifar import CIFAR10 # 10가지 클래스를 가지는 이미지 데이터셋 ex. 자동차, 동물 등
from torchvision.transforms import ToTensor

In [2]:
import matplotlib.pyplot as plt
import torchvision.transforms as T

from torchvision.transforms import Compose
from torchvision.transforms import RandomHorizontalFlip, RandomCrop

from torchvision.transforms import Normalize
# 기본 블록 정의
import torch
import torch.nn as nn

from torch.utils.data.dataloader import DataLoader
from torch.optim.adam import Adam

## 모델 정의

In [4]:
# 클래스를 사용하는 이유: 상속을 위해서
# 같은 포맷을 받아서 -> 그 안에 활용
class BasicBlock(nn.Module):
    # 기본 블록 구성하는 기본 정의
    def __init__(self, in_channels, out_channels, hidden_dim):
        # nn.Module
        super(BasicBlock, self).__init__()
        # 합성곱 층
        # in_channels : 입력 채널 수
        # kernel_size : 커널의 크기
        # padding : 이미지 외곽을 둘러쌀 0의 개수
        self.conv1 = nn.Conv2d(in_channels, hidden_dim,
                               kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(hidden_dim, out_channels,
                               kernel_size=3, padding=1)
        self.relu = nn.ReLU()

        # 커널의 이동 거리 stride
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
    
    # 순전파 정의
    def forward(self, x):
        x = self.conv1(x) # 합성곱1을 지나고
        x = self.relu(x) # 활성화함수를 지나고
        x = self.conv2(x) # 합성곱2를 지나고
        x = self.relu(x) # 활성화함수
        x = self.pool(x) # 맥스풀링을 지난걸
        return x # 리턴

### CNN 모델

In [5]:
class CNN(nn.Module):
    def __init__(self, num_classes): # 클래스 개수
        super(CNN, self).__init__()

        # 합성곱 기본 블록 정의
        self.block1 = BasicBlock(in_channels=3, out_channels=32, hidden_dim=16)
        self.block2 = BasicBlock(in_channels=32, out_channels=128, hidden_dim=64)
        self.block3 = BasicBlock(in_channels=128, out_channels=256, hidden_dim=128)

        # 분류기
        self.fc1 = nn.Linear(in_features=4096, out_features=2048)
        self.fc2 = nn.Linear(in_features=2048, out_features=256)
        self.fc3 = nn.Linear(in_features=256, out_features=num_classes)

        # 분류기의 활성화 함수
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x) # 출력 모양 (-1, 256, 4, 4)
        x = torch.flatten(x, start_dim=1)
        
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        x = self.relu(x)

        return x

In [6]:
transforms = Compose([
    RandomCrop((32, 32), padding=4), # 랜덤 크롭핑
    RandomHorizontalFlip(p=0.5), # 1/2 확률로 y축 뒤집기
    ToTensor(), # 텐서 변환
    Normalize(mean=(0.4914, 0.4822, 0.4465), std=(0.247, 0.243, 0.261))
])

In [7]:
# 학습용/평가용 데이터 불러오기
training_data = CIFAR10(
    root="./", 
    train=True, 
    download=True, 
    transform=transforms)

test_data = CIFAR10(
    root="./", 
    train=False, 
    download=True, 
    transform=transforms)

# 데이터로더 정의(batch_size)
train_loader = DataLoader(training_data,batch_size=(16),shuffle = True)
test_loader = DataLoader(test_data,batch_size=(16),shuffle = False)

# device = cpu or gpu(cuda)
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# CNN 모델 정의 (<= CNN 클래스 불러오기(객체 선언))
model = CNN(num_classes=10)

# 모델을 device로
model.to(device)

Files already downloaded and verified
Files already downloaded and verified


CNN(
  (block1): BasicBlock(
    (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu): ReLU()
    (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (block2): BasicBlock(
    (conv1): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (conv2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu): ReLU()
    (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (block3): BasicBlock(
    (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (conv2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu): ReLU()
    (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc1): Linear(in_features=4096, out_features=2048, bias=True)
  (fc2): Linear(in_features=2048, out_features=256,

In [8]:
import time
from tqdm.notebook import tqdm
lr = 0.0001 # 학습률 정의

# 최적화 기법 정의 (adam)
optim = Adam(model.parameters(), lr=lr)

# 학습 루프 정의
for epoch in tqdm(range(100)):
    for data, label in train_loader: # 데이터 호출
        optim.zero_grad() # 기울기 초기화
        preds = model(data.to(device))
        # 분류 문제 (회귀 문제면 MSE)
        loss = nn.CrossEntropyLoss()(preds, label.to(device))
        loss.backward() # 오차 역전파
        optim.step() # 최적화
    
    if epoch==0 or epoch%10==9:
        print(f"epoch{epoch+1} loss:{loss.item()}")

torch.save(model.state_dict(), "CIFAR.pth")

  0%|          | 0/100 [00:00<?, ?it/s]

epoch1 loss:2.219067096710205
epoch10 loss:0.730250358581543
epoch20 loss:0.7352678179740906
epoch30 loss:0.4655658006668091
epoch40 loss:0.29243195056915283
epoch50 loss:0.15692786872386932
epoch60 loss:0.1570410132408142
epoch70 loss:0.025545910000801086
epoch80 loss:0.0008887342410162091
epoch90 loss:0.3047877252101898
epoch100 loss:0.38577550649642944


In [9]:
model.load_state_dict(torch.load('CIFAR.pth', map_location=device))
num_corr = 0
with torch.no_grad():
    for data, label in test_loader:
        output = model(data.to(device))
        preds = output.data.max(1)[1]
        corr = preds.eq(label.to(device).data).sum().item()
        num_corr += corr
    print(f"Accuracy : {num_corr/len(test_data)}")

Accuracy : 0.0011
Accuracy : 0.0026
Accuracy : 0.0039
Accuracy : 0.0051
Accuracy : 0.0065
Accuracy : 0.0076
Accuracy : 0.0089
Accuracy : 0.0101
Accuracy : 0.0113
Accuracy : 0.0126
Accuracy : 0.014
Accuracy : 0.0152
Accuracy : 0.0164
Accuracy : 0.018
Accuracy : 0.0191
Accuracy : 0.0201
Accuracy : 0.0212
Accuracy : 0.0224
Accuracy : 0.0239
Accuracy : 0.0249
Accuracy : 0.026
Accuracy : 0.0275
Accuracy : 0.0286
Accuracy : 0.0298
Accuracy : 0.0309
Accuracy : 0.0322
Accuracy : 0.0334
Accuracy : 0.0346
Accuracy : 0.0358
Accuracy : 0.0371
Accuracy : 0.0382
Accuracy : 0.0398
Accuracy : 0.041
Accuracy : 0.0423
Accuracy : 0.0437
Accuracy : 0.0449
Accuracy : 0.0465
Accuracy : 0.0477
Accuracy : 0.0491
Accuracy : 0.0506
Accuracy : 0.0518
Accuracy : 0.0528
Accuracy : 0.0536
Accuracy : 0.0552
Accuracy : 0.0566
Accuracy : 0.0577
Accuracy : 0.0586
Accuracy : 0.0599
Accuracy : 0.0611
Accuracy : 0.0622
Accuracy : 0.0633
Accuracy : 0.0645
Accuracy : 0.0656
Accuracy : 0.0669
Accuracy : 0.0682
Accuracy : 0.0

Accuracy : 0.5649
Accuracy : 0.5659
Accuracy : 0.5673
Accuracy : 0.5682
Accuracy : 0.5697
Accuracy : 0.5708
Accuracy : 0.572
Accuracy : 0.5734
Accuracy : 0.5747
Accuracy : 0.5758
Accuracy : 0.5768
Accuracy : 0.5779
Accuracy : 0.579
Accuracy : 0.5802
Accuracy : 0.5815
Accuracy : 0.5829
Accuracy : 0.5843
Accuracy : 0.5856
Accuracy : 0.587
Accuracy : 0.5881
Accuracy : 0.5893
Accuracy : 0.5906
Accuracy : 0.5919
Accuracy : 0.5931
Accuracy : 0.5944
Accuracy : 0.5955
Accuracy : 0.5965
Accuracy : 0.5978
Accuracy : 0.599
Accuracy : 0.5998
Accuracy : 0.6009
Accuracy : 0.6024
Accuracy : 0.6033
Accuracy : 0.6041
Accuracy : 0.6053
Accuracy : 0.6066
Accuracy : 0.6079
Accuracy : 0.6092
Accuracy : 0.6102
Accuracy : 0.6114
Accuracy : 0.6126
Accuracy : 0.6138
Accuracy : 0.6148
Accuracy : 0.6162
Accuracy : 0.6174
Accuracy : 0.6186
Accuracy : 0.6198
Accuracy : 0.6212
Accuracy : 0.6224
Accuracy : 0.6234
Accuracy : 0.6246
Accuracy : 0.6259
Accuracy : 0.627
Accuracy : 0.6283
Accuracy : 0.6294
Accuracy : 0.63

- loss값은 크게 낮아졌으나, 정확도는 크게 오르지 않음.
- epoch 횟수를 늘린다면, 성능이 올라갈 것이라고 예상함.