## Weight_Initialization

1. 작은 숫자로 초기화하는 방법
  - 가중치를 평균 0, 편차 0.02로 초기화
  - 편차를 0으로 초기화
  
  
2. Xavier Glorot Initialization
  - 데이터가 몇 개의 레이어를 통과하더라도 활성화 값이 너무 커지거나 작아지지 않도록 일정한 범위 안에 있도록 잡아줌
  - 모듈의 가중치를 xavier normal로 초기화
  - 편차를 0으로 초기화
  - 시그모이드, 하이퍼볼릭탄젠트 활성화함수 사용 시 사용
  
  
3. Kaming He Initialization
  - 활성화 값이 0 이하이면 바꿔 전달함
  - 모듈의 가중치를 kaming he normal로 초기화
  - 편차를 0으로 초기화
  - 렐루, 리키렐루 활성화함수 사용 시 사용 

#### module

In [1]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.init as init
import torchvision.datasets as dset
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

In [2]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

cuda:0


#### parameters

In [3]:
num_epoch = 10
batch_size = 256
learning_rate = 2e-4

#### data

[pytorch dataset download error solution](https://github.com/pytorch/vision/issues/1938)

In [4]:
from six.moves import urllib
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)

In [5]:
# download
mnist_train = dset.MNIST("./", train=True, transform=transforms.ToTensor(), target_transform=None, download=True)
mnist_test = dset.MNIST("./", train=False, transform=transforms.ToTensor(), target_transform=None, download=True)

In [6]:
mnist_train.__getitem__(0)[0].size(), mnist_train.__len__()

(torch.Size([1, 28, 28]), 60000)

In [7]:
mnist_test.__getitem__(0)[0].size(), mnist_test.__len__()

(torch.Size([1, 28, 28]), 10000)

In [8]:
train_loader = DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=2, drop_last=True)
test_loader = DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=2, drop_last=True)

#### model

In [9]:
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.layer = nn.Sequential(
            nn.Conv2d(1, 16, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(16, 32, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(32, 64, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)
        )
        self.fc_layer = nn.Sequential(
            nn.Linear(64*7*7, 100),
            nn.ReLU(),
            nn.Linear(100, 10)
        )
        
        # 초기화
        for m in self.modules():
            if isinstance(m, nn.Conv2d):                                
                # Kaming Initialization
                init.kaiming_normal_(m.weight.data)
                m.bias.data.fill_(0)
            elif isinstance(m, nn.Linear):
                init.kaiming_normal_(m.weight.data)
                m.bias.data.fill_(0)
        
    def forward(self, x):
        out = self.layer(x)
        out = out.view(batch_size, -1)
        out = self.fc_layer(out)
        return out

In [10]:
model = CNN().to(device)

#### loss

In [11]:
loss_func = nn.CrossEntropyLoss()

#### optimizer

In [12]:
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

#### train

In [13]:
for i in range(num_epoch):
    for j, [image, label] in enumerate(train_loader):
        x = image.to(device)
        y = label.to(device)
        
        optimizer.zero_grad()
        output = model.forward(x)
        loss = loss_func(output, y)
        loss.backward()
        optimizer.step()
        
    if i%10 == 0:
        print(loss)

tensor(2.1269, device='cuda:0', grad_fn=<NllLossBackward>)


#### test

In [14]:
correct = 0
total = 0

with torch.no_grad():
    for image, label in test_loader:
        x = image.to(device)
        y = label.to(device)
        
        output = model.forward(x)
        _, output_index = torch.max(output, 1)
        
        total += label.size(0)
        correct += (output_index == y).sum().float()
        
    print(f"Accuracy of Test Data: {correct/total*100}")

Accuracy of Test Data: 87.24960327148438
