## 과제 1
ReLu activation function과 derivative function을 구현해보세요
- Hint : np.maximum 함수 사용하면 편리합니다
- 다른 방법 사용하셔도 무방합니다


In [2]:
import numpy as np

def relu(x):
  return np.maximum(0, x)

In [3]:
def d_relu(x):
  for value in x:
    return 1 if value > 0 else 0

## 과제 2
Deep Learning Basic 코드 파일의 MLP implementation with Numpy library using MNIST dataset 코드 참고해서
Three layer MLP 일 때의 backward_pass 함수를 완성해주세요.   
- Hint : 코드 파일의 예시는 Two layer MLP


In [4]:
def backward_pass(x, y_true, params):
  dS3 = params["A3"] - y_true
  # Please check http://machinelearningmechanic.com/deep_learning/2019/09/04/cross-entropy-loss-derivative.html
  # dS2 is softmax + CE loss derivative

  grads = {}

  grads["dW3"] =  np.dot(dS3, params["A2"].T)/x.shape[1]
  grads["db3"] =  (1/x.shape[1])*np.sum(dS3, axis=1, keepdims=True)/x.shape[1]

  dA2 = np.dot(params["W3"].T, dS3)
  dS2 = dA2 * d_sigmoid(params["S2"])

  grads["dW2"] =  np.dot(dS2, params["A1"].T)/x.shape[1]
  grads["db2"] =  (1/x.shape[1])*np.sum(dS2, axis=1, keepdims=True)/x.shape[1]

  dA1 = np.dot(params["W2"].T, dS2)
  dS1 = dA1 * d_sigmoid(params["S1"])

  grads["dW1"] = np.dot(dS1, x.T)/x.shape[1]
  grads["db1"] = np.sum(dS1, axis=1, keepdims=True)/x.shape[1]

  return grads

## 과제 3
Deep Learning Basic 코드 파일의 MLP implementation with Pytorch library using MNIST dataset 코드 참고해서
Three layer MLP를 구한후, 학습을 돌려 보세요

hyperparameter는 다음과 같이 설정

- epochs : 100
- hiddensize : 128, 64 (two layer)
- learning_rate : 0.5

In [6]:
from torchvision import transforms, datasets
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

In [7]:
transform = transforms.Compose([
    transforms.ToTensor()
])

In [8]:
trainset = datasets.MNIST(
    root      = './.data/', 
    train     = True,
    download  = True,
    transform = transform
)
testset = datasets.MNIST(
    root      = './.data/', 
    train     = False,
    download  = True,
    transform = transform
)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./.data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ./.data/MNIST/raw/train-images-idx3-ubyte.gz to ./.data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./.data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ./.data/MNIST/raw/train-labels-idx1-ubyte.gz to ./.data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./.data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ./.data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./.data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./.data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ./.data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./.data/MNIST/raw



In [9]:
BATCH_SIZE = 512
train_loader = DataLoader(trainset, batch_size=BATCH_SIZE, shuffle=True)
test_loader =  DataLoader(testset, batch_size=BATCH_SIZE, shuffle=True)

In [10]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.layer1 = nn.Linear(784,128)
        self.layer2 = nn.Linear(128,64)
        self.layer3 = nn.Linear(64,10)
        self.relu = nn.ReLU()
        
    def forward(self, x):
        x = x.view(-1, 784)
        out = self.layer1(x)
        out = self.relu(out)
        out = self.layer2(out)
        out = self.relu(out)
        out = self.layer3(out)

        return out

In [11]:
model = Net()
model

Net(
  (layer1): Linear(in_features=784, out_features=128, bias=True)
  (layer2): Linear(in_features=128, out_features=64, bias=True)
  (layer3): Linear(in_features=64, out_features=10, bias=True)
  (relu): ReLU()
)

In [12]:
list(model.parameters())

[Parameter containing:
 tensor([[ 0.0348,  0.0091, -0.0216,  ..., -0.0076, -0.0137,  0.0276],
         [-0.0071, -0.0169, -0.0224,  ..., -0.0134, -0.0300, -0.0108],
         [-0.0140,  0.0021,  0.0251,  ..., -0.0326,  0.0302,  0.0347],
         ...,
         [-0.0145, -0.0279,  0.0325,  ..., -0.0072, -0.0136, -0.0015],
         [-0.0158, -0.0063, -0.0233,  ...,  0.0200, -0.0293,  0.0271],
         [ 0.0179,  0.0153,  0.0063,  ..., -0.0092,  0.0169,  0.0338]],
        requires_grad=True), Parameter containing:
 tensor([ 0.0123,  0.0030, -0.0139, -0.0146,  0.0350, -0.0069,  0.0194, -0.0140,
         -0.0092,  0.0072,  0.0100, -0.0189,  0.0277,  0.0046, -0.0176, -0.0086,
          0.0343, -0.0082,  0.0122,  0.0033, -0.0294, -0.0122,  0.0062,  0.0246,
         -0.0192, -0.0295,  0.0142,  0.0301, -0.0265, -0.0321, -0.0088,  0.0198,
          0.0224, -0.0120, -0.0327, -0.0127, -0.0286,  0.0312, -0.0037, -0.0327,
          0.0058, -0.0273,  0.0133,  0.0294, -0.0051,  0.0047,  0.0145,  0.0104,

In [13]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.5)

In [14]:
def train(model, train_loader, optimizer):
    model.train()
    batch_losses = []

    for data, target in train_loader:
        optimizer.zero_grad()

        output = model(data)

        loss = criterion(output, target)
        batch_losses.append(loss)

        loss.backward()

        optimizer.step()
        
    avg_loss = sum(batch_losses) / len(batch_losses)
    
    return avg_loss

In [15]:
def evaluate(model, test_loader):
    model.eval()

    batch_losses = []
    correct = 0 

    with torch.no_grad(): 
        for data, target in test_loader:
            output = model(data)

            loss = criterion(output, target)
            batch_losses.append(loss)

            pred = output.max(1, keepdim=True)[1]

            correct += pred.eq(target.view_as(pred)).sum().item()

    avg_loss =  sum(batch_losses) / len(batch_losses)

    accuracy = 100. * correct / len(test_loader.dataset)

    return avg_loss, accuracy

In [20]:
EPOCHS = 100

for epoch in range(1, EPOCHS + 1):
    train_loss = train(model, train_loader, optimizer)
    test_loss, test_accuracy = evaluate(model, test_loader)
    
    print('[{}] Train Loss: {:.4f}\tTest Loss: {:.4f}\tAccuracy: {:.2f}%'.format(
          epoch, train_loss, test_loss, test_accuracy))

[1] Train Loss: 0.8074	Test Loss: 0.3102	Accuracy: 90.64%
[2] Train Loss: 0.2339	Test Loss: 0.2566	Accuracy: 91.75%
[3] Train Loss: 0.1650	Test Loss: 0.1437	Accuracy: 95.43%
[4] Train Loss: 0.1277	Test Loss: 0.1707	Accuracy: 94.56%
[5] Train Loss: 0.1046	Test Loss: 0.1193	Accuracy: 96.22%
[6] Train Loss: 0.0883	Test Loss: 0.1003	Accuracy: 96.89%
[7] Train Loss: 0.0766	Test Loss: 0.1265	Accuracy: 95.98%
[8] Train Loss: 0.0669	Test Loss: 0.0979	Accuracy: 96.79%
[9] Train Loss: 0.0594	Test Loss: 0.0792	Accuracy: 97.44%
[10] Train Loss: 0.0516	Test Loss: 0.0812	Accuracy: 97.45%
[11] Train Loss: 0.0455	Test Loss: 0.2104	Accuracy: 93.18%
[12] Train Loss: 0.4771	Test Loss: 0.1652	Accuracy: 95.20%
[13] Train Loss: 0.1075	Test Loss: 0.1225	Accuracy: 96.29%
[14] Train Loss: 0.0834	Test Loss: 0.1353	Accuracy: 95.89%
[15] Train Loss: 0.0726	Test Loss: 0.1242	Accuracy: 96.27%
[16] Train Loss: 0.0706	Test Loss: 0.1363	Accuracy: 96.01%
[17] Train Loss: 0.1390	Test Loss: 0.0958	Accuracy: 96.85%
[18] T

## 과제 4
과제 3 부분의 성능을 지금까지 배운 지식을 바탕으로 향상시켜보세요

- Hint : Activation function, hyperparameter setting

In [16]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.layer1 = nn.Linear(784,128)
        self.layer2 = nn.Linear(128,64)
        self.layer3 = nn.Linear(64,10)
        self.relu = nn.LeakyReLU(0.1)
        
    def forward(self, x):
        x = x.view(-1, 784)
        out = self.layer1(x)
        out = self.relu(out)
        out = self.layer2(out)
        out = self.relu(out)
        out = self.layer3(out)

        return out

In [17]:
model = Net()
model

Net(
  (layer1): Linear(in_features=784, out_features=128, bias=True)
  (layer2): Linear(in_features=128, out_features=64, bias=True)
  (layer3): Linear(in_features=64, out_features=10, bias=True)
  (relu): LeakyReLU(negative_slope=0.1)
)

In [18]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

In [19]:
EPOCHS = 100

for epoch in range(1, EPOCHS + 1):
    train_loss = train(model, train_loader, optimizer)
    test_loss, test_accuracy = evaluate(model, test_loader)
    
    print('[{}] Train Loss: {:.4f}\tTest Loss: {:.4f}\tAccuracy: {:.2f}%'.format(
          epoch, train_loss, test_loss, test_accuracy))

[1] Train Loss: 1.5436	Test Loss: 0.6782	Accuracy: 81.17%
[2] Train Loss: 0.5009	Test Loss: 0.4285	Accuracy: 87.67%
[3] Train Loss: 0.3720	Test Loss: 0.3653	Accuracy: 89.42%
[4] Train Loss: 0.3315	Test Loss: 0.3338	Accuracy: 90.21%
[5] Train Loss: 0.3066	Test Loss: 0.3373	Accuracy: 89.97%
[6] Train Loss: 0.2852	Test Loss: 0.2978	Accuracy: 91.59%
[7] Train Loss: 0.2677	Test Loss: 0.2636	Accuracy: 92.51%
[8] Train Loss: 0.2509	Test Loss: 0.2518	Accuracy: 92.67%
[9] Train Loss: 0.2371	Test Loss: 0.2402	Accuracy: 92.78%
[10] Train Loss: 0.2219	Test Loss: 0.2309	Accuracy: 92.87%
[11] Train Loss: 0.2101	Test Loss: 0.2078	Accuracy: 93.82%
[12] Train Loss: 0.1977	Test Loss: 0.2096	Accuracy: 93.93%
[13] Train Loss: 0.1867	Test Loss: 0.1936	Accuracy: 94.23%
[14] Train Loss: 0.1754	Test Loss: 0.1769	Accuracy: 94.64%
[15] Train Loss: 0.1653	Test Loss: 0.1921	Accuracy: 94.08%
[16] Train Loss: 0.1577	Test Loss: 0.1598	Accuracy: 95.31%
[17] Train Loss: 0.1498	Test Loss: 0.1630	Accuracy: 95.19%
[18] T

**무엇을 보완하였고, 왜 보완되었는지에 대한 자유 서술 (아래에)**

In [None]:
1. Activation function ReLU에서 LeakyReLU로 변경
2. Learning rate 0.5에서 0.1로 변경