## 과제 1
ReLu activation function과 derivative function을 구현해보세요
- Hint : np.maximum 함수 사용하면 편리합니다
- 다른 방법 사용하셔도 무방합니다


In [1]:
import numpy as np
def relu(x):
    return np.maximum(0,x)

In [2]:
def d_relu(x):
    return np.greater(x,0).astype(np.int32)
#np.greater=return the truth value of (x1 > x2)

## 과제 2
Deep Learning Basic 코드 파일의 MLP implementation with Numpy library using MNIST dataset 코드 참고해서
Three layer MLP 일 때의 backward_pass 함수를 완성해주세요.   
- Hint : 코드 파일의 예시는 Two layer MLP


In [3]:
def backward_pass(x, y_true, params):
  dS3 = params["A3"] - y_true

  grads = {}

  grads["dW3"] =  np.dot(dS3, params["A2"].T)/x.shape[1]
  grads["db3"] =  (1/x.shape[1])*np.sum(dS3, axis=1, keepdims=True)/x.shape[1]

  dA2 = np.dot(params["W3"].T, dS3)
  dS2 = dA2 * d_relu(params["S2"])

  grads["dW2"] =  np.dot(dS2, params["A1"].T)/x.shape[1]
  grads["db2"] =  (1/x.shape[1])*np.sum(dS2, axis=1, keepdims=True)/x.shape[1]

  dA1 = np.dot(params["W2"].T, dS2)
  dS1 = dA1 * d_relu(params["S1"])

  grads["dW1"] = np.dot(dS1, x.T)/x.shape[1]
  grads["db1"] = np.sum(dS1, axis=1, keepdims=True)/x.shape[1]

  return grads

## 과제 3
Deep Learning Basic 코드 파일의 MLP implementation with Pytorch library using MNIST dataset 코드 참고해서
Three layer MLP를 구한후, 학습을 돌려 보세요

hyperparameter는 다음과 같이 설정

- epochs : 100
- hiddensize : 128, 64 (two layer)
- learning_rate : 0.5

In [4]:
from torchvision import transforms, datasets
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

In [5]:
transform = transforms.Compose([
    transforms.ToTensor()
])
trainset = datasets.MNIST(
    root      = './.data/', 
    train     = True,
    download  = True,
    transform = transform
)
testset = datasets.MNIST(
    root      = './.data/', 
    train     = False,
    download  = True,
    transform = transform
)

In [6]:
BATCH_SIZE = 512
train_loader = DataLoader(trainset, batch_size=BATCH_SIZE, shuffle=True)
test_loader =  DataLoader(testset, batch_size=BATCH_SIZE, shuffle=True)

In [20]:
class Net1(nn.Module):
  def __init__(self):
    super(Net1,self).__init__()
    self.layer1=nn.Linear(784,128)
    self.layer2=nn.Linear(128,64)
    self.layer3=nn.Linear(64,10)
    self.sigmoid=nn.Sigmoid()
  def forward(self,x):
    x=x.view(-1,784)
    out=self.layer1(x)
    out=self.sigmoid(out)
    out=self.layer2(out)
    out=self.sigmoid(out)
    out=self.layer3(out)
    return out

In [21]:
model1=Net1()
criterion=nn.CrossEntropyLoss() #손실함수
optimizer=optim.SGD(model1.parameters(),lr=0.5) #optimizer

In [22]:
def train(model,train_loader,optimizer):
    model.train()
    batch_losses=[]
    for data,target in train_loader:
        optimizer.zero_grad()
        output=model(data)
        loss=criterion(output,target)
        batch_losses.append(loss)
        
        loss.backward()
        optimizer.step()
    
    avg_loss=sum(batch_losses)/len(batch_losses)
    return avg_loss

In [23]:
def evaluate(model, test_loader):
    model.eval()
    batch_losses=[]
    correct=0
    with torch.no_grad():
        for data,target in test_loader:
            output=model(data)
            loss=criterion(output,target)
            batch_losses.append(loss)
            
            pred=output.max(1,keepdim=True)[1]
            correct+=pred.eq(target.view_as(pred)).sum().item()
    
    avg_loss=sum(batch_losses)/len(batch_losses)
    accuracy=100*correct/len(test_loader.dataset)
    return avg_loss,accuracy

In [25]:
epochs=100
for epoch in range(1,epochs+1):
    train_loss=train(model1, train_loader, optimizer)
    test_loss, test_accuracy=evaluate(model1,test_loader)
    print('{} train loss:{:.2f}, test loss:{:.2f}, accuracy:{:.2f}'.format(epoch,train_loss,test_loss,test_accuracy))

1 train loss:2.25, test loss:2.07, accuracy:22.09
2 train loss:1.38, test loss:0.95, accuracy:66.24
3 train loss:0.77, test loss:0.67, accuracy:79.21
4 train loss:0.55, test loss:0.49, accuracy:85.31
5 train loss:0.44, test loss:0.42, accuracy:87.68
6 train loss:0.38, test loss:0.36, accuracy:89.30
7 train loss:0.35, test loss:0.33, accuracy:90.52
8 train loss:0.33, test loss:0.32, accuracy:91.11
9 train loss:0.31, test loss:0.32, accuracy:90.49
10 train loss:0.29, test loss:0.28, accuracy:91.78
11 train loss:0.27, test loss:0.26, accuracy:91.98
12 train loss:0.26, test loss:0.27, accuracy:92.09
13 train loss:0.25, test loss:0.24, accuracy:93.11
14 train loss:0.23, test loss:0.23, accuracy:93.11
15 train loss:0.22, test loss:0.23, accuracy:92.84
16 train loss:0.21, test loss:0.21, accuracy:93.66
17 train loss:0.20, test loss:0.20, accuracy:94.20
18 train loss:0.20, test loss:0.20, accuracy:93.83
19 train loss:0.19, test loss:0.19, accuracy:94.42
20 train loss:0.18, test loss:0.18, accu

## 과제 4
과제 3 부분의 성능을 지금까지 배운 지식을 바탕으로 향상시켜보세요

- Hint : Activation function, hyperparameter setting

In [26]:
class Net2(nn.Module):
    def __init__(self):
        super(Net2,self).__init__()
        self.layer1=nn.Linear(784,128)
        self.layer2=nn.Linear(128,64)
        self.layer3=nn.Linear(64,10)
        self.relu=nn.ReLU()
    def forward(self,x):
        x=x.view(-1,784)
        out=self.layer1(x)
        out=self.relu(out)
        out=self.layer2(out)
        out=self.relu(out)
        out=self.layer3(out)
        return out

In [27]:
model2=Net2()
criterion=nn.CrossEntropyLoss() #손실함수
optimizer=optim.SGD(model2.parameters(),lr=0.5) #optimizer

In [28]:
epochs=100
for epoch in range(1,epochs+1):
    train_loss=train(model2, train_loader, optimizer)
    test_loss, test_accuracy=evaluate(model2,test_loader)
    print('{} train loss:{:.2f}, test loss:{:.2f}, accuracy:{:.2f}'.format(epoch,train_loss,test_loss,test_accuracy))

1 train loss:0.81, test loss:0.34, accuracy:89.33
2 train loss:0.24, test loss:0.23, accuracy:92.79
3 train loss:0.17, test loss:0.16, accuracy:94.84
4 train loss:0.13, test loss:0.15, accuracy:95.05
5 train loss:0.11, test loss:0.12, accuracy:95.97
6 train loss:0.09, test loss:0.11, accuracy:96.65
7 train loss:0.08, test loss:0.15, accuracy:95.19
8 train loss:0.07, test loss:0.11, accuracy:96.39
9 train loss:0.06, test loss:0.10, accuracy:97.09
10 train loss:0.05, test loss:0.12, accuracy:96.13
11 train loss:0.05, test loss:0.52, accuracy:86.61
12 train loss:0.46, test loss:0.18, accuracy:94.14
13 train loss:0.10, test loss:0.23, accuracy:92.84
14 train loss:0.08, test loss:0.17, accuracy:94.55
15 train loss:0.08, test loss:0.12, accuracy:96.13
16 train loss:0.06, test loss:0.14, accuracy:95.61
17 train loss:0.05, test loss:0.11, accuracy:96.43
18 train loss:0.05, test loss:0.10, accuracy:97.04
19 train loss:0.04, test loss:0.11, accuracy:96.48
20 train loss:0.04, test loss:0.28, accu

**무엇을 보완하였고, 왜 보완되었는지에 대한 자유 서술 (아래에)**<br>
활성화 함수로 sigmoid 대신 relu를 사용하였다. <br>
sigmoid를 사용했을 때에는 계속 작은 미분값이 곱해지기 때문에 중간에 gradient vanishing 문제가 일어날 수 있다. 하지만, relu를 사용함으로써 기울기값이 양수일 경우에는 손실이 발생하지 않게 된다.
relu를 사용했을 때 더 작은 손실함수(train loss)에 도달하고 정확도 역시 더 높게 나타난다.