## 과제 1
ReLu activation function과 derivative function을 구현해보세요
- Hint : np.maximum 함수 사용하면 편리합니다
- 다른 방법 사용하셔도 무방합니다


In [None]:
def relu(x):
  if x < 0:
    return 0
  else:
    return x

In [None]:
def d_relu(x):
  if x < 0:
    return 0
  else:
    return 1

In [None]:
relu(2)

2

In [None]:
relu(-1)

0

## 과제 2
Deep Learning Basic 코드 파일의 MLP implementation with Numpy library using MNIST dataset 코드 참고해서
Three layer MLP 일 때의 backward_pass 함수를 완성해주세요.   
- Hint : 코드 파일의 예시는 Two layer MLP


In [None]:
def backward_pass(x, y_true, params):

  dS3 = params["A3"] - y_true

  grads = {}

  grads["dW3"] = np.dot(dS3, params["A2"].T)/x.shape[1]
  grads["db3"] = (1/x.shape[1])*np.sum(dS3, axis=1, keepdims=True)/x.shape[1]

  dA2 = np.dot(params["W3"].T, dS3)
  dS2 = dA2 * d_sigmoid(params["S2"])

  grads["dW2"] =  np.dot(dS2, params["A1"].T)/x.shape[1]
  grads["db2"] =  (1/x.shape[1])*np.sum(dS2, axis=1, keepdims=True)/x.shape[1]

  dA1 = np.dot(params["W2"].T, dS2)
  dS1 = dA1 * d_sigmoid(params["S1"])

  grads["dW1"] = np.dot(dS1, x.T)/x.shape[1]
  grads["db1"] = np.sum(dS1, axis=1, keepdims=True)/x.shape[1]

  return grads

## 과제 3
Deep Learning Basic 코드 파일의 MLP implementation with Pytorch library using MNIST dataset 코드 참고해서
Three layer MLP를 구한후, 학습을 돌려 보세요

hyperparameter는 다음과 같이 설정

- epochs : 100
- hiddensize : 128, 64 (two layer)
- learning_rate : 0.5

In [None]:
from torchvision import transforms, datasets
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

In [None]:
transform = transforms.Compose([
    transforms.ToTensor()
])

In [None]:
trainset = datasets.MNIST(
    root      = './.data/', 
    train     = True,
    download  = True,
    transform = transform
)
testset = datasets.MNIST(
    root      = './.data/', 
    train     = False,
    download  = True,
    transform = transform
)

In [None]:
trainset[0][0].shape

torch.Size([1, 28, 28])

In [None]:
train_loader = DataLoader(trainset, batch_size=32, shuffle=True)
test_loader =  DataLoader(testset, batch_size=32, shuffle=False)

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.layer1 = nn.Linear(784,128)
        self.layer2 = nn.Linear(128,64)
        self.layer3 = nn.Linear(64,10)
        self.relu = nn.ReLU()
        
    def forward(self, x):
        x = x.view(-1, 784)
        out = self.layer1(x)
        out = self.relu(out)
        out = self.layer2(out)
        out = self.relu(out)
        out = self.layer3(out)

        return out

In [None]:
model = Net()
model

Net(
  (layer1): Linear(in_features=784, out_features=128, bias=True)
  (layer2): Linear(in_features=128, out_features=64, bias=True)
  (layer3): Linear(in_features=64, out_features=10, bias=True)
  (relu): ReLU()
)

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.5)

In [None]:
def train(model, train_loader, optimizer):
    model.train()
    # 배치 당 loss 값을 담을 리스트 생성
    batch_losses = []

    for data, target in train_loader:
        # 옵티마이저의 기울기 초기화
        optimizer.zero_grad()

        # y pred 값 산출
        output = model(data)
        # loss 계산
        # 정답 데이터와의 cross entropy loss 계산
        # 이 loss를 배치 당 loss로 보관
        loss = criterion(output, target)
        batch_losses.append(loss)

        # 기울기 계산
        loss.backward()

        # 가중치 업데이트!
        optimizer.step()
        
    # 배치당 평균 loss 계산
    avg_loss = sum(batch_losses) / len(batch_losses)
    
    return avg_loss

In [None]:
def evaluate(model, test_loader):
    # 모델을 평가 모드로 전환
    model.eval()

    batch_losses = []
    correct = 0 

    with torch.no_grad(): 
        for data, target in test_loader:
            # 예측값 생성
            output = model(data)

            # loss 계산 (이전과 동일)
            loss = criterion(output, target)
            batch_losses.append(loss)

           # Accuracy 계산
           # y pred와 y가 일치하면 correct에 1을 더해주기
            pred = output.max(1, keepdim=True)[1]

            # eq() 함수는 값이 일치하면 1을, 아니면 0을 출력.
            correct += pred.eq(target.view_as(pred)).sum().item()

    # 배치 당 평균 loss 계산 
    avg_loss =  sum(batch_losses) / len(batch_losses)

    #정확도 계산
    accuracy = 100. * correct / len(test_loader.dataset)

    return avg_loss, accuracy

In [None]:
EPOCHS = 100

for epoch in range(1, EPOCHS + 1):
    train_loss = train(model, train_loader, optimizer)
    test_loss, test_accuracy = evaluate(model, test_loader)
    
    print('[{}] Train Loss: {:.4f}\tTest Loss: {:.4f}\tAccuracy: {:.2f}%'.format(
          epoch, train_loss, test_loss, test_accuracy))

[1] Train Loss: 0.3056	Test Loss: 0.1471	Accuracy: 95.57%
[2] Train Loss: 0.1295	Test Loss: 0.1210	Accuracy: 96.36%
[3] Train Loss: 0.1019	Test Loss: 0.1165	Accuracy: 96.87%
[4] Train Loss: 0.0847	Test Loss: 0.1198	Accuracy: 96.80%
[5] Train Loss: 0.0718	Test Loss: 0.1215	Accuracy: 96.86%
[6] Train Loss: 0.0649	Test Loss: 0.1104	Accuracy: 96.93%
[7] Train Loss: 0.0543	Test Loss: 0.1344	Accuracy: 96.72%
[8] Train Loss: 0.0547	Test Loss: 0.1619	Accuracy: 96.11%
[9] Train Loss: 0.0506	Test Loss: 0.1232	Accuracy: 97.13%
[10] Train Loss: 0.0411	Test Loss: 0.1450	Accuracy: 96.71%
[11] Train Loss: 0.0505	Test Loss: 0.1335	Accuracy: 97.00%
[12] Train Loss: 0.0412	Test Loss: 0.1212	Accuracy: 97.20%
[13] Train Loss: 0.0367	Test Loss: 0.1502	Accuracy: 97.21%
[14] Train Loss: 0.0433	Test Loss: 0.1434	Accuracy: 97.19%
[15] Train Loss: 0.0376	Test Loss: 0.1366	Accuracy: 97.32%
[16] Train Loss: 0.0369	Test Loss: 0.1143	Accuracy: 97.68%
[17] Train Loss: 0.0345	Test Loss: 0.1551	Accuracy: 97.31%
[18] T

## 과제 4
과제 3 부분의 성능을 지금까지 배운 지식을 바탕으로 향상시켜보세요

- Hint : Activation function, hyperparameter setting

In [None]:
model2 = Net()
model2

Net(
  (layer1): Linear(in_features=784, out_features=128, bias=True)
  (layer2): Linear(in_features=128, out_features=64, bias=True)
  (layer3): Linear(in_features=64, out_features=10, bias=True)
  (relu): ReLU()
)

In [None]:
optimizer2 = optim.SGD(model2.parameters(), lr=0.01)

In [None]:
EPOCHS = 50

for epoch in range(1, EPOCHS + 1):
    train_loss = train(model2, train_loader, optimizer2)
    test_loss, test_accuracy = evaluate(model2, test_loader)
    
    print('[{}] Train Loss: {:.4f}\tTest Loss: {:.4f}\tAccuracy: {:.2f}%'.format(
          epoch, train_loss, test_loss, test_accuracy))

[1] Train Loss: 1.1580	Test Loss: 0.4247	Accuracy: 88.39%
[2] Train Loss: 0.3790	Test Loss: 0.3206	Accuracy: 90.82%
[3] Train Loss: 0.3145	Test Loss: 0.2815	Accuracy: 92.04%
[4] Train Loss: 0.2773	Test Loss: 0.2553	Accuracy: 92.60%
[5] Train Loss: 0.2465	Test Loss: 0.2273	Accuracy: 93.64%
[6] Train Loss: 0.2196	Test Loss: 0.2081	Accuracy: 93.80%
[7] Train Loss: 0.1966	Test Loss: 0.1809	Accuracy: 94.68%
[8] Train Loss: 0.1771	Test Loss: 0.1668	Accuracy: 95.02%
[9] Train Loss: 0.1608	Test Loss: 0.1551	Accuracy: 95.37%
[10] Train Loss: 0.1467	Test Loss: 0.1456	Accuracy: 95.67%
[11] Train Loss: 0.1343	Test Loss: 0.1350	Accuracy: 95.90%
[12] Train Loss: 0.1240	Test Loss: 0.1256	Accuracy: 96.25%
[13] Train Loss: 0.1149	Test Loss: 0.1221	Accuracy: 96.39%
[14] Train Loss: 0.1070	Test Loss: 0.1128	Accuracy: 96.59%
[15] Train Loss: 0.0998	Test Loss: 0.1111	Accuracy: 96.72%
[16] Train Loss: 0.0934	Test Loss: 0.1032	Accuracy: 97.00%
[17] Train Loss: 0.0877	Test Loss: 0.1060	Accuracy: 96.88%
[18] T

**무엇을 보완하였고, 왜 보완되었는지에 대한 자유 서술 (아래에)**
* 우선 learning rate가 0.5로 너무 커서 test loss뿐만 아니라 train loss까지 계속 작아지지 못하고 발산하게 되는 양상을 발견하였다. 따라서 learning rate를 0.01로 두고 재학습을 진행하였다.
* 뿐만 아니라 epoch 개수가 100으로 학습 난이도에 비해 너무 높게 설정되어 이거 때문에도 overfitting이 발생했다고 판단해, epoch을 50 정도로 두고 재학습을 진행하였다.
* 그 결과, test loss와 train loss가 같이 계속해서 작아지는 좋은 optimization을 만들 수 있었다. 여기서 epoch을 약 40 선에서 끊어 test loss가 가장 작은 곳에서 학습을 종료하는 방안이나 learning rate를 좀더 줄여 더 알맞게 수렴하도록 하는 방안도 있을 것이다.