## 과제 1
ReLu activation function과 derivative function을 구현해보세요
- Hint : np.maximum 함수 사용하면 편리합니다
- 다른 방법 사용하셔도 무방합니다


In [1]:
import numpy as np

In [2]:
def relu(x):
  return np.maximum(0, x)

In [3]:
def d_relu(x):
  if x > 0:
    result = 1
  else: 
    result = 0
  return result

In [4]:
relu(7)

7

In [5]:
relu(-3)

0

In [6]:
d_relu(13)

1

In [7]:
d_relu(-4)

0

## 과제 2
Deep Learning Basic 코드 파일의 MLP implementation with Numpy library using MNIST dataset 코드 참고해서
Three layer MLP 일 때의 backward_pass 함수를 완성해주세요.   
- Hint : 코드 파일의 예시는 Two layer MLP


In [8]:
from IPython import get_ipython
get_ipython().magic('reset -sf')
import numpy as np
import sklearn.datasets

In [9]:
mnist = sklearn.datasets.fetch_openml('mnist_784', data_home="mnist_784")

In [10]:
# data preprocessing

num_train = 60000
num_class = 10

x_train = np.float32(mnist.data[:num_train]).T
y_train_index = np.int32(mnist.target[:num_train]).T
x_test = np.float32(mnist.data[num_train:]).T
y_test_index = np.int32(mnist.target[num_train:]).T

# Normalization

x_train /= 255
x_test /= 255
x_size = x_train.shape[0]

y_train = np.zeros((num_class, y_train_index.shape[0]))
for idx in range(y_train_index.shape[0]):
  y_train[y_train_index[idx], idx] = 1

y_test = np.zeros((num_class, y_test_index.shape[0]))
for idx in range(y_test_index.shape[0]):
  y_test[y_test_index[idx], idx] = 1    

In [11]:
x_train.shape

(784, 60000)

In [12]:
#parameter initialization

hidden_size1 = 128 # hidden unit size1
hidden_size2 = 64

# two-layer neural network

params = {"W1": np.random.randn(hidden_size1, x_size) * np.sqrt(1/ x_size),
          "b1": np.zeros((hidden_size1, 1)) * np.sqrt(1/ x_size),
          "W2": np.random.randn(hidden_size2, hidden_size1) * np.sqrt(1/ hidden_size1),
          "b2": np.zeros((hidden_size2, 1)) * np.sqrt(1/ hidden_size1),
          "W3": np.random.randn(num_class, hidden_size2) * np.sqrt(1/ hidden_size2),
          "b3": np.zeros((num_class, 1)) * np.sqrt(1/ hidden_size2)
          }
# Xavier initialization: https://reniew.github.io/13/

In [13]:
def sigmoid(x):
  return 1/(1+np.exp(-x))

def d_sigmoid(x):
  # derivative of sigmoid
  exp = np.exp(-x)
  return (exp)/((1+exp)**2)

def softmax(x):
  exp = np.exp(x)
  return exp/np.sum(exp, axis=0)

In [14]:
def compute_loss(y_true, y_pred):
  # loss calculation

  num_sample = y_true.shape[1]
  Li = -1 * np.sum(y_true * np.log(y_pred))
  
  return Li/num_sample

In [15]:
def foward_pass(x, params):
  
  params["S1"] = np.dot(params["W1"], x) + params["b1"]
  params["A1"] = sigmoid(params["S1"])
  params["S2"] = np.dot(params["W2"], params["A1"]) + params["b2"]
  params["A2"] = softmax(params["S2"])
  params["S3"] = np.dot(params["W3"], params["A2"]) + params["b3"]
  params["A3"] = softmax(params["S3"])

  return params

In [16]:
def foward_pass_test(x, params):

  params_test = {}
  
  params_test["S1"] = np.dot(params["W1"], x) + params["b1"]
  params_test["A1"] = sigmoid(params_test["S1"])
  params_test["S2"] = np.dot(params["W2"], params_test["A1"]) + params["b2"]
  params_test["A2"] = softmax(params_test["S2"])
  params_test["S3"] = np.dot(params["W3"], params_test["A2"]) + params["b3"]
  params_test["A3"] = softmax(params_test["S3"])

  return params_test

In [17]:
def compute_accuracy(y_true, y_pred):
  y_true_idx = np.argmax(y_true, axis = 0)
  y_pred_idx = np.argmax(y_pred, axis = 0)
  num_correct = np.sum(y_true_idx==y_pred_idx)

  accuracy = num_correct / y_true.shape[1] * 100

  return accuracy

In [18]:
def backward_pass(x, y_true, params):

  dS3 = params["A3"] - y_true

  grads = {}

  grads["dW3"] =  np.dot(dS3, params["A2"].T)/x.shape[1]
  grads["db3"] =  (1/x.shape[1])*np.sum(dS3, axis=1, keepdims=True)/x.shape[1]

  dA2 = np.dot(params["W3"].T, dS3)
  dS2 = dA2 * d_sigmoid(params["S2"])

  grads["dW2"] = np.dot(dS2, params["A1"].T)/x.shape[1]
  grads["db2"] = (1/x.shape[1])*np.sum(dS2, axis=1, keepdims=True)/x.shape[1]

  dA1 = np.dot(params["W2"].T, dS2)
  dS1 = dA1 * d_sigmoid(params["S1"])

  grads["dW1"] = np.dot(dS1, x.T)/x.shape[1]
  grads["db1"] = np.sum(dS1, axis=1, keepdims=True)/x.shape[1]


  return grads

In [19]:
epochs = 100
learning_rate = 0.5

for i in range(epochs):

  if i == 0:
    params = foward_pass(x_train, params)
    
  grads = backward_pass(x_train, y_train, params)

  params["W1"] -= learning_rate * grads["dW1"]
  params["b1"] -= learning_rate * grads["db1"]
  params["W2"] -= learning_rate * grads["dW2"]
  params["b2"] -= learning_rate * grads["db2"]
  params["W3"] -= learning_rate * grads["dW3"]
  params["b3"] -= learning_rate * grads["db3"]

  params = foward_pass(x_train, params)
  train_loss = compute_loss(y_train, params["A3"])
  train_acc = compute_accuracy(y_train, params["A3"])

  params_test = foward_pass_test(x_test, params)
  test_loss = compute_loss(y_test, params_test["A3"])
  test_acc = compute_accuracy(y_test, params_test["A3"])

  print("Epoch {}: training loss = {}, training acuracy = {}%, test loss = {}, testing acuracy = {}%"
  .format(i + 1, np.round(train_loss, 6), np.round(train_acc, 2), np.round(test_loss, 6), np.round(test_acc, 2)))

Epoch 1: training loss = 2.302905, training acuracy = 10.93%, test loss = 2.302975, testing acuracy = 10.93%
Epoch 2: training loss = 2.302724, training acuracy = 10.95%, test loss = 2.302788, testing acuracy = 10.91%
Epoch 3: training loss = 2.302542, training acuracy = 11.45%, test loss = 2.302601, testing acuracy = 11.63%
Epoch 4: training loss = 2.30236, training acuracy = 12.5%, test loss = 2.302414, testing acuracy = 12.67%
Epoch 5: training loss = 2.302178, training acuracy = 13.65%, test loss = 2.302227, testing acuracy = 14.02%
Epoch 6: training loss = 2.301995, training acuracy = 14.55%, test loss = 2.302039, testing acuracy = 14.94%
Epoch 7: training loss = 2.301812, training acuracy = 15.2%, test loss = 2.301851, testing acuracy = 15.68%
Epoch 8: training loss = 2.301629, training acuracy = 15.52%, test loss = 2.301662, testing acuracy = 16.02%
Epoch 9: training loss = 2.301444, training acuracy = 15.66%, test loss = 2.301473, testing acuracy = 16.07%
Epoch 10: training los

## 과제 3
Deep Learning Basic 코드 파일의 MLP implementation with Pytorch library using MNIST dataset 코드 참고해서
Three layer MLP를 구한후, 학습을 돌려 보세요

hyperparameter는 다음과 같이 설정

- epochs : 100
- hiddensize : 128, 64 (two layer)
- learning_rate : 0.5

In [20]:
from torchvision import transforms, datasets
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

In [21]:
# 이미지를 텐서로 변경
transform = transforms.Compose([
    transforms.ToTensor()
])

In [22]:
trainset = datasets.MNIST(
    root      = './.data/', 
    train     = True,
    download  = True,
    transform = transform
)
testset = datasets.MNIST(
    root      = './.data/', 
    train     = False,
    download  = True,
    transform = transform
)

In [23]:
BATCH_SIZE = 512
# train set과 test set 각각에 대하여 DataLoader를 생성합니다.
# shuffle=True 매개변수를 넣어 데이터를 섞어주세요.
train_loader = DataLoader(trainset, batch_size=BATCH_SIZE, shuffle=True)
test_loader =  DataLoader(testset, batch_size=BATCH_SIZE, shuffle=True)

In [24]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.layer1 = nn.Linear(784,128)
        self.layer2 = nn.Linear(128,64)
        self.layer3 = nn.Linear(64,10)
        self.relu = nn.ReLU()
        
    def forward(self, x):
        x = x.view(-1, 784)
        out = self.layer1(x)
        out = self.relu(out)
        out = self.layer2(out)
        out = self.relu(out)
        out = self.layer3(out)

        return out

In [25]:
model = Net()
model

Net(
  (layer1): Linear(in_features=784, out_features=128, bias=True)
  (layer2): Linear(in_features=128, out_features=64, bias=True)
  (layer3): Linear(in_features=64, out_features=10, bias=True)
  (relu): ReLU()
)

In [26]:
list(model.parameters()) # 행렬들을 직접 살펴볼 수 있음
                         # require_true 얘는 학습되는 애구나 알 수 있음

[Parameter containing:
 tensor([[ 0.0203,  0.0310, -0.0247,  ..., -0.0077, -0.0173, -0.0153],
         [ 0.0252, -0.0022,  0.0350,  ..., -0.0265,  0.0296, -0.0198],
         [ 0.0190, -0.0280,  0.0329,  ...,  0.0156, -0.0292,  0.0091],
         ...,
         [-0.0185, -0.0191, -0.0026,  ...,  0.0234,  0.0222,  0.0092],
         [-0.0269, -0.0042,  0.0065,  ...,  0.0338,  0.0353, -0.0144],
         [ 0.0027, -0.0005,  0.0280,  ..., -0.0162,  0.0003, -0.0122]],
        requires_grad=True), Parameter containing:
 tensor([ 1.0070e-02, -1.5995e-02,  7.2597e-03, -2.4965e-02,  2.0189e-02,
         -2.2571e-02, -2.3948e-02, -2.9904e-02,  1.8628e-02,  2.5505e-03,
          1.0518e-02, -2.7607e-02, -2.1285e-02,  2.4328e-02,  5.6603e-03,
          3.4405e-02,  2.2507e-03,  2.1228e-02,  2.5734e-02,  2.2452e-02,
          2.6311e-02,  2.8190e-03, -4.6057e-03,  2.1130e-02,  1.0302e-02,
         -4.3101e-04, -1.7172e-04,  2.6341e-03,  7.3069e-03, -1.7542e-02,
         -1.5784e-02, -9.7718e-03,  5.637

In [27]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.5)

In [28]:
def train(model, train_loader, optimizer):
    model.train()
    # 배치 당 loss 값을 담을 리스트 생성
    batch_losses = []

    for data, target in train_loader:
        # 옵티마이저의 기울기 초기화
        optimizer.zero_grad()

        # y pred 값 산출
        output = model(data)
        # loss 계산
        # 정답 데이터와의 cross entropy loss 계산
        # 이 loss를 배치 당 loss로 보관
        loss = criterion(output, target)
        batch_losses.append(loss)

        # 기울기 계산
        loss.backward()

        # 가중치 업데이트!
        optimizer.step()
        
    # 배치당 평균 loss 계산
    avg_loss = sum(batch_losses) / len(batch_losses)
    
    return avg_loss

In [29]:
def evaluate(model, test_loader):
    # 모델을 평가 모드로 전환
    model.eval()

    batch_losses = []
    correct = 0 

    with torch.no_grad(): 
        for data, target in test_loader:
            # 예측값 생성
            output = model(data)

            # loss 계산 (이전과 동일)
            loss = criterion(output, target)
            batch_losses.append(loss)

           # Accuracy 계산
           # y pred와 y가 일치하면 correct에 1을 더해주기
            pred = output.max(1, keepdim=True)[1]

            # eq() 함수는 값이 일치하면 1을, 아니면 0을 출력.
            correct += pred.eq(target.view_as(pred)).sum().item()

    # 배치 당 평균 loss 계산 
    avg_loss =  sum(batch_losses) / len(batch_losses)

    #정확도 계산
    accuracy = 100. * correct / len(test_loader.dataset)

    return avg_loss, accuracy

In [30]:
EPOCHS = 100

for epoch in range(1, EPOCHS + 1):
    train_loss = train(model, train_loader, optimizer)
    test_loss, test_accuracy = evaluate(model, test_loader)
    
    print('[{}] Train Loss: {:.4f}\tTest Loss: {:.4f}\tAccuracy: {:.2f}%'.format(
          epoch, train_loss, test_loss, test_accuracy))

[1] Train Loss: 0.8582	Test Loss: 0.2881	Accuracy: 91.10%
[2] Train Loss: 0.2406	Test Loss: 0.2590	Accuracy: 91.69%
[3] Train Loss: 0.1702	Test Loss: 0.2368	Accuracy: 92.34%
[4] Train Loss: 0.1351	Test Loss: 0.2077	Accuracy: 93.20%
[5] Train Loss: 0.1083	Test Loss: 0.1441	Accuracy: 95.52%
[6] Train Loss: 0.0892	Test Loss: 0.1084	Accuracy: 96.66%
[7] Train Loss: 0.0773	Test Loss: 0.1318	Accuracy: 95.81%
[8] Train Loss: 0.0670	Test Loss: 0.1827	Accuracy: 94.62%
[9] Train Loss: 0.0645	Test Loss: 0.0873	Accuracy: 97.24%
[10] Train Loss: 0.0530	Test Loss: 0.2386	Accuracy: 93.01%
[11] Train Loss: 0.1013	Test Loss: 0.0763	Accuracy: 97.64%
[12] Train Loss: 0.0456	Test Loss: 0.1002	Accuracy: 96.93%
[13] Train Loss: 0.0407	Test Loss: 0.0780	Accuracy: 97.64%
[14] Train Loss: 0.0354	Test Loss: 0.1070	Accuracy: 96.93%
[15] Train Loss: 0.0317	Test Loss: 0.0821	Accuracy: 97.59%
[16] Train Loss: 0.0284	Test Loss: 0.1294	Accuracy: 95.74%
[17] Train Loss: 0.0964	Test Loss: 0.0758	Accuracy: 97.73%
[18] T

## 과제 4
과제 3 부분의 성능을 지금까지 배운 지식을 바탕으로 향상시켜보세요

- Hint : Activation function, hyperparameter setting

In [34]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.layer1 = nn.Linear(784,256)
        self.layer2 = nn.Linear(256,128)
        self.layer3 = nn.Linear(128,10)
        self.leakyrelu = nn.LeakyReLU(0.01)
        
    def forward(self, x):
        x = x.view(-1, 784)
        out = self.layer1(x)
        out = self.leakyrelu(out)
        out = self.layer2(out)
        out = self.leakyrelu(out)
        out = self.layer3(out)

        return out

In [35]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

In [36]:
EPOCHS = 40

for epoch in(range(1, EPOCHS + 1)):
    train_loss = train(model, train_loader, optimizer)
    test_loss, test_accuracy = evaluate(model, test_loader)
    
    print('[{}] Train Loss: {:.4f}\tTest Loss: {:.4f}\tAccuracy: {:.2f}%'.format(
          epoch, train_loss, test_loss, test_accuracy))

[1] Train Loss: 0.0004	Test Loss: 0.1015	Accuracy: 97.92%
[2] Train Loss: 0.0004	Test Loss: 0.1011	Accuracy: 97.93%
[3] Train Loss: 0.0004	Test Loss: 0.1023	Accuracy: 97.93%
[4] Train Loss: 0.0004	Test Loss: 0.1004	Accuracy: 97.94%
[5] Train Loss: 0.0004	Test Loss: 0.1014	Accuracy: 97.94%
[6] Train Loss: 0.0004	Test Loss: 0.1003	Accuracy: 97.94%
[7] Train Loss: 0.0004	Test Loss: 0.0988	Accuracy: 97.94%
[8] Train Loss: 0.0004	Test Loss: 0.0997	Accuracy: 97.94%
[9] Train Loss: 0.0004	Test Loss: 0.1005	Accuracy: 97.95%
[10] Train Loss: 0.0004	Test Loss: 0.0996	Accuracy: 97.95%
[11] Train Loss: 0.0004	Test Loss: 0.0999	Accuracy: 97.95%
[12] Train Loss: 0.0004	Test Loss: 0.1014	Accuracy: 97.95%
[13] Train Loss: 0.0004	Test Loss: 0.1000	Accuracy: 97.95%
[14] Train Loss: 0.0004	Test Loss: 0.0999	Accuracy: 97.95%
[15] Train Loss: 0.0004	Test Loss: 0.1003	Accuracy: 97.95%
[16] Train Loss: 0.0004	Test Loss: 0.1010	Accuracy: 97.95%
[17] Train Loss: 0.0004	Test Loss: 0.1007	Accuracy: 97.95%
[18] T

**무엇을 보완하였고, 왜 보완되었는지에 대한 자유 서술 (아래에)**

1. Activation function으로 leakyrelu를 적용해보았다.
2. 128, 64였던 hidden size를 두배씩 늘려 256, 128로 변경하였다.
3. Learning rate가 크다고 생각되어 줄이고자 하였다. 따라서 기존의 0.5였던 learning rate를 0.3으로 줄여보았다.
4. Epochs가 클수록 오히려 중간에 더 높은 accuracy가 나타나다가 다시 줄어드는 경향성이 보여서 기존의 100이던 epochs를 40으로 줄여보았다. 40으로 설정한 이유는 과제 3에서 37번째에서 98.01%라는 가장 높은 accuracy를 보였기 때문이다.
 

결과적으로 위의 변화들을 주었더니 accuracy가 97.93%에서 97.95%로 0.02%만큼만 높아진 것을 확인할 수 있다. 과제 3과의 약간의 차이점은 accuracy가 높아졌다 줄어들었다를 반복하지 않고 약간씩 높아지기다가 97.95%에서 멈추는 것을 확인할 수 있다.
