## 과제 1
ReLu activation function과 derivative function을 구현해보세요
- Hint : np.maximum 함수 사용하면 편리합니다
- 다른 방법 사용하셔도 무방합니다


In [1]:
import numpy as np

In [2]:
x = np.arange(10)-5

In [3]:
def relu(x):
    x = np.where(x>0, x, 0)
    return x

In [4]:
def d_relu(x):
    dx = np.where(x>0, 1, 0)
    return dx

In [5]:
d_relu(x)

array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1])

## 과제 2
Deep Learning Basic 코드 파일의 MLP implementation with Numpy library using MNIST dataset 코드 참고해서
Three layer MLP 일 때의 backward_pass 함수를 완성해주세요.   
- Hint : 코드 파일의 예시는 Two layer MLP


In [6]:
from pprint import pprint

In [7]:
def check_grads(grad):
    for k, v in grad.items():
        print(f"Key {k} : {v.shape}")

In [8]:
def backward_pass(x, y_true, params):
    dS3 = params["A3"] - y_true
    # Please check http://machinelearningmechanic.com/deep_learning/2019/09/04/cross-entropy-loss-derivative.html
    # dS2 is softmax + CE loss derivative

    grads = {}

    grads["dW3"] =  np.dot(dS3, params["A2"].T)/x.shape[1]
    grads["db3"] =  (1/x.shape[1])*np.sum(dS3, axis=1, keepdims=True)/x.shape[1]
    
    dA2 = np.dot(params["W3"].T, dS3)
    dS2 = dA2 * d_sigmoid(params["S2"])
    
    grads["dW2"] = np.dot(dS2, params["A1"].T)/x.shape[1]
    grads["db2"] = np.sum(dS2, axis=1, keepdims=True)/x.shape[1]
    
    dA1 = np.dot(params["W2"].T, dS2)
    dS1 = dA1 * d_relu(params["S1"])

    grads["dW1"] = np.dot(dS1, x.T)/x.shape[1]
    grads["db1"] = np.sum(dS1, axis=1, keepdims=True)/x.shape[1]
    return grads

## 과제 3
Deep Learning Basic 코드 파일의 MLP implementation with Pytorch library using MNIST dataset 코드 참고해서
Three layer MLP를 구한후, 학습을 돌려 보세요

hyperparameter는 다음과 같이 설정

- epochs : 100
- hiddensize : 128, 64 (two layer)
- learning_rate : 0.5

In [9]:
import sklearn.datasets

mnist = sklearn.datasets.fetch_openml('mnist_784', data_home="mnist_784")

In [10]:
# data preprocessing

num_train = 60000
num_class = 10

x_train = np.float32(mnist.data[:num_train]).T
y_train_index = np.int32(mnist.target[:num_train]).T
x_test = np.float32(mnist.data[num_train:]).T
y_test_index = np.int32(mnist.target[num_train:]).T

# Normalization

x_train /= 255
x_test /= 255
x_size = x_train.shape[0]

y_train = np.zeros((num_class, y_train_index.shape[0]))
for idx in range(y_train_index.shape[0]):
    y_train[y_train_index[idx], idx] = 1

y_test = np.zeros((num_class, y_test_index.shape[0]))
for idx in range(y_test_index.shape[0]):
    y_test[y_test_index[idx], idx] = 1   

In [11]:
x_size

784

In [12]:
def xavier(input_dim, hidden_dim):
    weight = np.random.randn(hidden_dim, input_dim) * np.sqrt(1/ input_dim)
    bias = np.zeros((hidden_dim, 1)) * np.sqrt(1/ input_dim)
    return weight, bias

In [13]:
#parameter initialization

hidden_size = [128, 64] # hidden unit size

dims = [x_size]+hidden_size+[num_class]
params = dict()
for i in range(len(dims)-1):
    weight, bias = xavier(dims[i], dims[i+1])
    params[f"W{i+1}"] = weight
    params[f"b{i+1}"] = bias
# Xavier initialization: https://reniew.github.io/13/

In [14]:
def sigmoid(x):
    return 1/(1+np.exp(-x))

def d_sigmoid(x):
    # derivative of sigmoid
    exp = np.exp(-x)
    return (exp)/((1+exp)**2)

def softmax(x):
    exp = np.exp(x)
    return exp/np.sum(exp, axis=0)

In [15]:
def compute_loss(y_true, y_pred):
    # loss calculation

    num_sample = y_true.shape[1]
    Li = -1 * np.sum(y_true * np.log(y_pred))

    return Li/num_sample

In [16]:
def compute_accuracy(y_true, y_pred):
  y_true_idx = np.argmax(y_true, axis = 0)
  y_pred_idx = np.argmax(y_pred, axis = 0)
  num_correct = np.sum(y_true_idx==y_pred_idx)

  accuracy = num_correct / y_true.shape[1] * 100

  return accuracy

In [17]:
def foward_pass(x, params):
  
    params["S1"] = np.dot(params["W1"], x) + params["b1"]
    params["A1"] = relu(params["S1"])
    
    params["S2"] = np.dot(params["W2"], params["A1"]) + params["b2"]
    params["A2"] = sigmoid(params["S2"])
    
    params["S3"] = np.dot(params["W3"], params["A2"]) + params["b3"]
    params["A3"] = softmax(params["S3"])

    return params

In [18]:
def foward_pass_test(x, params):

    params_test = {}

    params_test["S1"] = np.dot(params["W1"], x) + params["b1"]
    params_test["A1"] = sigmoid(params_test["S1"])
    params_test["S2"] = np.dot(params["W2"], params_test["A1"]) + params["b2"]
    params_test["A2"] = softmax(params_test["S2"])

    params_test["S3"] = np.dot(params["W3"], params_test["A2"]) + params["b3"]
    params_test["A3"] = softmax(params_test["S3"])
    return params_test

In [19]:
epochs = 100
learning_rate = 0.5

for i in range(epochs):

    if i == 0:
        params = foward_pass(x_train, params)
        
    grads = backward_pass(x_train, y_train, params)

    params["W1"] -= learning_rate * grads["dW1"]
    params["b1"] -= learning_rate * grads["db1"]
    params["W2"] -= learning_rate * grads["dW2"]
    params["b2"] -= learning_rate * grads["db2"]
    
    params["W3"] -= learning_rate * grads["dW3"]
    params["b3"] -= learning_rate * grads["db3"]

    params = foward_pass(x_train, params)
    train_loss = compute_loss(y_train, params["A3"])
    train_acc = compute_accuracy(y_train, params["A3"])

    params_test = foward_pass_test(x_test, params)
    test_loss = compute_loss(y_test, params_test["A3"])
    test_acc = compute_accuracy(y_test, params_test["A3"])

    print("Epoch {}: training loss = {}, training acuracy = {}%, test loss = {}, training acuracy = {}%"
    .format(i + 1, np.round(train_loss, 6), np.round(train_acc, 2), np.round(test_loss, 6), np.round(test_acc, 2)))

Epoch 1: training loss = 2.304747, training acuracy = 12.34%, test loss = 2.302674, training acuracy = 8.25%
Epoch 2: training loss = 2.276494, training acuracy = 20.33%, test loss = 2.302275, training acuracy = 8.61%
Epoch 3: training loss = 2.258614, training acuracy = 27.62%, test loss = 2.301907, training acuracy = 10.98%
Epoch 4: training loss = 2.240996, training acuracy = 34.25%, test loss = 2.30155, training acuracy = 12.88%
Epoch 5: training loss = 2.222263, training acuracy = 40.01%, test loss = 2.301184, training acuracy = 14.68%
Epoch 6: training loss = 2.202012, training acuracy = 44.85%, test loss = 2.300799, training acuracy = 16.03%
Epoch 7: training loss = 2.180044, training acuracy = 48.88%, test loss = 2.300386, training acuracy = 17.44%
Epoch 8: training loss = 2.156241, training acuracy = 52.17%, test loss = 2.299942, training acuracy = 19.47%
Epoch 9: training loss = 2.130472, training acuracy = 54.75%, test loss = 2.299462, training acuracy = 22.58%
Epoch 10: tra

Epoch 76: training loss = 0.543657, training acuracy = 86.46%, test loss = 2.239886, training acuracy = 71.47%
Epoch 77: training loss = 0.538675, training acuracy = 86.56%, test loss = 2.239426, training acuracy = 71.45%
Epoch 78: training loss = 0.533834, training acuracy = 86.65%, test loss = 2.238972, training acuracy = 71.36%
Epoch 79: training loss = 0.52913, training acuracy = 86.74%, test loss = 2.238527, training acuracy = 71.39%
Epoch 80: training loss = 0.524555, training acuracy = 86.82%, test loss = 2.238088, training acuracy = 71.35%
Epoch 81: training loss = 0.520106, training acuracy = 86.9%, test loss = 2.237657, training acuracy = 71.33%
Epoch 82: training loss = 0.515777, training acuracy = 86.96%, test loss = 2.237232, training acuracy = 71.23%
Epoch 83: training loss = 0.511564, training acuracy = 87.06%, test loss = 2.236814, training acuracy = 71.16%
Epoch 84: training loss = 0.507462, training acuracy = 87.12%, test loss = 2.236402, training acuracy = 71.07%
Epo

## 과제 4
과제 3 부분의 성능을 지금까지 배운 지식을 바탕으로 향상시켜보세요

- Hint : Activation function, hyperparameter setting

In [20]:
#parameter initialization

hidden_size = [256, 32] # hidden unit size

dims = [x_size]+hidden_size+[num_class]
params = dict()
for i in range(len(dims)-1):
    weight, bias = xavier(dims[i], dims[i+1])
    params[f"W{i+1}"] = weight
    params[f"b{i+1}"] = bias
# Xavier initialization: https://reniew.github.io/13/

In [21]:
epochs = 50
learning_rate = 1

for i in range(epochs):

    if i == 0:
        params = foward_pass(x_train, params)
        
    grads = backward_pass(x_train, y_train, params)

    params["W1"] -= learning_rate * grads["dW1"]
    params["b1"] -= learning_rate * grads["db1"]
    params["W2"] -= learning_rate * grads["dW2"]
    params["b2"] -= learning_rate * grads["db2"]
    
    params["W3"] -= learning_rate * grads["dW3"]
    params["b3"] -= learning_rate * grads["db3"]

    params = foward_pass(x_train, params)
    train_loss = compute_loss(y_train, params["A3"])
    train_acc = compute_accuracy(y_train, params["A3"])

    params_test = foward_pass_test(x_test, params)
    test_loss = compute_loss(y_test, params_test["A3"])
    test_acc = compute_accuracy(y_test, params_test["A3"])

    print("Epoch {}: training loss = {}, training acuracy = {}%, test loss = {}, training acuracy = {}%"
    .format(i + 1, np.round(train_loss, 6), np.round(train_acc, 2), np.round(test_loss, 6), np.round(test_acc, 2)))

Epoch 1: training loss = 2.286303, training acuracy = 17.5%, test loss = 2.301917, training acuracy = 13.47%
Epoch 2: training loss = 2.250438, training acuracy = 37.74%, test loss = 2.300512, training acuracy = 10.32%
Epoch 3: training loss = 2.221862, training acuracy = 40.28%, test loss = 2.29938, training acuracy = 10.32%
Epoch 4: training loss = 2.190093, training acuracy = 46.52%, test loss = 2.298203, training acuracy = 10.38%
Epoch 5: training loss = 2.153225, training acuracy = 52.45%, test loss = 2.296877, training acuracy = 11.05%
Epoch 6: training loss = 2.11002, training acuracy = 57.13%, test loss = 2.295336, training acuracy = 12.23%
Epoch 7: training loss = 2.059501, training acuracy = 60.64%, test loss = 2.293514, training acuracy = 13.39%
Epoch 8: training loss = 2.000912, training acuracy = 63.42%, test loss = 2.291346, training acuracy = 14.96%
Epoch 9: training loss = 1.933876, training acuracy = 65.54%, test loss = 2.288764, training acuracy = 17.39%
Epoch 10: tra

**무엇을 보완하였고, 왜 보완되었는지에 대한 자유 서술 (아래에)**

기존에 sigmoid에서 relu를 사용하면서 업데이트가 느리게 되었던 gradient vanishing이 어느정도 해결되었고, 이와 더불어 lr을 개선해서 학습을 빠르게 진행했다. optimizer 자체를 개선하기 위해 params를 개선하는 것도 방법이 될 것 같다.