<a href="https://colab.research.google.com/github/AllyHyeseongKim/CAU11934_MachineLearning/blob/feature%2Fassignment10/assignment/10/assignment10.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment10: Multi-label classification using neural networks with a regularization

## 1. Load the input data (csv file)

### Mount the google drive

In [0]:
from google.colab import drive

drive.mount('/content/gdrive')

In [0]:
cd

In [0]:
cd ../content/gdrive/My Drive/Colab Notebooks/Machine Learning/assignment10

In [0]:
ls

### Load the Data

Load a set of the data from the given `csv file` (`'mnist.csv'`) for training. \\
Each row of the data consists of the `label`, $l$ and the `image pixel values`, $x$ in a `vector form`, where the `label` is one of the 10 digits from 0 to 9, $l \in [0, 9]$. \\
The `image` represents its associated `label` in the `grey scale` and the number of images is 10,000 and the size of each image is $28 \times 28$, $x \in \mathbb{R}^{784}$. \\
Consder the first `1,000 images` for `training` and the rest `9,000 images` for `testing`. \\
`Normalize` the `intensity values` of each image so that they ranges from `0 to 1`. \\
Plot the `images` that are loaded from `'mnist.csv'` file. \\

In [0]:
import matplotlib.pyplot as plt
import numpy as np
import cupy as cp

file_data   = "mnist.csv"
handle_file = open(file_data, "r")
data        = handle_file.readlines()
handle_file.close()

size_row    = 28    # height of the image
size_col    = 28    # width of the image

num_image   = len(data)
count       = 0     # count for the number of images

#
# normalize the values of the input data to be [0, 1]
#
def normalize(data):

    data_normalized = (data - min(data)) / (max(data) - min(data))

    return(data_normalized)

#
# example of distance function between two vectors x and y
#
def distance(x, y):

    d = (x - y) ** 2
    s = cp.sum(d)
    # r = np.sqrt(s)

    return(s)

#
# make a matrix each column of which represents an images in a vector form
# split the first 1,000 images for training and the rest 9,000 images for testing
#
num_train_image = 1000
num_test_image = 9000

np_list_image_train  = np.empty((size_row * size_col, num_train_image), dtype=float)
np_list_label_train  = np.empty(num_train_image, dtype=int)
np_list_image_test  = np.empty((size_row * size_col, num_test_image), dtype=float)
np_list_label_test  = np.empty(num_test_image, dtype=int)

for line in data[:num_train_image]:

    line_data   = line.split(',')
    label       = int(line_data[0])
    im_vector_train   = np.asfarray(line_data[1:])
    im_vector_train   = normalize(im_vector_train)

    np_list_label_train[count]       = label
    np_list_image_train[:, count]    = im_vector_train

    count += 1

count = 0
for line in data[num_train_image:]:
    line_data   = line.split(',')
    label        = int(line_data[0])
    im_vector_test   = np.asfarray(line_data[1:])
    im_vector_test   = normalize(im_vector_test)

    np_list_label_test[count]       = label
    np_list_image_test[:, count]    = im_vector_test

    count += 1

In [0]:
list_image_train = cp.array(np_list_image_train)
list_label_train = cp.array(np_list_label_train)
list_image_test = cp.array(np_list_image_test)
list_label_test = cp.array(np_list_label_test)

## 2. Neural Network Architecture

```mermaid
(input layer : x)  --> (first hidden layer : y)  -->  (output layer : h)
```

```mermaid
(x)  -- fully connected : u -->  (y_)  -- sigmoid -->  (y)  -- fully connected : v -->  (h_)  -- sigmoid -->  (h)
```


Build a `neural network` for the `multi-label classification` with `10 labels`. \\
Construct a `neural network` with `4 layers` including the `input layer` and the `output layer`. \\
Each `hidden layer` is defined by a `logistic unit`. \\
A `logistic unit` consists of a `fully connected layer` with a `bias` followed by the `sigmoid activation function`. \\
 \\
The `dimension` of each layer is defined by: \\
`Input layer` : 784 (+ a `bias`) \\
`First hidden layer` : 196 (+ a `bias`) \\
`Output layer` : 10

### 2.1. Generate the Fully Connected Layer

Define the following `fully connected layer` with a `bias`.

\begin{equation*}
(output \ layer) = 1\times\theta_0^t + (input \ layer)_1\times\theta_1^t + (input \ layer)_2\times\theta_2^t + ... + (input \ layer)_{num \ input}^t, \quad\text{where, $t =$ (the number of the iteration of the layer)}
\end{equation*}

In [0]:
def fully_connected(num_input, num_output, weight, input_layer, num_image):
    output_layer  = cp.empty((num_image, num_output), dtype=float)
    input_reshaped = cp.ones((num_input + 1, num_image), dtype=float)
    input_reshaped[1:] = input_layer
    weight_reshaped = weight.reshape(num_output, num_input + 1)
    output_layer = cp.matmul(weight_reshaped, input_reshaped)
    return output_layer

### 2.2. Generate the Sigmoid Function as an Activation Function

#### Generate the `sigmoid function`

Define the following `sigmoid fuction` as an `activation fuction`.

\begin{equation*}
\sigma(z) = \frac{1}{1 + exp(-z)} \\
\sigma'(z) = \sigma(z)(1 - \sigma(z)) \\
\end{equation*}

In [0]:
def sigmoid(input_layer, num_image):
    output_layer  = np.empty((len(input_layer[0]), num_image), dtype=np.float128)
    for i in range(num_image):
        for j in range(len(input_layer[0])):
            output_layer[j][i] = 1/(1 + np.exp(-input_layer[i][j]))
    return output_layer

### 2.3. Generate the `Objective Function`

Define the following `objective function`.

\begin{equation*}
J(\theta) = \frac{1}{m}​\sum_{i = 1}^m​\sum_{k = 0}^9​(−l_k^{(i)}​log(h_k^{(i)}​)−(1−l_k^{(i)}​)log(1−h_k^{(i)}​)) + \frac{\lambda}{2n}\sum_{j = 1}^n\theta_j^2, \\
\text{where,}\quad \theta_j \text{ denotes a model parametter where $j = 1, 2, ..., n$}, \theta = (u, v), \\
\lambda \text{ is a control parameter for the regularization based on the $L_2^2$-norm (weight decay)}, \\
n\text{ is the total number of all the model parameters over the entire neural network, therefore, $n = 2$}, \\
\text{ and $h_k^{(i)}$ denotes the $k^{th}$ element of the output layer for $i^{th}$ sample data.}
\end{equation*}

In [0]:
import math

In [0]:
def objective(output_layer, num_image, label):
    errors = []
    error = 0
    for i in range(num_image):
        for j in range(10):
            error = error + ((-label[i]) * math.log(output_layer[j][i]) - (1 - label[i]) * math.log(1 - output_layer[j][i]))
        errors.append(error)
    return (1 / num_image) * sum(errors)

### 2.3. Generate the `Gradient Descent` (`Back-Propagation`)

Define the `learning rate`.

\begin{equation*}
\alpha  = 0.001
\end{equation*}

In [0]:
learning_rate = 0.001

Define the following `gradient descent`.

\begin{equation*}
\theta_k^{(t + 1)} := \theta_k^{(t)} - \alpha\frac{\partial J(\theta^{(t)})}{\partial \theta_k}, \quad\text{for all $k$}.
\end{equation*}

In [0]:
def gradient_descent(input_layer, output_layer, weight):
    num_weight = len(weight)
    num_input = len(input_layer)
    num_output = len(output_layer)
    input_reshaped = np.ones((num_input + 1, num_train_image), dtype=np.float128)
    input_reshaped[1:] = input_layer
    weight_reshaped = weight.reshape(num_output, num_input + 1)
    weight_update = weight_reshaped
    label_reshaped = np.empty((num_output, num_train_image), dtype=np.float128)
    for i in range(num_output):
        label_reshaped[i] = list_label_train
    weight_update = weight_update - np.dot(learning_rate / num_train_image, np.matmul(output_layer-label_reshaped, np.transpose(input_reshaped)))
    return weight_update.reshape(1, -1)

### 2.4. Compute the `Accuracy`

Compute the following `accuracy` in `number (%)`.
\begin{equation}
accuracy\ (\%) = \frac{\text{number of correct predictions}}{\text{total number of predictions}} \times 100
\end{equation}

In [0]:
def accuracy(output_layer, num_image, label):
    num_correct_predict = 0
    for i in range(num_image):
        if np.argmax(output_layer[:, i]) == label[i]:
            num_correct_predict = num_correct_predict + 1
    return (num_correct_predict / num_image) * 100

### 2.4. `Train` and `Test` the input data

Define the `initial conditions` of `weights` $(\theta_{0}^{(0)}, \theta_{1}^{(0)}, \theta_{2}^{(0)}, ..., \theta_{28\times28}^{(0)})$. \\
The `weights` ar following a `normal distribution` $\mathcal{N}(0, \sigma^2)$ with `mean` 0 and `standard deviation` some number.

Define the `standard deviation`. \\

\begin{equation*}
\sigma = 0.01
\end{equation*}

In [0]:
epoch = 100

In [0]:
def initialize(num):
    weight = np.random.randn(num)
    return weight

In [0]:
weight1 = np.empty((epoch, 785 * 196), dtype=np.float128)
weight1[0] = initialize(785 * 196)

In [0]:
weight2 = np.empty((epoch, 197 * 49), dtype=np.float128)
weight2[0] = initialize(197 * 49)

In [0]:
weight3 = np.empty((epoch, 50 * 10), dtype=np.float128)
weight3[0] = initialize(50 * 10)

`Train` the `train data` with the `Neural Network Architecture` above with the `gradient descent`. \\
Find `optimal parameters` $\theta$ using the `traing data` (the first `1,000 images`). \\
`Test` the `test data` with the `Neural Network Architecture` above with the `obtained parameters` $\theta$ from the `training process` using the `testing data` (the rest `9,000 images`).

In [0]:
train_input_layer_x = np.empty((784, num_train_image), dtype=np.float128)
train_y = np.empty((num_train_image, 196), dtype=np.float128)
train_hidden_layer_y = np.empty((196, num_train_image), dtype=np.float128)
train_z = np.empty((num_train_image, 49), dtype=np.float128)
train_hidden_layer_z = np.empty((49, num_train_image), dtype=np.float128)
train_h = np.empty((num_train_image, 10), dtype=np.float128)
train_output_layer_h = np.empty((10, num_train_image), dtype=np.float128)
train_input_layer_x = list_image_train

In [0]:
test_input_layer_x = np.empty((784, num_test_image), dtype=np.float128)
test_y = np.empty((num_test_image, 196), dtype=np.float128)
test_hidden_layer_y = np.empty((196, num_test_image), dtype=np.float128)
test__z = np.empty((num_test_image, 49), dtype=np.float128)
test_hidden_layer_z = np.empty((49, num_test_image), dtype=np.float128)
test_h = np.empty((num_test_image, 10), dtype=np.float128)
test_output_layer_h = np.empty((10, num_test_image), dtype=np.float128)
test_input_layer_x = list_image_test

In [0]:
train_loss = np.empty(epoch, dtype=np.float128)
test_loss = np.empty(epoch, dtype=np.float128)

In [0]:
train_accuracy = np.empty(epoch, dtype=np.float128)
test_accuracy = np.empty(epoch, dtype=np.float128)

In [0]:
for i in range(epoch):
    train_y = fully_connected(784, 196, weight1[i], train_input_layer_x, num_train_image)
    train_hidden_layer_y = sigmoid(train_y, num_train_image)
    train_z = fully_connected(196, 49, weight2[i], train_hidden_layer_y, num_train_image)
    train_hidden_layer_z = sigmoid(train_z, num_train_image)
    train_h = fully_connected(49, 10, weight3[i], train_hidden_layer_z, num_train_image)
    train_output_layer_h = sigmoid(train_h, num_train_image)

    train_loss[i] = objective(train_output_layer_h, num_train_image, list_label_train)
    train_accuracy[i] = accuracy(train_output_layer_h, num_train_image, list_label_train)

    print("[", i + 1, "/", epoch, "]", "train loss: ", train_loss[i], ", train accuracy: ", train_accuracy[i])

    test_y = fully_connected(784, 196, weight1[i], test_input_layer_x, num_test_image)
    test_hidden_layer_y = sigmoid(test_y, num_test_image)
    test_z = fully_connected(196, 49, weight2[i], test_hidden_layer_y, num_test_image)
    test_hidden_layer_z = sigmoid(test_z, num_test_image)
    test_h = fully_connected(49, 10, weight3[i], test_hidden_layer_z, num_test_image)
    test_output_layer_h = sigmoid(test_h, num_test_image)

    test_loss[i] = objective(test_output_layer_h, num_test_image, list_label_test)
    test_accuracy[i] = accuracy(test_output_layer_h, num_test_image, list_label_test)

    print("[", i + 1, "/", epoch, "]", "test loss: ", test_loss[i], ", test accuracy: ", test_accuracy[i])

    if i < (epoch - 1):
        weight3[i + 1] =  gradient_descent(train_hidden_layer_z, train_output_layer_h, weight3[i])
        weight2[i + 1] = gradient_descent(train_hidden_layer_y, train_hidden_layer_z, weight2[i])
        weight1[i + 1] = gradient_descent(train_input_layer_x, train_hidden_layer_y, weight1[i])

In [0]:
def plot(list_label, list_image, i):
    label       = list_label
    im_vector   = list_image
    im_matrix   = im_vector.reshape((size_row, size_col))
    
    plt.subplot(2, 5, i+1)
    plt.title(label)
    plt.imshow(im_matrix, cmap='Greys', interpolation='None')

    frame   = plt.gca()
    frame.axes.get_xaxis().set_visible(False)
    frame.axes.get_yaxis().set_visible(False)

In [0]:
test_correct = []
test_miss = []
for i in range(num_test_image):
    if np.argmax(test_output_layer_h[:, i]) == list_label_test[i]:
        test_correct.append(i)
    else:
        test_miss.append(i)

## 3. **Results**

### 3.1. **Plot the loss curve**

Plot the `training loss` at `every iteration` of `gradient descent` using the `training data` (the first `1,000 images`) (in `blue` color). \\
Plot the `testing loss` at `every iteration` of `gradient descent` using the `testing data` (the rest `9,000 images`) in `red` color. \\
The both `curves` should be presented in `one figure`.


In [0]:
plt.figure(figsize=(8, 8))
x_cost1 = np.arange(0, epoch)
x_cost2 = np.arange(0, epoch)
plt.xlabel('t (iteration)')
plt.ylabel('J(theta)')

plt.plot(x_cost1, train_loss[:epoch], color = 'blue', label = 'training loss')
plt.plot(x_cost2, test_loss[:epoch], color = 'red', label = 'testing loss')
plt.legend()

plt.show()

### 3.2. **Plot the accuracy curve**

Plot the `training accuracy` (%) at `every iteration` of `gradient descent` using the `training data` (the first `1,000 images`) (in `blue` color). \\
plot the `testing accuracy` (%) at `every iteration` of `gradient descent` using the `testing data` (the rest `9,000 images`) (in `red` color). \\
The both `curves` should be presented in `one figure`.

In [0]:
plt.figure(figsize=(8, 8))
plt.xlabel('t (iteration)')
plt.ylabel('accuracy(%)')

plt.plot(x_cost1, train_accuracy[:epoch], color = 'blue', label = 'training accuracy')
plt.plot(x_cost2, test_accuracy[:epoch], color = 'red', label = 'testing accuracy')
plt.legend()

plt.show()

### 3.3. **Plot the accuracy value**

Print the `final training accuracy` (%) using the `training data` (the first `1,000 images`). \\
Print the `final testing accuracy` (%) using the `testing data` (the rest `9,000 images`).

In [0]:
print("Final training accuracy: ", train_accuracy[epoch - 1])
print("Final testing accuracy: ", test_accuracy[epoch - 1])

### 3.4. **Plot the classification example**

Present `10 correctly classified testing images` with their `labels` at the title of each sub-figure in `2x5 array`. \\
Present `10 misclassified testing images` with their misclassified `labels` at the title of each sub-figure in `2x5 array`.

In [0]:
f1 = plt.figure(figsize=(8, 4))

plot(np.argmax(test_output_layer_h[:, test_correct[0]]), list_image_test[:, test_correct[0]], 0)
plot(np.argmax(test_output_layer_h[:, test_correct[1]]), list_image_test[:, test_correct[1]], 1)
plot(np.argmax(test_output_layer_h[:, test_correct[2]]), list_image_test[:, test_correct[2]], 2)
plot(np.argmax(test_output_layer_h[:, test_correct[3]]), list_image_test[:, test_correct[3]], 3)
plot(np.argmax(test_output_layer_h[:, test_correct[4]]), list_image_test[:, test_correct[4]], 4)
plot(np.argmax(test_output_layer_h[:, test_correct[5]]), list_image_test[:, test_correct[5]], 5)
plot(np.argmax(test_output_layer_h[:, test_correct[6]]), list_image_test[:, test_correct[6]], 6)
plot(np.argmax(test_output_layer_h[:, test_correct[7]]), list_image_test[:, test_correct[7]], 7)
plot(np.argmax(test_output_layer_h[:, test_correct[8]]), list_image_test[:, test_correct[8]], 8)
plot(np.argmax(test_output_layer_h[:, test_correct[9]]), list_image_test[:, test_correct[9]], 9)

plt.show()

In [0]:
f1 = plt.figure(figsize=(8, 4))

plot(np.argmax(test_output_layer_h[:, test_miss[0]]), list_image_test[:, test_miss[0]], 0)
plot(np.argmax(test_output_layer_h[:, test_miss[1]]), list_image_test[:, test_miss[1]], 1)
plot(np.argmax(test_output_layer_h[:, test_miss[2]]), list_image_test[:, test_miss[2]], 2)
plot(np.argmax(test_output_layer_h[:, test_miss[3]]), list_image_test[:, test_miss[3]], 3)
plot(np.argmax(test_output_layer_h[:, test_miss[4]]), list_image_test[:, test_miss[4]], 4)
plot(np.argmax(test_output_layer_h[:, test_miss[5]]), list_image_test[:, test_miss[5]], 5)
plot(np.argmax(test_output_layer_h[:, test_miss[6]]), list_image_test[:, test_miss[6]], 6)
plot(np.argmax(test_output_layer_h[:, test_miss[7]]), list_image_test[:, test_miss[7]], 7)
plot(np.argmax(test_output_layer_h[:, test_miss[8]]), list_image_test[:, test_miss[8]], 8)
plot(np.argmax(test_output_layer_h[:, test_miss[9]]), list_image_test[:, test_miss[9]], 9)

plt.show()

### 3.5. **`Testing` accuracy**