# Assignment09: Multi-label classification using neural networks

## 1. Load the input data (csv file)

### Mount the google drive

In [0]:
from google.colab import drive

drive.mount('/content/gdrive')

In [0]:
cd

In [0]:
cd ../content/gdrive/My Drive/Colab Notebooks/Machine Learning/assignment09

In [0]:
ls

### Load the Data

Load a set of the data from the given `csv file` (`'mnist.csv'`) for training. \\
Each row of the data consists of the `label`, $l$ and the `image pixel values`, $x$ in a `vector form`, where the `label` is one of the 10 digits from 0 to 9, $l \in [0, 9]$. \\
The `image` represents its associated `label` in the `grey scale` and the number of images is 10,000 and the size of each image is $28 \times 28$, $x \in \mathbb{R}^{784}$. \\
Consder the first `6,000 images` for `training` and the rest `4,000 images` for `testing`. \\
`Normalize` the `intensity values` of each image so that they ranges from `0 to 1`. \\
Plot the `images` that are loaded from `'mnist.csv'` file. \\

In [0]:
import matplotlib.pyplot as plt
import numpy as np

file_data   = "mnist.csv"
handle_file = open(file_data, "r")
data        = handle_file.readlines()
handle_file.close()

size_row    = 28    # height of the image
size_col    = 28    # width of the image

num_image   = len(data)
count       = 0     # count for the number of images

#
# normalize the values of the input data to be [0, 1]
#
def normalize(data):

    data_normalized = (data - min(data)) / (max(data) - min(data))

    return(data_normalized)

#
# example of distance function between two vectors x and y
#
def distance(x, y):

    d = (x - y) ** 2
    s = np.sum(d)
    # r = np.sqrt(s)

    return(s)

#
# make a matrix each column of which represents an images in a vector form
# split the first 6,000 images for training and the rest 4,000 images for testing
#
num_train_image = 6000
num_test_image = 4000

list_image_train  = np.empty((size_row * size_col, 6000), dtype=np.float128)
list_label_train  = np.empty(6000, dtype=int)
list_image_test  = np.empty((size_row * size_col, 4000), dtype=np.float128)
list_label_test  = np.empty(4000, dtype=int)

for line in data[:6000]:

    line_data   = line.split(',')
    label       = line_data[0]
    im_vector   = np.asfarray(line_data[1:])
    im_vector   = normalize(im_vector)

    list_label_train[count]       = label
    list_image_train[:, count]    = im_vector

    count += 1

count = 0
for line in data[6000:]:
    line_data   = line.split(',')
    label       = line_data[0]
    im_vector   = np.asfarray(line_data[1:])
    im_vector   = normalize(im_vector)

    list_label_test[count]       = label
    list_image_test[:, count]    = im_vector

    count += 1
#
# plot first 150 images out of 10,000 with their labels
#
f1 = plt.figure(1)

for i in range(150):

    label       = list_label_train[i]
    im_vector   = list_image_train[:, i]
    im_matrix   = im_vector.reshape((size_row, size_col))

    plt.subplot(10, 15, i+1)
    plt.title(label)
    plt.imshow(im_matrix, cmap='Greys', interpolation='None')

    frame   = plt.gca()
    frame.axes.get_xaxis().set_visible(False)
    frame.axes.get_yaxis().set_visible(False)


#plt.show()

#
# plot the average image of the training images for each digit
#
f2 = plt.figure(2)

im_average  = np.zeros((size_row * size_col, 10), dtype=float)
im_count    = np.zeros(10, dtype=int)

for i in range(6000):

    im_average[:, list_label_train[i]] += list_image_train[:, i]
    im_count[list_label_train[i]] += 1

for i in range(10):

    im_average[:, i] /= im_count[i]

    plt.subplot(2, 5, i+1)
    plt.title(i)
    plt.imshow(im_average[:,i].reshape((size_row, size_col)), cmap='Greys', interpolation='None')

    frame   = plt.gca()
    frame.axes.get_xaxis().set_visible(False)
    frame.axes.get_yaxis().set_visible(False)

plt.show()

## 2. Neural Network Architecture

```mermaid
(input layer : x)  --> (first hidden layer : y)  -->  (second hidden layer : z)  -->  (output layer : h)
```

```mermaid
(x)  -- fully connected : u -->  (y_). -- sigmoid -->  (y)  -- fully connected : v   -->  (z_)  -- sigmoid -->  (z)  -- fully connected : w -->  (h_)  -- sigmoid -->  (h)
```


Build a `neural network` for the `multi-label classification` with `10 labels`. \\
Construct a `neural network` with `4 layers` including the `input layer` and the `output layer`. \\
Each `hidden layer` is defined by a `logistic unit`. \\
A `logistic unit` consists of a `fully connected layer` with a `bias` followed by the `sigmoid activation function`. \\
 \\
The `dimension` of each layer is defined by: \\
`Input layer` : 784 (+ a `bias`) \\
`First hidden layer` : 196 (+ a `bias`) \\
`Second hidden layer` : 49 (+ a `bias`) \\
`Output layer` : 10

### 2.1. Generate the Fully Connected Layer

Define a `fully connected layer` with a `bias`.

In [0]:
def fully_connected(num_input, num_output):
    weight = np.empty((num_output, num_input), dtype=np.float128)
    weight[0] = initialize(num_input)
    output_layer  = np.empty((num_train_image, num_output), dtype=np.float128)
    for i in range (num_output):
        for j in range (num_input):
            output_layer[i][j] = weight[i][0] + np.matmul(weight[i], list_image_train[:, j])
        weight[i + 1] = gradient_descent(weight[i], output_layer[i])

### 2.2. Generate the Logistic Regression Model

#### Generate the `logistic regression`

Define the following `logistic regression` with a `high dimensional function feature function`.

\begin{equation*}
\hat{h} = \sigma(z) \\
z = g(x, y; \theta), \quad\text{where } \theta = (\theta_{0}, \theta_{1}, ..., \theta_{28\times28}) \\
g(x, y; \theta) = \theta_1 x_1 + \theta_2 x_2 + ... + \theta_{28\times28} x_{28\times28} \\
\sigma(z) = \frac{1}{1 + exp(-z)} \\
\sigma'(z) = \sigma(z)(1 - \sigma(z)) \\
\end{equation*}

Define the `function` $g(x, y; \theta)$:
\begin{equation}
g(x, y; \theta) = \theta_1 x_1 + \theta_2 x_2 + ... + \theta_{28\times28} x_{28\times28}
\end{equation}

In [0]:
def g(weight):
    g_0  = np.empty(count_0, dtype=np.float128)
    g_1  = np.empty(count_1, dtype=np.float128)
    g_2  = np.empty(count_2, dtype=np.float128)
    g_3  = np.empty(count_3, dtype=np.float128)
    g_4  = np.empty(count_4, dtype=np.float128)
    g_5  = np.empty(count_5, dtype=np.float128)
    g_6  = np.empty(count_6, dtype=np.float128)
    g_7  = np.empty(count_7, dtype=np.float128)
    g_8  = np.empty(count_8, dtype=np.float128)
    g_9  = np.empty(count_9, dtype=np.float128)

    for i in range(count_0):
        g_0[i] = np.dot(list_image_0[:, i], weight)
    for i in range(count_1):
        g_1[i] = np.dot(list_image_1[:, i], weight)
    for i in range(count_2):
        g_2[i] = np.dot(list_image_2[:, i], weight)
    for i in range(count_3):
        g_3[i] = np.dot(list_image_3[:, i], weight)
    for i in range(count_4):
        g_4[i] = np.dot(list_image_4[:, i], weight)
    for i in range(count_5):
        g_5[i] = np.dot(list_image_5[:, i], weight)
    for i in range(count_6):
        g_6[i] = np.dot(list_image_6[:, i], weight)
    for i in range(count_7):
        g_7[i] = np.dot(list_image_7[:, i], weight)
    for i in range(count_8):
        g_8[i] = np.dot(list_image_8[:, i], weight)
    for i in range(count_9):
        g_9[i] = np.dot(list_image_9[:, i], weight)
    return g_0, g_1, g_2, g_3, g_4, g_5, g_6, g_7, g_8, g_9

Define the following `sigmoid function` as an `activation function`.

\begin{equation*}
\hat{h} = \sigma(z), \text{ where }
\sigma(z) = \frac{1}{1 + exp(-z)}, \\
z = g(x, y; \theta) = \sum_{i = 0}^9\sum_{j = 0}^9\theta_{i, j}x^iy^j \\
\end{equation*}

In [0]:
def logistic_unit(weight):
    y_sigmoid_function = []
    for i in range(10):
        y_sigmoid_function.append([])
    z = g(weight)
    for i in range(count_0):
        y_sigmoid_function[0].append(1/(1 + np.exp(-z[0][i])))
    for i in range(count_1):
        y_sigmoid_function[1].append(1/(1 + np.exp(-z[1][i])))
    for i in range(count_2):
        y_sigmoid_function[2].append(1/(1 + np.exp(-z[2][i])))
    for i in range(count_3):
        y_sigmoid_function[3].append(1/(1 + np.exp(-z[3][i])))
    for i in range(count_4):
        y_sigmoid_function[4].append(1/(1 + np.exp(-z[4][i])))
    for i in range(count_5):
        y_sigmoid_function[5].append(1/(1 + np.exp(-z[5][i])))
    for i in range(count_6):
        y_sigmoid_function[6].append(1/(1 + np.exp(-z[6][i])))
    for i in range(count_7):
        y_sigmoid_function[7].append(1/(1 + np.exp(-z[7][i])))
    for i in range(count_8):
        y_sigmoid_function[8].append(1/(1 + np.exp(-z[8][i])))
    for i in range(count_9):
        y_sigmoid_function[9].append(1/(1 + np.exp(-z[9][i])))
    return y_sigmoid_function

### 2.2. Generate the `Objective uction`

Define the following `objective function`.

\begin{equation*}
J(\theta) = \frac{1}{m}​\sum_{i = 1}^m​\sum_{k = 0}^9​(−l_k^{(i)}​log(h_k^{(i)}​)−(1−l_k^{(i)}​)log(1−h_k^{(i)}​)), \\
\text{where,}\quad \theta = (u, v, w), \text{ and $h_k^{(i)}$ denotes the $k^{th}$ element of the output layer for $i^{th}$ sample data.}
\end{equation*}

### 2.3. Generate the `Gradient Descent` (`Back-Propagation`)

Define the `learning rate`.

\begin{equation*}
\alpha  = 1
\end{equation*}

Define the following `gradient descent`.

\begin{equation*}
\theta_k^{(t + 1)} := \theta_k^{(t)} - \alpha\frac{\partial J(\theta)}{\partial \theta_k}, \quad\text{for all $k$}.
\end{equation*}

### 2.4. `Train` the input data

Define the `initial conditions` of `weights` $(\theta_{0}^{(0)}, \theta_{1}^{(0)}, \theta_{2}^{(0)}, ..., \theta_{28\times28}^{(0)})$. \\
The `weights` ar following a `normal distribution` $\mathcal{N}(0, \sigma^2)$ with `mean` 0 and `standard deviation` some number.

Define the `standard deviation`. \\

\begin{equation*}
\sigma = 1
\end{equation*}

In [0]:
weight = np.random.randn(size_row*size_col)
list_weight = np.empty((size_row*size_col, 1), dtype=np.float128)
list_weight[:, 0] = weight

`Train` the `train data` with the `Neural Network Architecture` above with the `gradient descent`. \\
Find `optimal parameters` $\theta$ using the `traing data` (the first `6,000 images`).

In [0]:
sigmoid = []
sigmoid = logistic_unit(list_weight)

### 2.5. `Test` the input data

`Test` the `test data` with the `Neural Network Architecture` above with the `obtained parameters` $\theta$ from the `training process` using the `testing data` (the rest `4,000 images`).

### 2.3. Present the `output` of the `neural network` with `random weights`

Consider a `neural network` with a `fully connected layer` using a `logistic unit` without a `bias`. \\
Assign `random values` from the `normal distribution` $\mathcal{N}(0, 1)$ with `mean` 0 and `standard deviation` 1 to the `weights` of the `fully connected layer` using a `logistic unit` without a `bias`. \\
Compute the `forward propagation` and take the `average` of the `output values` for the `images` of the `same label`.
Present the `average values` for `each label` in the `increasing order` of the `label`.

In [0]:
averaage_0 = np.mean(sigmoid[0])
averaage_1 = np.mean(sigmoid[1])
averaage_2 = np.mean(sigmoid[2])
averaage_3 = np.mean(sigmoid[3])
averaage_4 = np.mean(sigmoid[4])
averaage_5 = np.mean(sigmoid[5])
averaage_6 = np.mean(sigmoid[6])
averaage_7 = np.mean(sigmoid[7])
averaage_8 = np.mean(sigmoid[8])
averaage_9 = np.mean(sigmoid[9])

In [0]:
print('label 0 : ', averaage_0)
print('label 1 : ', averaage_1)
print('label 2 : ', averaage_2)
print('label 3 : ', averaage_3)
print('label 4 : ', averaage_4)
print('label 5 : ', averaage_5)
print('label 6 : ', averaage_6)
print('label 7 : ', averaage_7)
print('label 8 : ', averaage_8)
print('label 9 : ', averaage_9)

## 3. **Results**

### 1. **Plot the loss curve**

Plot the `training loss` at `every iteration` of `gradient descent` using the `training data` (the first `6,000 images`) (in `blue` color). \\
Plot the `testing loss` at `every iteration` of `gradient descent` using the `testing data` (the rest `4,000 images`) in `red` color. \\
The both `curves` should be presented in `one figure`.


In [0]:
f1 = plt.figure(figsize=(8, 4))

plotMean(list_image_0, 0)
plotMean(list_image_1, 1)
plotMean(list_image_2, 2)
plotMean(list_image_3, 3)
plotMean(list_image_4, 4)
plotMean(list_image_5, 5)
plotMean(list_image_6, 6)
plotMean(list_image_7, 7)
plotMean(list_image_8, 8)
plotMean(list_image_9, 9)

plt.show()

### 2. **Present the accuracy curve**

Plot the `training accuracy` (%) at `every iteration` of `gradient descent` using the `training data` (the first `6,000 images`) (in `blue` color). \\
plot the `testing accuracy` (%) at `every iteration` of `gradient descent` using the `testing data` (the rest `4,000 images`) (in `red` color). \\
The both `curves` should be presented in `one figure`.

In [0]:
print('label 0 : ', averaage_0)
print('label 1 : ', averaage_1)
print('label 2 : ', averaage_2)
print('label 3 : ', averaage_3)
print('label 4 : ', averaage_4)
print('label 5 : ', averaage_5)
print('label 6 : ', averaage_6)
print('label 7 : ', averaage_7)
print('label 8 : ', averaage_8)
print('label 9 : ', averaage_9)

### 3. **Plot the accuracy value**

Print the `final training accuracy` (%) using the `training data` (the first `6,000 images`). \\
Print the `final testing accuracy` (%) using the `testing data` (the rest `4,000 images`).

### 4. **Plot the classification example**

Present `10 correctly classified testing images` with their `labels` at the title of each sub-figure in `2x5 array`. \\
Present `10 misclassified testing images` with their misclassified `labels` at the title of each sub-figure in `2x5 array`.