<a href="https://colab.research.google.com/github/AllyHyeseongKim/CAU11934_MachineLearning/blob/master/assignment/08/assignment08.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment08: Forward Propagation in the Neural Networks

## 1. Load the input data (text file)

### Mount the google drive

In [0]:
from google.colab import drive

drive.mount('/content/gdrive')

In [0]:
cd

In [0]:
cd ../content/gdrive/My Drive/Colab Notebooks/Machine Learning/assignment07

In [0]:
ls

### Load the Data

Load a set of the data $\{ (x^{(i)}, y^{(i)}, l^{(i)}) \}$ from the given `text file` (`'data-nonlinear.txt'`) for training. \\
Each row $\{ (x^{(i)}, y^{(i)}, l^{(i)}) \}$ of the data consists of a 2-dimensional point $(x, y)$ with its label $l$, $\text{where $x, y \in \mathbb{R}$ and $l \in \{0, 1\}$}$. \\
Plot the set of points $\{ (x^{(i)}, y^{(i)}) \}$ that are loaded from `'data-nonlinear.csv'` file. \\

In [0]:
import numpy as np
import matplotlib.pyplot as plt

data    = np.genfromtxt("data-nonlinear.txt", delimiter=',')

x_train  = data[:, 0]
y_train  = data[:, 1]
label   = data[:, 2]

m = len(label)

pointX0 = x_train[label == 0]
pointY0 = y_train[label == 0]

pointX1 = x_train[label == 1]
pointY1 = y_train[label == 1]

plt.figure(figsize=(8, 8))
plt.scatter(pointX0, pointY0, c='b')
plt.scatter(pointX1, pointY1, c='r')
plt.tight_layout()
plt.gca().set_aspect('equal', adjustable='box')
plt.show()

## 2. Generate the Logistic Regression Model

### Generate the `logistic regression`

Define the following `logistic regression` with a `high dimensional function feature function`.

\begin{equation*}
\hat{h} = \sigma(z) \\
z = g(x, y; \theta), \quad\text{where $g$ is a high dimensional function and $\theta \in \mathbb{R}^{100}$} \\
\theta = (\theta_{0, 0}, \theta_{0, 1}, ..., \theta_{9, 9}) \\
g(x, y; \theta) = \sum_{i = 0}^9\sum_{j = 0}^9\theta_{i, j}x^iy^j \\
\sigma(z) = \frac{1}{1 + exp(-z)} \\
\sigma'(z) = \sigma(z)(1 - \sigma(z)) \\
\end{equation*}

Define the `function` $g(x, y; \theta)$:
\begin{equation}
g(x, y; \theta) = \sum_{i = 0}^9\sum_{j = 0}^9\theta_{i, j}x^iy^j
\end{equation}

In [0]:
def g(weight):
    g = []
    for i in range(m):
        g_k = 0
        for j in range(10):
            for k in range(10):
                g_k = g_k + weight[10 * j + k] * (x_train[i] ** j) * (y_train[i] ** k)
        g.append(g_k)
    return g

Define the following `sigmoid function`.

\begin{equation*}
\hat{h} = \sigma(z), \text{ where }
\sigma(z) = \frac{1}{1 + exp(-z)}, \\
z = g(x, y; \theta) = \sum_{i = 0}^9\sum_{j = 0}^9\theta_{i, j}x^iy^j \\
\end{equation*}

In [0]:
def logistic_regression(weight):
    y_logistic_regression = []
    z = g(weight)
    for i in range(m):
        y_logistic_regression.append(1/(1 + np.exp(-z[i])))       
    return y_logistic_regression

## 3. Generate the `Cost Function` with `Gradient Descent` method

### Generate the `objective function`

Define the `degree of regularization` by the `control parameter`, $\lambda$:

\begin{equation*}
\lambda_1 = 0.000001, \text{  for over-fitting} \\
\lambda_2 = 0.001, \text{  for just-right} \\
\lambda_3 = 0.1, \text{  for under-fitting}
\end{equation*}

In [0]:
lambda1 = 0.000001
lambda2 = 0.001
lambda3 = 0.1

Define the following `objective function` with a `regularization term`.

\begin{equation*}
J(\theta) = \frac{1}{m}\sum_{i = 1}^m[-l^{(i)}log(\sigma(g(x^{(i)}, y^{(i)}; \theta))) - (1 - l^{(i)})log(1 - \sigma(g(x^{(i)}, y^{(i)}; \theta)))] + \frac{\lambda}{2}\sum_{i = 0}^9\sum_{j = 0}^9\theta_{i, j}^2
\end{equation*}

In [0]:
def objective_function(lambda_value, weight, y_logistic_regression):
    error = []
    regularization = 0
    for i in range(10):
        for j in range(10):
            regularization = regularization + (weight[10 * i + j] ** 2)
    regularization = (lambda_value / 2) * regularization
    for k in range(m):
        error.append((-label[k]) * np.log(y_logistic_regression[k]) - (1 - label[k]) * np.log(1 - y_logistic_regression[k]))
    return (sum(error) / m) + regularization

### Generate the `gradient descent`

Define the `learning rate`.

\begin{equation*}
\alpha  = 1
\end{equation*}

In [0]:
learning_rate = 1

Define the following `gradient descent`.

\begin{equation*}
\theta_{i,j}^{(t+1)} := \theta_{i,j}^{(t)} - \alpha[\frac{1}{m}\sum_{i = 1}^m(\sigma(g(x^{(i)}, y^{(i)}; \theta^{(t)})) - l^{(i)})\frac{\partial g(x^{(i)}, y^{(i)}; \theta^{(t)})}{\partial \theta_{i,j}} + \lambda\theta_{i,j}^{(t)}], \quad\text{for all $i, j$}, \\
\text{where,}\quad\frac{\partial g(x^{(i)}, y^{(i)}; \theta^{(t)})}{\partial \theta_{i,j}} = \frac{\partial\sum_{i = 0}^9\sum_{j = 0}^9\theta_{i, j}x^iy^j}{\partial \theta_{i,j}} = \frac{\partial \theta_{0,0}x^0y^0}{\partial \theta_{i,j}} + \frac{\partial \theta_{0,1}x^0y^1}{\partial \theta_{i,j}} + ... + \frac{\partial \theta_{9,9}x^9y^9}{\partial \theta_{i,j}}
\end{equation*}

In [0]:
def gradient_descent(lambda_value, weight, y_logistic_regression):
    weight_error = []
    for i in range(100):
        weight_error.append([])
    weightPrime = []
    for i in range(10):
        for j in range(10):
            for k in range(m):
                weight_error[10 * i + j].append((y_logistic_regression[k] - label[k]) * (x_train[k] ** i) * (y_train[k] ** j))
    for i in range(10):
        for j in range(10):
            weightPrime.append((1 - learning_rate * lambda_value) * weight[10 * i + j] - learning_rate * (sum(weight_error[10 * i + j]) / m))
    return weightPrime

## 4. `Train` the input data

Define the initial `weight` and `offset`$(\theta_{0,0}^{(0)}, \theta_{0,1}^{(0)}, \theta_{0,2}^{(0)}, ..., \theta_{9,9}^{(0)})$:

\begin{equation*}
\theta_{0,0}^{(0)} = \theta_{0,1}^{(0)} = \theta_{0,2}^{(0)} = ... = \theta_{9,9}^{(0)} = 1
\end{equation*}

In [0]:
weight1 = []
weight1.append([])
for i in range(100):
    weight1[0].append(1)

In [0]:
weight2 = []
weight2.append([])
for i in range(100):
    weight2[0].append(1)

In [0]:
weight3 = []
weight3.append([])
for i in range(100):
    weight3[0].append(1)

In [0]:
cost_convergence = []

`Train` the `input data` with the `logistic regression` function above with the `gradient descent`. \\
Find optimal parameters $\theta$ using the `traing data`.

In [0]:
epoch = m
sigma1 = []
sigma2 = []
sigma3 = []
cost1 = []
cost2 = []
cost3 = []
i = 0
# print(logistic_regression(weight[i], offset[i]))
# print(objective_function(logistic_regression(weight[i], offset[i])))
sigma1.append(logistic_regression(weight1[i]))
sigma2.append(logistic_regression(weight2[i]))
sigma3.append(logistic_regression(weight3[i]))
cost1.append(objective_function(lambda1, weight1[i], sigma1[i]))
cost2.append(objective_function(lambda2, weight2[i], sigma2[i]))
cost3.append(objective_function(lambda3, weight3[i], sigma3[i]))
# print(j)

while i < epoch:
    i = i + 1

#    print(k)
    weight1.append(gradient_descent(lambda1, weight1[i - 1], sigma1[i - 1]))
    weight2.append(gradient_descent(lambda2, weight2[i - 1], sigma2[i - 1]))
    weight3.append(gradient_descent(lambda3, weight3[i - 1], sigma3[i - 1]))

#    print('weight: ', weight)
#    print('offset: ', offset)

    sigma1.append(logistic_regression(weight1[i]))
    sigma2.append(logistic_regression(weight2[i]))
    sigma3.append(logistic_regression(weight3[i]))
#    print('sigma: ', sigma)
#    print(offset[k])
#    print(round(j[k - 1], 2))
    cost1.append(objective_function(lambda1, weight1[i], sigma1[i]))
    cost2.append(objective_function(lambda2, weight2[i], sigma2[i]))
    cost3.append(objective_function(lambda3, weight3[i], sigma3[i]))
#    print('j: ', j)
    if cost1[i] == cost1[i - 1]:
        if cost2[i] == cost2[i - 1]:
            if cost3[i] == cost3[i - 1]:
                cost_convergence1.append(i)

cost_convergence.append(epoch)

# print(theta_convergence)
# print(cost_convergence)
# print('sigma: ', sigma)
# print('j: ', j)
# print('weight: ', weight)
# print('offset: ', offset)

## 5. Compute the `training accuracy`

Compute the `final training accuracy` in `number (%)` with varying values of the `regularization parameter` $\lambda$.
\begin{equation}
accuracy\ (\%) = \frac{\text{number of correct predictions}}{\text{total number of predictions}} \times 100
\end{equation}

In [0]:
index_cost_minimum1 = cost1.index(min(cost1))
index_cost_minimum2 = cost2.index(min(cost2))
index_cost_minimum3 = cost3.index(min(cost3))

In [0]:
accuracy1 = []
i = 0
for i in range(epoch):
    correct_predictions = 0
    for j in range(m):
        if sigma1[i][j] == 0.5:
            correct_predictions = correct_predictions + 1
        else:
            if sigma1[i][j] < 0.5:
                if label[j] == 0:
                    correct_predictions = correct_predictions + 1
            else:
                if label[j] == 1:
                    correct_predictions = correct_predictions + 1
    accuracy1.append((correct_predictions / m) * 100)
# print(index_minimum)
index_accuracy_maximum1 = accuracy1.index(max(accuracy1))
# print(index_accuracy_maximum)
# print(accuracy[index_accuracy_maximum])
# print(accuracy[index_minimum-1])

In [0]:
accuracy2 = []
i = 0
for i in range(epoch):
    correct_predictions = 0
    for j in range(m):
        if sigma2[i][j] == 0.5:
            correct_predictions = correct_predictions + 1
        else:
            if sigma2[i][j] < 0.5:
                if label[j] == 0:
                    correct_predictions = correct_predictions + 1
            else:
                if label[j] == 1:
                    correct_predictions = correct_predictions + 1
    accuracy2.append((correct_predictions / m) * 100)
# print(index_minimum)
index_accuracy_maximum2 = accuracy2.index(max(accuracy2))

In [0]:
accuracy3 = []
i = 0
for i in range(epoch):
    correct_predictions = 0
    for j in range(m):
        if sigma3[i][j] == 0.5:
            correct_predictions = correct_predictions + 1
        else:
            if sigma3[i][j] < 0.5:
                if label[j] == 0:
                    correct_predictions = correct_predictions + 1
            else:
                if label[j] == 1:
                    correct_predictions = correct_predictions + 1
    accuracy3.append((correct_predictions / m) * 100)
# print(index_minimum)
index_accuracy_maximum3 = accuracy3.index(max(accuracy3))

In [0]:
#print(sigma1[0])
#print(sigma1[1])
#print(sigma1[index_accuracy_maximum1])
#print(sigma1[index_minimum-1])
#print(label)

In [0]:
#print(accuracy1)

## 6. Visualize the `Classifier`

Generate the `Classifier`.

In [0]:
#print(weight1)

In [0]:
def classifier1(x, y):
    optimal_g1 = 0
    for i in range(10):
        for j in range(10):
            optimal_g1 = optimal_g1 + weight1[epoch][10 * i + j] * (x ** i) * (y ** j)
    return optimal_g1

In [0]:
def classifier2(x, y):
    optimal_g2 = 0
    for i in range(10):
        for j in range(10):
            optimal_g2 = optimal_g2 + weight2[epoch][10 * i + j] * (x ** i) * (y ** j)
    return optimal_g2

In [0]:
def classifier3(x, y):
    optimal_g3 = 0
    for i in range(10):
        for j in range(10):
            optimal_g3 = optimal_g3 + weight3[epoch][10 * i + j] * (x ** i) * (y ** j)
    return optimal_g3

Visualize the obtained `classifier`, where the `boundary` of the `classifier` is defined by $\{(x, y) | \sigma(g(x, y ; \theta)) = 0.5\} = \{(x, y) | g(x, y ; \theta) = 0\}$.

In [0]:
plt.figure(figsize=(8, 8))
x = np.arange(-1, 1.25, 0.01)
y = np.arange(-1, 1.25, 0.01)
X, Y = np.meshgrid(x, y)
z1 = classifier1(X, Y)
z2 = classifier2(X, Y)
z3 = classifier3(X, Y)

CS = plt.contour(X, Y, z1, [0], colors='red')
CS = plt.contour(X, Y, z2, [0], colors='green')
CS = plt.contour(X, Y, z3, [0], colors='blue')
CS.clabel()
plt.scatter(pointX0, pointY0, c='b')
plt.scatter(pointX1, pointY1, c='r')
plt.show()

## 7. **Results**

### 1. **Plot the training data**

Plot the `training data points` $(x, y)$ with their `labels` $l$ (in `blue` color for `label 0` and `red` color for `label 1`).

In [0]:
plt.figure(figsize=(8, 8))
plt.scatter(pointX0, pointY0, c='b')
plt.scatter(pointX1, pointY1, c='r')
plt.tight_layout()
plt.gca().set_aspect('equal', adjustable='box')
plt.show()

### 2. **Plot the training error with varying regularization parameters**

Choose a value for $\lambda_1$​ in such a way that `over-fitting` is demonstrated and plot the `training error` $J(\theta)$ at `every iteration` of `gradient descent` until `convergence` (in `red` color). \\
Choose a value for $\lambda_2$ in such a way that `just-right` is demonstrated and plot the `training error` $J(\theta)$ at `every iteration` of `gradient descent` until `convergence` (in `green` color). \\
Choose a value for $\lambda_3$ in such a way that `under-fitting` is demonstrated and plot the `training error` $J(\theta)$ at `every iteration` of `gradient descent` until `convergence` (in `blue` color). \\
The above three curves should be presented `all together in a single figure`.

In [0]:
plt.figure(figsize=(8, 8))
x_cost1 = np.arange(0, cost_convergence[0])
x_cost2 = np.arange(0, cost_convergence[0])
x_cost3 = np.arange(0, cost_convergence[0])
plt.xlabel('t (iteration)')
plt.ylabel('J(theta)')

plt.plot(x_cost1, cost1[:cost_convergence[0]], color = 'red', label = 'over-fitting error')
plt.plot(x_cost2, cost2[:cost_convergence[0]], color = 'green', label = 'just-right error')
plt.plot(x_cost3, cost3[:cost_convergence[0]], color = 'blue', label = 'under-fitting error')
plt.legend()

plt.show()

### 3. **Display the values of the chosen regularization parameters**

Display the value of the chosen $\lambda_1$​ for the demonstration of `over-fitting` (in `red` color). \\
Display the value of the chosen $\lambda_2$​ for the demonstration of `just-right` (in `green` color). \\
Display the value of the chosen $\lambda_3$​ for the demonstration of `under-fitting` (in `blue` color).

In [0]:
print('\033[31m' + 'lambda 1 = ' , lambda1 , '\033[0m')
print('\033[32m' + 'lambda 2 = ' , lambda2 , '\033[0m')
print('\033[34m' + 'lambda 3 = ' , lambda3 , '\033[0m')

### 4. **Plot the training accuracy with varying regularization parameters**

Plot the `training accuracy` with the chosen $\lambda_1$​ for `over-fitting` at `every iteration` of `gradient descent` until `convergence` (in `red` color). \\
Plot the `training accuracy` with the chosen $\lambda_2$ for `just-right` at `every iteration` of `gradient descent` until `convergence` (in `green` color). \\
Plot the `training accuracy` with the chosen $\lambda_3$​ for `under-fitting` at `every iteration` of `gradient descent` until `convergence` (in `blue` color). \\
The above three curves should be presented `all together in a single figure`.

In [0]:
plt.figure(figsize=(8, 8))
x_accuracy1 = np.arange(0, epoch)
x_accuracy2 = np.arange(0, epoch)
x_accuracy3 = np.arange(0, epoch)
plt.xlabel('t (iteration)')
plt.ylabel('accuracy (%)')

plt.plot(x_accuracy1, accuracy1[:epoch], color = 'red', label = 'over-fitting accuracy')
plt.plot(x_accuracy2, accuracy2[:epoch], color = 'green', label = 'just-right accuracy')
plt.plot(x_accuracy3, accuracy3[:epoch], color = 'blue', label = 'under-fitting accuracy')
plt.legend()

plt.show()

### 5. **Display the final training accuracy with varying regularization parameters**

Display the `final training accuracy` obtained with the chosen $\lambda_1$​ for `over-fitting` in number (%) at `convergence` (in `red` color). \\
Display the `final training accuracy` obtained with the chosen $\lambda_2$​ for `just-right` in number (%) at `convergence` (in `green` color). \\
Display the `final training accuracy` obtained with the chosen $\lambda_3$​ for `under-fitting` in number (%) at `convergence` (in `blue` color).

In [0]:
print('\033[31m' + 'accuracy 1 = ' , accuracy1[epoch - 1] , '(%)' + '\033[0m')
print('\033[32m' + 'accuracy 2 = ' , accuracy2[epoch - 1] , '(%)' + '\033[0m')
print('\033[34m' + 'accuracy 3 = ' , accuracy3[epoch - 1] , '(%)' + '\033[0m')

### 6. **Plot the optimal classifier with varying regularization parameters superimposed on the training data**

Plot the boundary of the `optimal classifier` with the chosen $\lambda_1$​ for `over-fitting` at `convergence` (in `red` color). \\
Plot the boundary of the `optimal classifier` with the chosen $\lambda_2$​ for `just-right` at `convergence` (in `green` color). \\
Plot the boundary of the `optimal classifier` with the chosen $\lambda_3$​ for `under-fitting` at `convergence` (in `blue` color). \\
The `boundary` of the `classifier` is defined by $\{ (x, y) \mid \sigma(g(x, y ; \theta)) = 0.5 \} = \{ (x, y) \mid g(x, y ; \theta) = 0 \}$. \\
The `boundaries` of the `classifiers` with `different regularization parameters` should be presented with the `training data points` $(x, y)$ with their `labels` $l$ (in `blue` color for label 0 and `red` color for label 1). \\
You can use `contour` function in python3.

In [0]:
plt.figure(figsize=(8, 8))
x = np.arange(-1, 1.25, 0.01)
y = np.arange(-1, 1.25, 0.01)
X, Y = np.meshgrid(x, y)
z1 = classifier1(X, Y)
z2 = classifier2(X, Y)
z3 = classifier3(X, Y)

CS = plt.contour(X, Y, z1, [0], colors='red')
CS = plt.contour(X, Y, z2, [0], colors='green')
CS = plt.contour(X, Y, z3, [0], colors='blue')
#CS.clabel()
plt.scatter(pointX0, pointY0, c='b')
plt.scatter(pointX1, pointY1, c='r')
plt.show()