<a href="https://colab.research.google.com/github/AllyHyeseongKim/CAU11934_MachineLarning/blob/master/assignment/06/assignment06.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment06: Logistic regression for a binary classification with a non-linear classification boundary

## 1. Load the input data (text file)

### Mount the google drive

In [0]:
from google.colab import drive

drive.mount('/content/gdrive')

In [0]:
cd gdrive/My Drive/Colab Notebooks/Machine Learning/assignment06

In [0]:
ls

### Load the Data

Load a set of the data $\{ (x^{(i)}, y^{(i)}, l^{(i)}) \}$ from the given `text file` (`'data-nonlinear.txt'`) for training. \\
Each row $\{ (x^{(i)}, y^{(i)}, l^{(i)}) \}$ of the data consists of a 2-dimensional point $(x, y)$ with its label $l$, $\text{where $x, y \in \mathbb{R}$ and $l \in \{0, 1\}$}$. \\
Plot the set of points $\{ (x^{(i)}, y^{(i)}) \}$ that are loaded from `'data-nonlinear.csv'` file. \\

In [0]:
import numpy as np
import matplotlib.pyplot as plt

data    = np.genfromtxt("data-nonlinear.txt", delimiter=',', dtype=np.float64)

x_train  = data[:, 0]
y_train  = data[:, 1]
label   = data[:, 2]

m = len(x_train)

pointX0 = x_train[label == 0]
pointY0 = y_train[label == 0]

pointX1 = x_train[label == 1]
pointY1 = y_train[label == 1]

plt.figure(figsize=(8, 8))
plt.scatter(pointX0, pointY0, c='b')
plt.scatter(pointX1, pointY1, c='r')
plt.tight_layout()
plt.gca().set_aspect('equal', adjustable='box')
plt.show()


## 2. Generate the Logistic Regression Model

### Generate the `logistic regression`

Define the following `Sigmoid function`.

\begin{equation*}
\hat{h} = \sigma(z) \\
z = g(x, y; \theta), \quad\text{where $g$ is a high dimensional function and $\theta \in \mathbb{R}^k$} \\
\theta = (\theta_0, \theta_1, ..., \theta_{k-1}) \\
g(x, y; \theta) = \theta_0f_0(x, y) + \theta_1f_1(x, y) + ... + \theta_{k-1}f_{k-1}(x, y) \\
\sigma(z) = \frac{1}{1 + exp(-z)} \\
\sigma'(z) = \sigma(z)(1 - \sigma(z)) \\
\end{equation*}

Define the `dimension` $k$ of $\theta$, where $k \leq 16$.

\begin{equation*}
k  = 16
\end{equation*}

In [0]:
k = 16

Define the `function` $f_k(x, y)$:
\begin{equation*}
f_0(x, y) = - 0.5 \\
f_1(x, y) = x, 
f_2(x, y) = y, 
f_3(x, y) = xy \\
f_4(x, y) = x^2, 
f_5(x, y) = y^2, 
f_6(x, y) = x^2y^2 \\
f_7(x, y) = x^3, 
f_8(x, y) = y^3, 
f_9(x, y) = x^3y^3 \\
f_{10}(x, y) = x^4, 
f_{11}(x, y) = y^4, 
f_{12}(x, y) = x^4y^4 \\
f_{13}(x, y) = x^6, 
f_{14}(x, y) = y^6, 
f_{15}(x, y) = x^6y^6 \\
\end{equation*}

In [0]:
f = []
for i in range(m):
    f.append([])
    f[i].append(-0.5)
    f[i].append(x_train[i])
    f[i].append(y_train[i])
    f[i].append(x_train[i] * y_train[i])
    f[i].append(x_train[i] ** 2)
    f[i].append(y_train[i] ** 2)
    f[i].append((x_train[i] ** 2) * (y_train[i] ** 2))
    f[i].append(x_train[i] ** 3)
    f[i].append(y_train[i] ** 3)
    f[i].append((x_train[i] ** 3) * (y_train[i] ** 3))
    f[i].append(x_train[i] ** 4)
    f[i].append(y_train[i] ** 4)
    f[i].append((x_train[i] ** 4) * (y_train[i] ** 4))
    f[i].append(x_train[i] ** 6)
    f[i].append(y_train[i] ** 6)
    f[i].append((x_train[i] ** 6) * (y_train[i] ** 6))

Define the `function` $g(x, y; \theta)$:
\begin{equation}
g(x, y; \theta) = \theta_0f_0(x, y) + \theta_1f_1(x, y) + ... + \theta_{k-1}f_{k-1}(x, y)
\end{equation}

In [0]:
def g(weight, offset):
    g = []
    for i in range(m):
        g_k = offset * f[i][0]
        for j in range(15):
            g_k = g_k + weight[j] * f[i][j + 1]
        g.append(g_k)
    return g

In [0]:
def logistic_regression(weight, offset):
    y_logistic_regression = []
    z = g(weight, offset)
    for i in range(m):
        # print(z[i])
        y_logistic_regression.append(1/(1 + np.exp(-z[i])))
        # print(y_logistic_regression[i])        
    return y_logistic_regression

## 3. Generate the `Cost Function` with `Gradient Descent` method

### Generate the `objective function`

Define the following `objective function`.

\begin{equation*}
J(\theta) = \frac{1}{m}\sum_{i = 1}^m(-l^{(i)}log(\sigma(g(x^{(i)}, y^{(i)}; \theta))) - (1 - l^{(i)})log(1 - \sigma(g(x^{(i)}, y^{(i)}; \theta))))
\end{equation*}

In [0]:
def objective_function(y_logistic_regression):
    error = []
    for i in range(m):
        error.append((-label[i]) * np.log(y_logistic_regression[i]) - (1 - label[i]) * np.log(1 - y_logistic_regression[i]))
    return sum(error) / m

### Generate the `gradient descent`

Define the following `derivation`.
\begin{equation}
\frac{\partial g(x^{(i)}, y^{(i)}; \theta^{(t)})}{\partial \theta_k} = \frac{\partial \theta_0f_0(x, y))}{\partial \theta_k} + \frac{\partial \theta_1f_1(x, y))}{\partial \theta_k} + ... + \frac{\partial \theta_{k-1}f_{k-1}(x, y))}{\partial \theta_k}
\end{equation}

In [0]:
def derivation_g(n_data, k):
    return f[n_data][k]

Define the `learning rate`.

\begin{equation*}
\alpha  = 0.000012
\end{equation*}

In [0]:
learning_rate = 0.0000012

Define the following `gradient descent`.

\begin{equation*}
\theta_k^{(t+1)} := \theta_0^{(t)} - \alpha\frac{1}{m}\sum_{i = 1}^m(\sigma(g(x^{(i)}, y^{(i)}; \theta)) - l^{(i)})\frac{\partial g(x^{(i)}, y^{(i)}; \theta^{(t)})}{\partial \theta_k}, \quad\text{for all $k$} \\
\end{equation*}

In [0]:
def gradient_descent(weight, offset, y_logistic_regression):
    offset_error = []
    weight_error = []
    for i in range(15):
        weight_error.append([])
    weightPrime = []
    for i in range(m):
        regression_error = y_logistic_regression[i] - label[i]
        offset_error.append(regression_error * derivation_g(i, 0))
        for j in range(15):
            weight_error[j].append(regression_error * derivation_g(i, j + 1))
    offsetPrime = offset - learning_rate * sum(offset_error) / m
    for i in range(15):
        weightPrime.append(weight[i] - learning_rate * sum(weight_error[0]) / m)
    return weightPrime, offsetPrime

## 4. `Train` the input data

Define the initial `weight`$(\theta_1^{(0)}, \theta_2^{(0)}, ..., \theta_{k-1}^{(0)})$ and `offset`$(\theta_0^{(0)})$:

\begin{equation*}
\theta_0^{(0)} = 0, \theta_1^{(0)} = \theta_2^{(0)}  = ... = \theta_{k-1}^{(0)} = 0.
\end{equation*}

In [0]:
weight = []
offset = []
offset.append(0)
weight.append([])
for i in range(k-1):
    weight[0].append(0)

In [0]:
cost_convergence = []
theta_convergence = []

`Train` the `input data` with the `logistic regression` function above with the `gradient descent`. \\
Find optimal parameters $\theta$ using the `traing data`.

In [0]:
epoch = m * 50
sigma = []
j = []
i = 0
# print(logistic_regression(weight[i], offset[i]))
# print(objective_function(logistic_regression(weight[i], offset[i])))
sigma.append(logistic_regression(weight[i], offset[i]))
j.append(objective_function(logistic_regression(weight[i], offset[i])))
# print(j)

while i < epoch:
    i = i + 1

#    print(k)

    weight.append(gradient_descent(weight[i - 1], offset[i - 1], logistic_regression(weight[i - 1], offset[i - 1]))[0])
    offset.append(gradient_descent(weight[i - 1], offset[i - 1], logistic_regression(weight[i - 1], offset[i - 1]))[1])

#    print('weight: ', weight)
#    print('offset: ', offset)

#    sigma.append(logistic_regression(weight[k], offset[k]))
#    print('sigma: ', sigma)
#    print(offset[k])
#    print(round(j[k - 1], 2))
    j.append(objective_function(logistic_regression(weight[i], offset[i])))
#    print('j: ', j)
    if round(j[i], 6) == round(j[i - 1], 6):
        cost_convergence.append(i)
        if round(offset[i], 6) == round(offset[i - 1], 6):
            if round(weight[i][0], 6) == round(weight[i - 1][0], 6):
                if round(weight[i][1], 6) == round(weight[i - 1][1], 6):
                    theta_convergence.append(i)

theta_convergence.append(epoch)
cost_convergence.append(epoch)

# print(theta_convergence)
# print(cost_convergence)
# print('sigma: ', sigma)
# print('j: ', j)
# print('weight: ', weight)
# print('offset: ', offset)

## 5. Compute the `training accuracy`

Compute the `final training accuracy` in `number (%)`.

## 6. Visualize the `Classifier`

Generate the `Classifier`.

Visualize the obtained `classifier`, where the `boundary` of the `classifier` is defined by $\{(x, y) | \sigma(g(x, y ; \theta)) = 0.5\} = \{(x, y) | g(x, y ; \theta) = 0\}$.

## 7. **Results**

### 1. **Plot the training data**

Plot the `training data points` $(x, y)$ with their `labels` $l$ (in `blue` color for `label 0` and `red` color for `label 1`).

### 2. **Write down the high dimensional function $g(x, y; \theta)$**

Write down the `equation` for the `non-linear function` $g(x, y; \theta)$ used for the `classifier` in `LaTeX` format.

### 3. **Plot the training error**

Plot the `training error` $J(\theta)$ at `every iteration` of `gradient descent` until `convergence` (in `blue` color).

### 4. **Plot the training accuracy**

Plot the `training accuracy` at `every iteration` of `gradient descent` until `convergence` (in `red` color).

### 5. **Write down the final training accuracy**

Present the `final training accuracy` in `number (%)` at `convergence`.

### 6. **Plot the optimal clssifier superimposed on the training data**

Plot the `boundary` of the `optimal classifier` at `convergence` (in `green` color). \\
The `boundary` of the `classifier` is defined by $\{ (x, y) \mid \sigma(g(x, y ; \theta)) = 0.5 \} = \{ (x, y) \mid g(x, y ; \theta) = 0 \}$. \\
Plot the `training data points` $(x, y)$ with their `labels` $l$ superimposed on the illustration of the `classifier` using `contour` function in `python3`(in `blue` color for `label 0` and `red` color for `label 1`).