<a href="https://colab.research.google.com/github/AllyHyeseongKim/CAU11934_MachineLearning/blob/master/assignment/07/assignment07.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment07: Logistic regression for a binary classification with a regularization

## 1. Load the input data (text file)

### Mount the google drive

In [0]:
from google.colab import drive

drive.mount('/content/gdrive')

In [0]:
cd

In [0]:
cd ../content/gdrive/My Drive/Colab Notebooks/Machine Learning/assignment07

In [0]:
ls

### Load the Data

Load a set of the data $\{ (x^{(i)}, y^{(i)}, l^{(i)}) \}$ from the given `text file` (`'data-nonlinear.txt'`) for training. \\
Each row $\{ (x^{(i)}, y^{(i)}, l^{(i)}) \}$ of the data consists of a 2-dimensional point $(x, y)$ with its label $l$, $\text{where $x, y \in \mathbb{R}$ and $l \in \{0, 1\}$}$. \\
Plot the set of points $\{ (x^{(i)}, y^{(i)}) \}$ that are loaded from `'data-nonlinear.csv'` file. \\

In [0]:
import numpy as np
import matplotlib.pyplot as plt

data    = np.genfromtxt("data-nonlinear.txt", delimiter=',')

x_train  = data[:, 0]
y_train  = data[:, 1]
label   = data[:, 2]

m = len(label)

pointX0 = x_train[label == 0]
pointY0 = y_train[label == 0]

pointX1 = x_train[label == 1]
pointY1 = y_train[label == 1]

plt.figure(figsize=(8, 8))
plt.scatter(pointX0, pointY0, c='b')
plt.scatter(pointX1, pointY1, c='r')
plt.tight_layout()
plt.gca().set_aspect('equal', adjustable='box')
plt.show()

## 2. Generate the Logistic Regression Model

### Generate the `logistic regression`

Define the following `logistic regression` with a `high dimensional function feature function`.

\begin{equation*}
\hat{h} = \sigma(z) \\
z = g(x, y; \theta), \quad\text{where $g$ is a high dimensional function and $\theta \in \mathbb{R}^{100}$} \\
\theta = (\theta_{0, 0}, \theta_{0, 1}, ..., \theta_{9, 9}) \\
g(x, y; \theta) = \sum_{i = 0}^9\sum_{j = 0}^9\theta_{i, j}x^iy^j \\
\sigma(z) = \frac{1}{1 + exp(-z)} \\
\sigma'(z) = \sigma(z)(1 - \sigma(z)) \\
\end{equation*}

Define the `function` $g(x, y; \theta)$:
\begin{equation}
g(x, y; \theta) = \sum_{i = 0}^9\sum_{j = 0}^9\theta_{i, j}x^iy^j
\end{equation}

In [0]:
def g(weight):
    g = []
    for i in range(m):
        for j in range(10):
            for k in range(10):
                g_k = g_k + weight[10 * j + k] * (x_train[i] ** j) * (y_train[i] ** k)
        g.append(g_k)
    return g

Define the following `sigmoid function`.

\begin{equation*}
\hat{h} = \sigma(z), \text{ where }
\sigma(z) = \frac{1}{1 + exp(-z)}, \\
z = g(x, y; \theta) = \sum_{i = 0}^9\sum_{j = 0}^9\theta_{i, j}x^iy^j \\
\end{equation*}

In [0]:
def logistic_regression(weight):
    y_logistic_regression = []
    z = g(weight)
    for i in range(m):
        # print(z[i])
        y_logistic_regression.append(1/(1 + np.exp(-z[i])))
        # print(y_logistic_regression[i])        
    return y_logistic_regression

## 3. Generate the `Cost Function` with `Gradient Descent` method

### Generate the `objective function`

Define the `degree of regularization` by the `control parameter`, $\lambda$:

\begin{equation*}
\lambda_1 = 100, \text{  for over-fitting} \\
\lambda_2 = 10, \text{  for just-right} \\
\lambda_3 = 0, \text{  for under-fitting}
\end{equation*}

In [0]:
lambda1 = 100
lambda2 = 10
lambda3 = 0

Define the following `objective function` with a `regularization term`.

\begin{equation*}
J(\theta) = \frac{1}{m}\sum_{i = 1}^m[-l^{(i)}log(\sigma(g(x^{(i)}, y^{(i)}; \theta))) - (1 - l^{(i)})log(1 - \sigma(g(x^{(i)}, y^{(i)}; \theta)))] + \frac{\lambda}{2}\sum_{i = 0}^9\sum_{j = 0}^9\theta_{i, j}^2
\end{equation*}

In [0]:
def objective_function(lambda_value, weight, y_logistic_regression):
    error = []
    regularization = 0
    for i in range(10):
        for j in range(10):
            regularization = regularization + (weight[10 * i + j] ** 2)
    regularization = lambda_value / 2 * regularization
    for k in range(m):
        error.append((-label[k]) * np.log(y_logistic_regression[k]) - (1 - label[k]) * np.log(1 - y_logistic_regression[k]))
    return sum(error) / m + regularization

### Generate the `gradient descent`

Define the `learning rate`.

\begin{equation*}
\alpha  = 0.0009
\end{equation*}

In [0]:
learning_rate = 0.0009

Define the following `gradient descent`.

\begin{equation*}
\theta_{i,j}^{(t+1)} := \theta_{i,j}^{(t)} - \alpha[\frac{1}{m}\sum_{i = 1}^m(\sigma(g(x^{(i)}, y^{(i)}; \theta^{(t)})) - l^{(i)})\frac{\partial g(x^{(i)}, y^{(i)}; \theta^{(t)})}{\partial \theta_{i,j}} + \lambda\theta_{i,j}^{(t)}], \quad\text{for all $i, j$}, \\
\text{where,}\quad\frac{\partial g(x^{(i)}, y^{(i)}; \theta^{(t)})}{\partial \theta_{i,j}} = \frac{\partial\sum_{i = 0}^9\sum_{j = 0}^9\theta_{i, j}x^iy^j}{\partial \theta_{i,j}} = \frac{\partial \theta_{0,0}x^0y^0}{\partial \theta_{i,j}} + \frac{\partial \theta_{0,1}x^0y^1}{\partial \theta_{i,j}} + ... + \frac{\partial \theta_{9,9}x^9y^9}{\partial \theta_{i,j}}
\end{equation*}

In [0]:
def gradient_descent(lambda_value, weight, y_logistic_regression):
    weight_error = []
    for i in range(m):
        weight_error.append([])
    weightPrime = []
    for i in range(10):
        for j in range(10):
            for k in range(m):
                weight_error[10 * i + j].append((y_logistic_regression[k] - label[k]) * (x_train[k] ** i) * (y_train[k] ** j))
    for i in range(10):
        for j in range(10):
            weightPrime.append(weight[10 * i + j] - learning_rate * (sum(weight_error[10 * i + j]) / m + lambda_value * weight[10 * i + j]))
    return weightPrime

## 4. `Train` the input data

Define the initial `weight` and `offset`$(\theta_{0,0}^{(0)}, \theta_{0,1}^{(0)}, \theta_{0,2}^{(0)}, ..., \theta_{9,9}^{(0)})$:

\begin{equation*}
\theta_{0,0}^{(0)} = \theta_{0,1}^{(0)} = \theta_{0,2}^{(0)} = ... = \theta_{9,9}^{(0)} = 0
\end{equation*}

In [0]:
weight = []
weight.append([])
for i in range(100):
    weight[0].append(0)

In [0]:
cost_convergence = []
theta_convergence = []

`Train` the `input data` with the `logistic regression` function above with the `gradient descent`. \\
Find optimal parameters $\theta$ using the `traing data`.

In [0]:
epoch = m * 200
sigma = []
cost = []
i = 0
# print(logistic_regression(weight[i], offset[i]))
# print(objective_function(logistic_regression(weight[i], offset[i])))
sigma.append(logistic_regression(weight[i], offset[i]))
cost.append(objective_function(logistic_regression(weight[i], offset[i])))
# print(j)

while i < epoch:
    i = i + 1

#    print(k)

    weight.append(gradient_descent(weight[i - 1], offset[i - 1], sigma[i - 1])[0])
    offset.append(gradient_descent(weight[i - 1], offset[i - 1], sigma[i - 1])[1])

#    print('weight: ', weight)
#    print('offset: ', offset)

    sigma.append(logistic_regression(weight[i], offset[i]))
#    print('sigma: ', sigma)
#    print(offset[k])
#    print(round(j[k - 1], 2))
    cost.append(objective_function(logistic_regression(weight[i], offset[i])))
#    print('j: ', j)
    if cost[i] == cost[i - 1]:
        cost_convergence.append(i)
        if offset[i] == offset[i - 1]:
            for j in range(15):
                if weight[i][j] == weight[i - 1][j]:
                    theta_convergence.append(i)

theta_convergence.append(epoch)
cost_convergence.append(epoch)

# print(theta_convergence)
# print(cost_convergence)
# print('sigma: ', sigma)
# print('j: ', j)
# print('weight: ', weight)
# print('offset: ', offset)

## 5. Compute the `training accuracy`

Compute the `final training accuracy` in `number (%)` with varying values of the `regularization parameter` $\lambda$.
\begin{equation}
accuracy\ (\%) = \frac{\text{number of correct predictions}}{\text{total number of predictions}} \times 100
\end{equation}

In [0]:
index_minimum = cost.index(min(cost))

In [0]:
accuracy = []
i = 0
for i in range(epoch):
    correct_predictions = 0
    for j in range(m):
        if round(sigma[i][j], 2) == 0.5:
            correct_predictions = correct_predictions + 1
        else:
            if sigma[i][j] > 0.5:
                if label[j] == 0:
                    correct_predictions = correct_predictions + 1
            else:
                if label[j] == 1:
                    correct_predictions = correct_predictions + 1
    accuracy.append((correct_predictions / m) * 100)
# print(index_minimum)
index_accuracy_maximum = accuracy.index(max(accuracy))
# print(index_accuracy_maximum)
# print(accuracy[index_accuracy_maximum])
# print(accuracy[index_minimum-1])

In [0]:
# print(sigma[0])
# print(sigma[1])
# print(sigma[2])
# print(sigma[index_minimum-1])
# print(label)

In [0]:
# print(accuracy)

## 6. Visualize the `Classifier`

Generate the `Classifier`.

In [0]:
# print(offset[index_minimum])
# print(weight[index_minimum])

In [0]:
def classifier(x, y):
    val_k = f_k(x, y)
    optimal_g = offset[index_accuracy_maximum] * val_k[0]
    for i in range(k-1):
        optimal_g = optimal_g + weight[index_accuracy_maximum][i] * val_k[i + 1]
    return 1 / (1 + np.exp(-optimal_g))

Visualize the obtained `classifier`, where the `boundary` of the `classifier` is defined by $\{(x, y) | \sigma(g(x, y ; \theta)) = 0.5\} = \{(x, y) | g(x, y ; \theta) = 0\}$.

In [0]:
plt.figure(figsize=(8, 8))
x = np.arange(-1, 1.25, 0.01)
y = np.arange(-1, 1.25, 0.01)
X, Y = np.meshgrid(x, y)
z = classifier(X, Y)

CS = plt.contour(X, Y, z, [1/2], colors='green')
CS.clabel()
plt.scatter(pointX0, pointY0, c='b')
plt.scatter(pointX1, pointY1, c='r')
plt.show()

## 7. **Results**

### 1. **Plot the training data**

Plot the `training data points` $(x, y)$ with their `labels` $l$ (in `blue` color for `label 0` and `red` color for `label 1`).

In [0]:
plt.figure(figsize=(8, 8))
plt.scatter(pointX0, pointY0, c='b')
plt.scatter(pointX1, pointY1, c='r')
plt.tight_layout()
plt.gca().set_aspect('equal', adjustable='box')
plt.show()

### 2. **Plot the training error with varying regularization parameters**

Choose a value for $\lambda_1$​ in such a way that `over-fitting` is demonstrated and plot the `training error` $J(\theta)$ at `every iteration` of `gradient descent` until `convergence` (in `red` color). \\
Choose a value for $\lambda_2$ in such a way that `just-right` is demonstrated and plot the `training error` $J(\theta)$ at `every iteration` of `gradient descent` until `convergence` (in `green` color). \\
Choose a value for $\lambda_3$ in such a way that `under-fitting` is demonstrated and plot the `training error` $J(\theta)$ at `every iteration` of `gradient descent` until `convergence` (in `blue` color). \\
The above three curves should be presented `all together in a single figure`.

### 3. **Display the values of the chosen regularization parameters**

Display the value of the chosen $\lambda_1$​ for the demonstration of `over-fitting` (in `red` color). \\
Display the value of the chosen $\lambda_2$​ for the demonstration of `just-right` (in `green` color). \\
Display the value of the chosen $\lambda_3$​ for the demonstration of `under-fitting` (in `blue` color).

In [0]:
plt.figure(figsize=(8, 8))
x_out = np.arange(0, cost_convergence[0])
plt.xlabel('t (iteration)')

plt.plot(x_out, cost[:cost_convergence[0]], color = 'blue')

plt.show()

### 4. **Plot the training accuracy with varying regularization parameters**

Plot the `training accuracy` with the chosen $\lambda_1$​ for `over-fitting` at `every iteration` of `gradient descent` until `convergence` (in `red` color). \\
Plot the `training accuracy` with the chosen $\lambda_2$ for `just-right` at `every iteration` of `gradient descent` until `convergence` (in `green` color). \\
Plot the `training accuracy` with the chosen $\lambda_3$​ for `under-fitting` at `every iteration` of `gradient descent` until `convergence` (in `blue` color). \\
The above three curves should be presented `all together in a single figure`.

In [0]:
plt.figure(figsize=(8, 8))
x_out = np.arange(0, index_accuracy_maximum + 20)
plt.xlabel('t (iteration)')

plt.plot(x_out, accuracy[:index_accuracy_maximum + 20], color = 'red')

plt.show()

### 5. **Display the final training accuracy with varying regularization parameters**

Display the `final training accuracy` obtained with the chosen $\lambda_1$​ for `over-fitting` in number (%) at `convergence` (in `red` color). \\
Display the `final training accuracy` obtained with the chosen $\lambda_2$​ for `just-right` in number (%) at `convergence` (in `green` color). \\
Display the `final training accuracy` obtained with the chosen $\lambda_3$​ for `under-fitting` in number (%) at `convergence` (in `blue` color).

In [0]:
print(accuracy[index_accuracy_maximum], '(%)')

### 6. **Plot the optimal classifier with varying regularization parameters superimposed on the training data**

Plot the boundary of the `optimal classifier` with the chosen $\lambda_1$​ for `over-fitting` at `convergence` (in `red` color). \\
Plot the boundary of the `optimal classifier` with the chosen $\lambda_2$​ for `just-right` at `convergence` (in `green` color). \\
Plot the boundary of the `optimal classifier` with the chosen $\lambda_3$​ for `under-fitting` at `convergence` (in `blue` color). \\
The `boundary` of the `classifier` is defined by $\{ (x, y) \mid \sigma(g(x, y ; \theta)) = 0.5 \} = \{ (x, y) \mid g(x, y ; \theta) = 0 \}$. \\
The `boundaries` of the `classifiers` with `different regularization parameters` should be presented with the `training data points` $(x, y)$ with their `labels` $l$ (in `blue` color for label 0 and `red` color for label 1). \\
You can use `contour` function in python3.

In [0]:
plt.figure(figsize=(8, 8))
x = np.arange(-1, 1.25, 0.01)
y = np.arange(-1, 1.25, 0.01)
X, Y = np.meshgrid(x, y)
z = classifier(X, Y)

CS = plt.contour(X, Y, z, [1/2], colors='green')
plt.scatter(pointX0, pointY0, c='b')
plt.scatter(pointX1, pointY1, c='r')
plt.show()