<a href="https://colab.research.google.com/github/AllyHyeseongKim/CAU11934_MachineLearning/blob/feature%2Fassignment11/assignment/11/assignment11.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment11: Text classification using neural networks

## 1. Load the input data (txt file)

### Mount the google drive

In [0]:
from google.colab import drive

drive.mount('/content/gdrive')

In [0]:
cd

In [0]:
cd ../content/gdrive/My Drive/Colab Notebooks/Machine Learning/assignment11

In [0]:
ls

### Load the Data

Load a set of the data from the given `data directory` (`'movie_review'`) consists of two sub-directories (`'pos'`) and (`'neg'`) for `positive` and `negative`, respectively. \\
Each sub-directory includes a `list of files` for `review texts`. \\
The `preprocessing` transforms each `text` into the `frequency information`. \\


The data `preprocessing` steps aim to transform `text data` into `informative quantity` with respect to the class. \\
It is allowed to use any embedding scheme to transform `text data` into `descriptors` using any libraries. \\

In [0]:
import numpy as np
import cupy as cp
import re
import nltk
from sklearn.datasets import load_files
nltk.download('stopwords')
nltk.download('wordnet')
import pickle
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import matplotlib.pyplot as plt


review_data = load_files(r"movie_review")
X, y = review_data.data, review_data.target

documents = []

stemmer = WordNetLemmatizer()

for sen in range(0, len(X)):
    # Remove all the special characters
    document = re.sub(r'\W', ' ', str(X[sen]))
    
    # remove all single characters
    document = re.sub(r'\s+[a-zA-Z]\s+', ' ', document)
    
    # Remove single characters from the start
    document = re.sub(r'\^[a-zA-Z]\s+', ' ', document) 
    
    # Substituting multiple spaces with single space
    document = re.sub(r'\s+', ' ', document, flags=re.I)
    
    # Removing prefixed 'b'
    document = re.sub(r'^b\s+', '', document)
    
    # Converting to Lowercase
    document = document.lower()
    
    # Lemmatization
    document = document.split()
    document = [stemmer.lemmatize(word) for word in document]
    document = ' '.join(document)
    
    documents.append(document)

vectorizer = CountVectorizer(max_features=1500, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))
X = vectorizer.fit_transform(documents).toarray()

tfidfconverter = TfidfTransformer()
X = tfidfconverter.fit_transform(X).toarray()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=False)

In [0]:
X_train, X_test, y_train, y_test = cp.array(X_train,dtype=float), cp.array(X_test,dtype=float), cp.array(y_train,dtype=float), cp.array(y_test,dtype=float)

In [0]:
num_train = len(X_train)
num_test = len(X_test)

## 2. Neural Network Architecture

```mermaid
(input layer : x)  --> (first hidden layer : y)  -->  (output layer : h)
```

```mermaid
(x)  -- fully connected : u -->  (y_)  -- sigmoid -->  (y)  -- fully connected : v -->  (h_)  -- sigmoid -->  (h)
```


### 2.1. Generate the Fully Connected Layer

Define the following `fully connected layer` with a `bias`.

\begin{equation*}
(output \ layer) = 1\times\theta_0^t + (input \ layer)_1\times\theta_1^t + (input \ layer)_2\times\theta_2^t + ... + (input \ layer)_{num \ input}^t, \quad\text{where, $t =$ (the number of the iteration of the layer)}
\end{equation*}

In [0]:
def fully_connected(num_input, num_output, weight, input_layer, num_image):
    output_layer  = cp.empty((num_image, num_output), dtype=float)
    input_reshaped = cp.ones((num_input + 1, num_image), dtype=float)
    input_reshaped[1:] = input_layer
    weight_reshaped = weight.reshape(num_output, num_input + 1)
    output_layer = cp.matmul(weight_reshaped, input_reshaped)
    return output_layer

### 2.2. Generate the Sigmoid Function as an Activation Function

#### Generate the `sigmoid function`

Define the following `sigmoid fuction` as an `activation fuction`.

\begin{equation*}
\sigma(z) = \frac{1}{1 + exp(-z)} \\
\sigma'(z) = \sigma(z)(1 - \sigma(z)) \\
\end{equation*}

In [0]:
def sigmoid(input_layer, num_image):
    matrix_ones = cp.ones_like(input_layer)
    output_layer  = cp.reciprocal(cp.add(matrix_ones, cp.exp(cp.negative(input_layer))))
    return output_layer

### 2.3. Generate the `Objective Function`

Define the `regularization parameter`.

\begin{equation}
\lambda = 0.5
\end{equation}

In [0]:
regularization_parameter = 0.5

Define the following `objective function`.

\begin{equation*}
J(\theta) = \frac{1}{m}​\sum_{i = 1}^m​\sum_{k = 0}^9​(−l_k^{(i)}​log(h_k^{(i)}​)−(1−l_k^{(i)}​)log(1−h_k^{(i)}​)) + \frac{\lambda}{2n}\sum_{j = 1}^n\theta_j^2, \\
\text{where,}\quad \theta_j \text{ denotes a model parameter where $j = 1, 2, ..., n$}, \theta = (u, v), \\
\lambda \text{ is a control parameter for the regularization based on the $L_2^2$-norm (weight decay)}, \\
n\text{ is the total number of all the model parameters over the entire neural network}, \\
\text{ and $h_k^{(i)}$ denotes the $k^{th}$ element of the output layer for $i^{th}$ sample data.}
\end{equation*}

In [0]:
def objective(output_layer, weight1, weight2, num_image, label):
    matrix_ones = cp.ones_like(output_layer)
    regularization = (cp.sum(cp.power(weight1, 2)) + cp.sum(cp.power(weight2, 2))) * regularization_parameter / (2 * n)
    loss = cp.sum(cp.subtract(cp.multiply(label, cp.sum(cp.log(cp.subtract(cp.reciprocal(output_layer), matrix_ones)), axis=0)), cp.sum(cp.log(cp.subtract(matrix_ones, output_layer)), axis=0))) / num_image
    loss = loss + regularization
    return loss

### 2.3. Generate the `Gradient Descent` (`Back-Propagation`)

Define the `learning rate`.

\begin{equation*}
\alpha  = 0.0001
\end{equation*}

In [0]:
learning_rate = 0.0001

Define the following `gradient descent`.

\begin{equation*}
\theta_k^{(t + 1)} := \theta_k^{(t)} - \alpha\frac{\partial J(\theta^{(t)})}{\partial \theta_k}, \quad\text{for all $k$}.
\end{equation*}

In [0]:
def gradient_descent2(weight):
    num_y = len(train_hidden_layer_y)
    num_h = len(train_output_layer_h)
    weight_reshaped = weight.reshape(num_h, num_y + 1)
    y_reshaped = cp.ones((num_y + 1, num_train), dtype=float)
    y_reshaped[1:] = train_hidden_layer_y
    matrix_partial_loss = cp.matmul(cp.subtract(train_output_layer_h, cp.tile(y_train, (num_h, 1))), cp.transpose(y_reshaped)) / num_train
    matrix_partial_loss = matrix_partial_loss + weight_reshaped * (regularization_parameter / n)
    weight_update = cp.subtract(weight_reshaped, learning_rate * matrix_partial_loss)
    return weight_update.reshape(1, -1)

In [0]:
def gradient_descent1(weight_u, weight_v):
    num_x = len(train_input_layer_x)
    num_y = len(train_hidden_layer_y)
    num_h = len(train_output_layer_h)
    weight_reshaped = weight_v.reshape(num_y, num_x + 1)
    x_reshaped = cp.ones((num_x + 1, num_train), dtype=float)
    x_reshaped[1:] = train_input_layer_x
    y_reshaped = cp.multiply(train_hidden_layer_y, 1 - train_hidden_layer_y)
    matrix_partial_loss = cp.matmul(cp.transpose(cp.matmul(cp.transpose(cp.subtract(train_output_layer_h, cp.tile(y_train, (num_h, 1)))), weight_u.reshape(num_h, num_y + 1)[:, 1:])) * y_reshaped, cp.transpose(x_reshaped)) / num_train
    matrix_partial_loss = matrix_partial_loss + weight_reshaped * (regularization_parameter / n)
    weight_update = cp.subtract(weight_reshaped, learning_rate * matrix_partial_loss)
    return weight_update.reshape(1, -1)

### 2.4. Compute the `Accuracy`

Compute the following `accuracy` in `number (%)`.
\begin{equation}
accuracy\ (\%) = \frac{\text{number of correct predictions}}{\text{total number of predictions}} \times 100
\end{equation}

In [0]:
def accuracy(output_layer, num_image, label):
    num_correct_predict = cp.count_nonzero(cp.equal(cp.around(output_layer), label))
    return (num_correct_predict / num_image) * 100

### 2.4. `Train` and `Test` the input data

Define the `initial conditions` of `weights` $(\theta_{0}^{(0)}, \theta_{1}^{(0)}, \theta_{2}^{(0)}, ..., \theta_{28\times28}^{(0)})$. \\
The `weights` ar following a `normal distribution` $\mathcal{N}(0, \sigma^2)$ with `mean` 0 and `standard deviation` some number.

Define the `standard deviation`. \\

\begin{equation*}
\sigma = 0.1
\end{equation*}

In [0]:
mean = 0
standard_deviation = 0.1

In [0]:
epoch = 10000

In [0]:
def initialize(size):
    weight = cp.random.normal(mean, standard_deviation, size)
    return weight

In [0]:
weight1 = cp.empty((epoch, 1501 * 36), dtype=float)
weight1[0] = initialize(1501 * 36)

In [0]:
weight2 = cp.empty((epoch, 37 * 1), dtype=float)
weight2[0] = initialize(37 * 1)

In [0]:
n = len(weight1[0]) + len(weight2[0])

`Train` the `train data` with the `Neural Network Architecture` above with the `gradient descent`. \\
Find `optimal parameters` $\theta$ using the `traing data` (the first `1,000 images`). \\
`Test` the `test data` with the `Neural Network Architecture` above with the `obtained parameters` $\theta$ from the `training process` using the `testing data` (the rest `9,000 images`).

In [0]:
train_input_layer_x = cp.empty((1500, num_train), dtype=float)
train_y = cp.empty((36, num_train), dtype=float)
train_hidden_layer_y = cp.empty((36, num_train), dtype=float)
train_h = cp.empty((1, num_train), dtype=float)
train_output_layer_h = cp.empty((1, num_train), dtype=float)
train_input_layer_x = cp.transpose(X_train)

In [0]:
test_input_layer_x = cp.empty((1500, num_test), dtype=float)
test_y = cp.empty((36, num_test), dtype=float)
test_hidden_layer_y = cp.empty((36, num_test), dtype=float)
test_h = cp.empty((1, num_test), dtype=float)
test_output_layer_h = cp.empty((1, num_test), dtype=float)
test_input_layer_x = cp.transpose(X_test)

In [0]:
y_pred_train = cp.empty((epoch, num_train), dtype=float)
train_loss = np.empty(epoch, dtype=float)
test_loss = np.empty(epoch, dtype=float)

In [0]:
y_pred_test = cp.empty((epoch, num_test), dtype=float)
train_accuracy = np.empty(epoch, dtype=float)
test_accuracy = np.empty(epoch, dtype=np.float)

In [0]:
for i in range(epoch):
    train_y = fully_connected(1500, 36, weight1[i], train_input_layer_x, num_train)
    train_hidden_layer_y = sigmoid(train_y, num_train)
    train_h = fully_connected(36, 1, weight2[i], train_hidden_layer_y, num_train)
    train_output_layer_h = sigmoid(train_h, num_train)
    y_pred_train[i, :] = cp.around(train_output_layer_h)[0]

    train_loss[i] = objective(train_output_layer_h, weight1[i], weight2[i], num_train, y_train)
    train_accuracy[i] = accuracy(train_output_layer_h, num_train, y_train)

    print("[", i + 1, "/", epoch, "]", "train loss: ", train_loss[i], ", train accuracy: ", train_accuracy[i])

    test_y = fully_connected(1500, 36, weight1[i], test_input_layer_x, num_test)
    test_hidden_layer_y = sigmoid(test_y, num_test)
    test_h = fully_connected(36, 1, weight2[i], test_hidden_layer_y, num_test)
    test_output_layer_h = sigmoid(test_h, num_test)
    y_pred_test[i, :] = cp.around(test_output_layer_h)[0]

    test_loss[i] = objective(test_output_layer_h, weight1[i], weight2[i], num_test, y_test)
    test_accuracy[i] = accuracy(test_output_layer_h, num_test, y_test)

    print("[", i + 1, "/", epoch, "]", "test loss: ", test_loss[i], ", test accuracy: ", test_accuracy[i])

    if i < (epoch - 1):
        weight2[i + 1] = gradient_descent2(weight2[i])[0]
        weight1[i + 1] = gradient_descent1(weight2[i], weight1[i])[0]

## 3. **Results**

### 3.1. **Plot the loss curve**

Plot the `training loss` at `every iteration` of `gradient descent` using the `training data` (in `blue` color). \\
Plot the `testing loss` at `every iteration` of `gradient descent` using the `testing data` (in `red` color). \\
The both `curves` should be presented in `one figure`.


In [0]:
plt.figure(figsize=(6, 6))
x_cost1 = np.arange(0, epoch)
x_cost2 = np.arange(0, epoch)
plt.xlabel('t (iteration)')
plt.ylabel('J(theta)')

plt.plot(x_cost1, train_loss[:epoch], color = 'blue', label = 'training loss')
plt.plot(x_cost2, test_loss[:epoch], color = 'red', label = 'testing loss')
plt.legend()

plt.show()

### 3.2. **Plot the accuracy curve**

Plot the `training accuracy` (%) at `every iteration` of `gradient descent` using the `training data` (in `blue` color). \\
plot the `testing accuracy` (%) at `every iteration` of `gradient descent` using the `testing data` (in `red` color). \\
The both `curves` should be presented in `one figure`.

In [0]:
plt.figure(figsize=(6, 6))
plt.xlabel('t (iteration)')
plt.ylabel('accuracy(%)')

plt.plot(x_cost1, train_accuracy[:epoch], color = 'blue', label = 'training accuracy')
plt.plot(x_cost2, test_accuracy[:epoch], color = 'red', label = 'testing accuracy')
plt.legend()

plt.show()

### 3.3. **Plot the quantitative results**

#### `Training results`
Print the `confusion matrix` using the `function confusion_matrix` based on the `training data`. \\
Print the `classification report` using the `function classification_report` based on the `training data`. \\
Print the `accuracy score` using the `function accuracy_score` based on the `training data`. \\

In [0]:
print("Confusion Matrix of training:")
print(confusion_matrix(y_pred_train[0].tolist(),y_train.tolist()))
print()
print("Classification Report of training:")
print(classification_report(y_train.tolist(), y_pred_train[0].tolist()))
print("Accuracy Score of training : ", accuracy_score(y_train.tolist(), y_pred_train[0].tolist()) * 100, "%")

#### `Testing results`

Print the `confusion matrix` using the `function confusion_matrix` based on the testing data. \\
Print the `classification report` using the `function classification_report` based on the `testing data`. \\
Print the `accuracy score` using the `function accuracy_score` based on the `testing data`.

In [0]:
print("Confusion Matrix of testing:")
print(confusion_matrix(y_pred_test[0].tolist(),y_test.tolist()))
print()
print("Classification Report of testing:")
print(classification_report(y_test.tolist(), y_pred_test[0].tolist()))
print("Accuracy Score of testing : ", accuracy_score(y_test.tolist(), y_pred_test[0].tolist()) * 100, "%")


### 3.4. **Testing accuracy**

In [0]:
print("Final training accuracy: ", train_accuracy[epoch - 1], "%")
print("Final testing accuracy: ", test_accuracy[epoch - 1], "%")