---

# The Learning Framework:

The model we will use is a simple hyperplane. This plane will represent the decision boundary through the data space, separating positive from negative ratings.

1. __Hyperplane model__. The model should be a hyperplane as $f(x, \omega) = sgn(\omega^\top x)$, where $sgn(\cdot)$ is the sign function. Note that $x_0$ in this notation is the pseudo input 1. When evaluated, this gives us the predicted results as $\hat y_i = f(X_i, \omega^{(t)})$, where $\omega^{(t)}$ is the parameter vector at optimization iteration t and $\omega^{(0)}$ is the initial guess for the parameter vector.

2. __Objective function__. The loss function for our model is the hinge loss, which will be used together with $l_2$ regularization. <br> <br> $\mathfrak{L}(X, y, \omega) = \frac{\lambda}{2} ||\omega||^2 + \sum_{i=1}^{|X|} \max(0, 1-y_i \cdot \omega^\top X_i)$. <br> <br> Regularization is done by adding a norm on the parameter vector and including that in the objective function. A shorter parameter vector gives a larger margin for this model. The $l_2$ norm is defined as $\sqrt{\sum_{i=1}^n \omega_i^2}$. The regularization always has some positive attenuation parameter $\lambda \in \mathbb{R}$ keeping it from dominating the objective function. It symbolizes a trade-off between a more accurate classification and wider margins, while also giving the objective function a unique solution.

3. __Gradient descent__. The update for gradient descent looks like $\omega^{(t)} = \omega^{(t-1)} - \gamma \nabla \mathfrak{L}(\omega^{(t-1)})$, where the update gradient is defined as $\nabla \mathfrak{L} = \left ( \frac{\partial \mathfrak{L}}{\partial \omega_1}, \frac{\partial \mathfrak{L}}{\partial \omega_2}, \ldots, \frac{\partial \mathfrak{L}}{\partial \omega_n} \right )^\top$. The expression for this gradient $\nabla \mathfrak{L}$ is given analytically as: <br> <br> $\nabla \mathfrak{L}(X, y, \omega) = \lambda \omega + \sum_{i=1}^{|X|}
\begin{cases}
0 &amp; \text{if } y_i \omega^\top X_i \geq 1\\
-y_i X_i &amp; \text{else}
\end{cases}$

In the expression for the gradient, $X_i$ is a vector and $y_i$ is a scalar. These two refer to the i:th data point and its label. The learning rate $\gamma \in \mathbb{R}$ acts as a scaling/dampening factor on the gradient update. This should run until some stopping criteria is met (e.g., $\omega^{(t+1)}\approx \omega^{(t)}$). The default stopping criterion for SGDClassifier is when **_loss_<sub>_current_</sub> &gt; _loss_<sub>_best_</sub>** - .001 for five consecutive iterations.

---

In [None]:
!pip install wget

Collecting wget
  Downloading wget-3.2.zip (10 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25l[?25hdone
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9657 sha256=dac655d4e6ebe1cca72981ab33640783fa7d81d0f7c681a7c34fa54058b0d4f5
  Stored in directory: /root/.cache/pip/wheels/8b/f1/7f/5c94f0a7a505ca1c81cd1d9208ae2064675d97582078e6c769
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2


In [None]:
# Downloading the data

import wget
url = "http://www.cs.cornell.edu/people/pabo/movie-review-data/review_polarity.tar.gz"
data = wget.download(url)
!tar zxf review_polarity.tar.gz

In [None]:
# Part 1: Parsing the dataset

import os
import pathlib
import numpy as np

DEFAULT_FOLDER_NAME = "txt_sentoken/"

def parse_dataset():
    reviews_lst, labels_lst = [], []
    current_dir = os.path.join(str(pathlib.Path().absolute()), DEFAULT_FOLDER_NAME)

    for folder in os.listdir(current_dir): # 'pos' folder or 'neg' folder
        sub_path = os.path.join(current_dir, folder) # Update path
        pos = folder == 'pos'  # negative or positive
        label = 1 if pos else -1

        for file in os.listdir(sub_path):
            _sub_path = os.path.join(sub_path, file) # Update path
            with open(_sub_path, encoding='utf-8') as f: # Open file(s) with proper encoding
                reviews_lst.append(f.read())
                labels_lst.append(label)

    return reviews_lst, np.asarray(labels_lst)

X_raw, y = parse_dataset()

In [None]:
import re
import random

RE_PATTERN = r"('(?=\s)|''|``|--|[&$%,;:.!?\)\(\"]|\d+[\.,]\d+|\w+(?=n't)|n't|\w+(?=')|'\w+|\w+\-\w+|[A-Z][a-z]+\.(?=\s[A-Z])|(?:[A-Za-z]\.)+|\w+)"
STOPWORDS = {'.', ',', '?', '"', '``', "''", "'", '--', '-', ':', ';', '(', ')', 'i', 'a', 'about', 'after', 'all', 'also', 'an', 'any',
             'are', 'as', 'at', 'and', 'be', 'being', 'because', 'been', 'but', 'by', 'can', "'d", 'did', 'do', "don'", 'don', 'for',
             'from', 'had', 'has', 'have', 'he', 'her', 'him', 'his', 'how', 'if', 'is', 'in', 'it', 'its', "'ll", "'m", 'me',
             'more', 'my', 'n', 'of', 'on', 'one', 'or', "'re", "'s", "s", 'said', 'say', 'says', 'she', 'so', 'some', 'such', "'t",
             'than', 'that', 'the', 'them', 'they', 'their', 'there', 'this', 'to', 'up', 'us', "'ve", 'was', 'we', 'were', 'what',
             'when', 'where', 'which', 'who', 'will', 'with', 'you', 'your', '&', "n't"}

ordered_vocabulary = set()

print('Tokenizing the data...')
for review in X_raw:
    tokens = [token.lower() for token in re.findall(RE_PATTERN, review.strip()) if token.lower() not in STOPWORDS]
    ordered_vocabulary.update(tokens)

ordered_vocabulary = sorted(ordered_vocabulary)
print('Tokenization complete!')

X = np.zeros((len(y), len(ordered_vocabulary)))

print('Populating matrix...')
for j, review in enumerate(X_raw):
    tokens_in_review = set(token.lower() for token in re.findall(RE_PATTERN, review.strip()) if token.lower() not in STOPWORDS)
    X[j, :] = [1 if word in tokens_in_review else 0 for word in ordered_vocabulary]

print('Matrix is populated!')

Tokenizing the data...
Tokenization complete!
Populating matrix...
Matrix is populated!


In [None]:
# Testing 'Part 2: Feature extraction'

import numpy as np
assert len(X_raw) == 2000
assert np.all([isinstance(x, str) for x in X_raw])
assert len(X_raw) == y.shape[0]
assert len(np.unique(y)) == 2
assert y.min() == -1
assert y.max() == 1

In [None]:
# Shuffle 'X' and 'y' in the same way
p = np.random.permutation(len(X))
X, y = X[p], y[p]

# Reshape 'y'
y = y.reshape(-1, 1)  # Use -1 to infer the size of the first dimension automatically

# Get training and test sets
divisor = int(X.shape[0] * 0.8)
X_train, X_test, y_train, y_test = X[:divisor], X[divisor:], y[:divisor], y[divisor:]

In [None]:
# Part 3: Learning framework

class Model:
    def __init__(self, lamb, learning_rate, max_iterations=1000):
        self.lamb = lamb
        self.learning_rate = learning_rate
        self.max_iterations = max_iterations


    def fit(self, X, y):
        n_samples, n_features = X.shape
        weights = np.zeros(n_features)
        consecutive, best_loss = 0, 0

        while consecutive < 5:
            for _ in range(self.max_iterations):
                gradients, loss = self.__gradients(weights, X, y)
                weights_g = self.lamb * weights + gradients
                weights = weights - self.learning_rate * weights_g
                if loss > best_loss - 0.001:
                    consecutive += 1
                best_loss = loss
        return weights


    def predict(self, X, weight):
        prediction_lst = []
        for feature in X:
            score = np.dot(weight, feature)
            if score > 0:
                prediction_lst.append(1)
            else:
                prediction_lst.append(-1)
        return np.array(prediction_lst)


    def score(self, X, y, weight):
        prediction, correct_num = self.predict(X, weight), 0
        for index, score in enumerate(prediction):
            if score == y[index][0]:
                correct_num += 1
        return float(correct_num / len(y))


    def __gradients(self, weights, X, y):
        n_samples, n_features = X.shape
        gradients= np.zeros(n_features)
        loss, norm = 0, 0

        for j, feature in enumerate(X):
            g_step = y[j][0] * np.vdot(weights, feature)
            if 1 - g_step > 0:
                loss += 1 - g_step
            if y[j] * np.vdot(weights, feature) < 1:
                gradients += -y[j][0] * feature

        for w in weights:
            norm += w ** 2
        norm = (norm ** 0.5) ** 2
        loss += norm * self.lamb / 2

        return gradients, loss

In [None]:
# # Part 3: Learning framework
# # Improved Upon Code

# class Model:
#     def __init__(self, lamb, learning_rate, max_iterations=1000):
#         self.lamb = lamb
#         self.learning_rate = learning_rate
#         self.max_iterations = max_iterations

#     def fit(self, X, y):
#         n_samples, n_features = X.shape
#         weights = np.zeros(n_features)
#         consecutive, best_loss = 0, float('inf')

#         while consecutive < 5 and self.max_iterations > 0:
#             for _ in range(self.max_iterations):
#                 gradients, loss = self.__gradients(weights, X, y)
#                 weights_g = self.lamb * weights + gradients
#                 weights = weights - self.learning_rate * weights_g
#                 if best_loss - loss < 0.001:
#                     consecutive += 1
#                 else:
#                     consecutive = 0
#                 best_loss = loss
#                 self.max_iterations -= 1
#         return weights

#     def predict(self, X, weight):
#         prediction_lst = []
#         for feature in X:
#             score = np.dot(weight, feature)
#             if score > 0:
#                 prediction_lst.append(1)
#             else:
#                 prediction_lst.append(-1)
#         return np.array(prediction_lst)

#     def score(self, X, y, weight):
#         prediction = self.predict(X, weight)
#         correct_num = np.sum(prediction == y[:, 0])
#         return float(correct_num / len(y))

#     def __gradients(self, weights, X, y):
#         n_samples, n_features = X.shape
#         gradients = np.zeros(n_features)
#         loss, norm = 0, 0

#         for j, feature in enumerate(X):
#             g_step = y[j][0] * np.dot(weights, feature)
#             if 1 - g_step > 0:
#                 loss += 1 - g_step
#             if y[j][0] * np.dot(weights, feature) < 1:
#                 gradients += -y[j][0] * feature

#         for w in weights:
#             norm += w ** 2
#         loss += norm * self.lamb / 2

#         return gradients, loss

In [None]:
model = Model(0.0001, 0.0001)
weight = model.fit(X_train, y_train)
score = model.score(X_test, y_test, weight)
print(f'Learning rate: {0.0001}, Lambda: {0.0001}, Accuracy: {score}')

Learning rate: 0.0001, Lambda: 0.0001, Accuracy: 0.8525


In [None]:
# # Part 4: Exploring hyperparameters

# learning_rate_list = [0.0001, 0.0003] #, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1.0, 3.0]
# lamb_list = [0.0001, 0.0003] #, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1.0, 3.0]

# for learning_rate in learning_rate_list:
#     for lamb in lamb_list:
#         model = Model(lamb, learning_rate)
#         weight = model.fit(X_train, y_train)
#         score = model.score(X_test, y_test, weight)
#         print(f'Learning rate: {learning_rate}, Lambda: {lamb}, Accuracy: {score}')
#         del model