# Linear classifier

Classifier: SVM

$$
\argmin_{\theta} \sum_{i=1}^n l(max(0,1-y_if_{\theta}(x_i)),y_i) + \lambda \Omega_2(\theta)
$$

- Hinge loss 
- L2 Regularizer (Squared Euclidean Norm)
- Numeric solution using Stochastic Gradient Descent Method (SGD)
- ERM is about the *objective* (minimizing the empirical risk), while gradient descent is about the *method* (an algorithm to find the parameters that minimize the empirical risk).

1. Calculate stopping criterion

$$
\parallel \Theta^t - \Theta^{t+1} \parallel > \epsilon
$$ 

L2 Norm is used, hence:

$$
\|\Theta^t - \Theta^{t+1}\|_2 = \sqrt{(\Theta^t_1 - \Theta^{t+1}_1)^2 + (\Theta^t_2 - \Theta^{t+1}_2)^2 + \cdots + (\Theta^t_n - \Theta^{t+1}_n)^2}
$$

2. Claculate gradient for Hinge loss and L2 Regularizer

Hinge loss with linear model:

$$
max(0, 1-y_i(x_i*\theta_i))
$$

Gradient for a single instance:

$$
\nabla_{\theta} L(\theta) = \frac{\partial l(max(0, 1-y_i(x_i*\theta)))}{\partial \theta} + \frac{\lambda}{n}\frac{\partial \Omega_2(\theta)}{\partial \theta}
$$

Partial derivation of the Hinge loss:

$$
\nabla_{\theta} L(\theta) = 
\begin{cases}
0, & \text{if } 1 - y_i (\mathbf{x}_i \cdot \theta) \leq 0 \\
-y_i \mathbf{x}_i, & \text{if } 1 - y_i (\mathbf{x}_i \cdot \theta) > 0 
\end{cases}
$$


Partial derivation of the L2 Regularizer:
$$
\nabla_{\theta} \Omega_2(\theta) = \theta
$$

Hence, combination:
$$
\nabla_{\theta} L(\theta) = 
\begin{cases}
\frac{\lambda}{n}\theta, & \text{if } 1 - y_i (\mathbf{x}_i \cdot \theta) \leq 0 \\
-y_i \mathbf{x}_i + \frac{\lambda}{n}\theta, & \text{if } 1 - y_i (\mathbf{x}_i \cdot \theta) > 0 
\end{cases}
$$




3. Calculate step size alpha

$$
\alpha(t) = \frac{\alpha_0}{1 + \text{decay\_rate} \cdot t}
$$

## Learn the classifier


In [1]:
import numpy as np
from scipy.io import loadmat
from matplotlib import pyplot as plt
from sklearn.model_selection import ParameterGrid

import torch
import pickle

import sys
sys.path.append('..')
from linear_classifier_helper import LinearClassifierHelper
from dataset import BaseDataset, FeatureEngineeredDataset

file_path = "../../data/laser.mat"
mat_dict = loadmat(file_path)

dataset = FeatureEngineeredDataset(mat_dict, "X", "Y", "r2")

In [2]:
# param_grid = {
#     'lambda_value': [0.001, 0.01, 0.1, 1.0, 10.0],
#     'alpha_0': [0.001, 0.01, 0.1],
#     'decay_rate': [0.001, 0.01, 0.1],
#     'epsilon': [1e-4, 1e-5, 1e-6]
# }

# def train_model_with_params(params):
#     return LinearClassifierHelper.reg_erm_stoch(dataset.inputs, dataset.labels, 
#                                                 lambda_value=params['lambda_value'], 
#                                                 epsilon=params['epsilon'], 
#                                                 alpha_0=params['alpha_0'], 
#                                                 decay_rate=params['decay_rate'])

# best_params = None
# best_score = float('-inf')

# for params in ParameterGrid(param_grid):
#     score = train_model_with_params(params)
#     if score > best_score:
#         best_score = score
#         best_params = params

# print("Best parameters found:", best_params)


In [3]:
optimal_theta = LinearClassifierHelper.reg_erm_stoch(dataset.inputs, dataset.labels, lambda_value = 0.001, epsilon = 1e-05, alpha_0 = 0.1, decay_rate = 0.001)
print(optimal_theta)

[1.38743536]


In [4]:
raw_predictions = LinearClassifierHelper.predict(dataset.inputs, optimal_theta)

accuracy = LinearClassifierHelper.evaluate(raw_predictions, dataset.labels)
print(accuracy)

0.5
