# Squared hinge SVM

Consider the soft-margin SVM with the **squared** hinge loss:
$$
\begin{align}
\min_{\mathbf w\in \mathbb R^d, b\in\mathbb R}  ~ \frac{1}{2} \|\mathbf w\|_2^2 + C\sum_{i=1}^n \max\{ 1- y_i(\mathbf w^\top \mathbf w_i + b) , 0 \}^{\textcolor{red}{2}},
\end{align}
$$
where $\mathbf x_i \in \mathbb R^d$, $y_i \in \{\pm 1\}$, and $\|\mathbf w\|_2 := \sqrt{\sum_{j=1}^d w_j^2}$ is the Euclidean norm.


With the set of data points and their label:
$$
\{x,y\}=\{
    [(1,2),-1],[(2,1),1],[(3,2),-1],[(3,1),1]
    \}
$$
using the primal equation $y_i(w^Tx_i+b)\ge1$, we have the following:
$$
\begin{cases}
w_0+2w_1+b \le -1 \\ 2w_0+w_1+b \ge 1 \\ 3w_0+2w_1+b \le -1 \\ 3w_0+w_1+b \ge 1
\end{cases}
$$

(3) - (4) gives us $w_1\le -2$. We let $w_1=2$, then one feasible solution is $w_1=2,w_0=0,b=3$ after substituting $w_1$ into (1) and (2).

Since $\Vert w\Vert^2=w_1^2+w_2^2\ge w_2^2=4$, our solution above has objective value $2=\min\ \frac{1}{2}\Vert w\Vert^2$, thus our solution is an optimal solution.

## Stochastic Gradient Descent for SVM

We first compute the gradient with respect to $\mathbf w$ and $b$ for each second term in (1):
$$
\begin{align}
C (1-y_i \hat y_i)_+^2, ~~ \text{where} ~~ \hat y_i = \mathbf w^\top \mathbf w_i + b ~~\text{ and }~~ (t)_+^2 :=[ \max\{t, 0\}]^2.
\end{align}
$$

We have $C(1-y_i\hat{y_i})_+^2=C(1-y_i(w^Tx_i+b))_+^2$, and
$$
\begin{align*}
\frac{\partial}{\partial w}&=\begin{cases}
-2C(1-y_i(w^Tx_i+b))y_ix_i & 1-y_i\hat{y_i}> 0\to y_i\hat{y_i}< 1 \\ 0 & 1-y_i\hat{y_i}\le 0\to y_i\hat{y_i}\ge 1 
\end{cases} \\
\frac{\partial}{\partial b}&=\begin{cases}
-2C(1-y_i(w^Tx_i+b))y_i & y_i\hat{y_i}< 1 \\ 0 & y_i\hat{y_i}\ge 1 
\end{cases}
\end{align*}
$$

In [None]:
import numpy as np
from numpy import linalg as LA
import matplotlib.pyplot as plt
import pandas as pd
import time
from sklearn.preprocessing import StandardScaler
import math

In [66]:
def sgd_svm(X, Y, w, b, C, max_pass, step_size):
    n = len(Y)
    for _ in range(max_pass):
        # prev_w, prev_b = w, b
        for i in range(n):
            y_hat = np.dot(X[i], w) + b
            # y_hat = 1 if y_hat >= 0 else -1
            # print(y_hat)
            if Y[i] * y_hat <= 1:
                d = -2 * C * (1 - Y[i] * y_hat) * Y[i]
                # print(d)
                w -= d * X[i] * step_size / n
                b -= d * step_size / n
            w = w / (1 + step_size)
    return w, b

In [67]:
toy_x = np.array([[1,2],[2,1],[3,2],[3,1]])
toy_y = np.array([-1,1,-1,1])
w, b = sgd_svm(toy_x, toy_y, np.zeros(len(toy_x[0])), 0, 360, 500, 0.001)
print(f"The w and b value we got from the soft-margin SVM with C = 360 is:")
print(f"w: {w}")
print(f"b: {b}")

The w and b value we got from the soft-margin SVM with C = 360 is:
w: [ 0.04797872 -2.00850051]
b: 2.915148269391086
