# Support Vector Machines (SVM)

## Linear SVM

### Model 

$$
    y = sign(\textbf w^{T}\textbf x + b)
$$

where $\textbf w$ is the weight vector, $\textbf x$ is the feature matrix, $b$ is the bias and $sign()$ is the signun function.

### Loss ( Hinge Loss ) with L2 Regularization

$$
    J(\textbf w, b) = \frac {1} {2} ||\textbf w||^2 + C \sum \limits_{i=1}^{n} ( max(0, [1 - \textbf y^{(i)}(\textbf w^{T}\textbf x^{(i)} + b)]))
$$


### Partial Derivative of the Loss Function

$$
   \frac {\partial} {\partial \textbf w} J(\textbf w, b) = \textbf w + C \sum \limits_{i=1}^{n} \mathbb 1(1 - \textbf y^{(i)}(\textbf w^{T}\textbf x^{(i)} + b) > 0)\textbf y^{(i)} \textbf x^{(i)}
$$

$$
   \frac {\partial} {\partial b} J(\textbf w, b) = C \sum \limits_{i=1}^{n} \mathbb 1(1 - \textbf y^{(i)}(\textbf w^{T}\textbf x^{(i)} + b) > 0)\textbf y^{(i)}
$$





In [1]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_classification, make_blobs

sns.set_theme(style="whitegrid")

In [9]:
class softSVM:
    def __init__(self, C):
        self.support_vectors = None
        self.C = C
        self.b = None
        
    def __decision_function(self, X):
        return X @ self.w + self.b
    
    def __loss(self, margin):
        return 0.5 * self.w.dot(self.w) + self.C * np.sum(np.maximum(0, 1 - margin))
    
    def __margin(self, X, y):
        return y * self.__decision_function(X)
    
    def fit(self, X, y, lr=1e-3, epochs=500):
        n, d = X.shape
        self.w = np.random.randn(d)
        self.b = 0
        
        losses = []
        for _ in range(epochs):
            margin = self.__margin(X, y)
            loss = self.__loss(margin)
            losses.append(loss)
            
            misclassified_pts_idx = np.where(margin < 1)[0]
            d_w = self.w - self.C * y[misclassified_pts_idx].dot(X[misclassified_pts_idx])
            self.w = self.w - lr * d_w
            
            d_b = - self.C * np.sum(y[misclassified_pts_idx])
            self.b = self.b - lr * d_b
            
            self.support_vectors = np.where(margin <= 1)[0]
    
    def predict(self, X):
        return np.sign(self.__decision_function(X))
    
    def score(self, X, y):
        p = self.predict(X)
        return np.mean(y == p)

Note that usually we calculated other evaluation metrics such as f1-score, precision, recall, etc. instead of mean as mentioned above

**Testing the `softSVM()` class**

In [18]:
X, Y = make_blobs(n_samples=100, n_features=2, centers=[[0,0],[6,6]], cluster_std=1, random_state=12)
Y = np.where(Y == 0, -1, 1)

print(X.shape, Y.shape)

svm = softSVM(C=1)
svm.fit(X, Y)

print(svm.w, svm.b)
print(svm.support_vectors)

(100, 2) (100,)
[0.32549484 0.27311968] -1.7029999999999719
[ 0 52 55 74]


Adding some noise to the data. Also note that **as the value of C increases, them margin gets smaller and value of b increases**

In [20]:
X, Y = make_blobs(n_samples=60, n_features=2, centers=2, cluster_std=1.1, random_state=0)
Y = np.where(Y == 0, -1, 1)

print(Y)

svm = softSVM(C=100)  
svm.fit(X, Y)

print(svm.w, svm.b)

newExample = np.array([-2.2,2.2])
svm.predict(newExample)

[ 1 -1 -1 -1 -1  1  1  1  1  1  1 -1  1 -1 -1 -1  1 -1  1  1 -1 -1 -1 -1
  1 -1 -1 -1 -1 -1  1  1  1 -1 -1 -1  1 -1  1  1  1 -1  1 -1 -1 -1  1  1
  1  1  1 -1  1 -1 -1  1  1 -1  1  1]
[ 2.9132867  -7.72155226] 12.200000000000012


-1.0

**`softSVM()` will not work well if you have non-linear decision boundaries. In such cases, use `kernelSVM()`**

## Kernel SVM

From $\mathcal{L}_d$, the ( Wolfe ) dual for soft SVM is as follows :

$$
    \mathcal{L}_d(\alpha) = \sum \limits_{i=1}^{n} \alpha^{(i)} - \frac {1} {2} \sum \limits_{i=1}^{n} \sum \limits_{k=1}^{n} \langle \alpha^{(i)} \textbf y^{(i)} \textbf x^{(i)}, \alpha^{(k)}, \textbf y^{(k)}, \textbf x^{(k)} \rangle
$$

where $\langle u,v \rangle$ is the inner product ( here, dot product )

**Subject to constrains $\forall i \in 1...n$ :**
- $0 \le \alpha^{(i)} \le C$
- $\sum \limits_{i=1}^{n} \alpha^{(i)} \textbf y^{(i)} = 0$


### Kernels

**Linear Kernel** :
$$
    K(\textbf x^{i}, \textbf x^{(j)}) = \textbf x^{(i)^T} \textbf x^{(j)}
$$

**Polynomial Kernel** :
$$
    K(\textbf x^{i}, \textbf x^{(j)}) = \left ( 1 + \textbf x^{(i)^T} \textbf x^{(j)} \right )^{d}
$$

**Radial Basis Functions (RBF) Kernel** :
$$
    K(\textbf x^{i}, \textbf x^{(j)}) = e^{\left ( - \frac {(\textbf x^{(i)} - \textbf x^{(j)})^2} {\sigma^2} \right )}
$$



In [69]:
class kernelSVM():
    def __init__(self, C=0, kernel='rbf', sigma=0.1, degree=2):
        self.C = C
        if kernel == 'poly':
            self.kernel = self.__polynomial_kernel
            self.degree = degree
        else:
            self.kernel = self.__rbf_kernel
            self.sigma = sigma
        self.alpha = None
        self.b = 0
    
    def __rbf_kernel(self, X1, X2):
        return np.exp(-(1/self.sigma ** 2) * np.linalg.norm(X1[:, np.newaxis] - X2[np.newaxis, :], axis=2) ** 2)
    
    def __polynomial_kernel(self, X1, X2):
        return (self.C + X1.dot(X2.T)) ** self.degree
    
    def fit(self, X, y, lr=1e-3, epochs=100):
        self.alpha = np.random.random(X.shape[0])
        self.X = X
        self.y = y
        self.b = 0
        y_iy_kx_ix_k = np.outer(y, y) * self.kernel(X, X)
        
        losses = []
        for _ in range(epochs):
            gradient = np.ones(X.shape[0]) - y_iy_kx_ix_k.dot(self.alpha)
            self.alpha += lr * gradient
            self.alpha[self.alpha > self.C] = self.C
            self.alpha[self.alpha < 0] = 0
            
            loss = np.sum(self.alpha) - 0.5 * np.sum(np.outer(self.alpha, self.alpha) * y_iy_kx_ix_k)
            losses.append(loss)
        index = np.where((self.alpha > 0) & (self.alpha < self.C))[0]
        b_i = y[index] - (self.alpha * y).dot(self.kernel(X, X[index]))
        self.b = np.mean(b_i)
    
    def __decision_function(self, X):
        return (self.alpha * self.y).dot(self.kernel(self.X, X)) + self.b
    
    def predict(self, X):
        return np.sign(self.__decision_function(X))
    
    def score(self, X, y):
        y_hat = self.predict(X)
        return np.maen(y == y_hat)

**Testing the `kernelSVM()` class**

In [72]:
from sklearn.datasets import make_circles

X, y = make_circles(500, factor=0.5, noise=0.08)
y[y == 0] == -1

svm = kernelSVM(C=1.0, kernel='poly', degree=2)
svm.fit(X, y, lr=1e-3)

print(svm.C)
print(svm.b)

# print(svm.alpha)

1.0
-1.3634304244320172
