## Chapter 5 - Support Vector Machines

### Nonlinear SVC

The support vector classifier is used when the boundary between the two classes are linear. However, in practice, we are sometimes faced with non-linear boundaries. In this case, we consider enlarging the feature space using the higher order features. E.g. rather than fitting a support vector classifier on $p$ features $\begin{pmatrix} X_1, \cdots, X_p\end{pmatrix}$, we add a polynomial (squared) feature and fit the support vector classifier on $2p$ features $\begin{pmatrix} X_1, X_1^2 \cdots, X_p, X_p^2\end{pmatrix}$. Now, the optimisation problem will be:

$$\underset{\beta_0, \beta_{11}, \beta_{12}, \cdots, \beta_{p1},\beta_{p2},\epsilon_1, \cdots, \epsilon_n}{\text{Maximise }}M \text{ s. t. }$$
$$\sum_{j=1}^p\sum_{k=1}^2 \beta_{jk}^2=1$$
$$y_i\begin{pmatrix}\beta_0 + \sum_{j=1}^p\beta_{j1}x_{ij} + \sum_{j=1}^p\beta_{j2}^2x^2_{ij} \end{pmatrix}\geq M(1-\epsilon_i)\,\,\forall i \in \{1,\cdots,n\}$$
$$\epsilon_i \geq 0\,\,\forall i \in \{1,\cdots,n\}\,\,, \sum_{i=1}^n \epsilon_i \leq C$$

In this enlarged feature space, the decision boundary is linear. However, in the original future space, the decision boundary is in the form $q(x)=0$ where $q$ is a quadratic polynomial, adn its solutions are generally non-linear. In extension, we can enlarge the feature space with higher polynomial terms or interaction terms.

In [1]:
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import (make_moons, load_iris)
from sklearn.preprocessing import (PolynomialFeatures, StandardScaler)
from sklearn.svm import LinearSVC, SVC, LinearSVR, SVR
from sklearn.model_selection import train_test_split

To achieve this in SKLearn, use `PolynomialFeatures` to transform before training.

In [2]:
X, y = make_moons(n_samples=500, noise=0.30, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

In [3]:
# Transform to 3rd degree polynomial features
polyfeatures1 = PolynomialFeatures(degree=3)
scaler1 = StandardScaler()
X_expt1 = polyfeatures1.fit_transform(X_train)
X_expt1 = scaler1.fit_transform(X_expt1)

# Train on polynomial features
clf_expt1 = LinearSVC(C=10, loss='hinge', max_iter=1000000)
clf_expt1.fit(X_expt1, y_train)

LinearSVC(C=10, class_weight=None, dual=True, fit_intercept=True,
          intercept_scaling=1, loss='hinge', max_iter=1000000,
          multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
          verbose=0)

It is not hard to see that there are endless ways to enlarge the feature space and can come up with many features. This computationally becomes unmanageable. The support vector machine allows us to enlarge the feature space used by the support vector classifier in a way that leads to efficient computations.

### Nonlinear SVM Classification - Using Kernels

The Support Vector Machine extends the support vector classifier that results from <u>enlarging the feature space using kernels</u>. This results in a method that is more efficient computationally.

The following from SKLearn implements this using a 3rd degree polynomial kernel.

In [4]:
# Use 3rd degree polynomial and then train SVC on it
clf_expt12= SVC(kernel='poly', degree=3, coef0=1, C=10)
clf_expt12.fit(X_train, y_train)

SVC(C=10, break_ties=False, cache_size=200, class_weight=None, coef0=1,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='poly',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [5]:
# # Using the RBF kernel
# clf4 = SVC(kernel='rbf', gamma=5, C=0.001)
# X_expt3_feats = scl2.fit_transform(X_train)
# clf4.fit(X_expt3_feats, y_train)

SV Regression (SVR)

In [6]:
# # Linear SVR, no polynomial features
# svr1 = LinearSVR(epsilon=1.5)
# svr1.fit(X_train, y_train)

In [7]:
# # Linear SVR, with polynomial features
# svr2 = SVR(kernel='poly', degree=2, C=100, epsilon=0.1)
# svr2.fit(X_train, y_train)