# Support Vector Machines(SVM)

SVM is a very powerful tool capable of performing linear and non-linear classification, regression,and even outlier detection. 

In [26]:
# Linear SVM Classification

from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC, SVC
from sklearn import datasets
from sklearn.pipeline import Pipeline
import numpy as np

iris = datasets.load_iris()
iris.keys()

X = iris.data[:,(2,3)] # petal length and petal width
y = (iris['target'] == 2).astype('int')

svm_clf = Pipeline([
    ('scaler', StandardScaler()),
    ('linear_svc', LinearSVC(loss='hinge', C=1, dual=True))
])

svm_clf.fit(X,y)
pred = svm_clf.predict([[0.5,2]])
print("Prediction: {}".format(pred))
print('Predicted target name {}'.format(iris['target_names'][pred]))


Prediction: [0]
Predicted target name ['setosa']


0.9533333333333334

# The LinearSVC class regularizes the bias term, so you should center the training set first by subtracting its mean. This is automatic if you scale the data using the StandardScaler. Moreover, make sure you set the loss hyperparameter to "hinge", as it is not the default value. Finally, for better performance you should set the dual hyperparameter to False, unless there are more features than training instances

In [9]:
# OR we could use SVC but it much slower when we have much training sets
svc_clf = Pipeline([
    ('scaler', StandardScaler()),
    ('linear_svc', SVC(C=1, kernel='linear'))
])

svc_clf.fit(X,y)
svc_clf.predict([[1.5,2]])

array([0])

In [10]:
# OR
from sklearn.linear_model import SGDClassifier
m = 150 
C = 1
sdg_clf = SGDClassifier(loss='hinge', alpha=1/(m*C))
sdg_clf.fit(X,y)
sdg_clf.predict([[1.5,2]])

# This applies regular Stochastic Gradient Descent to train a linear SVM classifier.
# It also handles large training set that can fit into the memory

array([0])

# Non Linear SVM Classification
One approach to 
handling nonlinear datasets is to add more features, such as polynomial feature, in some cases this can result in a linearly separable dataset.s

In [11]:
from sklearn.datasets import make_moons
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures


X, y= make_moons()

polynomial_svm_clf = Pipeline([
    ('polynomial', PolynomialFeatures(degree=2)),
    ('scaler', StandardScaler()),
    ('svm_clf', LinearSVC(C=10, loss='hinge', dual=True))
])

polynomial_svm_clf.fit(X,y)
polynomial_svm_clf.predict([[2e-1, 4e-8]])

array([1], dtype=int64)

# Polynomial Kernel
Adding polynomial features is simple to implement and can work great with all sorts 
of Machine Learning algorithms (not just SVMs), but at a low polynomial degree i 
cannot deal with very complex datasets, and with a high polynomial degree it creat s
a huge number of features, making the model too slw.

In [12]:
from sklearn.svm import SVC

poly_kernal_svm_clf = Pipeline([
    ('scaler', StandardScaler()),
    ('svm_clf', SVC(kernel='poly', C=1, degree=3, coef0=1))
])

poly_kernal_svm_clf.fit(X, y)
poly_kernal_svm_clf.predict([[2e-1, 4e-8]])


array([1], dtype=int64)

# Adding Similarity Features:  Gaussian RBF
Another technique to tackle nonlinear problems is to add features computed using a similarity function that measures how much each instance resembles a particular landmark.

Mathematically

$\phi_γ(X,ℓ)$ = $exp(-\gamma||X - ℓ||^2)$

In [13]:
rbf_kernal_svm_clf = Pipeline([
    ('scaler', StandardScaler()),
    ('svm_clf', SVC(kernel='rbf', gamma=3, C=0.001))
])
rbf_kernal_svm_clf.fit(X,y)
rbf_kernal_svm_clf.predict([[2e-1, 4e-8]])

array([1], dtype=int64)

# Computational Complexity

| Class | Time Complexity | Out-of-core support | Scaling required | Kernel trick |
|:---------|:--------:|---------:| ---------:| ---------:|
|  LinearSVC  |  $O(m * n)$  |  No   |  Yes   |  No   |
|  SDGClassifier  |   $O(m * n)$   | Yes   |  Yes  | No |
|  SVC  |  $O(m^2 * n)$ to $O (m^3 * n)$    |  No   |   Yes   |  Yes   |

# SVM Regression
The SVM algorithm is quite versatile: not only does it support linear and nonlinear classification, but it also supports linear and nonlinear regression. The width of the street is controlled by a hyperparameter $ϵ$(which controls the margin). Higher value of $ϵ$ leads to more space margin while lower $ϵ$ leads to smaller margin. Adding more training instances within the margin does not affect the model’s predictions; thus, the model is said to be  $ϵ$-insensitive

_It is known as epsilon._ 

SVMs can also be used for outlier detection; see Scikit-Learn’s doc
umentation for more details.

In [17]:
from sklearn.svm import LinearSVR, SVR

In [18]:
svm_reg = LinearSVR(epsilon=1.5)
svm_reg.fit(X,y)



In [20]:
svm_poly_reg = SVR(degree=2, kernel='poly', C=100, epsilon=0.1)
svm_poly_reg.fit(X,y)

# Kernel
A kernel is a function capable of computing the dot product based only on the original vectors a and b, without having to compute (or even to know about) the transformation.

Common kernels

Linear: $K(a,b)$ = $a^Tb$

Polynomial: $K(a,b)$ = $(\gamma a^Tb+r)^d$

Gaussian RBF: $K(a,b)$ = $exp(-\gamma||a - b||^2)$

Sigmoid: $K(a,b)$ = $tanh(\gamma a^T b + r)$