# Support Vector Machines

A Support Vector Machine (SVM) is a powerful and versatile Machine Learning
 model, capable of performing linear or nonlinear classification, regression, and even
 outlier detection. It is one of the most popular models in Machine Learning, and any
one interested in Machine Learning should have it in their toolbox. SVMs are partic
ularly well suited for classification of complex small- or medium-sized datasets.

##  Linear SVM Classification

You can think of an SVM classifier as fitting the
 widest possible stree between the classes. This is called large margin classification.t

##### Warning SVMs are sensitive to the feature scales.

### Soft Margin Classification

If we strictly impose that all instances must be off the street and on the right side, this
 is called hard margin classification.

There are two main issues with hard margin clas
sification. First, it only works if the data is linearly separable. Second, it is sensitive to
 outliers. 

##### Important point below


To avoid these issues, use a more flexible model. The objective is to find a good bal
ance between keeping the street as large as possible and limiting the margin violations
 (i.e., instances that end up in the middle of the street or even on the wrong side). This
 is called soft margin classification.

 When creating an SVM model using Scikit-Learn, we can specify a number of hyper
parameters. C is one of those hyperparameters. If we set it to a low value, then we end
 up with the mod that has wider streets, if the value of C is high then we are going towards hard margin classificationel

In [1]:
import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

In [2]:
iris = datasets.load_iris()
X = iris["data"][:, (2, 3)]  # petal length, petal width
y = (iris["target"] == 2).astype(np.float64)

In [3]:
svm_clf = Pipeline([
 ("scaler", StandardScaler()),
 ("linear_svc", LinearSVC(C=1, loss="hinge",dual=True)),
 ])

In [4]:
svm_clf.fit(X, y)

In [5]:
svm_clf.predict([[5.5, 1.7]])

array([1.])

##### Note:- Unlike Logistic Regression classifiers, SVM classifiers do not out 
put probabilities for each class.

##### Important note below

Instead of using the LinearSVC class, we could use the SVC class with a linear kernel.
 When creating the SVC model, we would write SVC(kernel="linear", C=1). Or we
 could use the SGDClassifier class, with SGDClassifier(loss="hinge", alpha=1/
 (m*C)). This applies regular Stochastic Gradient Descent (see Chapter 4) to train a
 linear SVM classifier. It does not converge as fast as the LinearSVC class, but it can be
 useful to handle online classification tasks or huge datasets that do not fit in memory
 (out-of-core training).

## Nonlinear SVM Classification

Although linear SVM classifiers are efficient and work surprisingly well in many
 cases, many datasets are not even close to being linearly separable.One approach to
 handling nonlinear datasets is to add more features, such as polynomial feature. To implement this idea using Scikit-Learn, create a Pipeline containing a Polyno
 mialFeatures transformerfol
lowed by a StandardScaler and a LinearSVC. Let’s test this on the moons dataset: this
 is a toy dataset for binary classification in which the data points are shaped as two
 interleaving half circles s 

In [6]:
from sklearn.datasets import make_moons
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures

In [7]:
X, y = make_moons(n_samples=100, noise=0.15)

In [8]:
X

array([[ 0.40044917,  1.1047466 ],
       [-0.52934994,  0.86595106],
       [ 0.59530747,  0.7543267 ],
       [ 1.75279633, -0.32396073],
       [-0.5123682 ,  0.32378545],
       [ 1.8596307 ,  0.49102431],
       [ 0.86479901, -0.44444734],
       [ 0.28450651, -0.13149847],
       [ 2.02305335,  0.38492183],
       [ 0.93225242,  0.07388434],
       [ 0.3330345 ,  0.89119604],
       [ 0.00431553,  0.54080641],
       [-0.19880315,  0.91224413],
       [ 1.45329275, -0.25602455],
       [ 0.34540817, -0.30281142],
       [-0.67752732,  0.64495056],
       [ 2.10805706, -0.01401647],
       [ 1.8016006 ,  0.28258508],
       [ 0.13510703,  0.86778895],
       [ 0.91172883,  0.52995625],
       [-0.1543749 ,  0.70960688],
       [ 0.57348857, -0.25206275],
       [-0.05727664,  0.31329396],
       [ 0.74585247,  0.6013514 ],
       [ 0.90637248,  0.33331217],
       [-1.26888146,  0.18205431],
       [ 0.08759532,  0.79568126],
       [ 0.75309254, -0.64034006],
       [ 0.74024036,

In [9]:
 y

array([0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1,
       1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0,
       0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0,
       1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1,
       1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0], dtype=int64)

In [10]:
 polynomial_svm_clf = Pipeline([
 ("poly_features", PolynomialFeatures(degree=3)),
 ("scaler", StandardScaler()),
 ("svm_clf", LinearSVC(C=10, loss="hinge"))
 ])

In [11]:
polynomial_svm_clf.fit(X, y)



## Polynomial Kernel

#### Important point below

Adding polynomial features is simple to implement and can work great with all sorts
 of Machine Learning algorithms (not just SVMs). That said, at a low polynomial
 degree, this method cannot deal with very complex datasets, and with a high polyno
mial degree it creates a huge number of features, making the model too slow.

#### Important point below

Fortunately, when using SVMs you can apply an almost miraculous mathematical
 technique called the kernel trick (explained in a moment). The kernel trick makes it
 possible to get the same result as if you had added many polynomial features, even
 with very high-degree polynomials, without actually having to add them. So there is
 no combinatorial explosion of the number of features because you don’t actually add
 any features. This trick is implemented by the SVC class. Let’s test it on the moons
 dataset:

In [12]:
from sklearn.svm import SVC

In [13]:
poly_kernel_svm_clf = Pipeline([
 ("scaler", StandardScaler()),
 ("svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5))
 ])

In [14]:
poly_kernel_svm_clf.fit(X, y)

The hyperparameter coef0 controls how much the model is influenced by high
degree polynomials versus low-degree polynomials.

## Similarity Features

Another technique to tackle nonlinear problems is to add features computed using a
 similarity function, which measures how much each instance resembles a particular
 landmar You may wonder how to select the landmarks. The simplest approach is to create a
 landmark at the location of each and every instance in the dataset. Doing that creates
 many dimensions and thus increases the chances that the transformed training set
 will be linearly separable. The downside is that a training set with m instances and n
 features gets transformed into a training set with m instances and m features (assum
ing you drop the original features). If your training set is very large, you end up with
 an equally large number of features.k.

## Gaussian RBF Kernel

Just like the polynomial features method, the similarity features method can be useful
 with any Machine Learning algorithm, but it may be computationally expensive to
 compute all the additional features, especially on large training sets. Once again the
 kernel trick does its SVM magic, making it possible to obtain a similar result as if you
 had added many similarity features. Let’s try the SVC class with the Gaussian RBF
 kernel:

In [15]:
rbf_kernel_svm_clf = Pipeline([
 ("scaler", StandardScaler()),
 ("svm_clf", SVC(kernel="rbf", gamma=5, C=0.001))
 ])

In [16]:
rbf_kernel_svm_clf.fit(X, y)

With so many kernels to choose from, how can you decide which
 one to use? As a rule of thumb, you should always try the linear
 kernel first (remember that LinearSVC is much faster than SVC(ker
 nel="linear")), especially if the training set is very large or if it
 has plenty of features. If the training set is not too large, you should
 also try the Gaussian RBF kernel; it works well in most cases. Then
 if you have spare time and computing power, you can experiment
 with a few other kernels, using cross-validation and grid search.
 You’d want to experiment like that especially if there are kernels
 specialized for your training set’s data structure.

## Computational Complexity

### Linear SVC

The LinearSVC class is based on the liblinear library, which implements an opti
mized algorithm for linear SVMs.1 It does not support the kernel trick, but it scales
 almost linearly with the number of training instances and the number of features. Its
 training time complexity is roughly O(m × n).

### SVC(kernel = 'linear')

The SVC class is based on the libsvm library, which implements an algorithm that
 supports the kernel trick.2 The training time complexity is usually between O(m2 × n)
 and O(m3 × n). Unfortunately, this means that it gets dreadfully slow when the num
ber of training instances gets large (e.g., hundreds of thousands of instances).

## SVM Regression

As mentioned earlier, the SVM algorithm is versatile: not only does it support linear
 and nonlinear classification, but it also supports linear and nonlinear regression. To
 use SVMs for regression instead of classification, the trick is to reverse the objective:
 instead of trying to fit the largest possible street between two classes while limiting
 margin violations, SVM Regression tries to fit as many instances as possible on the
 street while limiting margin violations (i.e., instances off the street).

### SVM regression hyperparameter

The width of the
 street is controlled by a hyperparamete calles epsilon, (ϵ) if ϵ is big then we have a large margin, if small then small margin( relate it with the street basically)r

### Important point below

Adding more training instances within the margin does not affect the model’s predic
tions; thus, the model is said to be ϵ-insensitive.

 You can use Scikit-Learn’s LinearSVR class to perform linear SVM Regression.`

In [17]:
from sklearn.svm import LinearSVR

In [18]:
svm_reg = LinearSVR(epsilon=1.5)
svm_reg.fit(X, y)



In [19]:
from sklearn.svm import SVR

In [20]:
svm_poly_reg = SVR(kernel="poly", degree=2, C=100, epsilon=0.1)
svm_poly_reg.fit(X, y)

### Important point below

The SVR class is the regression equivalent of the SVC class, and the LinearSVR class is
 the regression equivalent of the LinearSVC class. The LinearSVR class scales linearly
 with the size of the training set (just like the LinearSVC class), while the SVR class gets
 much too slow when the training set grows large (just like the SVC class).

## Under the Hood