# Support Vector Machines (SVM)
It is a powerful and versatile machine learning model, capable of performing linear or nonlinear classification, regression and outlier detection.

## Linear SVM Classification
__Linearly separable__: Two classes can be separated easily with a straight line.<br>
__large margin classification__: SVM classifier that fits the widest possible street between the classes.<br>
__Support vectors__: The instances located on the edge of the street.

### Hard Margin Classification
If we strictly impose that all instances must be off the street and on the right side, then it is called hard margin classification.  

_Limitations_: 
- Only works with linearly separable data.   
- Senstive to outliers (instances that deviated from test of the instances)


### Soft Margin Classification
It overcomes the limitations of hard margin by finding a good balance between keeping the street as large as possible and limiting the margin violations (instances that end up in the middle of the street or on the wrong side).

__Hyperparameters__: variables which determine the network structure (eg: no of hidden units) and variables which determine how the network is trained (eg: learning rate)<br> In SVM using Scikit-Learn, we can specify a no of hyperparameter. Lower the value of hyperparameter, the better will be the result.


In [5]:
import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

iris = datasets.load_iris()
x = iris["data"][:,(2,3)] # petal length, petal width
y = (iris["target"]==2).astype(np.float64) # iris virginica

svm_clf = Pipeline([
    ("scaler",StandardScaler()),
    ("linear_svc",LinearSVC(C=1,loss="hinge")),
])

svm_clf.fit(x,y)


Pipeline(steps=[('scaler', StandardScaler()),
                ('linear_svc', LinearSVC(C=1, loss='hinge'))])

In [6]:
svm_clf.predict([[5.5,1.7]])

array([1.])

In [20]:
scaler = StandardScaler()
svm_clf1 = LinearSVC(C=1, loss="hinge", random_state=42)
svm_clf2 = LinearSVC(C=100, loss="hinge", random_state=42)

scaled_svm_clf1 = Pipeline([
        ("scaler", scaler),
        ("linear_svc", svm_clf1),
    ])
scaled_svm_clf2 = Pipeline([
        ("scaler", scaler),
        ("linear_svc", svm_clf2),
    ])

scaled_svm_clf1.fit(X, y)
scaled_svm_clf2.fit(X, y)

ValueError: Found input variables with inconsistent numbers of samples: [500, 100]

## Nonlinear SVM Classification
Adding second feature to a simple (one feature) dataset to convert linearly not separable dataset into the resulting 2D dataset which is perfectly linearly separable.  
__SVM Kernel__: transformation from lower dimension into higher dimension using some mathematical formulae (eg=$x^2$) to make it linearly separable.

In [10]:
from sklearn.datasets import make_moons
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures

x,y = make_moons(n_samples=100,noise=0.15)
polynomial_svm_clf = Pipeline([
    ("poly_features",PolynomialFeatures(degree=3)),
    ("scaler",StandardScaler()),
    ("svm_clf",LinearSVC(C=10,loss="hinge"))
])
polynomial_svm_clf.fit(x,y)

Pipeline(steps=[('poly_features', PolynomialFeatures(degree=3)),
                ('scaler', StandardScaler()),
                ('svm_clf', LinearSVC(C=10, loss='hinge'))])

### Polynomial Kernel
Low polynomial degree cannot deal with very complex datasets and high polynomial degree creates huge no of features, making the model too slow. But while using SVM, we can apply mathematical technique called __kernel trick__ which makes it possible to get the same result as if we have added many polynomial features, even with high-degree polynomials, without actually having to add them

In [13]:
from sklearn.svm import SVC

poly_kernel_svm_clf = Pipeline([
    ("scaler",StandardScaler()),
    ("svm_clf",SVC(kernel="poly",degree=3,coef0=1,C=5))
])
poly_kernel_svm_clf.fit(x,y)

Pipeline(steps=[('scaler', StandardScaler()),
                ('svm_clf', SVC(C=5, coef0=1, kernel='poly'))])

### Similarity Features
It measures how much each instance resembles a particular landmark.

<div class="alert alert-block alert-info"><b>Eqn 1: </b> Gaussian RBF<br>$φ_y(x,l) = exp(-y||x-l||^2)$</div>

### Gaussian RBF Kernel
Similar to polynomial kernel.

In [14]:
rbf_kernel_svm_clf = Pipeline([
    ("scaler",StandardScaler()),
    ("svm_clf",SVC(kernel="rbf",gamma=5,C=0.001))
])
rbf_kernel_svm_clf.fit(x,y)

Pipeline(steps=[('scaler', StandardScaler()),
                ('svm_clf', SVC(C=0.001, gamma=5))])

### Computational Complexity

|Class|Time complexity|out-of-core support|scaling required|kernel trick|
|-----|------|------|-----|-----|
|LinearSVC|O(m*n)|no|yes|no|
|SGDClassifier|O(m*n)|yes|yes|no|
|SVC|O($m^2$*n) to O($m^3$*n)|no|yes|yes|


In LinearSVC class, the algorithm takes longer if we require high precision. This is controlled by the tolerance hyperparameter ε (called tol in Scikit-Learn).

In SVC class, the given time complexity means it gets dreadfully slow when the no of training instances get large. 


## SVM  Regression
It's objective is the reverse of classification. Instead of trying to fit the largest possible street between two classes while limiting margin violations, SVM Regression tries to fit as many instances as possible on the street while limiting margin violations (i.e. instances off the street). The width of street is controlled by hyperparameter ε.  
Adding more training instances within the margin does not affect the model’s predictions; thus, the model is said to be _ε-insensitive_.

In [15]:
from sklearn.svm import LinearSVR

svm_reg = LinearSVR(epsilon=1.5)
svm_reg.fit(x,y)

LinearSVR(epsilon=1.5)

In [18]:
from sklearn.svm import SVR

svm_poly_reg = SVR(kernel="poly",degree=2,C=100,epsilon=0.1)
svm_poly_reg.fit(x,y)

SVR(C=100, degree=2, kernel='poly')