### SVM
A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. In two dimentional space this hyperplane is a line dividing a plane in two parts where in each class lay in either side.

<img src = 'https://cdn-images-1.medium.com/max/800/1*BpeH5_M58kJ5xXfwzxI8yA.png'/>

Support Vectors are simply the co-ordinates of individual observation. Support Vector Machine is a frontier which best segregates the two classes (hyper-plane/ line).



### Large Margin Classifier

<img src = 'SVM_21.png' />

<img src ='http://www.liuhaihua.cn/wp-content/uploads/2017/02/7nuAZvb.jpg' />

### Soft Margin Classifier
- The constraint of maximizing the margin of the line that separates the classes must be relaxed. This is often called the soft margin classifier. This change allows some points in the training data to violate the separating line

<img src ='http://www.jydata.top/ueditor/jsp/upload/image/20171222/1513902951824065263.jpg'/>

### Kernels

<img src = 'https://static.commonlounge.com/fp/original/xt5HXYtcYI1pkmlXtSsYBy9gG1520492872_kc'/>

SVM can solve this problem. Easily! It solves this problem by introducing additional feature. Here, we will add a new feature z=x^2+y^2. Now, let’s plot the data points on axis x and z

<img src = 'SVM_9.png'/>

In SVM, it is easy to have a linear hyper-plane between these two classes. But, another burning question which arises is, should we need to add this feature manually to have a hyper-plane. No, SVM has a technique called the kernel trick. These are functions which takes low dimensional input space and transform it to a higher dimensional space i.e. it converts not separable problem to separable problem, these functions are called kernels. It is mostly useful in non-linear separation problem. Simply put, it does some extremely complex data transformations, then find out the process to separate the data based on the labels or outputs you’ve defined.

### Types :

### Polynomial kernel
It is popular in image processing.

Equation is:
<img src = 'https://d2h0cx97tjks2p.cloudfront.net/blogs/wp-content/uploads/polynomial-kernel.png'/>
Polynomial kernel equation

where d is the degree of the polynomial.

### Gaussian kernel
It is general-purpose kernel; used when there is no prior knowledge about the data. Equation is:
Gaussian kernel equation
<img src = 'https://d2h0cx97tjks2p.cloudfront.net/blogs/wp-content/uploads/gaussian-kernel.png'/>
### Gaussian radial basis function (RBF)
It is general-purpose kernel; used when there is no prior knowledge about the data.

Equation is:
<img src = 'https://d2h0cx97tjks2p.cloudfront.net/blogs/wp-content/uploads/gaussian-radial-basis-function-RBF.png'/>
Gaussian radial basis function (RBF)

### Sigmoid kernel
We can use it  as proxy for neural networks. Equation is
 Sigmoid kernel equation
 <img src = 'https://d2h0cx97tjks2p.cloudfront.net/blogs/wp-content/uploads/sigmoid-kernel.png'/>

In [19]:
#linear
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn import datasets
from sklearn import metrics

In [20]:
iris = datasets.load_iris()
x = iris.data
y = iris.target
x.shape

(150, 4)

In [5]:
model = SVC(C=10.0)

In [6]:
model.fit(x,y)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

For large values of C, the optimization will choose a smaller-margin hyperplane if that hyperplane does a better job of getting all the training points classified correctly. Conversely, a very small value of C will cause the optimizer to look for a larger-margin separating hyperplane, even if that hyperplane misclassifies more points.

In [9]:
pred = model.predict(x)

In [13]:
print(metrics.classification_report(y,pred))


             precision    recall  f1-score   support

          0       1.00      1.00      1.00        50
          1       1.00      0.96      0.98        50
          2       0.96      1.00      0.98        50

avg / total       0.99      0.99      0.99       150



In [14]:
metrics.accuracy_score(y,pred)


0.98666666666666669

In [15]:
model2 = SVC(kernel = 'poly',degree=5)

In [16]:
model2.fit(x,y)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='poly',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [17]:
pred2 = model2.predict(x)

In [18]:
metrics.accuracy_score(y,pred2)

0.97999999999999998