# SVM: Support Vector Machine

A hyperplane (line that splits the input variable space) is selected to best separate the points in the input variable space by their class, either class 0 or class 1. In two-dimensions, you can visualize this as a line.  An optimization algorithm is used to find the values for the coefficients that maximizes the margin.  The distance between the hyperplane and the closest data points is referred to as the **margin**. The best or optimal hyperplane that can separate the two classes is the line that has the largest margin. Only these points, called the **support vectors**, are relevant in defining the hyperplane and in the construction of the classifier.

Pros | Cons
:----|:----
Effective in high dimensional spaces | Does not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation.  
It works really well with clear margin of separation | Prone to overfitting   
It is effective in cases where number of dimensions is greater than the number of samples. | Poor performance with a large data set because the required training time is higher.  
Memory efficient | Poor performance with very noisy datasets, where the target classes are overlapping especially. 
Powerful out-of-the-box classifiers | 

## Parameters

`sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma=0.0, coef0=0.0, shrinking=True, probability=False,tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, random_state=None)`

Parameters with higher impact on model performance:  “kernel”, “gamma” and “C”

## Kernels: a way to transform features

Instead of being forced to live in a coordinate system such as <x0,⋯, x1>, we can instead transform our data into a new coordinate system that is easier to solve. 

(show plot of 2 circles, one inside the other): draw a straight line that separates the two circles.

![svm_image1.png](svm_image1.png)

These look like regular circles, so there doesn’t appear to be a line that you could separate them with. This is true in 2D Cartesian coordinate systems, but if you project this into a 3D Cartesian coordinate system, < x,y > → <x2,√2xy,y2>, you will find that in fact this turns out to be linear.  

Now you can see that these two circles are separate and you can draw a plane easily between the two. If you took that and mapped it back to the original plane, then there would in fact be a third circle in the middle that is a straight plane." 

![svm_image2.png](svm_image2.png)

As a side note, there are many different types of projections (or kernels) such as:

- Polynomial kernel (heterogeneous and homogeneous)
- Radial basis functions
- Gaussian kernels

Options avialable with sklearn.svm.SVC:

- “linear”, “**rbf**”,”poly” and others  
- “rbf” and “poly” are useful for non-linear hyper-plane.
- look at the iris example, where we’ve used linear kernel on two feature of iris data set to classify their class.

In [1]:
import numpy as np
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from sklearn.svm import SVC
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split

# ignore warnings
import warnings
warnings.filterwarnings("ignore")

In [2]:
df = sns.load_dataset('iris')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [7]:
X = df.drop(['species'],axis=1)
y = df[['species']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .30, random_state = 123)

## Build SVM Model

In [8]:
svm = SVC(probability = True, random_state = 123)
svm.fit(X_train, y_train)
y_pred = svm.predict(X_train)
y_pred_proba = svm.predict_proba(X_train)
y_pred_proba[0:10]

array([[0.01428155, 0.01439904, 0.97131941],
       [0.0163127 , 0.00985372, 0.97383358],
       [0.00757722, 0.95064107, 0.04178171],
       [0.95145722, 0.02451949, 0.02402328],
       [0.94195462, 0.02745419, 0.03059119],
       [0.01615358, 0.17168914, 0.81215728],
       [0.93045699, 0.04387476, 0.02566825],
       [0.94782817, 0.0276442 , 0.02452763],
       [0.01337596, 0.93357643, 0.05304762],
       [0.00980796, 0.94920915, 0.04098289]])

## Evaluate Model

In [9]:
print('Accuracy of SVM classifier on training set: {:.2f}'
      .format(svm.score(X_train, y_train)))

Accuracy of SVM classifier on training set: 0.98


In [10]:
print('Confusion Matrix\n',confusion_matrix(y_train, y_pred))

Confusion Matrix
 [[32  0  0]
 [ 0 39  1]
 [ 0  1 32]]


In [12]:
print(classification_report(y_train, y_pred))

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        32
  versicolor       0.97      0.97      0.97        40
   virginica       0.97      0.97      0.97        33

    accuracy                           0.98       105
   macro avg       0.98      0.98      0.98       105
weighted avg       0.98      0.98      0.98       105



## Test Model

In [11]:
print('Accuracy of SVM classifier on test set: {:.2f}'
     .format(svm.score(X_test, y_test)))

Accuracy of SVM classifier on test set: 0.98


## Further Resources for SVM

- [SKLearn Example SVM](https://scikit-learn.org/stable/auto_examples/exercises/plot_iris_exercise.html#sphx-glr-auto-examples-exercises-plot-iris-exercise-py)
- [Support Vector Machines: A Visual Explanation with Sample Python Code](https://www.youtube.com/watch?v=N1vOgolbjSc)
- [Muffin vs. Cupcake SVM Classification example code](https://github.com/adashofdata/muffin-cupcake)
- [SVM Deep dive on turning decision boundaries into decision rules](https://www.youtube.com/watch?v=_PwhiWxHK8o)


## Exercises

Write a function to train, predict, and evaluate an SVM classifier. 
In your function, add a 'kernel' argument so that you can then run the function multiple times with different values for the kernel and compare results. 