<h1>Support Vector Machines</h1>

<p>SVM's can be used for linear and non-linear classification. It is a classifier which generates a discriminative hyperplane to categorize data.</p>

<p>An SVM generates different types of boundaries depending on the dimensionality of the problem:
    
    - 1D: Classifier is a point
    - 2D: Classifier is a line
    - 3D: Classifier is a plane
    - 4D+: Classifier is a hyperplane
   
In mathematical jargon, all of these classifiers are flat affine N-1 dimensional subspaces.
</p>


<p>A hyperplane is an N-1 dimensional subset of the N-dimensional Euclidean space. It divides the space into two disconnected parts in order to classify points. While there may be several hyperplanes that correctly seperate the points into distinct regions, the optimal hyperplane is one that maximizes the margin (distance between plane and closest points of each category). 
    
When the hyperplane has an equal margin on both sides, it is as large as it can be for both categories. This is called a <strong>Maximal Margin Classifer</strong>. However, this type of classifier is very sensitive to outliers. Instead, we allow for some misclassification to obtain a better classifier, and we place the margin between two other distinct categorical points. This is called bias-variance tradeoff, and the marign is now known as a <strong>Soft Margin</strong>.</p>

<p>How do we determine which two points the soft margin should lie between? For this, <strong>cross validation</strong> is necessary. Here, each margin is tested to determine the best soft margin. The resulting classifier is the <strong>Support Vector Classifier (SVC)</strong>. 

However, suppose the data in an N-dimensional space cannot be split into two categories. This is where <strong>Support Vector Machines</strong> are necessary. First, we increase the dimensionality of the data, and then use a SVC. The support vector machine uses a kernel to determine how to transform the data (i.e. how to modify its dimension) so that an SVC can subsequently be used. 

<h3>Implementation</h3>

In [18]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix

<h3>Importing the Data</h3>

In [3]:
#Read in data
bankData = pd.read_csv('bill_authentication.csv')

<h3>Exploratory Data Analysis</h3>

In [5]:
bankData.shape

(1372, 5)

In [7]:
#We can check what this data contains
bankData.head()

Unnamed: 0,Variance,Skewness,Curtosis,Entropy,Class
0,3.6216,8.6661,-2.8073,-0.44699,0
1,4.5459,8.1674,-2.4586,-1.4621,0
2,3.866,-2.6383,1.9242,0.10645,0
3,3.4566,9.5228,-4.0112,-3.5944,0
4,0.32924,-4.4552,4.5718,-0.9888,0


<h3>Data Preprocessing</h3>

The SVM algorithm takes in numerical data (like the first four columns) and predicts a class (as in the "Class" column). We can separate these for the purposes of training.

In [8]:
X = bankData.drop('Class', axis=1)
y = bankData['Class']

Scikit-learn makes it easy to perform a train-test split.

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)

Now, we apply what's called a <strong>kernel trick</strong> to generate the hyperplane in a higher dimension. 

In [10]:
svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
    kernel='linear', max_iter=-1, probability=False, random_state=None,
    shrinking=True, tol=0.001, verbose=False)

In [11]:
y_pred = svclassifier.predict(X_test)

<h3>Results</h3>

In [20]:
print(confusion_matrix(y_test,y_pred))

[[155   7]
 [  0 113]]


<img src="https://www.dataschool.io/content/images/2015/01/confusion_matrix_simple2.png">

Above is an example of a confusion matrix showing what each row and column represent. In our case, there are 7 misclassifications.

<h3>Polynomial Kernel</h3>

In [22]:
svclassifier = SVC(kernel='poly', degree=8, gamma='auto')
svclassifier.fit(X_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=8, gamma='auto', kernel='poly',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [23]:
y_pred = svclassifier.predict(X_test)

In [24]:
print(confusion_matrix(y_test, y_pred))

[[155   7]
 [  0 113]]


Once again, 