Support Vector Machines - is a model that can do both classification and 
regression. 

<img src="svm1.png" width=300, height=200>

References: https://commons.wikimedia.org/wiki/File:SVM_Example_of_Hyperplanes.png

Let's look at another example

<img src="svm2.png" width=300, height=200>

https://commons.wikimedia.org/wiki/File:SVM_margin.png

Support vectors are the points that lie close to the decision boundary. 

The dataset will comprise of $(x_i, y_i)$ where $y_i$ is either 1 or -1 
that indicates the class that $x_i$ belongs.

Our goal is to find a hyperplane that separates the two classes with maximum margin. This hyperplane can be represented by 
$\vec{w}\vec{x} - \vec{b} = 0.$

Using the training dataset, we compute $\vec{w}$ and $\vec{b}.$ 

Any point that lies on or above 

$\vec{w}\vec{x} - \vec{b} = 1$ will be classified as class 1 
and any point thay lies on or below

$\vec{w}\vec{x} - \vec{b} = -1$ will be classified as 
class 2.

The distance between the two hyperplanes is $\frac{2}{||\vec{w}||},$ we want to maximize this which is same as minimizing $||\vec{w}||.$ 

In Hard-margin, we are very particular about the margin. No data points can lie within the margin. So hard-margins are narrow.

In Soft-margin, data points can lie within the margin. Soft-margins are wide. 

The loss function for SVM is defined by
$max(0, 1-y_i(\vec{w}.\vec{x_i} - b))$ 



In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
df = pd.read_csv("bill_authentication.csv")

In [3]:
df.shape

(1372, 5)

In [4]:
df.head()

Unnamed: 0,Variance,Skewness,Curtosis,Entropy,Class
0,3.6216,8.6661,-2.8073,-0.44699,0
1,4.5459,8.1674,-2.4586,-1.4621,0
2,3.866,-2.6383,1.9242,0.10645,0
3,3.4566,9.5228,-4.0112,-3.5944,0
4,0.32924,-4.4552,4.5718,-0.9888,0


Important Links to understand skewness, curtosis (kurtosis), entropy

https://www.mathsisfun.com/data/skewness.html

https://www.statology.org/can-kurtosis-be-negative/
    
https://towardsdatascience.com/entropy-is-a-measure-of-uncertainty-e2c000301c2c

In [4]:
x = df.drop('Class', axis=1)
y = df['Class']

In [5]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.20)

In [6]:
from sklearn.svm import SVC
svclassifier = SVC(kernel='linear') # creating an object
svclassifier.fit(x_train, y_train) # fiting the data to the model

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
    kernel='linear', max_iter=-1, probability=False, random_state=None,
    shrinking=True, tol=0.001, verbose=False)

In [7]:
y_pred = svclassifier.predict(x_test)

In [8]:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

[[145   0]
 [  0 130]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       145
           1       1.00      1.00      1.00       130

    accuracy                           1.00       275
   macro avg       1.00      1.00      1.00       275
weighted avg       1.00      1.00      1.00       275



### Kernel Trick


References: http://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html

https://prateekvjoshi.com/2012/09/01/kernel-functions-for-machine-learning/

<img src="kernel1.png" width=400 height=300>

#### Different kernel functions

References: https://www.slideshare.net/okamoto-laboratory/families-of-triangular-norm-based-kernel-function-and-its-application-to-kernel-kmeans-conference

<img src="kernel2.png" width=400, height=300>

When to use which kernel?

Use linear SVM for linear problems and non-linear kernels such as RBF for non-linear data. 

Let us consider SVM with kernel trick

In [9]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

# Assign colum names to the dataset
colnames = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

# Read dataset to pandas dataframe
irisdata = pd.read_csv(url, names=colnames)

In [10]:
print(irisdata.shape)

(150, 5)


In [11]:
print(irisdata.head())

   sepal-length  sepal-width  petal-length  petal-width        Class
0           5.1          3.5           1.4          0.2  Iris-setosa
1           4.9          3.0           1.4          0.2  Iris-setosa
2           4.7          3.2           1.3          0.2  Iris-setosa
3           4.6          3.1           1.5          0.2  Iris-setosa
4           5.0          3.6           1.4          0.2  Iris-setosa


In [12]:
print(irisdata['Class'].unique())

['Iris-setosa' 'Iris-versicolor' 'Iris-virginica']


In [14]:
c = {'Iris-setosa':0, 'Iris-versicolor':1, 'Iris-virginica':2}
irisdata.Class = [c[item] for item in irisdata.Class] # list comprehension 

In [15]:
print(irisdata.head())

   sepal-length  sepal-width  petal-length  petal-width  Class
0           5.1          3.5           1.4          0.2      0
1           4.9          3.0           1.4          0.2      0
2           4.7          3.2           1.3          0.2      0
3           4.6          3.1           1.5          0.2      0
4           5.0          3.6           1.4          0.2      0


In [16]:
x = irisdata.drop('Class', axis=1)
y = irisdata['Class']

In [17]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.20, random_state=42)

Links to understand $gamma,$ and degree

https://scikit-learn.org/stable/modules/svm.html

In [19]:
# polynomial kernel

from sklearn.svm import SVC
svclassifier = SVC(kernel='poly', degree=2, gamma='auto')
svclassifier.fit(x_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=2, gamma='auto', kernel='poly',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [20]:
y_pred = svclassifier.predict(x_test)

In [21]:
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

[[10  0  0]
 [ 0  8  1]
 [ 0  0 11]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.89      0.94         9
           2       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30



In [22]:
# RBF kernel

from sklearn.svm import SVC
svclassifier = SVC(kernel='rbf', gamma='auto')
svclassifier.fit(x_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [23]:
y_pred = svclassifier.predict(x_test)

In [24]:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



In [None]:
"""
In-class activity: consider Titanic dataset and apply SVM model.
"""