In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [3]:
bankdata = pd.read_csv("bill_authentication.csv")

In [4]:
bankdata.head()

Unnamed: 0,Variance,Skewness,Curtosis,Entropy,Class
0,3.6216,8.6661,-2.8073,-0.44699,0
1,4.5459,8.1674,-2.4586,-1.4621,0
2,3.866,-2.6383,1.9242,0.10645,0
3,3.4566,9.5228,-4.0112,-3.5944,0
4,0.32924,-4.4552,4.5718,-0.9888,0


# Exploratory Data Analysis

In [5]:
bankdata.shape

(1372, 5)

# Data Preprocessing

Data preprocessing involves (1) Dividing the data into attributes and labels and (2)
dividing the data into training and testing sets.

In [6]:
X = bankdata.drop('Class', axis=1)
y = bankdata['Class']

In the first line of the script above, all the columns of the bankdata dataframe are being stored in the X variable except
the "Class" column, which is the label column. The drop() method drops this column.

In the second line, only the class column is being stored in the y variable. At this point of time X variable contains 
attributes while y variable contains corresponding labels.

Once the data is divided into attributes and labels, the final preprocessing step is to divide data into training and test
sets. Luckily, the model_selection library of the Scikit-Learn library contains the train_test_split method that allows us 
to seamlessly divide data into training and test sets.

In [7]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)


# Training the Algorithm

We have divided the data into training and testing sets. Now is the time to train our SVM on the training data. 
Scikit-Learn contains the svm library, which contains built-in classes for different SVM algorithms. Since we are going 
to perform a classification task, we will use the support vector classifier class, which is written as SVC in the 
Scikit-Learn's svm library. This class takes one parameter, which is the kernel type. This is very important. 
In the case of a simple SVM we simply set this parameter as "linear" since simple SVMs can only classify linearly separable
data. We will see non-linear kernels in the next section.

The fit method of SVC class is called to train the algorithm on the training data, which is passed as a parameter to the
fit method. Execute the following code to train the algorithm:

In [8]:
from sklearn.svm import SVC
svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

# Making Predictions

In [9]:
y_pred = svclassifier.predict(X_test)

# Evaluating the Algorithm

Confusion matrix, precision, recall, and F1 measures are the most commonly used metrics for classification tasks. 
Scikit-Learn's metrics library contains the classification_report and confusion_matrix methods, which can be readily 
used to find out the values for these important metrics.

In [10]:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))


[[156   1]
 [  0 118]]
             precision    recall  f1-score   support

          0       1.00      0.99      1.00       157
          1       0.99      1.00      1.00       118

avg / total       1.00      1.00      1.00       275



# Kernel SVM

In the previous section we saw how the simple SVM algorithm can be used to find decision boundary for linearly separable 
data. However, in the case of non-linearly separable data, such as the one shown in Fig. 3, a straight line cannot be used 
as a decision boundary.

In case of non-linearly separable data, the simple SVM algorithm cannot be used. Rather, a modified version of SVM, called 
Kernel SVM, is used.

Basically, the kernel SVM projects the non-linearly separable data lower dimensions to linearly separable data in higher
dimensions in such a way that data points belonging to different classes are allocated to different dimensions. 
Again, there is complex mathematics involved in this, but you do not have to worry about it in order to use SVM. 
Rather we can simply use Python's Scikit-Learn library that to implement and use the kernel SVM.

# Implementing Kernel SVM with Scikit-Learn

Implementing Kernel SVM with Scikit-Learn is similar to the simple SVM. In this section, we will use the famous iris 
dataset to predict the category to which a plant belongs based on four attributes: sepal-width, sepal-length, petal-width 
and petal-length.
https://stackabuse.com/implementing-svm-and-kernel-svm-with-pythons-scikit-learn/

The dataset can be downloaded from the following link:

https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [4]:
# Assign colum names to the dataset
colnames = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

# Load the dataset

irisdata = pd.read_csv("/home/bozaar/Documentos/PYTHON/PRACTICA - IMPLEMENTING SVM/iris.data", names=colnames)

In [5]:
irisdata.head()

Unnamed: 0,sepal-length,sepal-width,petal-length,petal-width,Class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


# Preprocessing

In [6]:
X = irisdata.drop('Class', axis=1)
y = irisdata['Class']

# Train Test Split

In [7]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)

# Training the Algorithm
To train the kernel SVM, we use the same SVC class of the Scikit-Learn's svm library. The difference lies in the value for the kernel parameter of the SVC class. In the case of the simple SVM we used "linear" as the value for the kernel parameter. However, for kernel SVM you can use Gaussian, polynomial, sigmoid, or computable kernel. We will implement polynomial, Gaussian, and sigmoid kernels to see which one works better for our problem.

# 1. Polynomial Kernel

In the case of polynomial kernel, you also have to pass a value for the degree parameter of the SVC class. 
This basically is the degree of the polynomial. Take a look at how we can use a polynomial kernel to implement kernel SVM:

In [8]:
from sklearn.svm import SVC
svclassifier = SVC(kernel='poly', degree=8)
svclassifier.fit(X_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=8, gamma='auto', kernel='poly',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

# Making Predictions

Now once we have trained the algorithm, the next step is to make predictions on the test data.

Execute the following script to do so:

In [9]:
y_pred = svclassifier.predict(X_test)

# Evaluating the Algorithm

As usual, the final step of any machine learning algorithm is to make evaluations for polynomial kernel. 
Execute the following script:

In [10]:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

[[12  0  0]
 [ 0 10  2]
 [ 0  0  6]]
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        12
Iris-versicolor       1.00      0.83      0.91        12
 Iris-virginica       0.75      1.00      0.86         6

    avg / total       0.95      0.93      0.94        30



In [11]:
# Now let's repeat the same steps for Gaussian and sigmoid kernels.

# 2. Gaussian Kernel
To use Gaussian kernel, you have to specify 'rbf' as value for the Kernel parameter of the SVC class.

In [13]:
from sklearn.svm import SVC
svclassifier = SVC(kernel='rbf')
svclassifier.fit(X_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

# Prediction and Evaluation

In [14]:
y_pred = svclassifier.predict(X_test)

In [15]:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

[[12  0  0]
 [ 0 11  1]
 [ 0  0  6]]
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        12
Iris-versicolor       1.00      0.92      0.96        12
 Iris-virginica       0.86      1.00      0.92         6

    avg / total       0.97      0.97      0.97        30



# 3. Sigmoid Kernel

Finally, let's use a sigmoid kernel for implementing Kernel SVM. To use the sigmoid kernel, you have to specify 'sigmoid' as value for the kernel parameter of the SVC class.

In [17]:
from sklearn.svm import SVC
svclassifier = SVC(kernel='sigmoid')
svclassifier.fit(X_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='sigmoid',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

# Prediction and Evaluation

In [18]:
y_pred = svclassifier.predict(X_test)

In [19]:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

[[ 0  0 12]
 [ 0  0 12]
 [ 0  0  6]]
                 precision    recall  f1-score   support

    Iris-setosa       0.00      0.00      0.00        12
Iris-versicolor       0.00      0.00      0.00        12
 Iris-virginica       0.20      1.00      0.33         6

    avg / total       0.04      0.20      0.07        30



  'precision', 'predicted', average, warn_for)


# Comparison of Kernel Performance

If we compare the performance of the different types of kernels we can clearly see that the sigmoid kernel performs 
the worst. This is due to the reason that sigmoid function returns two values, 0 and 1, therefore it is more suitable 
for binary classification problems. However, in our case we had three output classes.

Amongst the Gaussian kernel and polynomial kernel, we can see that Gaussian kernel achieved a perfect 100% prediction rate 
while polynomial kernel misclassified one instance. Therefore the Gaussian kernel performed slightly better. 
However, there is no hard and fast rule as to which kernel performs best in every scenario. It is all about testing all 
the kernels and selecting the one with the best results on your test dataset.