# Linear SVM Classification

1. Introduction

    * Support Vector Machines (SVMs) is linked to the concept of hyperplanes. A hyperplane is an $(n-1)$ subspace in an $n$-dimensional space. For example, to divide a three-dimensional space, we use a two-dimensional hyperplane (analogous to a 2D box).
    * For a two-dimensional psace, the hyperplane becomes a line.
    * SVMs classify data by finding the hyperplane which maximizes the margin between the different classes in the training dataset, and can be used for regression analysis, i.e. numerical predictions. 
    * Data points which reside along the margins are defined as support vectors and the mid-line passing in between of the margins is defined as the optimal hyperplane. 
    
        <img src="data/images/SVM_concept.png" width="30%">
        source: https://en.proft.me/2014/04/22/how-simulate-support-vector-machine-svm-r/
        
   
2. Principles of SVMs
    * Creates a boundary between data points (y-target which can be multi-dimensional) and the feature values.
    * Uses kernel trick to first transform the available data and then use the transformed data to find an optimal boundary between the possible outputs
    * Types of kernel functions:
        * Linear kernel - no data transformation
        * Polynomial kernel -  simple nonlinear transformation of the data by using a certain degree of $d$.
        * Sigmoid kernel results in a SVM model somewhat analogous to a neural network using a sigmoid activation function.
        * Gaussian Radial Basis Function (RBF) kernel is similar to a RBF neural network. 

For more information about the kernel functions, see http://www.jstatsoft.org/v15/i09/paper


3. Advantages of SVMs
    * Uses a regularisation parameter which makes over-fitting  unlikely.  
    * Engineered kernel trick which incorporates domain knowledge. 
    * SVM is defined by a convex optimisation problem hence there is no local minima.


4. Disadvantages of SVMs
    * Depends only on the parameters for a given value of the regularisation parameter, kernel parameters and kernel function -> shift the over-fitting problem to model selection, hence results in over-fitting of model selection.

In [1]:
# importing the required libraries
import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
from math import sqrt
from sklearn.datasets import load_digits
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

# define model for use later
svm_class_linear = Pipeline([("scaler",StandardScaler()),("linear_svc", LinearSVC(C=1, loss="hinge")),])
# C maintains a balance between the street width and margin violations
# svm_class_linear = LinearSVC(random_state=0, tol=1e-5)

In [2]:
# loading in digits data
dataset = load_digits()
print(dataset.data.shape)

X_digits = dataset.data / dataset.data.max() # to normalize all data by the maximum digit
y_digits = dataset.target

(1797, 64)


In [3]:
# training and validation dataset
# dummy_value = random.random() # generate a random number of between 0.0 and 1.0
# print(round(dummy_value,3))
dummy_value = 0.8
size = len(X_digits)
dummy_pos = int(dummy_value*size)
print(dummy_pos)

X_train = X_digits[:dummy_pos]
y_train = y_digits[:dummy_pos]
X_test = X_digits[dummy_pos:]
y_test = y_digits[dummy_pos:]

1437


In [8]:
# model training
result1 = svm_class_linear.fit(X_train,y_train)

# model validation
y_predict = result1.predict(X_test)
y_test = np.array(y_test)
y_test = y_test.reshape(len(y_test),-1)
y_predict = np.array(y_predict)
y_predict = y_predict.reshape(len(y_predict),-1)

correct = 0
wrong = 0
for i in range(len(y_test)):
    if y_test[i] == y_predict[i]:
        correct += 1
    else:
        wrong += 1
print('The number of matching ones is ' + str(correct) + '.')
print('The number of non-matching ones is ' + str(wrong) + '.')
accuracy = (correct/(len(y_test))) * 100
print('Average model accuracy is ' + str(round(accuracy,1)) + '%' + '.')

The number of matching ones is 319.
The number of non-matching ones is 41.
Average model accuracy is 88.6%.
