# Support Machine Vectors


Support Vector Machines (SVM) are widely recognized and widely used in for classification tasks. SVM offers distinct advantages compared to other methods, such as Linear Discriminant Analysis (LDA) and logistic regression. LDA focuses on modeling the joint distribution of both the target variable $(Y)$ and the input features $(X)$. On the other hand, logistic regression primarily models the conditional distribution of the target variable $(Y)$ given the input features $(X)$.

In contrast, SVM takes a geometric approach and does not rely on any specific assumptions about the underlying distribution of the data. Instead, SVM tries to find an optimal separating hyperplane that effectively separates the different classes. By directly seeking the separating hyperplane, SVM provides a unique perspective in classification tasks and offers a powerful tool for solving classification problems. Its ability to handle complex data distributions without making explicit distributional assumptions makes SVM a versatile and widely adopted technique.

There are three types of classifiers, but we will specifically focus on one since the classes are not linearly separable as is assumed in the Maximal Margin Classifier anad Support Vector Classifier. Instead, we will utilize the Support Vector Machines that have a soft non-linear separation in the original feature space since these classes are not separable. In addition, we will review different kernels such as polynomial, RBF, and sigmoid since they can potentially be useful when the observations are not linearly separable.

## Libraries

In [2]:
import pandas as pd
import numpy as np

from libsvm.svmutil import *

from sklearn.metrics import accuracy_score

## Import Data

In [3]:
X_train = pd.read_csv('0_X_train.csv', index_col='Id')
X_valid = pd.read_csv('1_X_valid.csv', index_col='Id')
X_test  = pd.read_csv('2_X_test.csv', index_col='Id')
y_train = pd.read_csv('0_y_train.csv', index_col='Id')
y_valid = pd.read_csv('1_y_valid.csv', index_col='Id')
y_test  = pd.read_csv('2_y_test.csv', index_col='Id')

X = pd.concat([X_train, X_valid, X_test], axis=0)
y = pd.concat([y_train, y_valid, y_test], axis=0)

num_vars = ['age', 'time_spent', 'banner_views', 'banner_views_old', 'days_elapsed_old', 'X4']

In [4]:
# Create a numpy array for y_train so that the methods can appropriately read the data
y_train = np.array(y_train)
y_train = y_train.ravel()

# Create a numpy array for y_train so that the methods can appropriately read the data
y = np.array(y)
y = y.ravel()

In [5]:
X_train = np.asarray(X_train).astype('float')
y_train = np.asarray(y_train).ravel()

X = np.asarray(X).astype('float')
y = np.asarray(y).ravel()

In [6]:
X_valid = np.asarray(X_valid).astype('float')
y_valid = np.asarray(y_valid).ravel()

In [7]:
X_test = np.asarray(X_test).astype('float')
y_test = np.asarray(y_test).ravel()

## SVM with Polynomial Kernel
SVM with a polynomial kernel applies a polynomial function to map the input data into a higher-dimensional feature space, allowing for non-linear decision boundaries.

The polynomial kernel function is defined as $k(x, y) = (x · y + c)^d$, where $x$ and $y$ are the input feature vectors, $·$ represents the dot product, $c$ is a constant term, and $d$ is the degree of the polynomial. 

In [8]:
param = svm_parameter('-q')
param.kernel_type = 1
problem = svm_problem(y_train, X_train)
model1 = svm_train(problem, param)

In [9]:
pred_lbl, pred_acc, pred_val = svm_predict(y_train, X_train, model1)

Accuracy = 65.0335% (4075/6266) (classification)


In [10]:
pred_lbl, pred_acc, pred_val = svm_predict(y_valid, X_valid, model1)


Accuracy = 63.589% (854/1343) (classification)


In [11]:
pred_lbl, pred_acc, pred_val = svm_predict(y_test, X_test, model1)

Accuracy = 65.9717% (886/1343) (classification)


In [12]:
pred_lbl, pred_acc, pred_val = svm_predict(y, X, model1)

Accuracy = 64.9576% (5815/8952) (classification)


In [13]:
param = svm_parameter('-q')
param.kernel_type = 1
problem = svm_problem(y, X)
model2 = svm_train(problem, param)

In [14]:
pred_lbl, pred_acc, pred_val = svm_predict(y, X, model2)

Accuracy = 64.3878% (5764/8952) (classification)


## SVM with Radial Basis Function Kernel

RBF kernel is effective for dealing with non-linearly separable data because it allows this model to capture complex decision boundaries without explicitly mapping the data into a higher-dimensional feature space. 

The RBF kernel function is defined as $k(x, y) = exp(-\gamma * ||x - y||^2)$, where $x$ and $y$ are the input feature vectors, $||x - y||$ represents the Euclidean distance between $x$ and $y$, and $\gamma$ is a hyperparameter that controls the shape of the decision boundary. A smaller value of $\gamma$ will lead to a softer decision boundary, while a larger value makes the boundary more rigid.

In [15]:
param = svm_parameter('-q')
param.kernel_type = 2
problem = svm_problem(y_train, X_train)
model3 = svm_train(problem, param)

In [16]:
pred_lbl, pred_acc, pred_val = svm_predict(y_train, X_train, model3)

Accuracy = 91.1586% (5712/6266) (classification)


In [17]:
pred_lbl, pred_acc, pred_val = svm_predict(y_valid, X_valid, model3)


Accuracy = 79.8213% (1072/1343) (classification)


In [18]:
pred_lbl, pred_acc, pred_val = svm_predict(y_test, X_test, model3)

Accuracy = 80.0447% (1075/1343) (classification)


In [19]:
pred_lbl, pred_acc, pred_val = svm_predict(y, X, model3)

Accuracy = 87.7904% (7859/8952) (classification)


In [20]:
param = svm_parameter('-q')
param.kernel_type = 2
problem = svm_problem(y, X)
model4 = svm_train(problem, param)

In [21]:
pred_lbl, pred_acc, pred_val = svm_predict(y, X, model4)

Accuracy = 90.8847% (8136/8952) (classification)


## SVM with Sigmoid Kernel

The sigmoid kernel is derived from the sigmoid activation function which is also commonly used in neural networks. It is defined as $k(x, y) = tanh(\alpha*x^Ty + c)$, where $x$ and $y$ are the input feature vectors, $\alpha$ is a hyperparameter that controls the shape of the decision boundary, and $c$ is another hyperparameter that controls the bias term.

Compared to other kernel functions like the RBF kernel above, the sigmoid kernel tends to produce decision boundaries that are less smooth and more prone to overfitting.

In [22]:
param = svm_parameter('-q')
param.kernel_type = 3
problem = svm_problem(y_train, X_train)
model5 = svm_train(problem, param)

In [23]:
pred_lbl, pred_acc, pred_val = svm_predict(y_train, X_train, model5)

Accuracy = 58.3307% (3655/6266) (classification)


In [24]:
pred_lbl, pred_acc, pred_val = svm_predict(y_valid, X_valid, model5)


Accuracy = 57.7811% (776/1343) (classification)


In [25]:
pred_lbl, pred_acc, pred_val = svm_predict(y_test, X_test, model5)

Accuracy = 59.0469% (793/1343) (classification)


In [26]:
pred_lbl, pred_acc, pred_val = svm_predict(y, X, model5)

Accuracy = 58.3557% (5224/8952) (classification)


In [27]:
param = svm_parameter('-q')
param.kernel_type = 3
problem = svm_problem(y, X)
model6 = svm_train(problem, param)

In [28]:
pred_lbl, pred_acc, pred_val = svm_predict(y, X, model6)

Accuracy = 58.3557% (5224/8952) (classification)


In [30]:
print("SVM Best Model")
print("")
print("Training accuracy:",  np.round(.9115, 4))
print("Validation accuracy:", np.round(.798213, 4))
print("Test accuracy:", np.round(.800447, 4))
print("X accuracy on Partially Trained Model:", np.round(.877904, 4))
print("X accuracy on Fully Trained Model:", np.round(.908847, 4))

SVM Best Model

Training accuracy: 0.9115
Validation accuracy: 0.7982
Test accuracy: 0.8004
X accuracy on Partially Trained Model: 0.8779
X accuracy on Fully Trained Model: 0.9088


## Resources

* https://dataaspirant.com/svm-kernels/#t-1608054630728

* https://data-flair.training/blogs/svm-kernel-functions/

* 18_SVM_MMC.pdf

* 20_SVM.pdf