# **Support Vector Machine (SVM) Classification Implementation Using Scikit-Learn**

*By Carlos Santiago Bañón*

* **Year:** 2020
* **Technologies:** Python, Pandas, NumPy, Matplotlib
* **Areas**: Machine Learning, Classification
* **Keywords:** `classification`, `kernels`, `linear-kernel` `machine-learning`, `quadratic-kernel`, `radial-basis-function-kernel`, `rbf-kernel`, `support-vector-machine`, `svm`
* **Description:** This notebook presents an implementation of support vector machine classification using the Scikit-Learn library. More specifically, we compare SVM implementations with (a) linear, (b) quadratic, and (c) radial basis function (RBF) kernels. The data used is a preprocessed version of the Kaggle Titanic dataset hosted in the GitHub repository for this notebook.

## 1. Import Statements
---

In [1]:
import numpy as np
import pandas as pd
from sklearn import svm
from sklearn.impute import KNNImputer
from sklearn.model_selection import cross_val_score

## 2. Load the Data
---

First, we import the preprocessed Kaggle Titanic dataset hosted in the GitHub repository for this notebook.

In [2]:
# Import the data into Pandas DataFrames.
train_df = pd.read_csv('https://bit.ly/39AQRJj')
test_df = pd.read_csv('https://bit.ly/3aoJzHG')
y_test_df = pd.read_csv('https://bit.ly/2YxfKzi')

In [3]:
# Show the training set.
train_df

Unnamed: 0,Age,Survived,Pclass,SibSp,Fare,Gender
0,2,0,3,1,0,0
1,3,1,1,1,3,1
2,2,1,3,0,1,1
3,3,1,1,1,3,1
4,3,0,3,0,1,0
...,...,...,...,...,...,...
886,2,0,2,0,1,0
887,2,1,1,0,2,1
888,2,0,3,1,2,1
889,2,1,1,0,2,0


In [4]:
# Show the test set.
test_df

Unnamed: 0,Age,Pclass,SibSp,Fare,Gender
0,3,3,0,0,0
1,4,3,1,0,1
2,5,2,0,1,0
3,2,3,0,1,0
4,2,3,1,1,1
...,...,...,...,...,...
413,3,3,0,1,0
414,3,1,0,3,1
415,3,3,0,0,0
416,3,3,0,1,0


In [5]:
# Set up the learning matrices.
X_train = train_df.drop('Survived', axis=1, inplace=False).to_numpy()
y_train = train_df[['Survived']].to_numpy()
X_test = test_df.to_numpy()
y_test = y_test_df.drop('PassengerId', axis=1, inplace=False).to_numpy()

## 3. Support Vector Machine Classification
---

### 3.1. Linear Kernel

In [6]:
# Create the SVM model.
clf = svm.SVC(kernel='linear')

In [7]:
# Train the model.
clf.fit(X_train, y_train.ravel());

In [8]:
# Calculate the accuracy using cross-validation.
accuracy_score = cross_val_score(clf, X_train, y_train.ravel(), cv=5, scoring='accuracy').mean()
print("Accuracy Score for Linear Kernel:", accuracy_score)

Accuracy Score for Linear Kernel: 0.7867365513778168


### 3.2. Quadratic Kernel

In [9]:
# Create the SVM model.
clf = svm.SVC(kernel='poly', degree=2)

In [10]:
# Train the model.
clf.fit(X_train, y_train.ravel());

In [11]:
# Calculate the accuracy using cross-validation.
accuracy_score = cross_val_score(clf, X_train, y_train.ravel(), cv=5, scoring='accuracy').mean()
print("Accuracy Score for Quadratic Kernel:", accuracy_score)

Accuracy Score for Quadratic Kernel: 0.8159186491745654


### 3.3. Radial Basis Function (RBF) Kernel

In [12]:
# Create the SVM model.
clf = svm.SVC(kernel='rbf')

In [13]:
# Train the model.
clf.fit(X_train, y_train.ravel());

In [14]:
# Calculate the accuracy using cross-validation.
accuracy_score = cross_val_score(clf, X_train, y_train.ravel(), cv=5, scoring='accuracy').mean()
print("Accuracy Score for RBF Kernel:", accuracy_score)

Accuracy Score for RBF Kernel: 0.8114305442219572
