# Support Vector Machine


# The Technique (Support Vector Machine)

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for Classification as well as Regression problems. However, primarily, it is used for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. This best decision boundary is called a hyperplane.

# The Problem
SVM can be understood with the example that we have used in the KNN classifier. Suppose we see a strange cat that also has some features of dogs, so if we want a model that can accurately identify whether it is a cat or dog, so such a model can be created by using the SVM algorithm. We will first train our model with lots of images of cats and dogs so that it can learn about different features of cats and dogs, and then we test it with this strange creature.

# Code

In [None]:
# Importing Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score
from sklearn.model_selection import cross_val_score
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import GridSearchCV
from sklearn import svm

# Importing the Dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

# Assign colum names to the dataset
colnames = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

# Read dataset to pandas dataframe
irisdata = pd.read_csv(url, names=colnames)
# Preprocessing

print(irisdata.head())
X = irisdata.drop('Class', axis=1)
y = irisdata['Class']
# Train Test Split
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)

   sepal-length  sepal-width  petal-length  petal-width        Class
0           5.1          3.5           1.4          0.2  Iris-setosa
1           4.9          3.0           1.4          0.2  Iris-setosa
2           4.7          3.2           1.3          0.2  Iris-setosa
3           4.6          3.1           1.5          0.2  Iris-setosa
4           5.0          3.6           1.4          0.2  Iris-setosa


# Results For RBF Kernel

In [None]:
# RBF Kernel
from sklearn.svm import SVC

svclassifier1 = SVC(kernel='rbf', degree=2)
svclassifier1.fit(X_train, y_train)
# Making Predictions
y_pred = svclassifier1.predict(X_test)
# Evaluating the Algorithm
from sklearn.metrics import classification_report, confusion_matrix

print('RBF Kernel Results:')
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

RBF Kernel Results:
[[ 9  0  0]
 [ 0  9  2]
 [ 0  0 10]]
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00         9
Iris-versicolor       1.00      0.82      0.90        11
 Iris-virginica       0.83      1.00      0.91        10

       accuracy                           0.93        30
      macro avg       0.94      0.94      0.94        30
   weighted avg       0.94      0.93      0.93        30



# Results for Polynomial Kernel

In [None]:
# Polynomial Kernel
from sklearn.svm import SVC

svclassifier = SVC(kernel='poly', degree=2)
svclassifier.fit(X_train, y_train)
# Making Predictions
y_pred = svclassifier.predict(X_test)
# Evaluating the Algorithm
from sklearn.metrics import classification_report, confusion_matrix

print('Polynomial Kernel Results:')
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

Polynomial Kernel Results:
[[ 9  0  0]
 [ 0 11  0]
 [ 0  1  9]]
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00         9
Iris-versicolor       0.92      1.00      0.96        11
 Iris-virginica       1.00      0.90      0.95        10

       accuracy                           0.97        30
      macro avg       0.97      0.97      0.97        30
   weighted avg       0.97      0.97      0.97        30



# Result for Sigmoid Kernel

In [None]:
# Sigmoid Kernel
from sklearn.svm import SVC

svclassifier2 = SVC(kernel='sigmoid', degree=2)
svclassifier2.fit(X_train, y_train)
# Making Predictions
y_pred = svclassifier2.predict(X_test)

# Evaluating the Algorithm
from sklearn.metrics import classification_report, confusion_matrix

print('Sigmoid Kernel Results:')
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

Sigmoid Kernel Results:
[[ 9  0  0]
 [11  0  0]
 [10  0  0]]
                 precision    recall  f1-score   support

    Iris-setosa       0.30      1.00      0.46         9
Iris-versicolor       0.00      0.00      0.00        11
 Iris-virginica       0.00      0.00      0.00        10

       accuracy                           0.30        30
      macro avg       0.10      0.33      0.15        30
   weighted avg       0.09      0.30      0.14        30



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


# Lab Assigned

Apply SVM on new dataset spam.csv

# Code

In [None]:
# New Data set

# Assign colum names to the dataset
colnames = ['label', 'message', 'petal_length', 'petal_width', 'Class']

# Read dataset to pandas dataframe
# from google.colab import files
# upload = files.upload()
df = pd.read_csv('spam.csv', names=colnames, encoding="ISO-8859-1")
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
df.replace(np.nan,0, inplace=True)
# print(df.petal_length.head())
# print(df.head())

# Preprocessing
X = df['message'].values
y = df['label'].values
# Train Test Split
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state= 0)

# Converting String to Integer
cv = CountVectorizer()
X_train = cv.fit_transform(X_train)
X_test = cv.transform(X_test)

# Polynomial Kernel
from sklearn.svm import SVC

svclassifier3 = SVC(kernel='poly', degree=2, random_state=0)
svclassifier3.fit(X_train, y_train)
# Making Predictions
y_pred = svclassifier3.predict(X_test)
# Evaluating the Algorithm
from sklearn.metrics import classification_report, confusion_matrix

print('Spam Polynomial Kernel Results:')
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

# RBF Kernel
from sklearn.svm import SVC

svclassifier4 = SVC(kernel='rbf', random_state=0)
svclassifier4.fit(X_train, y_train)
# Making Predictions
y_pred = svclassifier4.predict(X_test)
# Evaluating the Algorithm
from sklearn.metrics import classification_report, confusion_matrix

print('Spam RBF Kernel Results:')
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

# Sigmoid Kernel
from sklearn.svm import SVC

svclassifier5 = SVC(kernel='sigmoid', degree=2)
svclassifier5.fit(X_train, y_train)
# Making Predictions
y_pred = svclassifier5.predict(X_test)

Spam Polynomial Kernel Results:
[[952   0]
 [ 38 125]]
              precision    recall  f1-score   support

         ham       0.96      1.00      0.98       952
        spam       1.00      0.77      0.87       163

    accuracy                           0.97      1115
   macro avg       0.98      0.88      0.92      1115
weighted avg       0.97      0.97      0.96      1115

Spam RBF Kernel Results:
[[950   2]
 [ 21 142]]
              precision    recall  f1-score   support

         ham       0.98      1.00      0.99       952
        spam       0.99      0.87      0.93       163

    accuracy                           0.98      1115
   macro avg       0.98      0.93      0.96      1115
weighted avg       0.98      0.98      0.98      1115



# Confusion Matrix

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

print('Spam Sigmoid Kernel Results:')
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

Spam Sigmoid Kernel Results:
[[927  25]
 [ 38 125]]
              precision    recall  f1-score   support

         ham       0.96      0.97      0.97       952
        spam       0.83      0.77      0.80       163

    accuracy                           0.94      1115
   macro avg       0.90      0.87      0.88      1115
weighted avg       0.94      0.94      0.94      1115



# Result

In [None]:
print('RBF Kernel',svclassifier4.score(X_test, y_test))

print('Polynomial Kernel',svclassifier3.score(X_test, y_test))

print('Sigmodial Kernal', svclassifier5.score(X_test,y_test))

RBF Kernel 0.979372197309417
Polynomial Kernel 0.9659192825112107
Sigmodial Kernal 0.9434977578475336


# Conclusion

The method of support vector machines as an alternative to the conservative logistic regression models was studied and its performance compared on the real credit data sets. Especially in combination with the non-linear kernel, SVM proved itself as a competitive approach and provided a slight edge on top of the logistic regression model.