# Lab 7 - Support Vector Machine

## 1.
Please answer these questions which are related to `SVM`. Refer the lecture notes titled `“Support Vector Machine”` and `“Perceptron”` before start the assignment

1.1 In SVM what is the meaning of margin? Which are the equations of the two margin hyperplans H+ and H- ? (1 Mark)

In SVM `Margin` is the distance between the line and the closest data points (margin hyperplanes H+ and H-)
<br><br>
The equations of the two margin hyperplans:<br>
<br>
H+: WX + b = +1  <br>
H-: WX + b = -1

1.2 Consider the three linearly separable two-dimensional input vectors in the following figure. Find the linear SVM that optimally separates the classes by maximizing the margin. (1 Mark)

<img src="https://cdn.discordapp.com/attachments/941943020578820136/951319829825060894/Screen_Shot_2022-03-09_at_10.24.39_PM.png" width=50% alt="figure">


The 3 points on the given figure represents support vectors.<br>
The margin hyperplan H+ is the line that passes throught the positive points <br>
The margin hyperplan H- is parallel from H+ and passes through the negative point.<br>
The decision boundary is the line in the center, which is halfway between H+ and H-.<br>
The equation of the decision boundary is `-x+2=0`.

1.3 What is a kernel function? (1 Mark)

<dl>
<dt>Kernel</dt>
<dd>- method of using linear classifier to solve a non-linear problem <i>(Kernel Trick Method)</i></dd> 
<dd>- Function that is used in place of a "dot product" between two vectors</dd>
<dd>- Transforming linearly inseparable data to linearly separable ones</dd>
</dl>
<br>

<dl>
<dt>Kernel Functions</dt>
<dd>- Used to implicitly map to a new feature space <i>(Based on dot product)</i></dd> 
<dd>- It is important as it summarizes all the data via the kernel matrix.</dd>
<dd>- Used as parameters in SVM codes to determine the shape of the hyperplane and decision boundary <em>(Does all the hard work without complex calculations)</em> </dd>
</dl>

<br>
Kernel types:
<ul>
    <li> Linear</li>
    <li> Polynomial</li>
    <li> Gaussian RBF</li>
    <li> Sigmoid </li>
    <li>Chi-squared</li>
</ul>

<hr>

In [50]:
#Import
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn import svm

## Heart Disease Dataset

<br>
<b>Heart Disease Dataset:</b> Here, is the link for heart disease dataset of patients. <a>http://archive.ics.uci.edu/ml/datasets/Heart+Disease</a> 
<br>
After going to this link you will find two folders: 
<li>One: Data Folder</li> 
<li>two: Dataset description.</li> </em> It is better to use processed cleveland data. 
<br>
<br>
In the dataset description folder, you will find the description about the columns’ names referring to the 14 column of the dataset as the following: <strong>The last one attribute (number 14) is the result. </strong> Include your R source code of regression analysis, training and generating results. Here are the example of attributes and their Information (please see data set documents for more details)

<ol>
<li> #3 (age) </li>
<li> #4 (sex) </li>
<li> #9 (cp) </li>
<li> #10 (trestbps)</li> 
<li> #12 (chol)</li>
<li> #16 (fbs) </li>
<li> #19 (restecg) </li>
<li> #32 (thalach) </li>
<li> #38 (exang) </li>
<li> #40 (oldpeak)</li>
</ol> .........
<li>13. #51 (thal)</li>  
<li>14. #58 (num)</li>  
<br>
--------------------------->result <br> 


In [51]:
#read in heart disease data
heartDiseaseData = pd.read_csv('heart-disease-dataset1.csv')
heartDiseaseData.describe()

Unnamed: 0,age,sex,cp,tresbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,result
count,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0
mean,54.438944,0.679868,3.158416,131.689769,246.693069,0.148515,0.990099,149.607261,0.326733,1.039604,1.60066,0.937294
std,9.038662,0.467299,0.960126,17.599748,51.776918,0.356198,0.994971,22.875003,0.469794,1.161075,0.616226,1.228536
min,29.0,0.0,1.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,1.0,0.0
25%,48.0,0.0,3.0,120.0,211.0,0.0,0.0,133.5,0.0,0.0,1.0,0.0
50%,56.0,1.0,3.0,130.0,241.0,0.0,1.0,153.0,0.0,0.8,2.0,0.0
75%,61.0,1.0,4.0,140.0,275.0,0.0,2.0,166.0,1.0,1.6,2.0,2.0
max,77.0,1.0,4.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,3.0,4.0


### Data Cleaning

In [52]:
print("Number of missing data:",heartDiseaseData.isna().any().sum())
print("Number of duplicate data:",heartDiseaseData.duplicated().sum())
#replacing all the '?' to 0
heartDiseaseData = heartDiseaseData.replace('?',0)

#change object type to int
heartDiseaseData['thal'] = heartDiseaseData['thal'].astype(float)
heartDiseaseData['ca'] = heartDiseaseData['ca'].astype(float)

Number of missing data: 0
Number of duplicate data: 0


<hr>

## 2. 
Compare Neural Network and SVM in Classification of heart disease data set in Python language. You can use the sklearn Python library to implement both Neural Networks and SVM. <br><br>For `SVM`, build the model by changing the `different kernels` such as `Linear`, `Gaussian` and `Sigmoid` and note down the <u>model accuracy</u>. <br> Similarly, use `Stochastic Gradient Descent` and `Adam Gradient Descent` to build the **multi-layer Neural Network** and note down the <u>model accuracy</u> for each. <br><br>Finally, tell us which model performs better and why?
(5 Marks)

In [61]:
from sklearn.preprocessing import scale

X = heartDiseaseData.drop(['result'],axis=1)
Y = heartDiseaseData['result']

#standardized data set
X = scale(X)
X_train, X_test, y_train,y_test = train_test_split(X,y,test_size=0.2, random_state=1)


print('----------------SVM Classification---------------------')
clf = svm.SVC(kernel ='linear')
clf.fit(X_train,y_train)
print('Linear Kernel Accuracy:', clf.score(X_test,y_test))

clf = svm.SVC(kernel ='rbf')
clf.fit(X_train,y_train)
print('Gaussian RBF Kernel Accuracy:', clf.score(X_test,y_test))

clf = svm.SVC(kernel ='sigmoid')
clf.fit(X_train,y_train)
print('Sigmoid Kernel Accuracy:', clf.score(X_test,y_test))

#----------Multi-layer Neural Network----------------
print('\n----------------Multi-layer Neural Network---------------------')
clf = MLPClassifier(solver='sgd',random_state=1,max_iter=1000)
clf.fit(X_train,y_train)
print('Stochastic Gradient Descent Accuracy:', clf.score(X_test,y_test))

clf = MLPClassifier(solver='adam',random_state=1,max_iter=1000)
clf.fit(X_train,y_train)
print('MLP Adam Gradient Descent Accuracy:', clf.score(X_test,y_test))

----------------SVM Classification---------------------
Linear Kernel Accuracy: 0.639344262295082
Gaussian RBF Kernel Accuracy: 0.6229508196721312
Sigmoid Kernel Accuracy: 0.6229508196721312

----------------Multi-layer Neural Network---------------------




Stochastic Gradient Descent Accuracy: 0.6229508196721312
MLP Adam Gradient Descent Accuracy: 0.5737704918032787




|Model|Kernel|Accuracy|
|--------|-----------|----------------|
|SVM Classification|Linear |0.64|
|SVM Classification|RBF|0.62|
|SVM Classification|Sigmoid|0.62|
|Multi-Layer Neural Network|Stochastic Gradient Descent|0.62|
|Multi-Layer Neural Network|Adam Gradient Descent|0.57|

<br>

## Results
Based on the results, the model accuracy for `SVM Classification` is higher than Neural Network.
we can see that `Linear Kernel accuracy` is `0.64` which is the highest accuracy out of all the models and beating the Neural network values (`0.62` and `0.57`). The other SVM classifications are also very close and slightly higher than the Neural Network models.
<br> <br>

## Explanation
Some reasons why `SVM` might be better than Neural Networks is that SVMS are generally very `fast` to train since SVM uses a `subset of dataset` as training data, whereas for `Neural networks` it is dependent on the order that the data is presented so it is required to `process` the `entire` training dataset. Therefore it takes a bit of time and it is also expensive in the case of restarting training and initializing the data.
<br>

