<h2>Support Vector Machines</h2>

In [48]:
import warnings
warnings.filterwarnings('ignore')

<h4>1. Concept - Linear SVM</h4>

Consider a classification problme with two features $x_1$ and $x_2$ as illustrated in the image below.
<img src="Media/svm_classification.png" width="400px"/>

A Support vector classifier aims to find an hyperplane ($n-1$ dimensional line) that will seperate the datapoints in two distinct classes. This hyperplane is computed using <b>optimisation theory</b> by search for the coefficient of the hyperplane that will <b>maximise the margin</b> between the hyperplane and its closest datapoints (i.e. <b>support vectors</b>).

Hyperplane equation for n features: 
$$
  y = f(w,x) = w_1x_1 + w_2x_2 + ...+w_{n-1}x_{n-1} + b
$$

$$
\text{Maximize:} \quad \frac{2}{\|w\|}
\\
\text{Subject to:} \quad y_i(w \cdot x_i + b) \geq 1 \quad \text{for all} \ i
$$

<b>is equivalent to </b>

$$
\text{Minimize:} \quad \frac{1}{2} \|w\|^2
\\
\text{Subject to:} \quad y_i(w \cdot x_i + b) \geq 1 \quad \text{for all} \ i
$$

$$
SVMClassifier(x_i) =\begin{cases}
   \text{if } f(w,x_i) \geq 0, \text{assign class 1}\\
    \text{if } f(w,x_i) < 0, \text{assign class 0}\\
\end{cases}
$$

<h4>2. Linear SVM - Soft Margin</h4>

<img src="Media/soft_margin.png"/>

Perfect separable datapoints are rare or not practical in the real-world. The concept of soft margin is introduced to accomodate such real world data context. Introduction of slack variables : $\xi_i = $ deviation of a data point from being correctly classified.

$$
\text{Minimize:} \quad \frac{1}{2} \|w\|^2 + C \sum_{i=1}^{N} \xi_i
\\
\text{Subject to:} \quad y_i(w \cdot x_i + b) \geq 1 - \xi_i \quad \text{for all} \ i
\\
\text{and} \quad \xi_i \geq 0 \quad \text{for all} \ i
$$

<b>C is a hyperparameter called the regularization parameter</b>. It controls the trade-off between maximizing the margin and minimizing the classification error. Higher values of C correspond to a harder margin (less tolerance for misclassification), while lower values allow for a softer margin (more tolerance for misclassification).

SVM works smoothly for linear separable datapoints. However when the datapoints aren't linear separable,
<b>Linear SVM is not effective</b>.

<figure>
    <figcaption>Figure 2: Linearly Separable datapoints vs Nonlinear Seperable datapoints</figcaption>
    <img src="Media/linear_vs_nonlinear_svm_datapoints.png" alt="Figure 1">
    
</figure>


<h4>2. Concept - Nonlinear SVM:  Kernel-based SVM</h4>

In order to mitigate the limitation of SVM with non separable datapoints, a Kernel transformation on the datapoins
can be performed that transform the datapoints into a different space in which they are sperable.


<img src="Media/kernel_transformation.png" width="500px">

The <b>Kernel</b> in this case, is the transformation function. Kernels can be of different types: 

<ul>
    <li>Radial Basis Kernel: $K(x,y) = exp\{-\frac{(x-y)^2}{2\sigma^2}\}$</li>
    <li>Polynomial Kernel: $K(x,y) = (x^Ty + 1)^p$</li>
    <li>Hyperbolic Tangent Kernel: $K(x,y) = tanh(kx^Ty+\delta)$</li>
    <li>etc.</li>
</ul>

<h4>Example: Linear and Nonlinear SVM</h4>

In [49]:
import pandas as pd

df = pd.read_csv('../../datasets/Healthcare-Diabetes.csv')
df.head()

Unnamed: 0,Id,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,1,6,148,72,35,0,33.6,0.627,50,1
1,2,1,85,66,29,0,26.6,0.351,31,0
2,3,8,183,64,0,0,23.3,0.672,32,1
3,4,1,89,66,23,94,28.1,0.167,21,0
4,5,0,137,40,35,168,43.1,2.288,33,1


In [50]:
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

X = df.iloc[:, 1:9]#get features
y = df.iloc[:,[-1]]#get target variable

#---Data Scaling
Sc = StandardScaler()
Sc.fit(X)
X_d = Sc.transform(X)

In [51]:
#--Train - Test Split
X_train, X_test,y_train,y_test = train_test_split(X_d,y, test_size=0.2, random_state=1234)

<h4>Linear SVM</h4>

In [52]:
from sklearn.svm import SVC
from sklearn import metrics

linear_svc = SVC(kernel='linear')
linear_svc.fit(X_train,y_train)

y_pred_test = linear_svc.predict(X_test)
class_acc_linear_svm = metrics.accuracy_score(y_test,y_pred_test)#get classification accuracy

In [53]:
from sklearn.metrics import classification_report
targets = ['no-diabetes','has-diabetes']

print("Linear SVM - Test Performance: \n",classification_report(y_test,y_pred_test,target_names=targets))

Linear SVM - Test Performance: 
               precision    recall  f1-score   support

 no-diabetes       0.77      0.93      0.84       349
has-diabetes       0.82      0.53      0.64       205

    accuracy                           0.78       554
   macro avg       0.80      0.73      0.74       554
weighted avg       0.79      0.78      0.77       554



<h4>Nonlinear SVM</h4>

In [54]:
rbf_svc = SVC(kernel='rbf')
rbf_svc.fit(X_train,y_train)

y_pred_test_rbfsvm = rbf_svc.predict(X_test)
class_acc_rbf_rbfsvm = metrics.accuracy_score(y_test,y_pred_test_rbfsvm)#get classification accuracy

print("RBF SVM - Test Performance: \n",classification_report(y_test,y_pred_test_rbfsvm,target_names=targets))

RBF SVM - Test Performance: 
               precision    recall  f1-score   support

 no-diabetes       0.82      0.95      0.88       349
has-diabetes       0.88      0.63      0.74       205

    accuracy                           0.83       554
   macro avg       0.85      0.79      0.81       554
weighted avg       0.84      0.83      0.82       554



<h4>Hyperparameter Tuning - SVM</h4>

In [55]:
parameters = {'kernel':('linear', 'rbf'),
              'C':(0.25,0.75,1.0),
              'gamma': (0.5,1,2)}

In [56]:
#--GridSearch: Exhaustive Search with Crossvalidation 

In [57]:
from sklearn.model_selection import GridSearchCV

svm = SVC()
clf = GridSearchCV(svm, parameters,cv=5)
clf.fit(X_train,y_train)
clf.best_params_

{'C': 1.0, 'gamma': 2, 'kernel': 'rbf'}

In [58]:
rbf_svc_hyper = SVC(kernel='rbf')
rbf_svc_hyper.fit(X_train,y_train)

y_pred_test_hyper = rbf_svc.predict(X_test)
class_acc_rbf_svm = metrics.accuracy_score(y_test,y_pred_test_hyper)#get classification accuracy

print("Hyper-RBF SVM - Test Performance: \n",classification_report(y_test,y_pred_test_hyper,target_names=targets))

Hyper-RBF SVM - Test Performance: 
               precision    recall  f1-score   support

 no-diabetes       0.82      0.95      0.88       349
has-diabetes       0.88      0.63      0.74       205

    accuracy                           0.83       554
   macro avg       0.85      0.79      0.81       554
weighted avg       0.84      0.83      0.82       554

