# **Support vector maschine classifier Model Theory**


## Theory
Support Vector Machine (SVM) is a discriminative classifier that finds an optimal hyperplane to separate classes in a high-dimensional space. The hyperplane maximizes the margin between classes, making it robust to outliers and effective for non-linear classification.
The main idea is to:
- Map data into high-dimensional feature space
- Find optimal separating hyperplane
- Maximize margin between classes
- Use kernel trick for non-linear classification

## Mathematical Formulation
- **Linear SVM Objective**:
$$ \min_{w,b} \frac{1}{2}||w||^2 + C\sum_{i=1}^{n} \xi_i $$
subject to:
$$ y_i(w^Tx_i + b) \geq 1 - \xi_i $$
$$ \xi_i \geq 0 $$

- **Dual Form**:
$$ \max_{\alpha} \sum_{i=1}^{n} \alpha_i - \frac{1}{2}\sum_{i,j=1}^{n} \alpha_i\alpha_jy_iy_jK(x_i,x_j) $$
subject to:
$$ 0 \leq \alpha_i \leq C $$
$$ \sum_{i=1}^{n} \alpha_iy_i = 0 $$

## Kernel Functions
- **Linear**: 
$$ K(x_i,x_j) = x_i^Tx_j $$
- **Polynomial**:
$$ K(x_i,x_j) = (x_i^Tx_j + r)^d $$
- **RBF (Gaussian)**:
$$ K(x_i,x_j) = \exp(-\gamma||x_i-x_j||^2) $$
- **Sigmoid**:
$$ K(x_i,x_j) = \tanh(\gamma x_i^Tx_j + r) $$

## Classification Process
1. **Training**:
- Choose appropriate kernel
- Optimize dual form to find support vectors
- Calculate decision boundary

2. **Prediction**:
$$ f(x) = \text{sign}(\sum_{i=1}^{n} \alpha_iy_iK(x_i,x) + b) $$

## Key Hyperparameters
- **C**: Regularization parameter
- **kernel**: Type of kernel function
- **gamma**: Kernel coefficient (RBF, Polynomial, Sigmoid)
- **degree**: Degree of polynomial kernel
- **coef0**: Independent term in kernel function

## Decision Function
- **Linear**:
$$ f(x) = w^Tx + b $$
- **Non-linear**:
$$ f(x) = \sum_{i \in SV} \alpha_iy_iK(x_i,x) + b $$

## Advantages
- Effective in high-dimensional spaces
- Memory efficient (uses support vectors)
- Versatile through different kernel functions
- Strong theoretical guarantees
- Robust to outliers

## Disadvantages
- Sensitive to choice of kernel and parameters
- Computationally intensive for large datasets
- Less efficient for non-linear problems with large datasets
- No probabilistic predictions directly
- Requires feature scaling

## Implementation Tips
- Scale features before training
- Use cross-validation for parameter tuning
- Consider linear kernel for high-dimensional data
- Use RBF kernel for non-linear problems
- Balance C parameter for regularization


## Model Evaluation for Support Vector Machine Classifier

### 1. Accuracy Score
Formula:
$$
\text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}}
$$
Description:
- Measures the overall correctness of SVM predictions.
- Most commonly used metric for balanced datasets.
Interpretation:
- Higher values indicate better classification performance.
- Should be used alongside other metrics for imbalanced data.
---

### 2. Margin Distance
Formula:
$$
\text{Margin} = \frac{2}{\|\mathbf{w}\|}
$$
Description:
- Distance between the hyperplane and the nearest support vectors.
- Key concept in SVM optimization.
Interpretation:
- Larger margin generally indicates better generalization.
- Trade-off between margin size and classification errors.
---

### 3. Support Vector Count
Description:
- Number of training points that lie on the margin boundaries.
- Subset of training data that defines the hyperplane.
Interpretation:
- Fewer support vectors may indicate better generalization.
- High number might suggest overfitting or complex decision boundary.
---

### 4. Kernel Function Metrics
Formula:
$$
K(x_i, x_j) = \phi(x_i)^T\phi(x_j)
$$
Description:
- Measures similarity between data points in transformed space.
- Common kernels: Linear, RBF (Gaussian), Polynomial.
Interpretation:
- Higher values indicate more similar points.
- Helps understand data separation in feature space.
---

### 5. Decision Function Values
Formula:
$$
f(x) = \mathbf{w}^T\phi(x) + b
$$
Description:
- Raw scores from the SVM decision function.
- Distance from test points to the decision boundary.
Interpretation:
- Magnitude indicates prediction confidence.
- Useful for probability calibration.
---

### 6. Dual Coefficients
Formula:
$$
\alpha_i \geq 0, \sum_{i=1}^n \alpha_i y_i = 0
$$
Description:
- Lagrange multipliers from SVM optimization.
- Non-zero values identify support vectors.
Interpretation:
- Magnitude indicates importance of support vectors.
- Used for feature importance analysis.
---

### 7. Cross-Validation Performance
Description:
- K-fold validation results with different kernel parameters.
- Includes bias-variance trade-off analysis.
Interpretation:
- Stable performance across folds indicates good generalization.
- Helps in kernel and hyperparameter selection.
---

### 8. ROC-AUC Score
Formula:
$$
\text{AUC} = \int_{0}^{1} \text{TPR}(t)\text{FPR}'(t)dt
$$
Description:
- Area under the ROC curve for binary classification.
- Measures discrimination ability across thresholds.
Interpretation:
- 1.0 represents perfect classification.
- 0.5 indicates random prediction.
---

### 9. Hinge Loss
Formula:
$$
L(y, f(x)) = \max(0, 1 - yf(x))
$$
Description:
- SVM-specific loss function.
- Penalizes misclassifications and small margins.
Interpretation:
- Lower values indicate better model fit.
- Zero loss means perfect classification with sufficient margin.
---

### 10. Kernel Alignment
Formula:
$$
\text{Alignment}(K_1, K_2) = \frac{\langle K_1, K_2 \rangle_F}{\sqrt{\langle K_1, K_1 \rangle_F \langle K_2, K_2 \rangle_F}}
$$
Description:
- Measures similarity between different kernel matrices.
- Used for kernel selection and combination.
Interpretation:
- Higher alignment suggests better kernel choice.
- Helps in optimizing kernel parameters.
---

## sklearn template [SVC](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)

### class sklearn.svm.SVC(*, C=1.0, kernel='rbf', degree=3, gamma='scale', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape='ovr', random_state=None)

| **Parameter**               | **Description**                                                                                                                                     | **Default**      |
|----------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|------------------|
| `C`                        | Regularization parameter. Strength of regularization is inversely proportional to C                                                                  | `1.0`            |
| `kernel`                   | Kernel type ('linear', 'poly', 'rbf', 'sigmoid', 'precomputed')                                                                                     | `rbf`            |
| `degree`                   | Degree of polynomial kernel function ('poly')                                                                                                        | `3`              |
| `gamma`                    | Kernel coefficient for 'rbf', 'poly', 'sigmoid'. ('scale', 'auto' or float)                                                                         | `scale`          |
| `coef0`                    | Independent term in kernel function for 'poly' and 'sigmoid'                                                                                         | `0.0`            |
| `probability`              | Whether to enable probability estimates                                                                                                              | `False`          |
| `class_weight`             | Weights for classes (dict or 'balanced')                                                                                                            | `None`           |
| `random_state`             | Controls random seed for probability estimates                                                                                                      | `None`           |
| `tol`                      | Tolerance for stopping criterion                                                                                                                     | `0.001`          |
| `max_iter`                 | Hard limit on iterations (-1 means no limit)                                                                                                         | `-1`             |

-

| **Attribute**              | **Description**                                                                                                                                     |
|----------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|
| `support_`                 | Indices of support vectors                                                                                                                          |
| `support_vectors_`         | Support vectors                                                                                                                                     |
| `n_support_`               | Number of support vectors for each class                                                                                                            |
| `dual_coef_`              | Coefficients of the support vectors in the decision function                                                                                        |
| `intercept_`              | Constants in decision function                                                                                                                      |
| `classes_`                 | The classes labels                                                                                                                                  |
| `n_features_in_`           | Number of features seen during fit                                                                                                                  |

-

| **Method**                 | **Description**                                                                                                                                     |
|----------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|
| `fit(X, y)`                | Fit the SVM model according to the given training data                                                                                              |
| `predict(X)`               | Perform classification on samples in X                                                                                                               |
| `predict_proba(X)`         | Predict probabilities for X (only if probability=True)                                                                                              |
| `decision_function(X)`     | Distance of samples X to the separating hyperplane                                                                                                  |
| `score(X, y)`              | Return the mean accuracy on the given test data and labels                                                                                          |
| `get_params()`             | Get parameters for this estimator                                                                                                                   |

# Support vector maschine classifier - Example

## Data loading

In [2]:
#Import python packages
import numpy as np 
import pandas as pd 

#Import the heart data
data = pd.read_csv("/home/petar-ubuntu/Learning/ML_Theory/ML_Models/Supervised Learning/Classification models/SupportVectorMaschineClassifier/data/heart.csv")

#Display first 5 lines of heart data
data.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


##  Data processing

In [3]:
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split # Import train_test_split function

#Separate Feature and Target Matrix
x = data.drop('target',axis = 1) 
y = data.target

# Split dataset into training set and test set
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3,random_state=109) # 70% training and 30% test

## Plotting data

## Model definition

In [4]:
from sklearn import svm #Import svm model

#Create a svm Classifier
ml = svm.SVC(kernel='linear') # Linear Kernel

#Train the model using the training sets
ml.fit(x_train, y_train)

#Predict the response for test dataset
y_pred = ml.predict(x_test)


## Model evaulation

In [5]:
from sklearn import metrics #Import scikit-learn metrics module for accuracy calculation
from sklearn.metrics import confusion_matrix

# Model Accuracy: how often is the classifier correct?
ml.score(x_test,y_test)

0.9010989010989011

In [6]:
confusion_matrix(y_test,y_pred)

array([[35,  5],
       [ 4, 47]])