# 2.1.2.6. SUPPORT VECTOR MACHINE(SVM)

## DEFINITION
**SVM (Support Vector Machine)** is a powerful and popular machine learning algorithm for classification problems. It is **based on the idea of finding the optimal hyperplane that separates the data points into different classes with the maximum margin**.

## STEPS
1. Given a set of labeled data points that belong to two classes, find the hyperplane that maximizes the margin between the classes. The margin is defined as the distance between the hyperplane and the closest data points from each class, which are called support vectors. The hyperplane can be represented by a linear equation of the form w^T x + b = 0, where w is the normal vector and b is the bias term.
2. To find the optimal hyperplane, we need to solve an optimization problem that minimizes the norm of w subject to some constraints that ensure that the data points are correctly classified. This can be done using a technique called Lagrange multipliers, which transforms the problem into a dual form that involves only the dot products between the data points.
3. To make predictions for a new data point x, we need to evaluate the sign of w^T x + b. If it is positive, then x belongs to the positive class; if it is negative, then x belongs to the negative class; if it is zero, then x lies on the hyperplane.

## ADVANTAGES
- It can achieve high accuracy and generalization performance with a small number of support vectors, which reduces the computational cost and memory requirement.
- It can handle both linear and nonlinear classification problems by using different kernels, which are functions that map the data points into a higher-dimensional feature space where they become linearly separable.
- It can deal with imbalanced data sets by using different penalty parameters for different classes, which allows us to control the trade-off between precision and recall.

## DISADVANTAGES
- It can be sensitive to noise and outliers, which can affect the position of the hyperplane and reduce the margin.
- It can be difficult to choose the best kernel and its parameters, which can have a significant impact on the performance and complexity of the model.
- It does not provide probability estimates for the predictions, which can be useful for some applications.

## KERNEL CHOICE
Choosing the best kernel and its parameters is not trivial and depends on several factors, such as:

- **The characteristics and distribution of the data set**. A linear kernel may work well for simple or linearly separable data sets, while a nonlinear kernel may be needed for complex or nonlinearly separable data sets.
- **The trade-off between bias and variance**. A more complex kernel may lead to low bias but high variance, meaning that it can capture the nonlinear patterns well but may overfit the noise. A simpler kernel may lead to high bias but low variance, meaning that it can avoid overfitting but may miss some important details.
- **The computational cost and efficiency**. A more complex kernel may require more computation time and memory space than a simpler kernel.

## OPTIMALITY
One way to find the optimal kernel and its parameters is to use cross-validation, which involves splitting the data into training and validation sets, applying different kernels and parameters on the training set, and evaluating their performance on the validation set. The kernel and parameters that minimize the validation error can be chosen as the best ones.

## CONCLUSION
In conclusion, SVM is a powerful and popular machine learning algorithm for classification problems that relies on finding the optimal hyperplane that separates the data points into different classes with the maximum margin. It has some advantages such as being accurate and flexible to different types of data, but also some disadvantages such as being sensitive to noise and outliers and requiring a good choice of kernel and parameters. Choosing the best kernel and parameters is crucial for achieving good results with SVM and can be done using cross-validation or other methods.

## HANDS-ON: SVM Classification

### 1. IMPORTS

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

### 2. DATASET

In [2]:
iris = load_iris()
data = pd.DataFrame(data=iris['data'], columns=iris['feature_names'])
data['species'] = iris['target']

### 3. PREPROCESSING

In [3]:
X = data.drop('species', axis=1)
y = data['species']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

### 4. SVM CLASSIFIER WITH LINEAR KERNEL

In [4]:
svm = SVC(kernel='linear', C=1)
svm.fit(X_train, y_train)

### 5. PREDICTION AND EVALUATION

In [5]:
# Make predictions on the testing data
y_pred = svm.predict(X_test)

# Measure the performance of the model
print('Accuracy:', accuracy_score(y_test, y_pred))

Accuracy: 1.0


## REFERENCES
1. https://towardsdatascience.com/svm-support-vector-machine-for-classification-710a009f6873
2. https://scikit-learn.org/stable/modules/svm.html
3. https://link.springer.com/chapter/10.1007/978-1-4302-5990-9_3
4. https://www.geeksforgeeks.org/introduction-to-support-vector-machines-svm/
5. https://www.geeksforgeeks.org/support-vector-machine-algorithm/