# Support Vector Machine (SVM) Classification Algorithm

## Introduction

Support Vector Machine (SVM) is a supervised machine learning algorithm that can be used for both classification and regression tasks. However, it is primarily used for classification. SVM aims to find a hyperplane in an N-dimensional space (where N is the number of features) that distinctly classifies the data points.

SVM is highly effective in high-dimensional spaces and is still effective when the number of dimensions is greater than the number of samples. It performs well in cases where the data is not linearly separable by transforming the original features into higher-dimensional space (using a kernel trick).

### Why is SVM Used?

- **Effective in High Dimensions**: SVM performs well in high-dimensional spaces, making it suitable for datasets with many features.
- **Works Well with Non-Linear Data**: By using the kernel trick, SVM can handle non-linear classification problems.
- **Robust**: SVM is particularly effective in cases where there is a clear margin of separation between the classes.

## How Does SVM Work?

1. **Hyperplane**: In SVM, the idea is to find a hyperplane that best divides the dataset into classes. For 2D data, this hyperplane is simply a line, and in higher dimensions, it is a plane or a hyperplane.

2. **Margin**: The margin is the distance between the hyperplane and the nearest data point from either class. SVM tries to maximize this margin to increase the classifier's robustness.

3. **Support Vectors**: The data points that are closest to the hyperplane are called support vectors. These support vectors define the position of the hyperplane and are critical to the classifier's performance.

4. **Kernel Trick**: When the data is not linearly separable, SVM uses a technique called the kernel trick. It maps the data points into a higher-dimensional space where they are linearly separable, allowing the SVM to find a hyperplane for classification.

### Types of SVM

- **Linear SVM**: Used when the data is linearly separable.
- **Non-Linear SVM**: When the data is not linearly separable, SVM uses different kernels such as polynomial, Gaussian Radial Basis Function (RBF), and sigmoid to map the data into a higher-dimensional space.
---
## Key Terms in SVM

1. **Support Vectors**: These are the data points that are closest to the decision boundary or hyperplane. These points are crucial as they influence the placement of the hyperplane.

2. **Margin**: The margin is the distance between the support vectors and the decision boundary. SVM aims to maximize this margin for optimal classification.

3. **Kernel Function**: A function used to map the input features into a higher-dimensional space where a linear decision boundary can be found. Popular kernel functions are:
   - **Linear Kernel**: Used when the data is linearly separable.
   - **Polynomial Kernel**: Used when the relationship between the features is polynomial.
   - **RBF (Radial Basis Function) Kernel**: Used when there is a non-linear relationship between the features.
   - **Sigmoid Kernel**: Used in some special cases, similar to the activation function in neural networks.
---
## Mathematical Formulation

1. **Objective**: The objective of SVM is to find a hyperplane that maximizes the margin between the two classes.

2. **Equation of Hyperplane**: The equation of the hyperplane is given by:

   $$
   \mathbf{w} \cdot \mathbf{x} + b = 0
   $$

   Where:
   - $\mathbf{w}$ is the weight vector normal to the hyperplane.
   - $\mathbf{x}$ is the input feature vector.
   - $b$ is the bias term.

3. **Maximizing the Margin**: The margin is given by:

   $$
   \text{Margin} = \frac{1}{\|\mathbf{w}\|}
   $$

4. **Optimization Problem**: The optimization problem becomes:

   $$
   \text{Minimize} \quad \frac{1}{2} \|\mathbf{w}\|^2
   $$

   Subject to:

   $$
   y_i (\mathbf{w} \cdot \mathbf{x}_i + b) \geq 1 \quad \text{for each} \quad i
   $$

   Where $y_i$ is the class label (+1 or -1) of the $i$-th sample.
---
## Advantages of SVM

- **Effective in high-dimensional spaces**: Works well with datasets that have many features.
- **Memory Efficient**: SVM is memory efficient as it only requires a subset of the training data (the support vectors) to build the classifier.
- **Versatile**: SVM can handle both linear and non-linear classification problems using the kernel trick.
- **Robust to Overfitting**: SVM is less prone to overfitting, especially in high-dimensional spaces.

## Disadvantages of SVM

- **Computationally Expensive**: SVM can be computationally expensive, especially with large datasets.
- **Requires Tuning**: SVM requires careful tuning of parameters such as the regularization parameter (C) and the kernel function parameters.
- **Not Ideal for Large Datasets**: SVMs do not scale well with large datasets, and the training time increases significantly as the dataset grows.

## Applications of SVM

- **Text Classification**: SVM is commonly used for document classification, such as spam detection and sentiment analysis.
- **Image Classification**: SVM is effective in classifying images, especially in object recognition tasks.
- **Bioinformatics**: SVM has been used in DNA sequence classification, protein structure prediction, and other bioinformatics applications.
- **Face Detection**: SVM is used in face detection tasks to identify whether an image contains a human face.

---
Support Vector Machine (SVM) is a powerful algorithm used for classification tasks. It works well for both linear and non-linear classification problems. SVM’s ability to maximize the margin between classes and its usage of the kernel trick make it a versatile algorithm for many machine learning applications. However, the algorithm can be computationally expensive and may require tuning to achieve the best performance.


In [1]:
# Importing the required libraries
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score


ModuleNotFoundError: No module named 'sklearn'

In [None]:
# Loading a sample dataset (Iris dataset)
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Displaying the first few rows of the dataset
data = pd.DataFrame(data=X, columns=iris.feature_names)
data['target'] = y
data.head()


In [None]:
# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Checking the dimensions of the split data
print(f"Training data shape: {X_train.shape}")
print(f"Testing data shape: {X_test.shape}")


In [None]:
# Creating the SVM model with a linear kernel
svm_model = SVC(kernel='linear')

# Training the model with the training data
svm_model.fit(X_train, y_train)


In [None]:
# Making predictions using the test set
y_pred = svm_model.predict(X_test)

# Displaying the predicted labels
print(f"Predicted Labels: {y_pred}")


In [None]:
# Evaluating the model performance using accuracy and classification report
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy*100:.2f}%")

# Displaying the classification report for detailed evaluation
print("\nClassification Report:")
print(classification_report(y_test, y_pred))


In [None]:
import matplotlib.pyplot as plt

# For 2D data visualization (e.g., using only two features for simplicity)
X_2d = X[:, :2]
X_train_2d, X_test_2d, y_train, y_test = train_test_split(X_2d, y, test_size=0.2, random_state=42)

# Create the SVM model with a linear kernel
svm_model_2d = SVC(kernel='linear')
svm_model_2d.fit(X_train_2d, y_train)

# Plotting the decision boundary
plt.figure(figsize=(8, 6))

# Generate a grid of points to plot decision boundaries
xx, yy = np.meshgrid(np.linspace(X_train_2d[:, 0].min(), X_train_2d[:, 0].max(), 100),
                     np.linspace(X_train_2d[:, 1].min(), X_train_2d[:, 1].max(), 100))

# Predict class labels for each point in the grid
Z = svm_model_2d.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plotting the decision boundaries
plt.contourf(xx, yy, Z, alpha=0.4)

# Plotting the training points
plt.scatter(X_train_2d[:, 0], X_train_2d[:, 1], c=y_train, edgecolors='k', marker='o')
plt.title('SVM Decision Boundary (Linear Kernel)')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
