# Introduction to Support Vector Machine (SVM)

>*SVM is one of the most popular algorithms in machine learning and data science. Since the discovery of this algorithm in the 1990s, it has been widely popular among experts.*

**SVM is a supervised Machine Learning algorithm that is used in many classifications and regression problems.**

Support vector machine works by finding an optimal separation line called a ‘hyperplane’ to accurately separate 2 or more different classes in a classification problem. The goal is to find the optimal hyperplane separation through training the linearly separable data with the SVM algorithm.

# What is Hyperplane?

Humans have progressively designed geometry. First is the point, then there is a line on which that point lies. Then we have planes on which this line can lie. Hyperplanes are a popular concept in geometry that can be considered a generalization of "planes" in different dimensions. 

In 2D space, it's a line; in 3D space, it's a plane; in P dimensional space, P-1 dimensional subspace.

>**As a fundamental nature, it separates the space into two parts.**

# Hyperplane as a boundary

![image.png](attachment:67c7362e-e8c8-41ba-a8a0-963cc4a41369.png)

![image.png](attachment:118cd336-c5b0-4dba-a55e-2a1834a6a44e.png)

# What is a Maximal Margin Classifier?

One natural choice for us can be selecting that hyperplane farthest from the training samples. The minimum of these distances from both sides of the plane will be called the Margin. Let’s say W1 and W2 are the margins
from green and red dots, respectively. We want to maximize this Margin.

![image.png](attachment:eddf1d9d-176e-4b58-9cbe-348d94704ce4.png)

As W1 + W2 is constant, we want both W1 and W2 to be maximum. In such a scenario, W1 = W2 = W is the perfect and optimal distance. The hyperplane passing from this distance will be called the maximal margin hyperplane. 

In layman's terms, we can say that the maximal margin hyperplane represents the mid-line of the widest slab that we can insert between two classes represented as green and red dots.

So our overall objective is to fine-tune the coefficient values (β0,β1, β2, …, βn) so that W gets maximized. But we should take care of two **"constraints"**:

1. All green and red dots should be present on the correct side of the line. In general, all training samples should be present on the correct side of the plane.
  
2. Distance between the maximal margin hyperplane and the training samples should always be greater than or equal to W (minimum margin distance) as we made W1=W2=W.

Let's say X = [X1, X2]' is a test vector from our test set. We need to put this vector in the hyperplane equation. The output value of the equation will decide the class of that test sample X. If positive, it will lie in the red category, and if negative, it will be in the green class. The output's magnitude is the vector's distance from the hyperplane. This distance is also regarded as the confidence score for the classification. The
higher the distance, the better will be the confidence. 

# What are Support Vectors?

Support vectors are the points that lie the closest to the hyperplane between the 2 or more classes. These are the data points that are the most difficult to classify. 

In general, the larger the margin or distance between the support vectors, the easier it is for the algorithm to classify accurately. Hence, once the hyperplane is optimised, it is said to be the optimal separation or the maximum margin classifier.

![image.png](attachment:e4efc33d-998b-4ff4-80f3-9346f3f48644.png)

## Linear Classification

Linear SVM can be applied on a linearly separable data. In a linearly separable data, a straight line function can be drawn to separate all the items in class A and class B.

![image.png](attachment:9ed1754f-2242-4199-85f4-072fad946c79.png)

A straight red line (hyperplane) can be optimised to differentiate items in Class A and Class B.

>***Actually, an infinite number of hyperplanes can be drawn to separate the classes for linearly separable data**

This is where the basic idea of SVM comes in play to find out the optimal hyperplane or the maximum margin classifier which is the farthest from the observations.

## Non-Linear Classification

What happens if the data that was presented to us for classification is not linearly separable? 

![1 XAxmE3M3rGFQJMcfP8PBwg.png](attachment:9518d1ba-c645-4417-951d-e18b10c47a21.png)

## What is Kernel Trick?

Kernel Trick is widely used in Support Vector Machines (SVM) model to bridge linearity and non-linearity. It converts non-linear lower dimension space to a higher dimension space thereby we can get a linear classification. So, we are projecting the data with some extra features so that it can convert to a higher dimension space.

![1 gXvhD4IomaC9Jb37tzDUVg.webp](attachment:f28616ba-456b-4f46-81a0-a542b98eb4c8.webp)

Let's suppose if we want to classify red squares and green dots its impossible to differentiate because it’s in the non-linear form. In real-world also data is scattered and it’s impossible to separate this data. So, we can use a Decision Surface where it can classify both green dots as well as red squares. Kernel Trick uses only the original feature space because when the dimension space increases it becomes more and more complex to classify.

## What are the types of Kernel methods in SVM models?

##### Support vector machines use various kinds of kernel methods in machine learning. Here are a few of them:

#### 1. Linear Kernel

- The linear kernel is the simplest and is used when the data is linearly separable.
- It calculates the dot product between the feature vectors.

    ***`K(x1, x2) = x1 . x2`***

![Linear_Kernel.png](attachment:ba097528-6722-44f7-b107-76f8e9449388.png)  

#### 2. Polynomial Kernel

- The polynomial kernel is effective for non-linear data.
- It computes the similarity between two vectors in terms of the polynomial of the original variables.

    ***`K(x1, x2) = (x1 . x2 + 1)d`***

![Polynomial_Kernel-300x258.webp](attachment:993f4132-a4f4-432b-89e9-39f327e8ece3.webp)  

#### 3. Gaussian Kernel

The Gaussian kernel is an example of a radial basis function kernel. It can be represented with this equation:

    ***`k(xi, xj) = exp(-𝛾||xi - xj||2)`***

#### 4. Exponential Kernel

Similar to the RBF kernel, but it decays much more quickly.

    ***`k(x, y) =exp(-||x -y||22)`***

#### 5. Hyperbolic or the Sigmoid Kernel

- The sigmoid SVM kernel types can be used as an alternative to the RBF kernel.
- It is based on the hyperbolic tangent function and is suitable for neural networks and other non-linear classifiers.
- It transforms the input data into a higher-dimensional space using the Sigmoid kernel.

    ***`k(x, y) = tanh(xTy + c)`***

![sigmoid_Kernel.webp](attachment:a693a03a-693f-4c95-a096-8a8c7b330c0e.webp)  

#### 6. Radial-basis function kernel

- The RBF kernel is a common type of Kernel in SVM for handling non-linear decision boundaries.
- It maps the data into an infinite-dimensional space.

    ***`K(x, y) = exp(-γ ||x - y||^2)`***

![RBF_Kernel.webp](attachment:b0f13e58-4845-4aac-abec-36e5402b6738.webp)  

## How to Choose the Right Kernel Function in SVM?

**The choice of kernel in SVM depends on the type of data being analyzed & the problem being solved. Here's how you can go about choosing the right kernel function:**

#### 1. Understand the problem:

Understand the type of data, features, and the complexity of the relationship between the features

#### 2. Choose a simple kernel function:

Start with the linear kernel function as it serves as the baseline for comparison with the complex kernel functions.

#### 3. Test different kernel functions:

Test polynomial, RBF kernel, etc and compare their performance.

#### 4. Tune the parameters:

Experiment with different parameter values & choose the values that deliver the best performance.

#### 5. Use domain knowledge:

Based on the type of data, use the domain knowledge & choose the right type of kernel for your data set.

#### 6. Consider computational complexity:

Calculate the computation type & the resources that would be required for larger data sets.