# <p style='text-align: center;'> Support Vector Machine (SVM) Algorithm </p>

- Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for Classification as well as Regression problems. However, primarily, it is used for Classification problems in Machine Learning.


- The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. This best decision boundary is called a **hyperplane**.


- SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are called as **support vectors**, and hence algorithm is termed as **Support Vector Machine**. Consider the below diagram in which there are two different categories that are classified using a decision boundary or hyperplane :

![image.png](attachment:image.png)

- **Example:** Suppose we see a strange cat that also has some features of dogs, so if we want a model that can accurately identify whether it is a cat or dog, so such a model can be created by using the SVM algorithm. We will first train our model with lots of images of cats and dogs so that it can learn about different features of cats and dogs, and then we test it with this strange creature. So as support vector creates a decision boundary between these two data (cat and dog) and choose extreme cases (support vectors), it will see the extreme case of cat and dog. On the basis of the support vectors, it will classify it as a cat. Consider the below diagram :

![image.png](attachment:image.png)




- SVM algorithm can be used for Face detection, image classification, text categorization, etc.

## Types of SVM :
<b> There are two types in SVM, they are :
   - Linear SVM
   - Non-linear SVM
    
    
- **Linear SVM:** Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier.
    
    
- **Non-linear SVM:** Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used is called as Non-linear SVM classifier.

## Hyperplane and Support Vectors in the SVM algorithm :
- **Hyperplane:** There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space, but we need to find out the best fit line/decision boundary that helps to classify the data points. This best boundary is known as the **hyperplane of SVM**.


- The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2 features (as shown in image), then hyperplane will be a straight line. And if there are 3 features, then hyperplane will be a 2-dimension plane. We always create a hyperplane that has a maximum margin, which means the maximum distance between the data points.


- The data points or vectors that are the closest to the hyperplane and which affect the position of the hyperplane are termed as **Support Vector**. Since these vectors support the hyperplane, hence called a **Support vector**.



## How does SVM works ?
### Linear SVM :
The working of the SVM algorithm can be understood by using an example. Suppose we have a dataset that has two tags (green and blue), and the dataset has two features x1 and x2. We want a classifier that can classify the pair(x1, x2) of coordinates in either green or blue. Consider the below image :

![image.png](attachment:image.png)


So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But there can be multiple lines that can separate these classes. Consider the below image :

![image-2.png](attachment:image-2.png)


Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or region is called as a **hyperplane**. SVM algorithm finds the closest point of the lines from both the classes. These points are called **support vectors**. The distance between the vectors and the hyperplane is called as **margin**. And the goal of SVM is to maximize this margin. The hyperplane with maximum margin is called the **optimal hyperplane**.

![image-3.png](attachment:image-3.png)

### Non-Linear SVM :
If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we cannot draw a single straight line. Consider the below image :

![image.png](attachment:image.png)


So to separate these data points, we need to add one more dimension. For linear data, we have used two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be calculated as :

    z=x^2 +y^2
    
    
By adding the third dimension, the sample space will become as below image :

![image-2.png](attachment:image-2.png)


So now, SVM will divide the datasets into classes in the following way. Consider the below image :

![image-3.png](attachment:image-3.png)


Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert it in 2d space with z=1, then it will become as :

![image-4.png](attachment:image-4.png)


Hence we get a circumference of radius 1 in case of non-linear data.

## Types of Kernel Functions in Support Vector Machine (SVM) :
Kernel Function is a method used to take data as input and transform it into the required form of processing data. “Kernel” is used due to a set of mathematical functions used in Support Vector Machine providing the window to manipulate the data.


A kernel is a function used in SVM for helping to solve problems. With the help of the kernel, we can go to higher dimensions and perform smooth calculations. We can go up to an infinite number of dimensions using kernels. Kernel plays a vital role in classifying and analyzing some given dataset patterns. They are very helpful in solving a non-linear problem by using a linear classifier.


### Some commonly used kernel types are : 

<b> 1) Linear Kernel :
    
It is the most basic kernel type, usually one-dimensional in nature. It proves to be the best function when there are lots of features. Linear kernel functions are faster than other functions.
    

<b> Linear Kernel Formula :
    
    F(x, xj) = sum( x.xj)
    
    
Here, x, xj represents the data we’re trying to classify
    
    
![image.png](attachment:image.png)
    

<b> 2) Polynomial Kernel :
    
It is a more generalized representation of the linear kernel. It is not as preferred as other kernel functions as it is less efficient and accurate.
    
    
<b> Polynomial Kernel Formula :
    
    F(x, xj) = (x.xj+1)^d
    
    
Here ‘.’ shows the dot product of both the values and d denotes the degree.
F(x, xj) represents the decision boundary to separate the given classes.
    
![image-2.png](attachment:image-2.png)

<b> 3) Gaussian RBF kernel :
    
It is one of the most preferred and used kernel functions in SVM. It is usually chosen for non-linear data. It helps to make proper separation when there is no prior knowledge of data.
    
    
<b> Gaussian Radial Basis Formula :
    
    F(x, xj) = exp(-gamma * ||x - xj||^2)
    
    
The value of gamma varies from 0 to 1. We have to provide the value of gamma in the code manually. The most preferred value for gamma is 0.1.
    

![image.png](attachment:image.png)
- 

<b> 4) Sigmoid Kernel :
    
It is mostly preferred for neural networks. This kernel function is similar to a two-layer perceptron model of the neural network, which works as an activation function for neurons.
    
    
<b> Sigmoid Kenel Formula :
    
    F(x, xj) = tanh(αxay + c)
    
    
![image.png](attachment:image.png)


### Regularization :

The Regularization Parameter (in python it’s called C) tells the SVM optimization how much you want to avoid miss classifying each training example.

If the C is higher, the optimization will choose smaller margin hyperplane, so training data miss classification rate will be lower.

On the other hand, if the C is low, then the margin will be big, even if there will be miss classified training data examples. This is shown in the following two diagrams :

![image.png](attachment:image.png)


As you can see in the image, when the C is low, the margin is higher (so implicitly we don’t have so many curves, the line doesn’t strictly follows the data points) even if two apples were classified as lemons. When the C is high, the boundary is full of curves and all the training data was classified correctly. Don’t forget, even if all the training data was correctly classified, this doesn’t mean that increasing the C will always increase the precision (because of overfitting).