## SVM

SVM is a widely-used classification algorithm that can be used in both linear and nonlinear cases. It finds the decision boundary/hyperplane that best separates the two classes by maximizing the margin between them.

<img title="svm" alt="text" src="Images\svm.png"  width="300">

### Step 1: Collecting Data 


* The first step is to collect data that contains features and their corresponding labels. Here, the ‘features’ refer to the attributes of each sample that we will use to classify it, and ‘labels’ represent the target variable that we want to predict based on those features.

### Step 2: Preparing Data

* Next, we need to split the data into training and testing sets. The training set is used to train our SVM model while the test set is used for evaluating its performance.

```
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=42)
```

### Step 3: Defining the Hyperparameters

Hyperparameters are the parameters that are not learned during training, such as the type of kernel function (e.g., linear, polynomial, or radial basis function), regularization strength, and margin width. We need to define these hyperparameters before training our model.

### Step 4: Training the Model

Training the SVM involves finding the optimal hyperplane/decision boundary that maximizes the margin between classes. This can be done using optimization tools such as quadratic programming.

### SVM Optimization Objective


Here's the standard objective function(also known as the primal problem):

<img title="svm" alt="text" src="Images\svm objective.jpg" width="500">


```
minimize: ½ ||w||^2 + C∑i=1n max(0, yi(wi . xi + b) – 1)
```
subject to: y ∈ {-1, 1}

In this equation, w represents the weights assigned to each feature, b is the bias term, C is a regularization parameter, xi is a data point, and yi is the class label (-1 or 1).

The first term (½ ||w||^2) represents the Euclidean norm of the weight vector. We want to minimize this to prevent overfitting. The second term represents the hinge loss. We aim to minimize this while satisfying the constraints on yi.

To implement this in code, we can use Python's scikit-learn library. Here is some sample code to create an SVM with maximum margin using scikit-learn:

In [None]:
from sklearn import svm
import numpy as np

# Create the training data
X = np.array([[1, 2], [5, 8], [1.5, 1.8], [8, 8], [1, 0.6], [9,11]])

# Class labels
y = [0, 1, 0, 1, 0, 1]

# Create the SVM model
clf = svm.SVC(kernel='linear', C=1.0)

# Train the model
clf.fit(X, y)

# Print the support vectors
print(clf.support_vectors_)

# Get the indices of the support vectors
print(clf.support_)

# Print the number of support vectors for each class
print(clf.n_support_)


or implementing from scratch using this code:

### SVM from scratch

In [5]:
import numpy as np

class SVM:
    def __init__(self, lr=0.001, lambda_param=0.01, iters=1000):
        self.lr = lr
        self.lambda_param = lambda_param
        self.w = None
        self.iters = iters
        self.b = None
    
    def fit(self, X, y):
        # Initialize parameters
        _, n_features = X.shape
        self.w = np.zeros(n_features)
        self.b = 0
        
        # Gradient descent
        for _ in range(self.iters):
            for index, sample in enumerate(X):
                condition = y[index] * (np.dot(sample, self.w) - self.b) >= 1
                if condition:
                    self.w -= self.lr * (2 * self.lambda_param * self.w)
                else:
                    self.w -= self.lr * (2 * self.lambda_param * self.w - np.dot(sample, y[index]))
                    self.b -= self.lr * y[index]
                
    def predict(self, X):
        approx = np.dot(X, self.w) - self.b
        return np.sign(approx)


## Kernels

### Introduction for kernels:

Kernel is a method of using a linear classifier to solve a non-linear problem

it's function is to take data as input and transform it into the required form. Different SVM algorithms use different types of kernel functions. These functions can be different types. For example linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid.


The most used type of kernel function for svm is RBF. Because it has localized and finite response along the entire x-axis.
The kernel functions return the inner product between two points in a suitable feature space. Thus by defining a notion of similarity, with little computational cost even in very high-dimensional spaces.

### Types of kernels

#### Polynomial kernel

Polynomial kernal used in Support Vector Machines (SVMs) to extend the model's functionality to deal with non-linearly separable datasets. It is one of the most commonly used kernels, along with the Radial Basis Function (RBF) kernel.

In polynomial kernel, every data point is transformed from the original feature space to a high dimensional feature space using a polynomial function. The degree of the polynomial determines the number of dimensions in the new feature space.

The kernel function computes the dot product of two transformed data points in the high dimensional space, allowing SVM to find a hyperplane that separates the transformed points as far apart as possible. The polynomial kernel has a parameter called the degree, which specifies the complexity of the decision boundary. As the degree increases, the polynomial function becomes more complex and can fit more intricate shapes.

Here is the mathematical expression for polynomial kernel:

```
K(x, z) = (x . z + c)^d

```

where x and z are two data vectors in the original feature space, . is the dot product operator, d is the degree of the polynomial, and c is an optional constant added to the dot product to control the smoothness of the decision boundary.

In summary, the polynomial kernel is a useful tool for SVMs to handle non-linearly separable datasets by transforming the data into a high dimensional feature space. It uses a polynomial function to measure the similarity between two data points and find a decision boundary that maximizes the margin between classes.
<img title="svm" alt="text" src="Images\poly.png" width="500">

##### Polynomial kernel using python:

In [2]:
def polynomial_kernel(x, y, p=3):
    return (1 + np.dot(x, y)) ** p


#### RBF kernel

The Radial Basis Function (RBF) kernel is one of the most commonly used kernels in SVM. RBF kernel maps data points to an infinite-dimensional feature space where it becomes easier to separate them using a linear hyperplane. It is also known as Gaussian Kernel due to its similarity measure that resembles the probability density function of a Gaussian distribution.

The RBF kernel formula is defined as:


```
K(x, xi) = exp(-gamma * ||x - xi||^2)

```
where x and xi are two data points, ||x - xi|| represents the Euclidean distance between these points and gamma is a hyperparameter that determines the width of the kernel.

The RBF kernel has several advantages over other kernels such as polynomial kernel:

Non-linear separation: As mentioned earlier, the RBF kernel can map data into an infinite-dimensional feature space which makes it capable of separating non-linearly separable datasets more easily.

Flexibility: By tuning the hyperparameters, the RBF kernel can be adjusted to fit different types of datasets.

Robustness: RBF kernel can handle noise and outliers effectively due to its smoothness nature.


<img title="svm" alt="text" src="Images\RBF kernel.jpeg" width="500">

##### RBF kernel using python:

In [3]:
def rbf_kernel(x, y, gamma=0.1):
    diff = x - y
    return np.exp(-gamma * np.dot(diff, diff))


#### Sigmoid kernel

The Sigmoid kernel is a non-linear kernel function used in Support Vector Machines (SVMs) for binary classification tasks. It is defined as:

```
K(x, y) = tanh(α(x·y) + c)

```
where x, y are input data instances and α and c are hyperparameters.

The sigmoid kernel maps the original data instances into a higher-dimensional feature space where the instances become separable by a linear decision boundary. 

In summary, the sigmoid kernel is a non-linear kernel function used in SVMs that maps input data instances into a higher-dimensional feature space. However, it has some disadvantages compared to other kernels and should be used carefully.

<img title="svm" alt="text" src="Images\sigmoid.png" width="500">

##### Sigmoid kernel using python:

In [4]:
def sigmoid_kernel(x, y, alpha=0.01, c=0):
    return np.tanh(alpha * np.dot(x, y) + c)


## Comparisons

Advantages and disadvantages:

1. Polynomial kernel:
The polynomial kernel is an extension of the linear kernel and can handle nonlinear problems. The degree parameter is used to control the degree of the polynomial.
Advantages:

Works well with complex, multiclass data.
Can learn complex decision boundaries.
Disadvantages:

Too much use of the degree parameter can lead to overfitting.
High computational complexity for large datasets.

2. RBF kernel:
Radial basis function (RBF) kernel is a popular kernel in SVM that can easily segregate non-linearly separable data points.
Advantages:

Effective in high-dimensional space.
It can handle a varied range in the similarity criterion between two data samples
Disadvantages:

Difficult to interpret the results of this kernel method.
The gamma parameter selection has significant influence on good prediction accuracy.

3. Sigmoid kernel:
The sigmoid function applies a hyperbolic tangent to the dot product between two vectors, scaled by an additional parameter C.
Advantages:

Can be useful in neural network architectures.
Can detect local structures within the data
Disadvantages:

Tends to be sensitive to parameter-tuning requiring careful selection of 'C' and kernel parameters.
Can be unstable and produce variable results depending on random re-sampling of training data.

## Conclusion

SVM is a powerful machine learning algorithm that can be used for both classification and regression tasks. It works well in cases where there is no clear or simple separation between classes. Additionally, its optimization problem guarantees a global optimum solution.

Kernels play a crucial role in SVM by allowing non-linear classification boundaries via a transformation. This transformation maps the input features to a higher dimensional feature space where the data may be linearly separable. Popular kernel functions include polynomial, Gaussian RBF, and sigmoid kernels.

Overall, SVM is widely used in various applications such as image classification, text classification, and bioinformatics.