Q1. What is the mathematical formula for a linear SVM?

Ans:

A Support Vector Machine (SVM) is a supervised machine learning algorithm primarily used for binary classification. It aims to find an optimal hyperplane that effectively separates two classes within a dataset. Here’s how it works mathematically:

1. Representation of Data Points:
Suppose we have a set of data points, each represented by a feature vector x in a D-dimensional vector space (denoted as R^D).
Each data point corresponds to a unique feature vector, and our goal is to classify these points into different categories (e.g., male vs. female).
2. Linear Decision Boundary:
The SVM seeks to find a linear decision boundary (hyperplane) that best separates the two classes.
The decision boundary is defined by the equation: 

- [ f(X) = w^T \cdot X + b ]

Here:
(w) is the weight vector that we want to minimize.
(X) represents the data we’re trying to classify.
(b) is the linear coefficient estimated from the training data.
The decision boundary separates the data points into their respective categories based on the sign of (f(X)).

Q2. What is the objective function of a linear SVM?

Ans:

The objective function of a linear Support Vector Machine (SVM) aims to find the optimal hyperplane that maximizes the margin between two classes while correctly classifying the data points

Objective Function:

The objective function combines the margin maximization with the regularization term (to prevent overfitting): 

[ \text{Objective} = \frac{1}{2} |w|^2 + C \sum_{i=1}^{N} \max(0, 1 - y_i (w^T \cdot x_i + b)) ]

- The first term (\frac{1}{2} |w|^2) represents the margin maximization.
- The second term (\sum_{i=1}^{N} \max(0, 1 - y_i (w^T \cdot x_i + b))) penalizes misclassified data points.
The hyperparameter (C) controls the trade-off between maximizing the margin and minimizing classification errors.

Q3. What is the kernel trick in SVM?

Ans:

The kernel trick in Support Vector Machines (SVM) is a powerful technique that allows SVMs to handle non-linearly separable data by implicitly mapping input data into a higher-dimensional feature space without explicitly calculating the transformation. Let’s explore this concept further:

1. Linear Separability and Decision Boundaries:
- In SVM, we aim to find a decision boundary (hyperplane) that separates data points into different classes.
- If the data is linearly separable (i.e., classes can be separated by a straight line or plane), a linear SVM suffices.

2. Non-Linear Data and the Kernel Trick:
- Real-world data is often non-linearly separable. In such cases, a linear decision boundary won’t work effectively.
- The kernel trick addresses this limitation by transforming the data into a higher-dimensional space where it becomes linearly separable.
- However, instead of explicitly applying the transformation, SVMs use a kernel function to compute pairwise similarities between data points.

3. How It Works:
- Suppose we have data points (x) in the original feature space (lower-dimensional).
- The kernel function (K(x, x’)) computes the similarity (dot product) between (x) and another data point (x’) in the same feature space.
- The transformed representation (\phi(x)) in the higher-dimensional space is not explicitly calculated; instead, we work with the kernel similarities.
- The decision boundary is expressed in terms of these kernel similarities: 
- [ f(x) = \sum_{i=1}^{N} \alpha_i y_i K(x, x_i) + b ]
- Here:
- (N) is the number of data points.
- (\alpha_i) are coefficients obtained during training.
- (y_i) is the class label of each data point.
- (b) is the bias term.

4. Common Kernel Functions:
- Popular kernel functions include:
- Linear Kernel: (K(x, x’) = x^T x’)
- Polynomial Kernel: (K(x, x’) = (x^T x’ + c)^d)
- Radial Basis Function (RBF) Kernel: (K(x, x’) = e^{-\gamma |x - x’|^2})

Q4. What is the role of support vectors in SVM Explain with example

Ans:

1. What Are Support Vectors?
- Support vectors are the data points that play a critical role in defining the SVM’s decision boundary (hyperplane).
- These points are the closest to the hyperplane and significantly influence its position and orientation.
- The SVM algorithm maximizes the margin (distance) between the support vectors and the hyperplane.

2. Maximizing the Margin:
- The primary objective of SVM is to find the optimal hyperplane that separates data points from different classes.
- The margin is the distance between the hyperplane and the closest support vectors.
- By maximizing this margin, SVM ensures robust classification.
- Example: Linearly Separable Data
- Imagine we have a dataset with two features, (x_1) and (x_2), and two classes: blue circles and red circles.
- Our goal is to find the best hyperplane that separates these classes.
- Multiple hyperplanes can achieve this separation, but we want the one with the largest margin.

3. Handling Outliers:
- SVM is robust to outliers.
- Consider the scenario below:
- !Selecting Hyperplane with Outlier
- Here, the blue ball near the boundary of red circles is an outlier.
- SVM ignores the outlier and still finds the best hyperplane that maximizes the margin (L2).

4. Role of Support Vectors:
- Support vectors define the hyperplane and influence its position.
- Deleting support vectors would alter the hyperplane’s location.
- SVM’s effectiveness relies on these critical data points.

Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in
SVM?

Ans:

1. Hyperplane:
- A hyperplane is a flat affine subspace in the feature space. In SVM, it serves as the decision boundary that separates data points belonging to different classes.
- For a binary classification problem, the hyperplane equation is given by: [ f(x) = w^T x + b = 0 ] Here:
- (w) is the weight vector perpendicular to the hyperplane.
- (x) represents the input feature vector.
- (b) is the bias term.
- The hyperplane aims to maximize the margin (distance) between itself and the nearest data points (support vectors).

2. Marginal Plane:
- The marginal plane refers to the two parallel planes that run alongside the hyperplane.
- These planes define the margin (distance) from the hyperplane to the nearest data points.
- The margin is essential for robustness and generalization.
- The support vectors lie on these marginal planes.

3. Hard Margin SVM:
- In a hard margin SVM, the goal is to find a hyperplane that perfectly separates data points of different classes without any misclassifications.
- The margin width is maximized, ensuring a clear demarcation.
- All data points are correctly classified, and there is no overlap.
- Example graph: !Hard Margin SVM
- The black solid line represents the hyperplane.
- The dashed lines on both sides are the margins.
- Data points falling on the margins are support vectors.
- Perfect separation is achieved.

4. Soft Margin SVM:
- When data is not perfectly separable or contains outliers, SVM permits a soft margin technique.
- Soft margin allows some misclassification by introducing slack variables ((\xi_i)).
- The objective is to find a hyperplane that balances margin maximization and misclassification penalty.
- Example graph: !Soft Margin SVM
- The hyperplane allows some data points to fall within the margin.
- Misclassified points (within the margin) contribute to the penalty.
- Flexibility is gained at the cost of slight misclassification.

Q6. SVM Implementation through Iris dataset.

- Load the iris dataset from the scikit-learn library and split it into a training set and a testing setl
- Train a linear SVM classifier on the training set and predict the labels for the testing setl
- Compute the accuracy of the model on the testing setl
- Plot the decision boundaries of the trained model using two of the featuresl
- Try different values of the regularisation parameter C and see how it affects the performance of the model.

Ans:

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
from sklearn.datasets import load_iris

In [4]:
iris=load_iris()

In [6]:
x=iris.data
y=iris.target

In [7]:
from sklearn.model_selection import train_test_split

In [8]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.33,random_state=42)

In [21]:
from sklearn.svm import SVR


In [22]:
svr=SVR(kernel='linear')

In [23]:
svr.fit(x_train,y_train)

In [24]:
svr.coef_

array([[-0.02521167, -0.0748024 ,  0.2199637 ,  0.56094357]])

In [25]:
y_pred=svr.predict(x_test)

In [28]:
from sklearn.metrics import r2_score

In [29]:
print(r2_score(y_test,y_pred))

0.9357667743708263


In [30]:
from sklearn.model_selection import GridSearchCV

In [31]:
param_grid={
    
    'C':[0.1,1,10,100,1000],
    'gamma':[1,0.1,0.01,0.001,0.0001],
    'kernel':['linear'],
    'epsilon':[0.1,0.2,0.3]

}

In [32]:
grid=GridSearchCV(SVR(),param_grid=param_grid,cv=5,verbose=3)

In [33]:
grid.fit(x_train,y_train)

Fitting 5 folds for each of 75 candidates, totalling 375 fits
[CV 1/5] END C=0.1, epsilon=0.1, gamma=1, kernel=linear;, score=0.938 total time=   0.0s
[CV 2/5] END C=0.1, epsilon=0.1, gamma=1, kernel=linear;, score=0.894 total time=   0.0s
[CV 3/5] END C=0.1, epsilon=0.1, gamma=1, kernel=linear;, score=0.918 total time=   0.0s
[CV 4/5] END C=0.1, epsilon=0.1, gamma=1, kernel=linear;, score=0.896 total time=   0.0s
[CV 5/5] END C=0.1, epsilon=0.1, gamma=1, kernel=linear;, score=0.891 total time=   0.0s
[CV 1/5] END C=0.1, epsilon=0.1, gamma=0.1, kernel=linear;, score=0.938 total time=   0.0s
[CV 2/5] END C=0.1, epsilon=0.1, gamma=0.1, kernel=linear;, score=0.894 total time=   0.0s
[CV 3/5] END C=0.1, epsilon=0.1, gamma=0.1, kernel=linear;, score=0.918 total time=   0.0s
[CV 4/5] END C=0.1, epsilon=0.1, gamma=0.1, kernel=linear;, score=0.896 total time=   0.0s
[CV 5/5] END C=0.1, epsilon=0.1, gamma=0.1, kernel=linear;, score=0.891 total time=   0.0s
[CV 1/5] END C=0.1, epsilon=0.1, gamma

In [34]:
grid.best_params_

{'C': 1, 'epsilon': 0.1, 'gamma': 1, 'kernel': 'linear'}

In [35]:
y_pred=grid.predict(x_test)

In [36]:
print(r2_score(y_pred,y_test))

0.9359325275675416
