## Q1. What is the mathematical formula for a linear SVM?

The mathematical formula for a linear Support Vector Machine (SVM) in its primal form is:

minimize (1/2) ||w||^2
subject to y_i (w^T x_i + b) >= 1, for all i

Here:
- w is the weight vector, which defines the direction of the hyperplane.
- x_i is the i-th training sample.
- y_i is the label of the i-th training sample, which can be either -1 or 1.
- b is the bias term, which shifts the hyperplane away from the origin.
- ||w||^2 is the squared Euclidean norm of w, which is used for regularization to prevent overfitting.

The goal of the linear SVM is to find the values of w and b that minimize the objective function while satisfying the constraints. The constraints ensure that the hyperplane correctly classifies all the training samples, and that it is as far away as possible from the nearest samples, which are known as the support vectors.

## Q2. What is the objective function of a linear SVM?

The objective function of a linear Support Vector Machine (SVM) in its primal form is:

minimize (1/2) ||w||^2

Here, w is the weight vector, which defines the direction of the hyperplane, and ||w||^2 is the squared Euclidean norm of w, which is used for regularization to prevent overfitting.

The goal of the linear SVM is to find the values of w and b (the bias term) that minimize the objective function while satisfying the constraints:

y\_i (w^T x\_i + b) >= 1, for all i

Here, x\_i is the i-th training sample, y\_i is the label of the i-th training sample, which can be either -1 or 1. The constraints ensure that the hyperplane correctly classifies all the training samples and that it is as far away as possible from the nearest samples, which are known as the support vectors.

In summary, the objective function of a linear SVM is to minimize the squared Euclidean norm of the weight vector, subject to the constraints that the hyperplane correctly classifies all the training samples and is as far away as possible from the nearest samples.

## Q3. What is the kernel trick in SVM?

The kernel trick is a technique used in Support Vector Machines (SVMs) and other kernel-based methods to transform the original input data into a higher-dimensional feature space, where the data can be more easily separated or modeled.

In SVMs, the kernel trick is used to enable the construction of non-linear decision boundaries, while still maintaining the computational efficiency and mathematical elegance of the linear SVM formulation. This is achieved by replacing the dot product between the input vectors in the linear SVM formulation with a non-linear kernel function, which implicitly maps the input vectors to a higher-dimensional feature space.

The most commonly used kernel functions are the polynomial kernel, the radial basis function (RBF) kernel, and the sigmoid kernel. The choice of kernel function depends on the nature of the data and the problem being solved.

The key advantage of the kernel trick is that it allows us to work with the high-dimensional feature space implicitly, without having to explicitly compute the coordinates of the data in the feature space. This is because the kernel function only depends on the dot product between the input vectors, which can be computed directly in the input space.

In summary, the kernel trick is a technique used in SVMs and other kernel-based methods to transform the input data into a higher-dimensional feature space, where the data can be more easily separated or modeled. This is achieved by using a non-linear kernel function to replace the dot product between the input vectors in the linear SVM formulation, enabling the construction of non-linear decision boundaries while still maintaining computational efficiency and mathematical elegance.

## What is the role of support vectors in SVM Explain with example?

Support vectors are a crucial component of Support Vector Machines (SVMs), as they play a key role in defining the optimal decision boundary that separates the classes in the data.

In SVMs, the goal is to find the hyperplane that maximally separates the classes, while also minimizing the classification error. The hyperplane is defined by the weight vector w and the bias term b, and the distance between the hyperplane and the nearest data points is called the margin. The optimal hyperplane is the one that maximizes the margin.

The data points that lie closest to the hyperplane are called support vectors. These points are important because they are used to define the margin and to constrain the optimization problem that is used to find the optimal hyperplane. Specifically, the optimization problem is subject to the constraint that the distance between the hyperplane and the support vectors is equal to the margin.

To illustrate the role of support vectors in SVMs, consider a simple example with two classes of data that are linearly separable. The data points are shown in the following figure, along with the optimal hyperplane that separates the classes:

In this example, there are three support vectors, which are indicated by the circles. The two support vectors on the left side of the hyperplane are the closest data points to the hyperplane in the negative class, and the one support vector on the right side of the hyperplane is the closest data point to the hyperplane in the positive class.

The margin is defined as the distance between the hyperplane and the support vectors, and is indicated by the dashed lines in the figure. The optimization problem that is used to find the optimal hyperplane is subject to the constraint that the distance between the hyperplane and the support vectors is equal to the margin.

In summary, support vectors are the data points that lie closest to the hyperplane in SVMs, and are used to define the margin and to constrain the optimization problem that is used to find the optimal hyperplane. The optimal hyperplane is the one that maximizes the margin, and is determined by the weight vector w and the bias term b. The role of support vectors is to ensure that the hyperplane is the optimal one that separates the classes in the data with the maximum possible margin.

## Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in SVM?

Sure, I can provide examples and graphs to illustrate the concepts of hyperplane, marginal plane, soft margin, and hard margin in SVMs.

1. Hyperplane:

A hyperplane is a linear decision boundary that separates the classes in the data. In SVMs, the goal is to find the hyperplane that maximally separates the classes, while also minimizing the classification error.

Here's an example of a hyperplane in a two-dimensional feature space:

In this example, the hyperplane is represented by the solid line, and it separates the positive class (red circles) from the negative class (blue squares).

2. Marginal plane:

The marginal plane is a pair of parallel hyperplanes that are equidistant from the optimal hyperplane. The distance between the marginal plane and the optimal hyperplane is called the margin.

Here's an example of a marginal plane in a two-dimensional feature space:

In this example, the optimal hyperplane is represented by the solid line, and the marginal plane is represented by the dashed lines. The distance between the marginal plane and the optimal hyperplane is the margin, which is indicated by the arrows.

3. Soft margin:

In practice, the data may not be perfectly separable by a hyperplane. In such cases, SVMs can be modified to allow for some misclassifications, while still maximizing the margin. This is known as a soft margin SVM.

Here's an example of a soft margin SVM in a two-dimensional feature space:

In this example, the data is not perfectly separable by a hyperplane. The soft margin SVM allows for some misclassifications by introducing slack variables, which are represented by the vertical lines. The optimal hyperplane is represented by the solid line, and it maximizes the margin while also minimizing the classification error.

4. Hard margin:

A hard margin SVM is a special case of a soft margin SVM, where the slack variables are all set to zero. In other words, a hard margin SVM assumes that the data is perfectly separable by a hyperplane.

Here's an example of a hard margin SVM in a two-dimensional feature space:

In this example, the data is perfectly separable by a hyperplane. The hard margin SVM finds the optimal hyperplane that maximally separates the classes, while also ensuring that there are no misclassifications.

In summary, the hyperplane is a linear decision boundary that separates the classes in the data. The marginal plane is a pair of parallel hyperplanes that are equidistant from the optimal hyperplane, and the distance between them is the margin. Soft margin SVMs allow for some misclassifications, while still maximizing the margin, and hard margin SVMs assume that the data is perfectly separable by a hyperplane.

## Q6. SVM Implementation through Iris dataset.

Bonus task: Implement a linear SVM classifier from scratch using Python and compare its
performance with the scikit-learn implementation.
~ Load the iris dataset from the scikit-learn library and split it into a training set and a testing setl
~ Train a linear SVM classifier on the training set and predict the labels for the testing setl
~ Compute the accuracy of the model on the testing setl
~ Plot the decision boundaries of the trained model using two of the featuresl
~ Try different values of the regularisation parameter C and see how it affects the performance of
the model.

In [8]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.svm import SVR
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix

In [5]:
data=load_diabetes()
x=data['data']
y=data['target']
feature_names=data['feature_names']

In [6]:
x

array([[ 0.03807591,  0.05068012,  0.06169621, ..., -0.00259226,
         0.01990749, -0.01764613],
       [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,
        -0.06833155, -0.09220405],
       [ 0.08529891,  0.05068012,  0.04445121, ..., -0.00259226,
         0.00286131, -0.02593034],
       ...,
       [ 0.04170844,  0.05068012, -0.01590626, ..., -0.01107952,
        -0.04688253,  0.01549073],
       [-0.04547248, -0.04464164,  0.03906215, ...,  0.02655962,
         0.04452873, -0.02593034],
       [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,
        -0.00422151,  0.00306441]])

In [7]:
y

array([151.,  75., 141., 206., 135.,  97., 138.,  63., 110., 310., 101.,
        69., 179., 185., 118., 171., 166., 144.,  97., 168.,  68.,  49.,
        68., 245., 184., 202., 137.,  85., 131., 283., 129.,  59., 341.,
        87.,  65., 102., 265., 276., 252.,  90., 100.,  55.,  61.,  92.,
       259.,  53., 190., 142.,  75., 142., 155., 225.,  59., 104., 182.,
       128.,  52.,  37., 170., 170.,  61., 144.,  52., 128.,  71., 163.,
       150.,  97., 160., 178.,  48., 270., 202., 111.,  85.,  42., 170.,
       200., 252., 113., 143.,  51.,  52., 210.,  65., 141.,  55., 134.,
        42., 111.,  98., 164.,  48.,  96.,  90., 162., 150., 279.,  92.,
        83., 128., 102., 302., 198.,  95.,  53., 134., 144., 232.,  81.,
       104.,  59., 246., 297., 258., 229., 275., 281., 179., 200., 200.,
       173., 180.,  84., 121., 161.,  99., 109., 115., 268., 274., 158.,
       107.,  83., 103., 272.,  85., 280., 336., 281., 118., 317., 235.,
        60., 174., 259., 178., 128.,  96., 126., 28

In [9]:
feature_names

['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']

In [13]:
import pandas as pd
x=pd.DataFrame(x,columns=feature_names)
x

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6
0,0.038076,0.050680,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019907,-0.017646
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.068332,-0.092204
2,0.085299,0.050680,0.044451,-0.005670,-0.045599,-0.034194,-0.032356,-0.002592,0.002861,-0.025930
3,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022688,-0.009362
4,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031988,-0.046641
...,...,...,...,...,...,...,...,...,...,...
437,0.041708,0.050680,0.019662,0.059744,-0.005697,-0.002566,-0.028674,-0.002592,0.031193,0.007207
438,-0.005515,0.050680,-0.015906,-0.067642,0.049341,0.079165,-0.028674,0.034309,-0.018114,0.044485
439,0.041708,0.050680,-0.015906,0.017293,-0.037344,-0.013840,-0.024993,-0.011080,-0.046883,0.015491
440,-0.045472,-0.044642,0.039062,0.001215,0.016318,0.015283,-0.028674,0.026560,0.044529,-0.025930


In [16]:
y=pd.DataFrame(y,columns=['outcome'])
y

Unnamed: 0,outcome
0,151.0
1,75.0
2,141.0
3,206.0
4,135.0
...,...
437,178.0
438,104.0
439,132.0
440,220.0


In [18]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)

In [20]:
x_train.shape,x_test.shape

((353, 10), (89, 10))

In [23]:
svr=SVR()
svr.fit(x_train,y_train)

  y = column_or_1d(y, warn=True)


In [27]:
svr.n_features_in_

10

In [28]:
svr.epsilon

0.1

In [29]:
svr.dual_coef_

array([[-1.        ,  1.        ,  1.        , -1.        , -1.        ,
         1.        ,  1.        ,  1.        , -1.        , -1.        ,
         1.        , -1.        ,  1.        , -1.        ,  1.        ,
        -1.        , -1.        , -1.        ,  1.        , -1.        ,
        -1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
         1.        , -1.        , -1.        , -1.        ,  1.        ,
        -1.        , -0.16948097, -1.        ,  1.        , -0.70910852,
         1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
        -1.        , -1.        ,  1.        ,  1.        ,  1.        ,
         1.        ,  1.        ,  1.        , -1.        ,  1.        ,
        -1.        , -1.        ,  1.        , -1.        ,  1.        ,
         1.        ,  1.        ,  1.        , -1.        , -1.        ,
        -1.        ,  1.        ,  1.        ,  1.        , -1.        ,
        -1.        ,  1.        , -1.        ,  1. 

In [30]:
y_pred=svr.predict(x_test)
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
r2=r2_score(y_test,y_pred)
mae=mean_absolute_error(y_test,y_pred)
mse=mean_squared_error(y_test,y_pred)
print('r2 score is',r2*100)
print('mean squared error is',mse)
print('mean absolute error is ',mae)

r2 score is 18.211365770500286
mean squared error is 4333.285954518086
mean absolute error is  56.02372412801096


Unnamed: 0,outcome
287,219.0
211,70.0
72,202.0
321,230.0
73,111.0
...,...
255,153.0
90,98.0
57,37.0
391,63.0


In [32]:
from sklearn.svm import LinearSVR,NuSVR
svr1=LinearSVR()
svr2=NuSVR()
svr1.fit(x_train,y_train)
svr2.fit(x_train,y_train)

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


In [33]:
y_pred=svr1.predict(x_test)
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
r2=r2_score(y_test,y_pred)
mae=mean_absolute_error(y_test,y_pred)
mse=mean_squared_error(y_test,y_pred)
print('r2 score for linearis',r2*100)
print('mean squared error for linear is',mse)
print('mean absolute error for linear svr is ',mae)

r2 score for linearis -27.232785634821767
mean squared error for linear is 6740.986056797689
mean absolute error for linear svr is  63.284693419483766


In [34]:
y_pred=svr2.predict(x_test)
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
r2=r2_score(y_test,y_pred)
mae=mean_absolute_error(y_test,y_pred)
mse=mean_squared_error(y_test,y_pred)
print('r2 score for nusvr is',r2*100)
print('mean squared error for nusvr is',mse)
print('mean absolute error for nusvr is ',mae)

r2 score for nusvr is 16.831999301920586
mean squared error for nusvr is 4406.366883191594
mean absolute error for nusvr is  57.650730289526834
