Let's go step by step through each of the questions and tasks:

## Q1. **What is the mathematical formula for a linear SVM?**

A linear Support Vector Machine (SVM) aims to find a decision boundary that best separates the data into different classes. The mathematical model of a linear SVM can be represented as:

\[
f(x) = w^T x + b
\]

Where:
- \( x \) is the feature vector (input data).
- \( w \) is the weight vector.
- \( b \) is the bias term.
- \( f(x) \) is the decision function, and the sign of \( f(x) \) determines the class of \( x \).

For binary classification:
- If \( f(x) \geq 0 \), \( x \) belongs to class \( +1 \).
- If \( f(x) < 0 \), \( x \) belongs to class \( -1 \).

## Q2. **What is the objective function of a linear SVM?**

The objective function for a linear SVM is to find the hyperplane that maximizes the margin between the two classes while minimizing classification errors. The optimization problem is as follows:

\[
\min_{w, b} \left( \frac{1}{2} \|w\|^2 + C \sum_{i=1}^{n} \xi_i \right)
\]

Subject to the constraints:
\[
y_i(w^T x_i + b) \geq 1 - \xi_i \quad \text{for all} \ i = 1, 2, \dots, n
\]
\[
\xi_i \geq 0
\]

Where:
- \( w \) is the weight vector.
- \( b \) is the bias term.
- \( y_i \) is the label of the \( i \)-th training sample.
- \( x_i \) is the feature vector of the \( i \)-th training sample.
- \( \xi_i \) are slack variables that allow for misclassification (soft margin).
- \( C \) is a regularization parameter that controls the trade-off between maximizing the margin and minimizing the classification error.

## Q3. **What is the kernel trick in SVM?**

The kernel trick allows SVMs to solve classification problems in non-linearly separable data by implicitly mapping the data into a higher-dimensional space where it becomes linearly separable.

The idea is to replace the dot product \( \langle x, x' \rangle \) in the original feature space with a kernel function \( K(x, x') \), which computes the dot product in the higher-dimensional space without explicitly performing the transformation.

Some common kernel functions are:
- **Linear kernel**: \( K(x, x') = x^T x' \)
- **Polynomial kernel**: \( K(x, x') = (x^T x' + c)^d \)
- **Radial Basis Function (RBF)**: \( K(x, x') = \exp\left(-\gamma \|x - x'\|^2\right) \)

This trick allows SVM to efficiently handle non-linear classification problems without explicitly working in higher-dimensional space, reducing computational complexity.

## Q4. **What is the role of support vectors in SVM? Explain with an example**

Support vectors are the data points that lie closest to the decision boundary (or hyperplane). They are critical to defining the position and orientation of the hyperplane because they are the points that "support" the margin between the two classes.

The decision boundary is only influenced by these support vectors, and not by other data points that are farther away from the margin. This makes SVM a very robust model because it focuses on the most difficult to classify data points.

### Example:
Suppose we have a two-class problem where we are trying to separate circles and squares. Some of these circles and squares are very close to the boundary. The ones closest to this boundary are called support vectors, and they play a key role in determining where the boundary will be placed.

## Q5. **Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin, and Hard margin in SVM**

1. **Hyperplane**: The hyperplane is the decision boundary that separates the data into classes. In a 2D space, the hyperplane is a line. In 3D, it's a plane, and in higher dimensions, it's called a hyperplane.

   - Example: For a linearly separable 2D dataset, the hyperplane would be a straight line that best divides the two classes.

2. **Marginal Plane**: The marginal plane is defined by the support vectors on either side of the hyperplane. These are the planes that are parallel to the hyperplane and define the margin's width.

   - Example: For two classes, the marginal planes would be the closest boundary points (support vectors) on either side of the hyperplane.

3. **Hard Margin**: In cases where the data is perfectly linearly separable, the SVM tries to find a hyperplane with the maximum margin and no misclassification (i.e., all points lie outside the margin). This is known as a hard margin SVM.

   - Example: If there is no overlap or misclassification in the data, a hard margin SVM would draw a straight line through the points.

4. **Soft Margin**: In real-world datasets, data is rarely perfectly separable. To handle this, soft margin SVMs allow some misclassifications by introducing slack variables \( \xi_i \). This is controlled by the parameter \( C \), which allows for a trade-off between maximizing the margin and minimizing the classification error.

   - Example: If a few data points are misclassified or are within the margin, a soft margin SVM would tolerate these misclassifications while still trying to maximize the margin.



## Q6. SVM Implementation through Iris dataset

### 1. Load the iris dataset from the scikit-learn library and split it into a training set and a testing set

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [7]:
from sklearn.datasets import load_iris
dataset = load_iris()

In [8]:
print(daatset.DESCR)

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

:Number of Instances: 150 (50 in each of three classes)
:Number of Attributes: 4 numeric, predictive attributes and the class
:Attribute Information:
    - sepal length in cm
    - sepal width in cm
    - petal length in cm
    - petal width in cm
    - class:
            - Iris-Setosa
            - Iris-Versicolour
            - Iris-Virginica

:Summary Statistics:

                Min  Max   Mean    SD   Class Correlation
sepal length:   4.3  7.9   5.84   0.83    0.7826
sepal width:    2.0  4.4   3.05   0.43   -0.4194
petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

:Missing Attribute Values: None
:Class Distribution: 33.3% for each of 3 classes.
:Creator: R.A. Fisher
:Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
:Date: July, 1988

The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken
from Fis

In [22]:
import seaborn as sns
df = sns.load_dataset('iris')

In [23]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [24]:
x = df.iloc[:,:-1]
y = dataset.target

In [25]:
x

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


In [26]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [27]:
from sklearn.model_selection import  train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.20,random_state=42)

In [28]:
x_train

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
22,4.6,3.6,1.0,0.2
15,5.7,4.4,1.5,0.4
65,6.7,3.1,4.4,1.4
11,4.8,3.4,1.6,0.2
42,4.4,3.2,1.3,0.2
...,...,...,...,...
71,6.1,2.8,4.0,1.3
106,4.9,2.5,4.5,1.7
14,5.8,4.0,1.2,0.2
92,5.8,2.6,4.0,1.2


### 2. Train a linear SVM classifier on the training set and predict the labels for the testing set

In [30]:
from sklearn.svm import SVC
svc = SVC(kernel='linear')

In [31]:
svc.fit(x_train,y_train)
y_pred = svc.predict(x_test)

In [33]:
y_pred

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0])

In [34]:
from sklearn.metrics import accuracy_score
print(accuracy_score(y_pred,y_test))

1.0


### 4. Try different values of the regularization parameter 𝐶 C and see how it affects the performance of the model

In [35]:
C_values = [0.01, 0.1, 1, 10, 100]
for C in C_values:
    svm = SVC(kernel='linear', C=C)
    svm.fit(x_train, y_train)
    y_pred = svm.predict(x_test)
    accuracy = accuracy_score(y_test, y_pred)



In [36]:
accuracy

1.0