## Q1. What is the mathematical formula for a linear SVM?


 Linear Support Vector Machine (SVM)

The mathematical formula for a linear Support Vector Machine (SVM) can be represented as follows:

Given a training dataset of labeled samples:

\( D = \{(x_1, y_1), (x_2, y_2), ..., (x_n, y_n)\} \)

where \( x_i \) represents the input features of sample \( i \) and \( y_i \) represents the corresponding class label (either +1 or -1 for a binary classification problem), the objective of the linear SVM is to find a hyperplane that best separates the two classes in the feature space.

The hyperplane is represented by the equation:

\[ w \cdot x + b = 0 \]

where \( w \) is the weight vector perpendicular to the hyperplane and \( b \) is the bias term that shifts the hyperplane parallel to itself.

In a binary classification setting, the SVM aims to find the optimal hyperplane that maximizes the margin between the two classes. The margin is defined as the distance between the hyperplane and the closest samples from each class. The samples that lie on the margin are called support vectors, and they are crucial for determining the hyperplane.

To ensure the margin is maximized and the hyperplane separates the classes correctly, the SVM optimization problem is typically formulated as follows:

Minimize: \( \frac{1}{2} \lVert w \rVert^2 \)

subject to: \( y_i (w \cdot x_i + b) \geq 1 \) for all training samples \( (x_i, y_i) \)

The inequality constraint ensures that the samples are classified correctly, i.e., they are on the correct side of the hyperplane, and the margin between the two classes is at least 1. The \( \frac{1}{2} \lVert w \rVert^2 \) term is the regularization term, which helps to maximize the margin and avoid overfitting.

The optimization problem is typically solved using techniques from convex optimization, such as the quadratic programming method. Once the optimal values of \( w \) and \( b \) are obtained, the hyperplane equation is defined, and the SVM can be used for classification of new, unseen data points.

Please note that this equation is specific to the linear SVM, which deals with linearly separable data. For non-linearly separable data, kernels are used to map the input space into a higher-dimensional feature space, allowing the SVM to find a separating hyperplane.

## Q2. What is the objective function of a linear SVM?

The objective function of a linear Support Vector Machine (SVM) is to find the optimal hyperplane that maximizes the margin between the two classes in the feature space. The margin is defined as the distance between the hyperplane and the closest samples (support vectors) from each class.

For a binary classification problem, where the class labels are either +1 or -1, the objective function of a linear SVM can be formulated as follows:

Minimize: \( \frac{1}{2} \lVert w \rVert^2 \)

subject to: \( y_i (w \cdot x_i + b) \geq 1 \) for all training samples \( (x_i, y_i) \)

Where:
- \( w \) is the weight vector perpendicular to the hyperplane,
- \( b \) is the bias term that shifts the hyperplane parallel to itself,
- \( x_i \) represents the input features of sample \( i \),
- \( y_i \) represents the corresponding class label (+1 or -1) for sample \( i \),
- \( \lVert w \rVert \) denotes the Euclidean norm (magnitude) of the weight vector.

The objective function consists of two parts: the regularization term \( \frac{1}{2} \lVert w \rVert^2 \) and the inequality constraint \( y_i (w \cdot x_i + b) \geq 1 \). The regularization term helps to maximize the margin and avoid overfitting, while the inequality constraint ensures that all training samples are classified correctly and lie on the correct side of the hyperplane.

In this formulation, the SVM aims to find the values of \( w \) and \( b \) that minimize \( \frac{1}{2} \lVert w \rVert^2 \) while satisfying the inequality constraint for all training samples. The optimal values of \( w \) and \( b \) define the hyperplane equation, which can be used for classification of new, unseen data points.

The optimization problem is typically solved using techniques from convex optimization, such as the quadratic programming method, to find the optimal values of \( w \) and \( b \) that fulfill the objective function and constraints. Once the optimization is complete, the SVM can be used as a linear classifier for the given binary classification task.

## Q3. What is the kernel trick in SVM?

The kernel trick is a fundamental concept in Support Vector Machines (SVM) that allows SVMs to efficiently handle non-linearly separable data by implicitly mapping the input features into a higher-dimensional feature space. It enables linear classifiers, like SVMs, to effectively learn and classify non-linear patterns in the data.

The basic idea behind the kernel trick is to avoid the explicit computation of the high-dimensional feature space, which may be computationally expensive or even impossible in some cases. Instead, the kernel function computes the dot product between the feature vectors in the high-dimensional space without explicitly transforming the data into that space. This is achieved by defining a kernel function \( K(x, x') \) that corresponds to the dot product of the feature vectors \( \phi(x) \) and \( \phi(x') \) in the high-dimensional space:

\[ K(x, x') = \phi(x) \cdot \phi(x') \]

The kernel function allows us to implicitly operate in the higher-dimensional space while remaining in the original input space. This can be computationally advantageous, especially when the dimensionality of the higher-dimensional feature space is very large or even infinite.

The kernel trick is most commonly used with SVMs to find a non-linear separating hyperplane in the transformed feature space. When using a kernel, the optimization problem of the SVM changes from finding a linear hyperplane in the original feature space to finding a linear hyperplane in the high-dimensional feature space. This enables the SVM to discover more complex decision boundaries and handle data that is not linearly separable in the original space.

Some common kernel functions used in SVMs include:
1. Linear kernel: \( K(x, x') = x \cdot x' \) (same as the original dot product in the input space).
2. Polynomial kernel: \( K(x, x') = (x \cdot x' + c)^d \), where \( c \) and \( d \) are user-defined constants.
3. Radial Basis Function (RBF) kernel: \( K(x, x') = \exp(-\gamma \lVert x - x' \rVert^2) \), where \( \gamma \) is a user-defined parameter.

By choosing an appropriate kernel function and its associated parameters, SVMs can efficiently handle non-linear data and achieve high accuracy in various classification tasks. The kernel trick is one of the key reasons why SVMs are widely used and highly effective in practice.

## Q4. What is the role of support vectors in SVM Explain with example

In Support Vector Machines (SVM), support vectors play a crucial role in defining the optimal hyperplane that separates the classes in a binary classification problem. Support vectors are the data points that lie closest to the separating hyperplane and have the most significant influence on its position and orientation. These are the points that "support" the definition of the hyperplane, and hence the name "support vectors."

To better understand the role of support vectors, let's consider a simple example of a 2-dimensional binary classification problem. Assume we have two classes, represented by red and blue points in the following scatter plot:

```
  |         *
  |       *   *
  |    *    *   *
  |   *      *    *
  | *       SV    *
  |____________________
         Feature 1
```

In this example, the circles (*) represent the data points from each class, and the "SV" represents the support vector from each class. We can observe the following:

1. The support vectors are the points closest to the separating hyperplane (the line in 2D). They are the ones that have the smallest margin to the hyperplane.

2. The other data points that are farther away from the hyperplane do not influence its position. Only the support vectors contribute to determining the hyperplane.

3. If any non-support vector is removed or its position is changed, it will not affect the optimal hyperplane's position.

The role of support vectors becomes more apparent when we realize that the SVM's decision boundary (the hyperplane) is solely determined by these critical points. The margin of the hyperplane is defined by the distance between the support vectors from each class. The SVM aims to maximize this margin while correctly classifying all the training data.

In practice, the number of support vectors is often relatively small compared to the total number of training data points. SVMs are designed to find and utilize these critical support vectors to construct the decision boundary effectively, even in high-dimensional spaces or when the data is not linearly separable.

By focusing on the support vectors, SVMs are more robust to outliers and generalization performance can be improved. Additionally, SVMs are computationally efficient since the decision boundary is determined by the support vectors' positions, rather than the entire dataset.

In summary, support vectors play a central role in SVM by defining the optimal hyperplane and determining the decision boundary. They are the key data points that influence the SVM's learning process, and their positions are critical in achieving good classification performance.

## Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in SVM?

To illustrate the concepts of hyperplane, marginal plane, soft margin, and hard margin in SVM, let's consider a simple 2-dimensional example with two classes: blue squares and red circles.

Example Data Points:
- Blue Squares: (2, 3), (3, 4), (4, 2)
- Red Circles: (1, 1), (3, 1), (4, 4)

Let's visualize these data points on a 2D plane:

```
  |         * (4,4) [Red]
  |         
  |       * 
  |    * (3,1) [Red]
  |   *    
  | * (2,3) [Blue]
  |____________________
         Feature 1
```

In SVM, the goal is to find a hyperplane that separates the two classes effectively. A hyperplane in a 2-dimensional space is a line, and in a higher-dimensional space, it would be a plane. The hyperplane is represented by the equation \( w \cdot x + b = 0 \), where \( w \) is the weight vector perpendicular to the hyperplane, and \( b \) is the bias term.

1. **Hard Margin SVM:**
In a hard margin SVM, the goal is to find a hyperplane that perfectly separates the two classes with no misclassifications. In other words, there should be no data points inside the margin (on the wrong side). This is only possible when the data is linearly separable. In our example, let's assume the data is linearly separable, and the hard margin SVM finds the following hyperplane:

```
  |         * (4,4) [Red]
  |         \
  |       * \ 
  |    *    \  (3,1) [Red]
  |   *      \
  | * (2,3)   \______________
  |____________________
         Feature 1
```

The hyperplane (line) separates the two classes (red and blue) perfectly, and no data points are present inside the margin.

2. **Soft Margin SVM:**
In real-world scenarios, data may not be perfectly separable due to noise or overlapping classes. In such cases, we use a soft margin SVM, which allows for some misclassifications. The goal of the soft margin SVM is to find a hyperplane that separates the two classes while allowing some data points to be inside the margin. The number of data points allowed inside the margin is controlled by a parameter called "C."

Let's consider a soft margin SVM with \( C = 1 \) in our example:

```
  |         * (4,4) [Red]
  |         \
  |       * \ 
  |    *    \  (3,1) [Red]
  |   *      \
  | * (2,3)   \______________
  |          /   (3,4) [Blue]
  |        /   
  |      /    
  |    /     
  |  /       * (4,2) [Blue]
  | /       
  |/_______
```

The hyperplane still separates the classes, but it allows some data points to be inside the margin. The data points inside the margin contribute to the SVM's objective function and are called "slack variables." The parameter C controls the trade-off between maximizing the margin and minimizing the classification error. A smaller C allows more data points to be inside the margin, while a larger C penalizes misclassifications more heavily, leading to a narrower margin.

It's important to note that the soft margin SVM is more robust to noisy data and can handle non-linearly separable data by using appropriate kernel functions.

In summary, the hyperplane is the decision boundary that separates the classes in an SVM. The marginal plane is the region containing the support vectors, and the margin is the space between the marginal planes. Hard margin SVM seeks a perfect separation, while soft margin SVM allows some misclassifications and is more flexible in handling real-world data. The choice between hard and soft margin depends on the nature of the data and the classification task.

## Q6. SVM Implementation through Iris dataset.


- Load the iris dataset from the scikit-learn library and split it into a training set and a testing set
- Train a linear SVM classifier on the training set and predict the labels for the testing set
- Compute the accuracy of the model on the testing set
- Plot the decision boundaries of the trained model using two of the featuresl
- Try different values of the regularisation parameter C and see how it affects the performance of
the model.

In [1]:
import pandas as pd 
import numpy as np 
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.datasets import load_iris

In [2]:
df1=load_iris()

In [4]:

df=pd.DataFrame(df1.data,columns=df1.feature_names)
df['target']=df1.target

In [13]:
df.shape

(150, 5)

In [15]:
## independent variables and dependent varialbles
X=df.iloc[:,:-1]
y=df['target']

In [16]:
from sklearn.model_selection import train_test_split

In [17]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=10)

In [20]:
from sklearn.svm import SVC
svc=SVC()

In [26]:
parameters={'gamma' : [1, 0.1, 0.01, 0.001, 0.0001],
            'C': [0.1, 1, 10, 100, 1000]
}

In [27]:
from sklearn.model_selection import GridSearchCV

In [28]:
clf=GridSearchCV(SVC(),param_grid=parameters,scoring='accuracy',cv=5)

In [29]:
clf.fit(X_train,y_train)

In [30]:
clf.best_params_

{'C': 10, 'gamma': 0.1}

In [31]:
from sklearn.svm import SVC
svc=SVC(C=10,gamma=0.1)

In [51]:
svc.fit(X_train,y_train)

In [33]:
y_pred=svc.predict(X_test)

In [34]:
from sklearn.metrics import classification_report,accuracy_score
print(classification_report(y_test,y_pred))
print('\n',accuracy_score)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        11
           1       1.00      0.93      0.97        15
           2       0.92      1.00      0.96        12

    accuracy                           0.97        38
   macro avg       0.97      0.98      0.98        38
weighted avg       0.98      0.97      0.97        38


 <function accuracy_score at 0x7fe985ab72e0>


In [59]:
iris = load_iris()
X, y = iris.data, iris.target


In [None]:
import matplotlib.pyplot as plt

# Plot decision boundaries
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap="viridis", edgecolors="k")
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1), np.arange(y_min, y_max, 0.1))
Z = svc.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.4, cmap="viridis")
plt.xlabel("Sepal length (cm)")
plt.ylabel("Sepal width (cm)")
plt.title("Decision Boundaries of Linear SVM")
plt.show()


In [65]:
# Evaluate the model with different values of C
C_values = [0.01, 0.1, 1.0, 10.0, 100.0]
for C in C_values:
    svc = SVC( C=C)
    svc.fit(X_train, y_train)
    y_pred=svc.predict(X_test)
    accuracy = accuracy_score(y_test,y_pred)
    print(f"Accuracy of Linear SVM (C={C}): {accuracy:.2f}")


Accuracy of Linear SVM (C=0.01): 0.29
Accuracy of Linear SVM (C=0.1): 0.95
Accuracy of Linear SVM (C=1.0): 1.00
Accuracy of Linear SVM (C=10.0): 1.00
Accuracy of Linear SVM (C=100.0): 1.00
