In [None]:
Q1. What is the mathematical formula for a linear SVM?

In [None]:
The mathematical formulation of a linear Support Vector Machine (SVM) involves finding a hyperplane that best 
separates two classes in a high-dimensional space. The goal is to maximize the margin between the closest points 
of the classes, known as support vectors.

### Mathematical Formulation

1. **Hyperplane Equation**: In an \( n \)-dimensional space, a hyperplane can be represented as:

   \[
   w \cdot x + b = 0
   \]

   where:
   - \( w \) is the weight vector (normal to the hyperplane).
   - \( x \) is the feature vector.
   - \( b \) is the bias term (intercept).

2. **Classification Rule**: The SVM classifies data points based on which side of the hyperplane they fall:

   \[
   f(x) = \text{sign}(w \cdot x + b)
   \]

3. **Margin Maximization**: The SVM aims to maximize the margin \( \gamma \), which is defined as the distance 
    between the hyperplane and the closest data points from either class. This is done subject to the constraints:

   \[
   y_i (w \cdot x_i + b) \geq 1 \quad \forall i
   \]

   where \( y_i \) is the label of the \( i \)-th sample, which can be either +1 or -1.

4. **Optimization Problem**: The optimization problem can be expressed as:

   \[
   \min_{w, b} \frac{1}{2} \|w\|^2
   \]

   subject to the constraints mentioned above.

In [None]:
Q2. What is the objective function of a linear SVM?

In [None]:
The objective function of a linear Support Vector Machine (SVM) is focused on maximizing the margin between two 
classes while ensuring that the data points are correctly classified. This involves minimizing the norm of the weight
vector \( w \), which is related to the margin, subject to certain constraints. 

### Objective Function

The objective function can be formally stated as:

\[
\min_{w, b} \frac{1}{2} \|w\|^2
\]

### Constraints

This minimization is subject to the constraints that ensure correct classification of the training samples:

\[
y_i (w \cdot x_i + b) \geq 1 \quad \forall i
\]

Where:
- \( y_i \) is the label of the \( i \)-th sample, taking values +1 or -1.
- \( x_i \) is the feature vector of the \( i \)-th sample.
- \( w \) is the weight vector, and \( b \) is the bias term.

### Explanation

1. **Minimizing \(\frac{1}{2} \|w\|^2\)**: This term represents the squared magnitude of the weight vector \( w \). 
    Minimizing this term corresponds to maximizing the margin between the hyperplane and the nearest data points 
    (support vectors). The factor of \( \frac{1}{2} \) is used for mathematical convenience, especially during 
    optimization.

2. **Constraints**: The constraints ensure that each data point is classified correctly with respect to the hyperplane.
    Specifically, for points belonging to the positive class (label +1), the inequality should be satisfied positively,
    and for points in the negative class (label -1), it should be satisfied negatively.


In [None]:
Q3. What is the kernel trick in SVM?

In [None]:
The kernel trick is a powerful technique used in Support Vector Machines (SVM) that allows the algorithm to operate 
in a higher-dimensional space without explicitly computing the coordinates of the data in that space. Instead, it 
uses a kernel function to compute the inner products between the data points in the transformed feature space. 
This enables SVMs to effectively handle non-linear relationships between data points.

### Key Concepts of the Kernel Trick

1. **Non-linearly Separable Data**: In many real-world scenarios, data points from different classes cannot be 
    separated by a straight line (or hyperplane) in their original feature space. The kernel trick allows SVM to 
    create a decision boundary that is non-linear in the original space by mapping the data into a higher-dimensional 
    space.

2. **Kernel Function**: A kernel function computes the dot product of two vectors in a higher-dimensional feature 
    space without explicitly transforming the data. Common kernel functions include:
   - **Linear Kernel**: \( K(x_i, x_j) = x_i \cdot x_j \)
   - **Polynomial Kernel**: \( K(x_i, x_j) = (x_i \cdot x_j + c)^d \) for constants \( c \) and degree \( d \)
   - **Radial Basis Function (RBF) Kernel**: \( K(x_i, x_j) = \exp\left(-\frac{\|x_i - x_j\|^2}{2\sigma^2}\right) \)
   - **Sigmoid Kernel**: \( K(x_i, x_j) = \tanh(\alpha x_i \cdot x_j + c) \)

3. **Implicit Mapping**: The kernel trick allows SVM to learn complex decision boundaries by implicitly mapping the 
    original features into a higher-dimensional space. This mapping is performed through the kernel function without 
    the need to calculate the coordinates of the points in that space.

### Benefits of the Kernel Trick

- **Computational Efficiency**: By avoiding the explicit transformation of the data, the kernel trick saves 
    computational resources, making it feasible to work with high-dimensional data.
- **Flexibility**: Different kernel functions can be used to suit the nature of the data and the problem at hand,
    allowing for a wide variety of decision boundaries.
- **Improved Performance**: SVMs can achieve better classification performance on complex datasets that are not 
    linearly separable.

In [None]:
Q4. What is the role of support vectors in SVM Explain with example

In [None]:
Support vectors are the data points that are closest to the decision boundary (hyperplane) in a Support Vector Machine
(SVM) model. They play a crucial role in defining the optimal hyperplane that separates different classes. Here’s a 
detailed explanation of their role, along with an example:

### Role of Support Vectors

1. **Defining the Decision Boundary**: Support vectors are the critical elements of the training dataset that determine
    the position and orientation of the hyperplane. If any other points were removed (excluding the support vectors), 
    the position of the hyperplane would change.

2. **Maximizing the Margin**: The SVM aims to maximize the margin between the hyperplane and the closest points from 
    each class. The distance from the hyperplane to the support vectors is crucial because the SVM algorithm focuses 
    on maximizing this margin.

3. **Robustness**: Because the decision boundary is determined by only a subset of the data (the support vectors), 
    SVM can be more robust to outliers and noise in the dataset. Points that are far from the decision boundary do 
    not influence the position of the hyperplane.

### Example

Imagine a 2D dataset with two classes (red and blue) that can be separated by a line (hyperplane). Here’s how support 
vectors would work in this context:

1. **Visual Representation**:
   - Suppose we have the following points:
     - Red points: (1, 2), (2, 3), (3, 3)
     - Blue points: (5, 5), (6, 5), (7, 6)
   
   If we plot these points, we might find that the optimal hyperplane that separates these two classes could be 
    between the red points and blue points.

2. **Identifying Support Vectors**:
   - The support vectors are the points that are closest to the hyperplane. In this example, let’s say the red point
(2, 3) and the blue point (5, 5) are the closest points to the hyperplane.
   - These points are critical because if we were to move them or change their position, the hyperplane would adjust
    accordingly.

3. **Maximizing the Margin**:
   - The SVM would calculate the margin (the distance from the hyperplane to the support vectors). The aim is to 
maximize this margin.
   - The support vectors directly influence the width of the margin. For example, if we add a new red point (2, 2)
    that is closer to the hyperplane than the existing support vector (2, 3), it could become the new support vector.

In [None]:
Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in
SVM?

In [None]:
To illustrate the concepts of hyperplane, margin planes, soft margin, and hard margin in Support Vector Machines (SVM),
we can break these down with examples and graphical representations.

### 1. Hyperplane

A **hyperplane** is a decision boundary that separates different classes in the feature space. In a 2D space, it is
simply a line. For example:

- **Example**: Consider a dataset with two features \(x_1\) and \(x_2\), and two classes: red and blue.

**Graph**:
![Hyperplane](https://i.imgur.com/3Z2p0vU.png)

In this plot, the solid line represents the hyperplane that separates the red points from the blue points.

### 2. Margin Planes

The **margins** are the boundaries that lie parallel to the hyperplane, marking the closest points (support vectors)
from each class. The region between the two margin planes is where the SVM tries to maximize the distance.

- **Example**: In the same dataset, the margin planes would be lines parallel to the hyperplane that are closest to 
    the red and blue points.

**Graph**:
![Margin Planes](https://i.imgur.com/MHGg7Y6.png)

Here, the dashed lines represent the margin planes. The distance between the hyperplane and these margins is maximized
to achieve the best separation.

### 3. Hard Margin SVM

A **hard margin** SVM assumes that the data is linearly separable without any noise. All data points must lie on the 
correct side of the margin planes.

- **Example**: When the classes are perfectly separable.

**Graph**:
![Hard Margin](https://i.imgur.com/HQPIyg5.png)

In this graph, all points are correctly classified, and there is a clear gap (the margin) between the classes, with 
no data points lying within the margin area.

### 4. Soft Margin SVM

A **soft margin** SVM allows for some misclassification in the case where the data is not perfectly separable. 
This introduces a trade-off between maximizing the margin and minimizing classification errors.

- **Example**: When some points are close to or within the margin.

**Graph**:
![Soft Margin](https://i.imgur.com/c73RT5g.png)

In this plot, the hyperplane is adjusted to account for the misclassified points (indicated by the different colors). 
The soft margin allows some data points to fall within the margin or even be misclassified, providing flexibility to 
handle noise and overlapping classes.

In [None]:
Q6. SVM Implementation through Iris dataset.
~ Load the iris dataset from the scikit-learn library and split it into a training set and a testing setl
~ Train a linear SVM classifier on the training set and predict the labels for the testing setl
~ Compute the accuracy of the model on the testing setl
~ Plot the decision boundaries of the trained model using two of the featuresl
~ Try different values of the regularisation parameter C and see how it affects the performance of
the model.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # Using only the first two features for visualization
y = iris.target

# Split the dataset into a training set and a testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train a linear SVM classifier
svm_model = SVC(kernel='linear', C=1.0)  # Using C=1.0 as default
svm_model.fit(X_train, y_train)

# Predict the labels for the testing set
y_pred = svm_model.predict(X_test)

# Compute the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.4f}')

# Plotting decision boundaries
def plot_decision_boundaries(model, X, y):
    # Create a mesh grid
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
                         np.arange(y_min, y_max, 0.01))
    
    # Predict the function value for the whole grid
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # Plotting
    plt.figure(figsize=(10, 6))
    plt.contourf(xx, yy, Z, alpha=0.3)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o')
    plt.title('SVM Decision Boundaries with Linear Kernel')
    plt.xlabel(iris.feature_names[0])
    plt.ylabel(iris.feature_names[1])
    plt.show()

# Plot the decision boundaries
plot_decision_boundaries(svm_model, X, y)

# Try different values of the regularization parameter C
C_values = [0.01, 0.1, 1.0, 10, 100]
accuracies = []

for C in C_values:
    svm_model = SVC(kernel='linear', C=C)
    svm_model.fit(X_train, y_train)
    y_pred = svm_model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f'Accuracy with C={C}: {accuracy:.4f}')

# Plot the effect of C on accuracy
plt.figure(figsize=(10, 6))
plt.plot(C_values, accuracies, marker='o')
plt.xscale('log')
plt.title('Effect of Regularization Parameter C on SVM Accuracy')
plt.xlabel('C (Regularization Parameter)')
plt.ylabel('Accuracy')
plt.xticks(C_values)
plt.grid()
plt.show()
