In [1]:
# Q1. What is the mathematical formula for a linear SVM?
# The mathematical formula for a linear Support Vector Machine (SVM) can be described as finding the hyperplane that maximizes the margin between classes:

# \[ \min_{\mathbf{w}, b} \frac{1}{2} \|\mathbf{w}\|^2 \]

# subject to the constraints:

# \[ y_i(\mathbf{w} \cdot \mathbf{x}_i + b) \geq 1 \quad \text{for all } i = 1, \ldots, n \]

# where \( \mathbf{w} \) is the weight vector perpendicular to the hyperplane, \( b \) is the bias term, \( \mathbf{x}_i \) are the feature vectors, and \( y_i \) are the class labels (\( y_i = \pm 1 \)). The objective is to find \( \mathbf{w} \) and \( b \) that correctly classify the training data while maximizing the margin between the two classes.

In [2]:
# Q2. What is the objective function of a linear SVM?
# The objective function of a linear Support Vector Machine (SVM) is to find the optimal hyperplane that separates the classes while maximizing the margin between them. Mathematically, the objective function can be formulated as:

# \[ \min_{\mathbf{w}, b} \frac{1}{2} \|\mathbf{w}\|^2 \]

# subject to the constraints:

# \[ y_i(\mathbf{w} \cdot \mathbf{x}_i + b) \geq 1 \quad \text{for all } i = 1, \ldots, n \]

# where:
# - \( \mathbf{w} \) is the weight vector perpendicular to the hyperplane,
# - \( b \) is the bias term,
# - \( \mathbf{x}_i \) are the feature vectors,
# - \( y_i \) are the class labels (\( y_i = \pm 1 \)).

# In this formulation:
# - \( \frac{1}{2} \|\mathbf{w}\|^2 \) represents the regularization term that controls the margin and the complexity of the decision boundary.
# - The constraints \( y_i(\mathbf{w} \cdot \mathbf{x}_i + b) \geq 1 \) ensure that all data points are correctly classified with a margin of at least 1.

# The objective function seeks to minimize \( \|\mathbf{w}\|^2 \), which corresponds to maximizing the margin between the support vectors of different classes, ensuring robust generalization to unseen data.

In [3]:
# Q3. What is the kernel trick in SVM?
# The kernel trick in Support Vector Machines (SVMs) is a method to handle nonlinear decision boundaries by implicitly mapping input data into a higher-dimensional feature space without explicitly computing the transformations. Here's how it works:

# 1. **Nonlinear Mapping**: SVMs typically find linear decision boundaries in the original feature space. However, many real-world problems require nonlinear decision boundaries for accurate classification.

# 2. **Feature Space Expansion**: The kernel trick allows SVMs to operate in a higher-dimensional feature space where data points become linearly separable even if they were not in the original space.

# 3. **Kernel Functions**: Instead of computing the transformation explicitly, kernel functions \( K(\mathbf{x}, \mathbf{x'}) \) are used to compute the dot product of the mapped vectors in the higher-dimensional space.

# 4. **Types of Kernels**: Common kernel functions include:
#    - **Linear Kernel**: \( K(\mathbf{x}, \mathbf{x'}) = \mathbf{x}^\top \mathbf{x'} \)
#    - **Polynomial Kernel**: \( K(\mathbf{x}, \mathbf{x'}) = (\gamma \mathbf{x}^\top \mathbf{x'} + r)^d \)
#    - **RBF (Radial Basis Function) Kernel**: \( K(\mathbf{x}, \mathbf{x'}) = \exp(-\gamma \| \mathbf{x} - \mathbf{x'} \|^2) \)

# 5. **Computational Efficiency**: Instead of explicitly mapping each data point into the higher-dimensional space (which can be computationally expensive or even infeasible for infinite-dimensional spaces), kernel functions compute the similarity between data points efficiently.

# 6. **Flexibility**: Different kernels capture different types of relationships between data points, allowing SVMs to fit complex decision boundaries tailored to the problem at hand.

# 7. **Support Vector Expansion**: In the SVM optimization, only the dot products of the data points (through the kernel function) are required, making it feasible to work with large datasets and high-dimensional feature spaces.

# 8. **Practical Application**: The kernel trick significantly enhances SVMs' ability to handle nonlinearities, making them powerful tools for classification tasks in various domains, including image recognition, bioinformatics, and natural language processing.

# 9. **Parameter Tuning**: The choice of kernel and its parameters (such as \( \gamma \) for RBF kernel) impact the SVM's performance and must be carefully selected through cross-validation or grid search.

# 10. **Interpretability**: While the kernel trick increases model complexity, it retains the interpretability of SVMs through the support vectors, which are pivotal in defining the decision boundary.

In [4]:
# Q4. What is the role of support vectors in SVM Explain with example
# Support vectors are critical components of Support Vector Machines (SVMs) that directly influence the construction of the decision boundary. Here's their role explained with an example:

# 1. **Definition**: Support vectors are data points from the training set that lie closest to the decision boundary (hyperplane) between classes. They are the "support" for the optimal separation of classes.

# 2. **Determining the Margin**: The decision boundary in SVM is determined by these support vectors. They are the points where the margin (distance between the classes) is maximized.

# 3. **Influence on the Model**: Only support vectors influence the construction of the hyperplane; other training data points that are correctly classified but do not lie on the margin do not affect the decision boundary.

# 4. **Example**: Consider a binary classification problem with two classes (positive and negative). The support vectors are the data points that lie closest to the boundary separating the positive and negative classes.

# 5. **Critical for Generalization**: SVMs generalize well because they focus on the most challenging instances (support vectors) where the decision boundary is most sensitive.

# 6. **Margin Calculation**: The distance of the support vectors to the decision boundary (margin) is maximized during SVM training, ensuring robust classification performance on unseen data.

# 7. **Sparse Model**: SVMs are often sparse because the decision function is defined by a subset of the training data (the support vectors), making them memory-efficient for deployment.

# 8. **Kernel Methods**: In kernel SVMs, support vectors are crucial as they define the decision boundary in the higher-dimensional feature space induced by the kernel function.

# 9. **Outlier Robustness**: SVMs are less sensitive to outliers because outliers are unlikely to become support vectors unless they significantly affect the margin or class separation.

# 10. **Interpretation**: Understanding the support vectors helps interpret the SVM model's decision-making process and provides insights into which data points are critical for classification in complex datasets.

In [5]:
# Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in SVM?
# Sure, here's a concise explanation with examples and graphs of Hyperplane, Marginal plane, Soft margin, and Hard margin in SVM:

# 1. **Hyperplane**: In SVM, the hyperplane is the decision boundary that separates classes in a higher-dimensional space. For a binary classification problem, it's a line (2D) or plane (3D) and a hyperplane (>3D). Example: \( w_1x_1 + w_2x_2 + b = 0 \).

# 2. **Marginal plane**: This is the boundary parallel to the hyperplane that defines the margin (distance) between the hyperplane and the support vectors.

# 3. **Hard margin**: A SVM with a hard margin does not allow any misclassifications. It finds a hyperplane with maximum margin that separates the classes perfectly. Example: 
#    - Data: ![Image](https:// First Contains also  So
 
#  even had th Cre we Influence So Marketing

In [None]:
# Q6. SVM Implementation through Iris dataset.

# Bonus task: Implement a linear SVM classifier from scratch using Python and compare its
# performance with the scikit-learn implementation.
# ~ Load the iris dataset from the scikit-learn library and split it into a training set and a testing setl
# ~ Train a linear SVM classifier on the training set and predict the labels for the testing setl
# ~ Compute the accuracy of the model on the testing setl
# ~ Plot the decision boundaries of the trained model using two of the featuresl
# ~ Try different values of the regularisation parameter C and see how it affects the performance of
# the model.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # Taking only the first two features for visualization purposes
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Function to plot decision boundaries
def plot_decision_boundary(clf, X, y):
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), np.arange(y_min, y_max, 0.01))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm, s=20, edgecolors='k')
    plt.xlabel('Sepal length')
    plt.ylabel('Sepal width')
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.title('Decision Boundary')

# Train a linear SVM classifier using scikit-learn
def train_and_evaluate_svm(C=1.0):
    svm_classifier = SVC(kernel='linear', C=C, random_state=42)
    svm_classifier.fit(X_train, y_train)
    
    # Predict labels on the test set
    y_pred = svm_classifier.predict(X_test)
    
    # Compute accuracy
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy with C={C}: {accuracy:.2f}")
    
    # Plot decision boundary
    plt.figure(figsize=(8, 6))
    plot_decision_boundary(svm_classifier, X_train, y_train)
    plt.show()

# Try different values of C
for C in [0.001, 0.01, 0.1, 1.0, 10.0]:
    train_and_evaluate_svm(C)
