In [None]:
Q1. What is the mathematical formula for a linear SVM?
Answer--The mathematical formula for a linear Support Vector Machine (SVM) can be represented as follows:

Given a set of training data 
(
�
�
,
�
�
)
(x 
i
​
 ,y 
i
​
 ), where 
�
�
x 
i
​
  represents the feature vectors and 
�
�
y 
i
​
  represents the class labels, the linear SVM aims to find the optimal hyperplane that separates the data into two classes.

For a binary classification problem, the decision function of a linear SVM can be expressed as:

�
(
�
)
=
�
�
�
+
�
f(x)=w 
T
 x+b

where:

�
(
�
)
f(x) is the decision function that determines the class label of input vector 
�
x.
�
w is the weight vector perpendicular to the hyperplane.
�
x is the input feature vector.
�
b is the bias term.
The predicted class label 
�
^
y
^
​
  for an input vector 
�
x is determined based on the sign of 
�
(
�
)
f(x). Specifically, if 
�
(
�
)
≥
0
f(x)≥0, the predicted class label is 
+
1
+1; otherwise, the predicted class label is 
−
1
−1.

Q2. What is the objective function of a linear SVM?
Answer--The objective function of a linear Support Vector Machine (SVM) is 
formulated to find the optimal hyperplane that separates the data into 
different classes while maximizing the margin between the classes. In a 
linear SVM, the objective function is designed to minimize the classification
error and maximize the margin simultaneously.

The objective function of a linear SVM can be expressed as a combination of two components:

Margin Maximization: The SVM aims to maximize the margin, which is the distance
between the hyperplane and the nearest data points (support vectors). Maximizing 
the margin helps improve the generalization ability of the classifier.

Classification Error Minimization: The SVM also aims to minimize the classification error, 
ensuring that data points are correctly classified according to their respective classes.

The objective function of a linear SVM is often defined as a convex optimization problem. 
Mathematically, it can be represented as:

min
⁡
�
,
�
1
2
∥
�
∥
2
min 
w,b
​
  
2
1
​
 ∥w∥ 
2
 

subject to the constraint:

�
�
(
�
�
�
�
+
�
)
≥
1
y 
i
​
 (w 
T
 x 
i
​
 +b)≥1

for 
�
=
1
,
2
,
.
.
.
,
�
i=1,2,...,n, where:

�
w is the weight vector perpendicular to the hyperplane.
�
b is the bias term.
∥
�
∥
2
∥w∥ 
2
  represents the squared Euclidean norm of the weight vector.
�
�
y 
i
​
  represents the class label of the 
�
ith training example.
�
�
x 
i
​
  represents the feature vector of the 
�
ith training example.

Q3. What is the kernel trick in SVM?
Answer--The kernel trick is a fundamental concept in Support Vector Machine 
(SVM) algorithms, particularly in cases where the data is not linearly separable

in its original feature space. It allows SVMs to implicitly map input data into
higher-dimensional feature spaces without explicitly computing the transformed 
feature vectors. This transformation enables SVMs to find linear decision boundaries 
in higher-dimensional spaces, even though the classification problem might be
non-linear in the original feature space.

The kernel trick works by introducing a kernel function 
�
K that computes the inner product between pairs of data points in the higher-dimensional 
space without explicitly computing the transformation. Mathematically, the kernel function 
�
K is defined as:
    
Q4. What is the role of support vectors in SVM Explain with example
Answer--In Support Vector Machine (SVM), support vectors play a crucial role in 
defining the decision boundary or hyperplane that separates different classes in 
the feature space. Support vectors are the data points that lie closest to the
decision boundary, and they effectively determine the position and orientation
of the decision boundary.

Here's how support vectors function in SVM with an example:

Consider a binary classification problem where we want to distinguish between 
two classes, say Class A and Class B, in a two-dimensional feature space.
We have the following data points:
    Class A: {(-1, 1), (0, 0), (1, 1)}
Class B: {(-1, -1), (0, -1), (1, -1)}
In the feature space, the SVM algorithm seeks to find the optimal 
hyperplane that maximizes the margin between the two classes while 
minimizing classification errors. The hyperplane is defined by the
support vectors, which are the data points closest to the decision boundary.

In this example, the support vectors are the points closest to the
decision boundary, which will be the points (-1, 1), (1, 1), (-1, -1), and (1, -1).

Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in
SVM?
Answer--To illustrate the concepts of hyperplane, marginal plane, soft margin, 
and hard margin in SVM, let's consider a simple two-dimensional classification 
problem with two classes: Class 1 (blue circles) and Class 2 (red squares).

Here are the definitions of each concept:

Hyperplane: In SVM, a hyperplane is a decision boundary that separates data points
of different classes. For a two-dimensional problem, the hyperplane is a line. 
In higher dimensions, it becomes a hyperplane. The goal of SVM is to find the 
optimal hyperplane that maximizes the margin between the classes.

Marginal plane: The marginal plane is the plane parallel to the hyperplane and 
touching the support vectors. It defines the margins of the SVM classifier.

Soft margin: In soft margin SVM, the classifier allows for some misclassification 
of training examples to achieve a wider margin and better generalization to unseen
data. Soft margin SVM uses a penalty parameter (C) to control the trade-off between 
maximizing the margin and minimizing the classification error.

Hard margin: In hard margin SVM, the classifier does not allow any misclassification
of training examples. It seeks to find a hyperplane that perfectly separates the
classes if such a hyperplane exists. Hard margin SVM can be sensitive to outliers and noisy data.

Let's visualize these concepts using matplotlib in Python:

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm

# Generate some sample data
np.random.seed(0)
X = np.r_[np.random.randn(20, 2) - [2, 2], np.random.randn(20, 2) + [2, 2]]
y = [0] * 20 + [1] * 20

# Fit the model
clf_hard = svm.SVC(kernel='linear', C=1)  # Hard margin SVM
clf_soft = svm.SVC(kernel='linear', C=0.1)  # Soft margin SVM
clf_hard.fit(X, y)
clf_soft.fit(X, y)

# Get the separating hyperplane
w_hard = clf_hard.coef_[0]
w_soft = clf_soft.coef_[0]
a_hard = -w_hard[0] / w_hard[1]
a_soft = -w_soft[0] / w_soft[1]
xx = np.linspace(-5, 5)
yy_hard = a_hard * xx - (clf_hard.intercept_[0]) / w_hard[1]
yy_soft = a_soft * xx - (clf_soft.intercept_[0]) / w_soft[1]

# Plot the data
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm, s=30)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')

# Plot the hyperplane, margins, and support vectors
plt.plot(xx, yy_hard, 'k-', label='Hard margin')
plt.plot(xx, yy_soft, 'k--', label='Soft margin')
plt.plot(xx, yy_hard + 1 / np.sqrt(np.sum(clf_hard.coef_ ** 2)), 'k:')
plt.plot(xx, yy_hard - 1 / np.sqrt(np.sum(clf_hard.coef_ ** 2)), 'k:')
plt.plot(xx, yy_soft + 1 / np.sqrt(np.sum(clf_soft.coef_ ** 2)), 'k-.')
plt.plot(xx, yy_soft - 1 / np.sqrt(np.sum(clf_soft.coef_ ** 2)), 'k-.')
plt.scatter(clf_hard.support_vectors_[:, 0], clf_hard.support_vectors_[:, 1],
            s=100, facecolors='none', edgecolors='k', label='Support vectors')

plt.legend()
plt.show()
Q6. SVM Implementation through Iris dataset.

Bonus task: Implement a linear SVM classifier from scratch using Python and compare its
performance with the scikit-learn implementation.
~ Load the iris dataset from the scikit-learn library and split it into a training set and a testing setl
~ Train a linear SVM classifier on the training set and predict the labels for the testing setl
~ Compute the accuracy of the model on the testing setl
~ Plot the decision boundaries of the trained model using two of the featuresl
~ Try different values of the regularisation parameter C and see how it affects the performance of
the model.
Answer--from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score

# Step 1: Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Step 2: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Train a linear SVM classifier
clf = SVC(kernel='linear')
clf.fit(X_train, y_train)

# Step 4: Predict the labels for the testing set
y_pred = clf.predict(X_test)

# Step 5: Compute the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Step 6: Plot the decision boundaries using two features
# We'll choose the first two features for visualization
X_subset = X[:, :2]
X_train_subset = X_train[:, :2]
X_test_subset = X_test[:, :2]

# Plotting the decision boundaries
x_min, x_max = X_subset[:, 0].min() - 1, X_subset[:, 0].max() + 1
y_min, y_max = X_subset[:, 1].min() - 1, X_subset[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
                     np.arange(y_min, y_max, 0.1))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.4)
plt.scatter(X_train_subset[:, 0], X_train_subset[:, 1], c=y_train, s=20, edgecolors='k', label='Training set')
plt.scatter(X_test_subset[:, 0], X_test_subset[:, 1], c=y_test, s=100, marker='x', edgecolors='k', label='Testing set')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Decision Boundaries of Linear SVM')
plt.legend()
plt.show()

# Step 7: Experiment with different values of the regularization parameter C
C_values = [0.1, 1, 10, 100]
for C in C_values:
    clf = SVC(kernel='linear', C=C)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pr
