1) What is the mathematical formula for a linear SVM?

A linear SVM (Support Vector Machine) is a type of classifier that separates data points using a hyperplane. The mathematical formula for a linear SVM can be expressed as:

f(x) = sign(w^T x + b)

where:

x is the input data point 

w is a vector of weights  that determine the orientation and position of the hyperplane

b is a scalar bias term that shifts the hyperplane away from the origin

sign() is the sign function, which returns -1 if the argument is negative, 0 if it is zero, and +1 if it is positive.

The goal of training a linear SVM is to find the optimal values of w and b that maximize the margin (distance) between the hyperplane and the closest data points from each class. This is achieved by solving an optimization problem that minimizes the norm of w subject to the constraint that all data points are classified correctly

2) What is the objective function of a linear SVM?

The objective function of a linear SVM  is to find the hyperplane that maximizes the margin between the closest data points from each class. The margin is defined as the distance between the hyperplane and the closest data points. The optimal hyperplane is the one that maximizes this margin, which in turn leads to better generalization performance on unseen data.

The objective function of a linear SVM can be expressed as a constrained optimization problem:

minimize 1/2 ||w||^2

subject to y_i(w^T x_i + b) >= 1 for all i = 1, ..., n

where:

w is a vector of weights  that determine the orientation and position of the hyperplane

b is a scalar bias term that shifts the hyperplane away from the origin

x_i is the i-th input data point 

y_i is the corresponding label (+1 or -1) of the i-th data point

n is the number of data points.

The first term 1/2 ||w||^2 is a regularization term that penalizes large weights, which helps prevent overfitting. The second term is a constraint that enforces that all data points are classified correctly with a margin of at least 1. This is achieved by ensuring that the product of the predicted label and the actual label is greater than or equal to 1 for all data points

3) What is the kernel trick in SVM?

The kernel trick is a technique used in support vector machines (SVM) that allows the algorithm to efficiently find non-linear decision boundaries in high-dimensional feature spaces.

In SVM, the goal is to find a hyperplane that separates two classes of data points in a way that maximizes the margin, or the distance between the hyperplane and the closest points of each class. However, when the data is not linearly separable in the original feature space, the SVM algorithm can't find a linear decision boundary that maximizes the margin.

The kernel trick solves this problem by transforming the original feature space into a higher-dimensional space, where the data may become linearly separable. This transformation is done by applying a nonlinear function, called a kernel function, to the original feature space. The kernel function calculates the dot product between pairs of data points in the transformed space, without explicitly computing the coordinates of the data points in the higher-dimensional space.

The most common kernel functions used in SVM are the linear kernel, polynomial kernel, and radial basis function (RBF) kernel. The RBF kernel is the most commonly used kernel function as it has proven to be effective in many practical applications.

The kernel trick enables SVM to effectively learn non-linear decision boundaries in high-dimensional feature spaces, without actually computing the coordinates of the transformed data points. This makes SVM computationally efficient, and allows it to handle complex datasets with large numbers of features

4) What is the role of support vectors in SVM Explain with example

In support vector machines (SVM), support vectors are the data points that lie closest to the decision boundary, or hyperplane, and therefore play a critical role in defining the decision boundary and the margin.

Support vectors are important because they determine the orientation and position of the decision boundary, as well as the width of the margin. The decision boundary is defined as the hyperplane that separates the data into two classes, with the maximum margin possible, i.e., the distance between the hyperplane and the closest data points of each class.

The support vectors are the data points that lie on the margin boundary or within the margin, and their position and orientation determine the hyperplane's location and orientation. In other words, if we move or remove any support vector, the decision boundary would change, and the margin could become smaller or disappear altogether.

ex: let's consider a simple binary classification problem where we have two classes of data points in a 2D feature space, and the goal is to find a linear decision boundary that separates the two classes. The following figure shows the two classes of data points and a possible decision boundary (blue line) that separates them

In this case, we can see that there are three data points (circled in red) that lie on the margin boundary or within the margin, and they are the support vectors. If we move or remove any of these support vectors, the decision boundary and the margin would change.

The SVM algorithm finds the support vectors by solving an optimization problem that aims to maximize the margin between the classes while minimizing the classification error. The support vectors are the data points that are closest to the decision boundary, and they are used to calculate the hyperplane's orientation and position

Let's consider a simple example of a binary classification problem with two features, x1 and x2. We want to separate the two classes using SVM

The red points belong to one class, and the blue points belong to the other class. We can see that there is no linear boundary that can separate the two classes.

Now, let's explore the different concepts in SVM:

Hyperplane:

In SVM, a hyperplane is a decision boundary that separates the two classes. For a two-dimensional dataset, the hyperplane is a line. For higher-dimensional datasets, the hyperplane is a hyperplane.

ex:

SVM_hyperplane

The line shown in the figure is the hyperplane. It separates the red and blue classes. The points on the hyperplane are called support vectors.

Margin plane:

The margin plane is a parallel line to the hyperplane that separates the support vectors of the two classes. The margin is the distance between the hyperplane and the margin plane.

ex :

SVM_marginplane

The blue line is the margin plane, and the red line is the hyperplane. The margin is the distance between the two lines. In this case, the margin is maximized.

Hard margin:

In a hard margin SVM, the goal is to find a hyperplane that separates the two classes with maximum margin and with no misclassified points. This is only possible when the two classes are linearly separable.

ex:

SVM_hardmargin

The red and blue lines represent the hyperplane and margin plane, respectively. All the points are correctly classified, and the margin is maximized.

Soft margin:

In a soft margin SVM, the goal is to find a hyperplane that separates the two classes with maximum margin, but allows some misclassified points. This is necessary when the two classes are not linearly separable.

ex:

SVM_softmargin

The red and blue lines represent the hyperplane and margin plane, respectively. In this case, there are some misclassified points (the blue point on the left), but the margin is still maximized

6) SVM Implementation through Iris dataset.
~ Load the iris dataset from the scikit-learn library and split it into a training set and a testing setl

~ Train a linear SVM classifier on the training set and predict the labels for the testing setl

~ Compute the accuracy of the model on the testing setl

~ Plot the decision boundaries of the trained model using two of the featuresl

~ Try different values of the regularisation parameter C and see how it affects the performance of
the model.

Bonus task: Implement a linear SVM classifier from scratch using Python and compare its
performance with the scikit-learn implementation.

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd


In [11]:
iris=load_iris()

In [12]:
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)


In [13]:
svm = SVC(kernel='linear')
svm.fit(X_train, y_train)


In [14]:
y_pred=svm.predict(X_test)

In [15]:
y_pred

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0])

In [16]:
accuracy_score(y_test,y_pred)

1.0

In [17]:
f1=0
f2=1


In [19]:
x_min, x_max = X_train[:, f1].min() - 1, X_train[:, f1].max() + 1
y_min, y_max = X_train[:, f2].min() - 1, X_train[:, f2].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02), np.arange(y_min, y_max, 0.02))


In [None]:
Z = svm.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

In [None]:
plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X_train[:, feature1], X_train[:, feature2], c=y_train, cmap=plt.cm.coolwarm)
plt.xlabel(iris.feature_names[feature1])
plt.ylabel(iris.feature_names[feature2])
plt.title("SVM Decision Boundaries")
plt.show()

In [24]:
for C in [0.1, 1, 10]:
    svm = SVC(kernel='linear', C=C)
    svm.fit(X_train, y_train)
    y_pred = svm.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print("Accuracy (C={}): {}".format(C, accuracy))

Accuracy (C=0.1): 1.0
Accuracy (C=1): 1.0
Accuracy (C=10): 0.9666666666666667


Implement a linear SVM classifier from scratch using Python and compare its
performance with the scikit-learn implementation.

In [25]:
class LinearSVM:
    def __init__(self, lr=0.01, lambda_reg=0.01, num_epochs=1000):
        self.lr = lr
        self.lambda_reg = lambda_reg
        self.num_epochs = num_epochs
        self.weights = None
        self.bias = None

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0
        y_ = np.where(y <= 0, -1, 1)

        for epoch in range(self.num_epochs):
            for i in range(n_samples):
                condition = y_[i] * (np.dot(X[i], self.weights) - self.bias) >= 1
                if condition:
                    self.weights -= self.lr * (2 * self.lambda_reg * self.weights)
                else:
                    self.weights -= self.lr * (2 * self.lambda_reg * self.weights - np.dot(X[i], y_[i]))
                    self.bias -= self.lr * y_[i]

    def predict(self, X):
        linear_output = np.dot(X, self.weights) - self.bias
        return np.sign(linear_output)

In [29]:
from sklearn.datasets import make_blobs

In [30]:
X, y = make_blobs(n_samples=1000, centers=2, n_features=2, random_state=42)

In [32]:
from sklearn.svm import LinearSVC

In [33]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
clf = LinearSVM()
clf.fit(X_train, y_train)
clf_sklearn = LinearSVC()
clf_sklearn.fit(X_train, y_train)
accuracy = np.mean(clf.predict(X_test) == y_test)
accuracy_sklearn = clf_sklearn.score(X_test, y_test)

print("Accuracy (our implementation): {:.2f}%".format(accuracy*100))
print("Accuracy (scikit-learn implementation): {:.2f}%".format(accuracy_sklearn*100))

Accuracy (our implementation): 49.67%
Accuracy (scikit-learn implementation): 100.00%
