### Q1. What is the mathematical formula for a linear SVM?

A linear SVM (Support Vector Machine) is a type of binary classifier that separates two classes by finding the hyperplane with the maximum margin between them. The mathematical formula for a linear SVM can be expressed as follows:

Given a training dataset of labeled points (x1, y1), (x2, y2), ..., (xn, yn), where xi is the input feature vector of dimension d, and yi is either +1 or -1 denoting the class label of the corresponding point, the objective of a linear SVM is to find the hyperplane w.x + b = 0 that maximizes the margin between the two classes.

Here, w is the weight vector perpendicular to the hyperplane, and b is the bias term.

The optimization problem for a linear SVM can be formulated as follows:

minimize ||w|| subject to yi(w.xi + b) >= 1 for all i

where ||w|| is the L2-norm of the weight vector w, and yi(w.xi + b) is the margin of the i-th point with respect to the hyperplane.

The solution to the optimization problem is the weight vector w and the bias term b that satisfy the above constraints and maximize the margin. Once the optimal hyperplane is found, the classification of a new point x is performed by evaluating the sign of w.x + b. If the result is positive, the point is assigned to one class, and if it is negative, the point is assigned to the other class.

### Q2. What is the objective function of a linear SVM?

The objective function of a linear SVM (Support Vector Machine) is to find the hyperplane that maximizes the margin between two classes. The margin is the distance between the hyperplane and the closest points of each class, also known as support vectors.

Given a training dataset of labeled points (x1, y1), (x2, y2), ..., (xn, yn), where xi is the input feature vector of dimension d, and yi is either +1 or -1 denoting the class label of the corresponding point, the objective of a linear SVM can be formulated as follows:

minimize (1/2)||w||^2 subject to yi(w.xi + b) >= 1 for all i

where ||w|| is the L2-norm of the weight vector w, and yi(w.xi + b) is the margin of the i-th point with respect to the hyperplane. The constant 1/2 is used for mathematical convenience and does not affect the solution.

The objective function can be interpreted as finding the hyperplane that separates the two classes with the largest margin while ensuring that all the data points are correctly classified. The constraint yi(w.xi + b) >= 1 ensures that the margin is at least 1, and therefore, the hyperplane is not too sensitive to small perturbations in the data.

The solution to the optimization problem is the weight vector w and the bias term b that satisfy the above constraints and maximize the margin. Once the optimal hyperplane is found, the classification of a new point x is performed by evaluating the sign of w.x + b. If the result is positive, the point is assigned to one class, and if it is negative, the point is assigned to the other class.

### Q3. What is the kernel trick in SVM?

The kernel trick is a technique used in SVM (Support Vector Machine) to extend the linear classifier to a nonlinear one, without explicitly computing the nonlinear feature space. It is used to find a decision boundary that can separate the data in a high-dimensional space.

The idea behind the kernel trick is to transform the input data into a higher-dimensional space where a linear boundary can be used to separate the data. This is done by replacing the dot product between two data points with a kernel function that computes the similarity between the data points in the high-dimensional space. The kernel function is a mathematical function that takes two vectors as input and returns a scalar value that represents their similarity.

The most commonly used kernel functions are the linear kernel, the polynomial kernel, and the radial basis function (RBF) kernel. The linear kernel corresponds to the dot product between two vectors and is used when the data is linearly separable. The polynomial kernel and the RBF kernel are used when the data is not linearly separable.

By using the kernel trick, the SVM can learn a nonlinear decision boundary in the high-dimensional feature space without explicitly computing the coordinates of the data points in the feature space. This is computationally efficient and avoids the curse of dimensionality problem that arises when the dimensionality of the feature space is large.

The kernel trick is a powerful technique that allows SVM to be applied to a wide range of applications, including image recognition, text classification, and bioinformatics.

### Q4. What is the role of support vectors in SVM Explain with example

Support vectors play a crucial role in SVM (Support Vector Machine) as they are the data points closest to the decision boundary or the hyperplane. These points are used to define the decision boundary and determine the margin of the SVM.

In SVM, the goal is to find the hyperplane that separates the two classes of data with the maximum margin, where the margin is the distance between the hyperplane and the closest data points of each class. The data points that lie on the margin or the closest to the margin are known as support vectors.

Support vectors are important because they determine the position of the decision boundary, and any change in their position could result in a different decision boundary. Therefore, the support vectors define the SVM and have a significant impact on its performance.

For example, consider a binary classification problem where the goal is to classify images of cats and dogs. The SVM algorithm tries to find a hyperplane that separates the two classes of images with the maximum margin. The images that are closest to the hyperplane, or lie on the margin, are the support vectors. These images are the most informative and are used to define the decision boundary.

If the position of the support vectors changes, the decision boundary of the SVM also changes. Therefore, the choice of support vectors is critical to the performance of the SVM. In practice, the SVM algorithm selects the support vectors automatically during the training process, based on their distance from the decision boundary.

### Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in
SVM?

Hyperplane:
In SVM, the hyperplane is the decision boundary that separates the data points of different classes. In a binary classification problem, the hyperplane is a linear boundary that separates the data points into two classes. For example, consider the following dataset of two classes, where the blue circles represent class 0, and the red triangles represent class 1.
svm_hyperplane_1

The hyperplane that separates the two classes is a line in this case, represented by the equation w.x + b = 0, where w is the weight vector, x is the input feature vector, and b is the bias term.

Marginal plane:
The marginal plane in SVM is the plane that is parallel to the hyperplane and passes through the support vectors. It is used to define the margin, which is the distance between the hyperplane and the closest data points of each class. For example, consider the following dataset, where the two classes are not linearly separable.
svm_marginal_plane_1

In this case, we can use the kernel trick to transform the data into a higher-dimensional space where it is separable. The hyperplane in this space is a plane, and the marginal plane is the parallel plane that passes through the support vectors.

svm_marginal_plane_2

Hard margin:
A hard margin SVM is an SVM with no tolerance for misclassification. It tries to find the hyperplane that separates the data with the maximum margin while ensuring that all the data points are correctly classified. For example, consider the following dataset, where the two classes are linearly separable.
svm_hard_margin_1

In this case, the hard margin SVM tries to find the hyperplane that separates the two classes with the maximum margin while ensuring that all the data points are correctly classified.

svm_hard_margin_2

Soft margin:
A soft margin SVM is an SVM with some tolerance for misclassification. It allows some data points to be misclassified to find a more generalizable decision boundary. For example, consider the following dataset, where the two classes are not linearly separable.
svm_soft_margin_1

In this case, a hard margin SVM cannot be used since the data is not linearly separable. Instead, a soft margin SVM can be used to find the hyperplane that separates the two classes with a margin that allows some misclassification. The trade-off between the margin size and the number of misclassified points is controlled by a parameter C, which determines the penalty for misclassification.

svm_soft_margin_2

In summary, the concepts of Hyperplane, Marginal plane, Soft margin, and Hard margin are important in understanding how SVM works and how to choose the appropriate parameters for a given dataset.

### Q6. SVM Implementation through Iris dataset.
    ~ Load the iris dataset from the scikit-learn library and split it into a training set and a testing setl
	~ Train a linear SVM classifier on the training set and predict the labels for the testing setl
	~ Compute the accuracy of the model on the testing setl
	~ Plot the decision boundaries of the trained model using two of the featuresl
	~ Try different values of the regularisation parameter C and see how it affects the performance of
	the model.
   

In [2]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train linear SVM classifier
svm = SVC(kernel='linear')
svm.fit(X_train, y_train)

# Predict labels for testing set
y_pred = svm.predict(X_test)

# Compute accuracy of model on testing set
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)


Accuracy: 1.0


### Bonus task: Implement a linear SVM classifier from scratch using Python and compare its
performance with the scikit-learn implementation.

In [3]:
import numpy as np

class LinearSVM:
    def __init__(self, lr=0.01, num_iters=1000, C=1.0):
        self.lr = lr
        self.num_iters = num_iters
        self.C = C
    
    def fit(self, X, y):
        n_samples, n_features = X.shape
        
        # Initialize parameters
        self.w = np.zeros(n_features)
        self.b = 0
        
        # Gradient descent
        for _ in range(self.num_iters):
            for i, x_i in enumerate(X):
                cond = y[i] * (np.dot(x_i, self.w) - self.b) >= 1
                
                if cond:
                    self.w -= self.lr * (2 * self.C * self.w)
                else:
                    self.w -= self.lr * (2 * self.C * self.w - np.dot(x_i, y[i]))
                    self.b -= self.lr * y[i]
    
    def predict(self, X):
        linear_output = np.dot(X, self.w) - self.b
        return np.sign(linear_output)


This code implements a linear SVM classifier using gradient descent to find the optimal values for the parameters w and b. The fit method updates the parameters using the gradient descent algorithm, while the predict method computes the linear output of the SVM and returns the predicted class labels.

To compare the performance of this implementation with scikit-learn, we can use the same iris dataset and split it into training and testing sets, train both the custom implementation and scikit-learn implementation of linear SVM classifier on the training set, and evaluate their performance on the testing set. Here's an example code to do that:

In [4]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score

# Load iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train custom linear SVM classifier
svm_custom = LinearSVM(lr=0.01, num_iters=1000, C=1.0)
svm_custom.fit(X_train, y_train)
y_pred_custom = svm_custom.predict(X_test)

# Train scikit-learn linear SVM classifier
svm_sklearn = LinearSVC(C=1.0)
svm_sklearn.fit(X_train, y_train)
y_pred_sklearn = svm_sklearn.predict(X_test)

# Compute accuracy of models on testing set
accuracy_custom = accuracy_score(y_test, y_pred_custom)
accuracy_sklearn = accuracy_score(y_test, y_pred_sklearn)

print('Custom Linear SVM Accuracy:', accuracy_custom)
print('Scikit-learn Linear SVM Accuracy:', accuracy_sklearn)


Custom Linear SVM Accuracy: 0.3
Scikit-learn Linear SVM Accuracy: 1.0




This code trains both the custom implementation and scikit-learn implementation of linear SVM classifier on the same iris dataset, and computes their accuracy on the testing set. By comparing the accuracy scores of both models, we can see how well the custom implementation performs compared to the scikit-learn implementation.



