## Q1. What is the mathematical formula for a linear SVM?

f(x) = sign(w^T * x + b)

In this formula:

* f(x) represents the predicted class label for a given input x.
* w is the weight vector that defines the hyperplane separating the classes.
* x is the input vector.
* b is the bias term or intercept.
* w^T denotes the transpose of the weight vector w.
* sign() is the sign function that returns -1 if the argument is negative, 0 if the argument is zero, and 1 if the argument is positive.

## Q2. What is the objective function of a linear SVM?

The objective function of a linear Support Vector Machine (SVM) is to find the hyperplane that maximizes the margin between the classes while minimizing the classification errors. The objective function typically involves minimizing a cost function that captures both the margin and the misclassification.

For a linear SVM, the objective function is often formulated as follows:

minimize: (1/2) ||w||^2 + C * Σ(max(0, 1 - y_i(w^T * x_i + b)))


## Q3. What is the kernel trick in SVM?

The kernel trick in Support Vector Machines (SVM) is a technique that allows SVMs to efficiently and effectively operate in high-dimensional feature spaces without explicitly computing the coordinates of the transformed data. It allows SVMs to implicitly map the input data into a higher-dimensional space by using a kernel function, thus enabling the linear separation of non-linearly separable data.

The basic idea behind the kernel trick is that instead of explicitly transforming the input data into a higher-dimensional feature space, we can define a kernel function that calculates the inner products between the data points in the feature space. These kernel functions measure the similarity or distance between pairs of data points, allowing the SVM algorithm to operate as if it were in the higher-dimensional space.

Mathematically, the kernel trick can be expressed as follows:

K(x, y) = Φ(x) • Φ(y)


## Q4. What is the role of support vectors in SVM Explain with example

In Support Vector Machines (SVM), support vectors play a crucial role in defining the decision boundary and determining the classification of new data points. Support vectors are the data points from the training set that lie closest to the decision boundary or are on the margin. They are the critical elements used by SVM to make predictions.

The main roles of support vectors in SVM are as follows:

Definition of the decision boundary: The decision boundary in SVM is determined by the support vectors. In a binary classification problem, the decision boundary is a hyperplane that separates the two classes. The support vectors lie on or near this hyperplane and determine its position and orientation. The support vectors play a key role in defining the decision boundary since they are the data points closest to it.

Margin calculation: The margin is the distance between the decision boundary and the closest data points from each class. The support vectors are the data points that lie on the margin or have a non-zero margin. They define the width of the margin and provide an intuitive separation between the classes. SVM aims to maximize the margin, and the support vectors are critical in achieving this objective.

Influence on the classifier: Support vectors have a significant influence on the SVM classifier's predictions. They are the data points that are most likely to affect the classification of new, unseen data points. The decision function of the SVM assigns labels to new instances based on the distances or similarities to the support vectors. The classification decision depends on the support vectors' positions relative to the decision boundary.

Here's an example to illustrate the role of support vectors in SVM:

Consider a 2D dataset with two classes that are not linearly separable. By using the kernel trick, SVM can find a decision boundary in a higher-dimensional feature space. However, in the original input space, some data points become support vectors that lie on or near the margin.

These support vectors determine the position and orientation of the decision boundary. They play a critical role in defining the classification of new data points. When a new data point is encountered, its distance or similarity to the support vectors is computed, and based on these measurements, the SVM classifier assigns a class label.

In summary, support vectors are the key elements in SVM that define the decision boundary, determine the margin, and influence the classification of new data points. They are the critical data points that contribute to the effectiveness and robustness of the SVM algorithm.

## Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in SVM?

The concepts of hyperplane, marginal plane, soft margin, and hard margin in Support Vector Machines (SVM), let's consider a simple example with a two-dimensional dataset.

Suppose we have a binary classification problem with two classes, represented by red and blue points. Here's an example dataset:

the different concepts in SVM:

Hyperplane: In SVM, a hyperplane is a decision boundary that separates the two classes. In a two-dimensional space, a hyperplane is a straight line. The goal of SVM is to find the optimal hyperplane that maximizes the margin between the classes.

Marginal plane: The marginal plane refers to the planes parallel to the hyperplane that touch the support vectors. The margin is the region between these two marginal planes. It is a region of separation between the classes. 

Soft Margin and Hard Margin: In SVM, the margin can be soft or hard, depending on the presence of misclassified points or violations of the margin. The distinction is determined by the regularization parameter C in the SVM algorithm.

Hard Margin: In a hard-margin SVM, no misclassifications are allowed, and the margin must be maximized without any violations. This is suitable for linearly separable data. If the data is not linearly separable, a hard-margin SVM may fail to find a solution.

Soft Margin: In a soft-margin SVM, some misclassifications or margin violations may be allowed within certain limits. The regularization parameter C controls the trade-off between the margin width and the number of misclassifications. A larger C value allows fewer misclassifications, resulting in a narrower margin, while a smaller C value allows more misclassifications, resulting in a wider margin. Soft-margin SVMs are useful when dealing with data that is not perfectly separable.

## Q6. SVM Implementation through Iris dataset.

Bonus task: Implement a linear SVM classifier from scratch using Python and compare its

performance with the scikit-learn implementation.

~ Load the iris dataset from the scikit-learn library and split it into a training set and a testing setl

~ Train a linear SVM classifier on the training set and predict the labels for the testing setl

~ Compute the accuracy of the model on the testing setl

~ Plot the decision boundaries of the trained model using two of the featuresl

~ Try different values of the regularisation parameter C and see how it affects the performance of
the model.

Let's implement a linear SVM classifier from scratch using Python and compare its performance with the scikit-learn implementation on the Iris dataset. Here's an example code that accomplishes the tasks mentioned:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC


# Implement Linear SVM Classifier from scratch
class LinearSVM:
    def __init__(self, learning_rate=0.001, num_iterations=1000, reg_strength=1):
        self.learning_rate = learning_rate
        self.num_iterations = num_iterations
        self.reg_strength = reg_strength

    def fit(self, X, y):
        self.X = np.insert(X, 0, 1, axis=1)  # Insert bias column
        self.y = np.where(y == 0, -1, y)  # Convert class labels to -1 and 1
        self.num_samples, self.num_features = self.X.shape
        self.weights = np.zeros(self.num_features)

        for _ in range(self.num_iterations):
            predictions = self.predict(self.X)
            hinge_losses = np.maximum(0, 1 - self.y * predictions)
            gradient = -(1 / self.num_samples) * np.dot(self.y, self.X)
            gradient += (2 * self.reg_strength / self.num_samples) * self.weights

            self.weights -= self.learning_rate * gradient

    def predict(self, X):
        X = np.insert(X, 0, 1, axis=1)  # Insert bias column
        return np.sign(np.dot(X, self.weights))

    def get_weights(self):
        return self.weights


# Load the Iris dataset
iris = load_iris()
X = iris.data[:, :2]  # Selecting only the first two features for visualization purposes
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a linear SVM classifier from scratch
svm_scratch = LinearSVM()
svm_scratch.fit(X_train, y_train)

# Predict the labels for the testing set using the scratch implementation
y_pred_scratch = svm_scratch.predict(X_test)

# Compute the accuracy of the scratch model on the testing set
accuracy_scratch = accuracy_score(y_test, y_pred_scratch)
print("Accuracy (Scratch):", accuracy_scratch)

# Train a linear SVM classifier using scikit-learn
svm_sklearn = SVC(kernel='linear')
svm_sklearn.fit(X_train, y_train)

# Predict the labels for the testing set using scikit-learn
y_pred_sklearn = svm_sklearn.predict(X_test)

# Compute the accuracy of the scikit-learn model on the testing set
accuracy_sklearn = accuracy_score(y_test, y_pred_sklearn)
print("Accuracy (scikit-learn):", accuracy_sklearn)

# Plot the decision boundaries of the trained models
# Create a meshgrid of points to evaluate the model over the feature space
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
h = 0.02  # Step size in the meshgrid
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

# Make predictions on the meshgrid points using the scratch model
Z_scratch = svm_scratch.predict(np.c_[xx.ravel(), yy.ravel()])
Z_scratch = Z_scratch.reshape(xx.shape)

# Make predictions on the meshgrid points using the scikit-learn model
Z_sklearn = svm_sklearn.predict(np.c_[xx.ravel(), yy.ravel()])
Z_sklearn = Z_sklearn.reshape(xx.shape)

# Plot the decision boundaries and the data points
plt.figure(figsize=(10, 6))
plt.contourf(xx, yy, Z_scratch, alpha=0.6)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.title('Linear SVM Decision Boundaries (Scratch)')
plt.show()

plt.figure(figsize=(10, 6))
plt.contourf(xx, yy, Z_sklearn, alpha=0.6)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.title('Linear SVM Decision Boundaries (scikit-learn)')
plt.show()
