In [None]:
#Q1):-
The mathematical formula for a linear Support Vector Machine (SVM) can be expressed as follows:

Given a training dataset with input features represented as vectors (x1,x2,x3,....,xn) and
their corresponding binary class labels (y1,y2,..,yn)  where yi belongs to either + 1 or -1
the goal of the linear SVM is to find the optimal hyperplane that best separates the two classes.

The hyperplane can be represented as:

 w.x+ b = 0

Where:
 w is the weight vector (normal vector) perpendicular to the hyperplane.
 x represents the input feature vector.
 b is the bias term, which determines the offset of the hyperplane from the origin.

The decision function of the linear SVM is given by:

 x= w.x+ b 

The predicted class for a given input \(\mathbf{x}\) is determined by the sign of f(x)
f(x)>0, the sample is classified as the positive class.
f(x)<0, the sample is classified as the negative class.

The training objective of the linear SVM is to find \(\mathbf{w}\) and \(b\) that maximize the margin between the two classes while minimizing 
the classification error. The margin is the distance between the hyperplane and the closest data points of the two classes.

In [None]:
#Q2):-

The objective function of a linear Support Vector Machine (SVM) is to find the parameters w and b that define the optimal hyperplane, 
which maximizes the margin between the two classes while minimizing the classification error.

In the case of a linearly separable dataset, the objective function of the linear SVM is to maximize the margin while ensuring that all data points 
are correctly classified. The margin is defined as the distance between the hyperplane and the closest data points (support vectors) of the two
classes.

The objective function is typically formulated as follows:

Minimize: 1/2 ||w||^2

Subject to the constraints:
   yi (w.xi+b)>1, for i=1,2,...,n


In [None]:
#Q3):-
The kernel trick is a fundamental concept in Support Vector Machines (SVM) that allows SVMs to efficiently handle non-linearly separable data by
implicitly transforming the original feature space into a higher-dimensional space. It is a clever mathematical technique that avoids the explicit 
computation of the transformed feature space, making SVMs computationally more efficient.

In the standard linear SVM, the decision boundary is a hyperplane in the original feature space. However, many real-world datasets are not linearly
separable, and finding a linear boundary would result in poor classification performance. The kernel trick addresses this limitation by introducing a 
kernel function, which implicitly computes the dot product between the transformed feature vectors in a higher-dimensional space, without explicitly
transforming the data into that space.

The general idea is to find a mapping function ϕ(x)  that maps the input feature vector x into a higher-dimensional space, often referred to as the 
feature space. The kernel function K(xi,xj) then calculates the dot product of the transformed feature vectors ϕ(xi)  and ϕ(xj) without having to 
explicitly compute ϕ(xi) and ϕ(xj).

The kernel function K(xi,xj) takes two input feature vectors xi and xj and computes their dot product in the feature space:
    K(xi,xj)=ϕ(x)⋅ϕ(xj)
    
The key benefit of using the kernel trick is that it avoids the computational overhead of explicitly transforming the data into the 
higher-dimensional feature space, which can become impractical or even infeasible for very high-dimensional spaces. Instead, the kernel
function allows SVM to operate directly in the original feature space while effectively leveraging the benefits of working in a higher-dimensional
space.

Some commonly used kernel functions include:

Linear Kernel: K(xi,xj)=xi⋅xj
Polynomial Kernel: K(xi,x j)=(xi⋅xj+c)d

The choice of the kernel function depends on the nature of the data and the problem at hand. By selecting an appropriate kernel function, 
SVMs can effectively handle complex decision boundaries and capture non-linear relationships between features, making them powerful and versatile 
classifiers.

In [None]:
#Q4):-
In Support Vector Machines (SVM), support vectors play a crucial role in defining the optimal hyperplane that separates the data into different classes. Support vectors are the data points from the training set that lie closest to the decision boundary (hyperplane). These points are the most informative for defining the decision boundary because they are the ones that "support" the placement of the hyperplane.

To illustrate the role of support vectors, let's consider a simple example with a 2-dimensional dataset and a linear SVM.

Example:
Suppose we have a binary classification problem with two classes: circles (O) and crosses (X). The dataset is as follows:

Circles (O): (2, 2), (3, 3), (4, 4)
Crosses (X): (1, 4), (2, 5), (3, 6)
The goal of the SVM is to find the best hyperplane (line in 2D) that separates the circles from the crosses.
Let's assume the SVM finds the following hyperplane as the decision boundary: 
x2=x1+1.

In this case, the support vectors are the points closest to the decision boundary. Let's find them and mark them in the plot:

Circles (O): (2, 2), (3, 3)
Crosses (X): (3, 6)
The support vectors are the data points that lie closest to the decision boundary. In this example, they are (2, 2), (3, 3), and (3, 6).
These points determine the position of the hyperplane, and the distance between the hyperplane and the support vectors is known as the margin.

In SVM, the objective is to maximize the margin while minimizing the classification error. The hyperplane is chosen in such a way that it maximizes
the distance between the support vectors, as the larger the margin, the more confident the model is in its classification.

Support vectors are crucial for the SVM's robustness and generalization because they represent the most critical data points that define the decision
boundary. SVM focuses on these support vectors rather than the entire dataset, which allows it to be efficient and effective, especially in
high-dimensional spaces. Additionally, SVM's decision boundary depends only on the support vectors, which makes it less sensitive to outliers that
may exist in the data.

In [None]:
#Q5):-
To illustrate the concepts of hyperplane, marginal plane, soft margin, and hard margin in SVM, let's consider a simple 2-dimensional dataset 
with two classes: circles (O) and crosses (X).
Suppose we have the following data points:
Circles (O): (2, 3), (3, 5), (4, 4), (5, 5)
Crosses (X): (1, 1), (2, 2), (4, 2), (5, 3)

Hyperplane:
The hyperplane is the decision boundary that separates the two classes. In a 2-dimensional space, the hyperplane is a line. 
SVM aims to find the optimal hyperplane that maximizes the margin between the two classes.

Example hyperplane: x2=x1−1

Marginal Plane:
The marginal plane, also known as the decision plane, is the hyperplane that lies parallel to the optimal hyperplane and passes through the support
vectors. The marginal plane is important because it defines the margin, which is the distance between the support vectors and the hyperplane.
Example marginal plane corresponding to the hyperplane: x2=x1−1

Hard Margin SVM:
In hard margin SVM, the objective is to find a hyperplane that perfectly separates the two classes with no misclassifications. 
This means that all data points of one class are on one side of the hyperplane, and all data points of the other class are on the other side. 
Hard margin SVM works only when the data is linearly separable.
Example hyperplane for hard margin SVM: x2=x1−1

Soft Margin SVM:
In soft margin SVM, the objective is to find a hyperplane that separates the two classes with some tolerance for misclassifications (errors) 
and allows for a small number of data points to be on the wrong side of the margin or even misclassified. Soft margin SVM is used when the data
is not perfectly separable by a hyperplane.
Example hyperplane for soft margin SVM: x2=x1−1

In [None]:
#Q6):-
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Split the dataset into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create the SVM classifier with a linear kernel
# Note: C is the regularization parameter. Smaller C values correspond to stronger regularization.
C = 1.0
svm_classifier = SVC(kernel='linear', C=C)

# Train the classifier on the training set
svm_classifier.fit(X_train, y_train)
# Predict labels for the testing set
y_pred = svm_classifier.predict(X_test)

# Compute the accuracy of the model on the testing set
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
# Function to plot the decision boundaries
def plot_decision_boundaries():
    # We'll use the first two features for visualization
    feature1 = 0
    feature2 = 1

    # Create a meshgrid to plot the decision boundaries
    h = 0.02  # Step size in the mesh
    x_min, x_max = X[:, feature1].min() - 1, X[:, feature1].max() + 1
    y_min, y_max = X[:, feature2].min() - 1, X[:, feature2].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

    # Make predictions on the meshgrid points
    Z = svm_classifier.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # Plot the decision boundaries and data points
    plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
    plt.scatter(X[:, feature1], X[:, feature2], c=y, cmap=plt.cm.coolwarm, edgecolors='k')
    plt.xlabel(iris.feature_names[feature1])
    plt.ylabel(iris.feature_names[feature2])
    plt.title("Decision Boundaries of Linear SVM")
    plt.show()

# Plot the decision boundaries
plot_decision_boundaries()
# Try different values of C
C_values = [0.01, 0.1, 1.0, 10.0, 100.0]

# Train and evaluate the model for each value of C
for C in C_values:
    svm_classifier = SVC(kernel='linear', C=C)
    svm_classifier.fit(X_train, y_train)
    y_pred = svm_classifier.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"C = {C}, Accuracy: {accuracy:.2f}")


In [None]:
#Q):-
BONUS TASK:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Define the Linear SVM Classifier
class LinearSVM:
    def __init__(self, learning_rate=0.01, max_iters=1000, C=1.0):
        self.learning_rate = learning_rate
        self.max_iters = max_iters
        self.C = C
        self.W = None
        self.b = None

    def fit(self, X, y):
        n_samples, n_features = X.shape

        # Initialize weight vector and bias term
        self.W = np.zeros(n_features)
        self.b = 0

        # Training using SMO
        for _ in range(self.max_iters):
            for i in range(n_samples):
                if y[i] * (np.dot(self.W, X[i]) + self.b) < 1:
                    self.W = self.W + self.learning_rate * (self.C * y[i] * X[i] - 2 * self.C * self.W)
                    self.b = self.b + self.learning_rate * self.C * y[i]

    def predict(self, X):
        return np.sign(np.dot(X, self.W) + self.b)

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the dataset into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the linear SVM classifier from scratch
svm_classifier_scratch = LinearSVM(learning_rate=0.01, max_iters=1000, C=1.0)
svm_classifier_scratch.fit(X_train, y_train)

# Predict labels for the testing set
y_pred_scratch = svm_classifier_scratch.predict(X_test)

# Compute the accuracy of the model on the testing set
accuracy_scratch = accuracy_score(y_test, y_pred_scratch)
print(f"Accuracy of SVM from scratch: {accuracy_scratch:.2f}")


Now, let's use the scikit-learn implementation of the linear SVM classifier and compare the accuracy:

from sklearn.svm import SVC

# Train the linear SVM classifier using scikit-learn
svm_classifier_sklearn = SVC(kernel='linear', C=1.0)
svm_classifier_sklearn.fit(X_train, y_train)

# Predict labels for the testing set using scikit-learn implementation
y_pred_sklearn = svm_classifier_sklearn.predict(X_test)

# Compute the accuracy of the model on the testing set using scikit-learn implementation
accuracy_sklearn = accuracy_score(y_test, y_pred_sklearn)
print(f"Accuracy of SVM using scikit-learn: {accuracy_sklearn:.2f}")
