In [None]:
Q1. What is the mathematical formula for a linear SVM?


Ans:
    
    
    The mathematical formula for a linear Support Vector Machine (SVM) can be expressed as follows:

Given a set of labeled training data points:
- Input vectors: x1, x2,....,xn
- Corresponding labels: y1, y2,...,yn, where yi ∈ {-1, 1}

The goal of a linear SVM is to find a hyperplane in the feature space that
best separates the data into two classes while maximizing the margin.
The equation of this hyperplane can be expressed as:

    
w⋅x+b=0
Where:
- w is a vector of weights (coefficients) that defines the orientation of the hyperplane.
- x is the input feature vector.
- b is the bias term or intercept, which determines the position of the hyperplane.

The decision function of the linear SVM can be defined as:

    f(x)= w⋅x+b

The SVM aims to find (w) and (b) such that the following conditions hold:
1. f(xi) ≥ 1 for all data points xi with label yi = 1 (positive class).
2. f(xi) ≤ -1 for all data points xi with label yi = -1 (negative class).
3. The margin between the two parallel hyperplanes f(x) = 1 and f(x) = -1 is maximized.

The margin is defined as the distance between these two hyperplanes, and the SVM aims 
to maximize this margin while satisfying the conditions above. The vector w  is 
perpendicular to the hyperplane, and its magnitude (∥w∥) is inversely proportional
to the margin. Therefore, the optimization problem for finding the best hyperplane is 
often formulated as a convex quadratic programming problem, where the objective is to 
maximize 1∥w∥2 subject to the constraints mentioned above.

The SVM then makes predictions by evaluating the sign of \(f(x)\). If f(x) ≥ 0,
the input x is classified as the positive class (1); otherwise, 
it is classified as the negative class (-1).









Q2. What is the objective function of a linear SVM?


Ans:
    
    The objective function of a linear Support Vector Machine (SVM) is to find the hyperplane
    that best separates the data into two classes while maximizing the margin between the two
    classes. In a binary classification problem, where you have two classes, typically
    labeled as +1 and -1, the objective function is to find a hyperplane defined by the equation:

w · x + b = 0

Where:
- "w" is the weight vector perpendicular to the hyperplane.
- "x" is the input data point.
- "b" is the bias term or the intercept.

The goal of the linear SVM is to find the "w" and "b" that maximize the margin
between the two classes. The margin is defined as the distance between the hyperplane
and the nearest data point from each class. Mathematically, the margin is given by:

Margin = 2 / ||w||

Here, "||w||" represents the Euclidean norm of the weight vector "w."

The objective function can be formulated as an optimization problem, typically a
quadratic programming problem, as follows:

Minimize: 1/2 * ||w||^2

Subject to the constraints:

y_i * (w · x_i + b) ≥ 1 for all training data points (x_i, y_i), where y_i
is +1 or -1 depending on the class label.

In this formulation, the SVM aims to minimize the L2 norm of the weight vector
"w" while ensuring that all data points are correctly classified and are at least 
at a distance of 1 from the hyperplane. This constraint ensures that the margin is maximized.

The linear SVM finds the optimal "w" and "b" values that satisfy these
constraints and minimize the objective function, resulting in a hyperplane
that effectively separates the two classes with a maximum margin.
This approach makes SVMs particularly effective for binary classification
tasks with a clear margin of separation.












Q3. What is the kernel trick in SVM?


Ans:
    
    The kernel trick is a fundamental concept in Support Vector Machines (SVMs) that
    allows SVMs to perform non-linear classification or regression tasks by implicitly 
    mapping the input data into a higher-dimensional space. This trick makes it possible
    for SVMs to find a hyperplane (or decision boundary)
    that can separate the data even when the data is not linearly separable in
    the original feature space.

Here's how the kernel trick works:

1. Original Feature Space: In a typical SVM, you start with your original feature space,
where your data points are represented as vectors in a lower-dimensional space.

2. Non-Linear Mapping: The kernel trick involves applying a mathematical function called a
"kernel" to map the data from the original feature space to a higher-dimensional feature space.
This mapping is typically non-linear and allows for more complex relationships to be captured.

3. Linear Separation: In this higher-dimensional space, SVM tries to find a hyperplane that
best separates the data points into different classes while maximizing the margin (distance)
between the classes. This is done as if the data were linearly separable in this higher-dimensional space.

4. Implicit Calculation: The clever part of the kernel trick is that it doesn't
explicitly compute the transformation into the higher-dimensional space, which
would be computationally expensive for large datasets or complex transformations. 
Instead, it relies on a kernel function that computes the dot product between pairs 
of data points in the higher-dimensional space without explicitly calculating the
transformation. Common kernel functions include the linear kernel, polynomial kernel,
and radial basis function (RBF) kernel, among others.

By using the kernel trick, SVMs can effectively learn complex decision boundaries
in the original feature space without explicitly representing the data in the 
higher-dimensional space, making them powerful tools for both linear and
non-linear classification and regression tasks. Different kernel functions 
are chosen based on the specific characteristics of the data and the problem at hand.
    
    
    
    
    
    
    
    
    
Q4. What is the role of support vectors in SVM Explain with example



Ans:
    
In Support Vector Machines (SVM), support vectors are the data points that play a 
crucial role in defining the decision boundary (hyperplane) between different classes.
SVM is a supervised machine learning algorithm used for classification and regression tasks,
and it works by finding the optimal hyperplane that maximizes the margin between classes.
Support vectors are the data points closest to this hyperplane, and they are the ones that
are most influential in determining the position and orientation of the hyperplane.

Here's an explanation of the role of support vectors in SVM with an example:

**Example: Binary Classification**

Let's say you have a binary classification problem where you want to classify emails
as either spam or non-spam based on two features: the number of words in the email
and the number of links in the email. You have a dataset with labeled examples.

1. **Data Preparation**: Your dataset consists of several email examples, and you
represent them as points in a two-dimensional feature space (number of words, number of links).

2. **Training SVM**: When you train an SVM on this data, the algorithm's objective is 
to find the hyperplane that best separates the two classes (spam and non-spam)
while maximizing the margin between them. The margin is the distance between the 
hyperplane and the nearest data points from each class.

3. **Support Vectors**: Support vectors are the data points from each class that lie
closest to the decision boundary. These are the data points that are the most challenging
to classify correctly and have the smallest margin. They are essentially the "support" 
for the decision boundary. In other words, if you were to move or remove any of these
support vectors, the position and orientation of the hyperplane would change.

4. **Optimal Hyperplane**: The hyperplane found by the SVM is positioned in such
a way that it maximizes the margin between the support vectors of the two classes. 
This results in the most robust decision boundary that generalizes well to unseen data.

5. **Classification**: When you want to classify a new email as spam or non-spam, 
you can do so by checking on which side of the hyperplane the new data point falls.
If it's on the same side as the support vectors of the spam class, it's classified
as spam; otherwise, it's classified as non-spam.

In summary, support vectors are the data points that are closest to the decision boundary 
in an SVM. They are critical because they define the position and orientation of the
hyperplane, which, in turn, determines the algorithm's ability to classify new, 
unseen data accurately. SVM aims to maximize the margin between these support
vectors to create a robust classifier. 












Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in
SVM?



Ans:
    Support Vector Machines (SVM) are a class of supervised machine learning algorithms used
    for classification and regression tasks. They work by finding a hyperplane that best 
    separates the data into different classes. In SVM, there are different concepts related to hyperplanes 
    and margins, including the hyperplane itself, marginal plane, soft margin, and hard margin. Let's
    illustrate these concepts with examples and graphs.

1. **Hyperplane**:
   - A hyperplane is a decision boundary that separates data points of one class from
another in a binary classification problem.
   - In a 2D space, a hyperplane is a straight line. In higher dimensions, it's 
    a flat affine subspace.
   - The equation of a hyperplane in 2D is given by: `w0 + w1*x1 + w2*x2 = 0`, where 
`w0`, `w1`, and `w2` are the coefficients of the hyperplane, and `x1` and `x2` are the features.

   Example:
   Consider a simple 2D classification problem with two classes (blue and red points).
The hyperplane that separates the two classes is shown in the graph below as a straight line.

  import matplotlib.pyplot as plt

from sklearn import svm
from sklearn.datasets import make_blobs
from sklearn.inspection import DecisionBoundaryDisplay

# we create 70 separable points
X, y = make_blobs(n_samples=50, centers=2, random_state=6)

# fit the model, don't regularize for illustration purposes
clf = svm.SVC(kernel="linear", C=1000)
clf.fit(X, y)

plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)

# plot the decision function
ax = plt.gca()
DecisionBoundaryDisplay.from_estimator(
    clf,
    X,
    plot_method="contour",
    colors="k",
    levels=[-1, 0, 1],
    alpha=0.5,
    linestyles=["--", "-", "--"],
    ax=ax,
)
# plot support vectors
ax.scatter(
    clf.support_vectors_[:, 0],
    clf.support_vectors_[:, 1],
    s=100,
    linewidth=1,
    facecolors="none",
    edgecolors="k",
)
plt.show() 
   
 
    
    
2. **Marginal Plane**:
   - The marginal plane in SVM is the hyperplane that is closest to the data points of both
classes, ensuring a maximum margin between the hyperplane and the data points.
   - This plane is important because it defines the margin, which is the distance between 
    the marginal plane and the nearest data point (support vector).

   Example:
   In the same 2D problem, the marginal plane is the one that maximizes the distance between
itself and the nearest data point from each class. The support vectors 
(points closest to the marginal plane) are shown as bold points in the graph below:

   
3. **Hard Margin SVM**:
   - In a hard margin SVM, the goal is to find a hyperplane that perfectly separates 
the two classes without any misclassification.
   - This is only possible when the data is linearly separable, meaning a hyperplane
    can completely separate the two classes without errors.
Example:
A  hard margin SVM with linearly separable data.
The hyperplane perfectly separates the two classes, and there are no misclassified points:


4. **Soft Margin SVM**:
   - In cases where the data is not linearly separable or when we want to allow some
misclassification, we use a soft margin SVM.
   - Soft margin SVM introduces a parameter (C) that controls the trade-off between 
    maximizing the margin and allowing some misclassification. A smaller C allows 
    more misclassification but a wider margin, while a larger C allows
    fewer misclassifications but a narrower margin.

   Example:
    The data is not linearly separable, so a hard margin SVM is not feasible.
    Instead, a soft margin SVM is used with a margin that allows for some misclassification:

   

import matplotlib.pyplot as plt
import numpy as np

from sklearn import svm

# we create 80 separable points
np.random.seed(0)
X = np.r_[np.random.randn(80, 2) - [2, 2], np.random.randn(80, 2) + [2, 2]]
Y = [0] * 80 + [1] * 80

# figure number
fignum = 1

# fit the model
for name, penalty in (("unreg", 1), ("reg", 0.05)):
    clf = svm.SVC(kernel="linear", C=penalty)
    clf.fit(X, Y)

    # get the separating hyperplane
    w = clf.coef_[0]
    a = -w[0] / w[1]
    xx = np.linspace(-5, 5)
    yy = a * xx - (clf.intercept_[0]) / w[1]

    # plot the parallels to the separating hyperplane that pass through the
    # support vectors (margin away from hyperplane in direction
    # perpendicular to hyperplane). This is sqrt(1+a^2) away vertically in
    # 2-d.
    margin = 1 / np.sqrt(np.sum(clf.coef_**2))
    yy_down = yy - np.sqrt(1 + a**2) * margin
    yy_up = yy + np.sqrt(1 + a**2) * margin

    # plot the line, the points, and the nearest vectors to the plane
    plt.figure(fignum, figsize=(4, 3))
    plt.clf()
    plt.plot(xx, yy, "k-")
    plt.plot(xx, yy_down, "k--")
    plt.plot(xx, yy_up, "k--")

    plt.scatter(
        clf.support_vectors_[:, 0],
        clf.support_vectors_[:, 1],
        s=80,
        facecolors="none",
        zorder=10,
        edgecolors="k",
        cmap=plt.get_cmap("RdBu"),
    )
    plt.scatter(
        X[:, 0], X[:, 1], c=Y, zorder=10, cmap=plt.get_cmap("RdBu"), edgecolors="k"
    )

    plt.axis("tight")
    x_min = -4.8
    x_max = 4.2
    y_min = -6
    y_max = 6

    YY, XX = np.meshgrid(yy, xx)
    xy = np.vstack([XX.ravel(), YY.ravel()]).T
    Z = clf.decision_function(xy).reshape(XX.shape)

    # Put the result into a contour plot
    plt.contourf(XX, YY, Z, cmap=plt.get_cmap("RdBu"), alpha=0.5, linestyles=["-"])

    plt.xlim(x_min, x_max)
    plt.ylim(y_min, y_max)

    plt.xticks(())
    plt.yticks(())
    fignum = fignum + 1

plt.show()


In summary, SVM aims to find a hyperplane that separates data into different classes.
The marginal plane defines the maximum margin between the hyperplane and the support
vectors. Hard margin SVM requires perfect separation, while soft margin SVM allows 
some misclassification with a trade-off controlled by the parameter C.
    
    
    
    
    
    
    
    
    
    
    
    
    Q6. SVM Implementation through Iris dataset.

~ Load the iris dataset from the scikit-learn library and split it into a training set and a testing setl
~ Train a linear SVM classifier on the training set and predict the labels for the testing setl
~ Compute the accuracy of the model on the testing setl
~ Plot the decision boundaries of the trained model using two of the featuresl
~ Try different values of the regularisation parameter C and see how it affects the performance of
the model.


Ans:
    
    
    To implement SVM on the Iris dataset using scikit-learn, 
    you can follow these steps. First, make sure you have scikit-learn installed. You can install it using pip 
    if you don't have it already:
    This code does the following:

Loads the Iris dataset from scikit-learn.

Splits the dataset into a training set and a testing set (80% training, 20% testing).

Trains a linear SVM classifier on the training set.

Predicts labels for the testing set and computes the accuracy of the model.

Plots the decision boundaries using the first two features (Sepal length and Sepal width).

You can experiment with different values of the regularization parameter C to see 
how it affects the model's performance. Adjust the C parameter in the SVC constructor 
to test different values.
A smaller C value allows for more margin violations and may 
result in a simpler model, while a larger C value enforces a stricter margin
and may lead to a more complex model.

pip install scikit-learn

# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the dataset into a training set and a testing set (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a linear SVM classifier
def train_svm_classifier(C):
    svm_classifier = SVC(kernel='linear', C=C)
    svm_classifier.fit(X_train[:, [0, 1]], y_train)
    return svm_classifier

# Predict the labels for the testing set
def predict_labels(svm_classifier, X_test):
    return svm_classifier.predict(X_test)

# Compute the accuracy of the model on the testing set
def compute_accuracy(y_true, y_pred):
    return accuracy_score(y_true, y_pred)

# Plot the decision boundaries using two of the features
def plot_decision_boundaries(svm_classifier, X, y, feature_indices, C):
    h = .02  # step size in the mesh

    x_min, x_max = X[:, feature_indices[0]].min() - 1, X[:, feature_indices[0]].max() + 1
    y_min, y_max = X[:, feature_indices[1]].min() - 1, X[:, feature_indices[1]].max() + 1

    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    Z = svm_classifier.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    plt.figure(figsize=(8, 6))
    plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
    plt.scatter(X[:, feature_indices[0]], X[:, feature_indices[1]], c=y, cmap=plt.cm.coolwarm)
    plt.xlabel(iris.feature_names[feature_indices[0]])
    plt.ylabel(iris.feature_names[feature_indices[1]])
    plt.title(f"SVM Decision Boundaries (C={C})")
    plt.show()

# Try different values of the regularization parameter C
C_values = [0.01, 0.1, 1, 10, 100]

for C in C_values:
    svm_classifier = train_svm_classifier(C)
    y_pred = predict_labels(svm_classifier, X_test[:, [0, 1]])
    accuracy = compute_accuracy(y_test, y_pred)
    print(f"Accuracy (C={C}): {accuracy:.2f}")
    plot_decision_boundaries(svm_classifier, X_train, y_train, [0, 1], C)
    
    
    
    
    
    
    
    
    
    
    
    
Bonus task: Implement a linear SVM classifier from scratch using Python and compare its
performance with the scikit-learn implementation.



Ans:
    Now, let's implement a basic linear SVM from scratch and compare it with scikit-learn's SVM:
    pip install scikit-learn

    import numpy as np
from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Create a simple dataset for binary classification
X, y = make_classification(n_samples=100, n_features=2, n_informative=2, n_redundant=0, random_state=42)

# Custom Linear SVM Implementation
class LinearSVM:
    def __init__(self, lr=0.01, epochs=1000):
        self.lr = lr
        self.epochs = epochs

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.w = np.zeros(n_features)
        self.b = 0

        for _ in range(self.epochs):
            for i in range(n_samples):
                if y[i] * (np.dot(X[i], self.w) - self.b) >= 1:
                    self.w -= self.lr * (2 * 1 / self.epochs * self.w)
                else:
                    self.w -= self.lr * (2 * 1 / self.epochs * self.w - np.dot(X[i], y[i]))
                    self.b -= self.lr * y[i]

    def predict(self, X):
        return np.sign(np.dot(X, self.w) - self.b)

# Split the dataset into training and testing sets
X_train, X_test = X[:80], X[80:]
y_train, y_test = y[:80], y[80:]

# Train the custom SVM classifier
custom_svm = LinearSVM()
custom_svm.fit(X_train, y_train)

# Predict using the custom SVM classifier
y_pred_custom = custom_svm.predict(X_test)

# Train and predict using scikit-learn's SVM classifier
sklearn_svm = SVC(kernel='linear')
sklearn_svm.fit(X_train, y_train)
y_pred_sklearn = sklearn_svm.predict(X_test)

# Compare accuracy
accuracy_custom = accuracy_score(y_test, y_pred_custom)
accuracy_sklearn = accuracy_score(y_test, y_pred_sklearn)

print(f"Custom SVM Accuracy: {accuracy_custom}")
print(f"Scikit-learn SVM Accuracy: {accuracy_sklearn}")


This code creates a simple dataset, implements a basic linear SVM from scratch using the SMO algorithm,
and compares its performance with scikit-learn's SVM classifier. 
    
    
