# Q1. What is the mathematical formula for a linear SVM?

## A linear SVM (Support Vector Machine) is a type of SVM that uses a linear hyperplane to separate two classes of data. The mathematical formula for a linear SVM can be expressed as follows:
## Given a training set of input vectors X and corresponding class labels y, where X = {x1, x2, ..., xn}, xi ∈ R^d and y ∈ {-1, +1}, the linear SVM seeks to find a hyperplane in the d-dimensional input space that separates the two classes.
## The hyperplane is defined by the equation: w^T x + b = 0
## where w is the weight vector perpendicular to the hyperplane, and b is the bias term.
## The distance between the hyperplane and the closest point in each class is called the margin. The objective of the linear SVM is to maximize the margin while minimizing the classification error.
## This can be formulated as the optimization problem: minimize ||w||^2/2 subject to y_i(w^T x_i + b) >= 1 for all i
## where ||w|| is the Euclidean norm of the weight vector w.
## This is a convex optimization problem, and can be solved using quadratic programming techniques.





# Q2. What is the objective function of a linear SVM?

## The objective function of a linear SVM (Support Vector Machine) is to find the hyperplane that maximizes the margin between two classes of data while minimizing the classification error.
## The margin is defined as the distance between the hyperplane and the closest point in each class. The hyperplane that maximizes the margin is chosen because it provides the best generalization performance on unseen data.
## The objective function of a linear SVM can be formulated as a constrained optimization problem: minimize ||w||^2/2
## subject to y_i(w^T x_i + b) >= 1 for all i
## where ||w|| is the Euclidean norm of the weight vector w, and b is the bias term. The constraint ensures that the hyperplane correctly classifies all training examples, with a margin of at least 1.
## The objective function seeks to minimize the norm of the weight vector while satisfying the classification constraint. By minimizing the norm of the weight vector, the objective function encourages the SVM to find a sparse solution, where only a small number of input features are used to separate the classes. This helps to avoid overfitting and improve generalization performance.

# Q3. What is the kernel trick in SVM?

## The kernel trick is a technique used in Support Vector Machines (SVMs) to transform the input data into a higher-dimensional space in order to find a better separating hyperplane. Instead of computing the dot product between input vectors directly, the kernel function computes the dot product between the vectors in a higher-dimensional feature space.
## The kernel function maps the input data into a higher-dimensional space where it may be more separable. This is done without actually computing the transformation to the higher-dimensional space explicitly, which can be computationally expensive or even impossible if the dimensionality is very high.
## The kernel trick allows the SVM to learn non-linear decision boundaries in the original input space, even though the optimization problem is formulated in terms of a linear hyperplane. This is achieved by replacing the dot product of input vectors with a kernel function that implicitly computes the dot product in a higher-dimensional space.
## The most commonly used kernel functions in SVMs are the linear kernel, polynomial kernel, and radial basis function (RBF) kernel. The choice of kernel function depends on the specific problem and the properties of the data.
## In summary, the kernel trick is a powerful technique that allows SVMs to handle complex data by transforming it into a higher-dimensional feature space without explicitly computing the transformation. This enables the SVM to find a better separating hyperplane and improve classification accuracy.

# Q4. What is the role of support vectors in SVM Explain with example

## Support vectors are the data points that lie closest to the separating hyperplane in Support Vector Machines (SVMs). These are the data points that have the largest margin, and they play a crucial role in the SVM's decision boundary.
## In SVM, the decision boundary is determined by the weights assigned to each feature in the input data. The support vectors are the training examples that are closest to the decision boundary and they determine the orientation of the hyperplane.
## During the training process, the SVM algorithm seeks to find the hyperplane that maximizes the margin between the two classes while correctly classifying all training examples. The margin is defined as the distance between the hyperplane and the closest point in each class. The training examples that lie on the margin or violate the classification constraint are not used in determining the orientation of the hyperplane.
## Once the optimal hyperplane is found, the support vectors are the data points that are closest to the hyperplane and provide the most information about the problem. These support vectors determine the position and orientation of the decision boundary and are used to make predictions on new data.
## For example, consider a binary classification problem where the input data is two-dimensional and consists of two classes that are not linearly separable. The SVM algorithm transforms the data into a higher-dimensional feature space using a kernel function, and finds the hyperplane that maximizes the margin between the two classes.
## The support vectors are the data points that lie closest to the decision boundary and determine the orientation of the hyperplane. These support vectors are crucial in predicting the class of new data points that lie close to the decision boundary.
## In summary, support vectors play a crucial role in SVMs by determining the position and orientation of the decision boundary. They are the training examples that are closest to the decision boundary and provide the most information about the problem.

# Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in SVM?

## 1. Hyperplane: In SVM, the hyperplane is a decision boundary that separates the input data into two classes. The hyperplane is defined as the plane that maximizes the margin between the two classes. The hyperplane can be linear or non-linear, depending on the kernel function used.

## 2. Marginal plane: The marginal plane is the plane parallel to the hyperplane that passes through the support vectors. The marginal plane is used to define the margin between the two classes.

## 3. Hard margin: In SVM, the hard margin refers to the case where the SVM algorithm requires that all training examples be correctly classified with a margin of at least 1. In other words, there is no tolerance for misclassification.

## 4. Soft margin: In SVM, the soft margin refers to the case where the SVM algorithm allows for some misclassification in order to find a better separating hyperplane. The goal is to find a hyperplane that maximizes the margin between the two classes while minimizing the number of misclassified examples.

# Q6. SVM Implementation through Iris dataset.
# Bonus task: Implement a linear SVM classifier from scratch using Python and compare its
# performance with the scikit-learn implementation.
# ~ Load the iris dataset from the scikit-learn library and split it into a training set and a testing set
# ~ Train a linear SVM classifier on the training set and predict the labels for the testing set
# ~ Compute the accuracy of the model on the testing set
# ~ Plot the decision boundaries of the trained model using two of the features
# ~ Try different values of the regularisation parameter C and see how it affects the performance of the model.

In [4]:
from sklearn.datasets import load_iris

In [5]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [6]:
data = load_iris()

In [7]:
print(data.DESCR)

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :

In [8]:
data.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [9]:
data.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [10]:
## Independent & dependent features 
X = data.data
y = data.target

In [11]:
X

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
       [4.9, 3

In [12]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [17]:
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)


In [18]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((100, 4), (50, 4), (100,), (50,))

In [19]:
# Train the SVM model
model = SVC(kernel='linear')
model.fit(X_train, y_train)

In [20]:
#prediction 
accuracy = model.score(X_test, y_test)
print("Accuracy:", accuracy)


Accuracy: 1.0


In [22]:
# regularization parameters
from sklearn.model_selection import cross_val_score
C_values = [0.1, 0.5, 1, 5, 10, 50, 100]

# Loop over different C values and compute cross-validation scores
for C in C_values:
    scores = cross_val_score(model, data.data, data.target, cv=5)
    print("C = {}, Mean accuracy = {:.2f}, Standard deviation = {:.2f}".format(
        C, scores.mean(), scores.std()))

C = 0.1, Mean accuracy = 0.98, Standard deviation = 0.02
C = 0.5, Mean accuracy = 0.98, Standard deviation = 0.02
C = 1, Mean accuracy = 0.98, Standard deviation = 0.02
C = 5, Mean accuracy = 0.98, Standard deviation = 0.02
C = 10, Mean accuracy = 0.98, Standard deviation = 0.02
C = 50, Mean accuracy = 0.98, Standard deviation = 0.02
C = 100, Mean accuracy = 0.98, Standard deviation = 0.02


# Different C values do not effect much on the machine.

In [23]:
# plotting of decision boundaries
X = iris.data[:, :2]
y = iris.target

# Train an SVM model
svm = SVC(kernel='linear', C=1)
svm.fit(X, y)

# Define a grid of points to plot the decision boundaries
x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02), np.arange(y_min, y_max, 0.02))
Z = svm.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot the decision boundaries and the data points
plt.contourf(xx, yy, Z, alpha=0.4)
plt.scatter(X[:, 0], X[:, 1], c=y, alpha=0.8)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Decision boundaries of SVM on iris dataset')
plt.show()
In this code, we first load the iris dataset using scikit-learn's load_iris function. We then choose the first two features of the dataset (X = iris.data[:, :2]) and the target variable (y = iris.target) for plotting.

We then train an SVM model with a linear kernel and C=1 using scikit-learn's SVC class, and fit it to the data using the fit method.

Next, we define a grid of points (xx and yy) using the Meshgrid function, and use the predict method of the trained SVM to assign a class label to each point on the grid. We reshape the predicted labels to the same shape as the grid (Z = Z.reshape(xx.shape)).

Finally, we use a contour plot (plt.contourf(xx, yy, Z, alpha=0.4)) to visualize the decision boundaries of the SVM model, and scatter plot (plt.scatter(X[:, 0], X[:, 1], c=y, alpha=0.8)) to plot the data points. We also add labels to the x-axis and y-axis, a title to the plot, and display the plot using plt.show().

The output of this code might look like this:

svm_decision_boundaries_iris.png

As we can see from the plot, the decision boundaries of the SVM model are linear, since we used a linear kernel. The plot shows how the SVM model separates the different classes of iris based on the values of the chosen features (feature 1 and feature 2). We can see that the model is able to separate the blue class (setosa) from the other two classes (versicolor and virginica) quite well, but there is some overlap between the green and







SyntaxError: unterminated string literal (detected at line 23) (1497170885.py, line 23)