# Q1. What is the mathematical formula for a linear SVM?

A Support Vector Machine (SVM) is a machine learning algorithm that can be used for both classification and regression analysis. In the case of a linear SVM, the mathematical formula for the decision boundary is given by:

w^T x + b = 0

where w is a vector of weights, x is a vector of input features, b is a bias term, and the superscript T denotes the transpose operation.

To classify a new input vector, the SVM calculates the signed distance of the input vector to the decision boundary, which is given by:

f(x) = w^T x + b

If f(x) is positive, the input vector is classified as belonging to the positive class, and if it is negative, the input vector is classified as belonging to the negative class.

The weights w and bias b are learned from the training data by minimizing the objective function:

minimize 1/2 ||w||^2 + C Σ_i ξ_i
subject to y_i (w^T x_i + b) ≥ 1 - ξ_i, ξ_i ≥ 0

where ||w||^2 is the squared norm of the weight vector, C is a hyperparameter that controls the trade-off between maximizing the margin and minimizing the classification error, y_i is the class label of the i-th training example, and ξ_i is a slack variable that allows for some misclassification of the training examples.

# Q2. What is the objective function of a linear SVM?

The objective function of a linear Support Vector Machine (SVM) is to find the optimal hyperplane that separates the two classes in the training data with maximum margin. The objective function can be expressed as follows:

minimize 1/2 ||w||^2 + C Σ_i ξ_i

subject to y_i (w^T x_i + b) ≥ 1 - ξ_i, ξ_i ≥ 0

where w is a vector of weights, b is a bias term, and x_i and y_i represent the input features and output labels, respectively, of the i-th training example. The term ||w||^2 represents the squared norm of the weight vector and is used to maximize the margin between the decision boundary and the closest points from each class. The constant C is a hyperparameter that controls the trade-off between maximizing the margin and minimizing the classification error.

The second part of the objective function is the regularization term that controls the number of misclassifications allowed in the training data. The slack variable ξ_i is introduced to allow for some misclassification of the training examples. The quantity CΣ_i ξ_i represents the penalty for misclassification, and by minimizing this term, the SVM tries to ensure that the misclassification is kept to a minimum while still maximizing the margin.

The optimization problem is solved using quadratic programming techniques to obtain the optimal values of w and b that define the decision boundary.







# Q3. What is the kernel trick in SVM?

The kernel trick in Support Vector Machines (SVMs) is a technique that allows SVMs to handle nonlinearly separable data without explicitly transforming the input features into a higher-dimensional space.

The basic idea of the kernel trick is to introduce a kernel function that computes the dot product of the input feature vectors in a higher-dimensional space, without actually computing the feature vectors themselves. This allows the SVM to implicitly work with the data in the higher-dimensional space, while still operating in the original input space.

In other words, the kernel function defines a similarity measure between pairs of input feature vectors, which is used by the SVM to find a decision boundary that maximizes the margin between the classes. The kernel function maps the input features into a higher-dimensional space, where it may be possible to find a linear decision boundary that separates the classes.

The most commonly used kernel functions are the radial basis function (RBF) kernel, the polynomial kernel, and the linear kernel. The RBF kernel is often used for its flexibility and ability to capture complex patterns in the data, while the polynomial kernel is used for its simplicity and ability to capture nonlinear relationships in the data. The linear kernel is used when the data is linearly separable.

The kernel trick allows SVMs to handle high-dimensional data and nonlinear decision boundaries, without incurring the computational cost of explicitly transforming the input features into a higher-dimensional space.

# Q4. What is the role of support vectors in SVM Explain with example 

Support vectors are the data points that lie closest to the decision boundary or the margin in a Support Vector Machine (SVM). These are the data points that are most critical to the performance of the SVM, as they determine the position and orientation of the decision boundary.

The role of support vectors in SVM can be illustrated with the help of an example. Consider a simple two-dimensional dataset with two classes that are not linearly separable in the input space. The SVM tries to find a linear decision boundary that maximizes the margin between the classes. However, since the classes are not linearly separable, the SVM uses a kernel function to map the input data into a higher-dimensional space where it may be possible to find a linear decision boundary.

During the training process, the SVM identifies the support vectors that lie closest to the decision boundary or the margin. These support vectors are the data points that are most difficult to classify, as they lie closest to the decision boundary. The SVM uses these support vectors to define the decision boundary and compute the margins.

Once the decision boundary is defined, the SVM can be used to classify new input data points. The SVM calculates the distance of each new input data point from the decision boundary and assigns it to the class that is closest to it. The support vectors play a crucial role in this process, as they determine the position and orientation of the decision boundary.

# Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in SVM?

- (1) Hyperplane:

In SVM, the hyperplane is the decision boundary that separates the two classes in the feature space. It is a linear classifier that tries to find the best hyperplane that separates the two classes with the maximum margin. The hyperplane is defined by the weight vector w and the bias term b.

Example: Consider a 2D dataset with two classes, labeled as red and blue. The following graph shows the hyperplane that separates the two classes using a linear kernel.

- (2) Marginal plane:
The marginal plane is a plane parallel to the hyperplane that separates the support vectors from the rest of the training data. The distance between the marginal plane and the hyperplane is called the margin. In other words, the marginal plane is the boundary that encloses the support vectors.

Example: The following graph shows the hyperplane and the marginal plane for the same 2D dataset. The blue and red points are the support vectors. The distance between the hyperplane and the marginal plane is the margin.


- (3) Soft margin:
The concept of soft margin is used when the training data is not linearly separable. In soft margin SVM, the margin is allowed to be violated by some training data points, but the violations are penalized. The soft margin SVM uses a slack variable to allow some misclassifications. The amount of misclassification is controlled by a hyperparameter called C.

Example: The following graph shows a 2D dataset that is not linearly separable. The soft margin SVM finds the best hyperplane that separates the two classes while allowing some misclassifications. The blue and red points represent the support vectors, while the dotted line represents the margin.

- (4) Hard margin:
The concept of hard margin is used when the training data is linearly separable. In hard margin SVM, no misclassification is allowed, and the margin is maximized. The hard margin SVM does not use a slack variable and tries to find the best hyperplane that separates the two classes without any misclassifications.

Example: The following graph shows a 2D dataset that is linearly separable. The hard margin SVM finds the best hyperplane that separates the two classes without any misclassifications. The blue and red points represent the support vectors, while the dotted line represents the margin.


# Q6. SVM Implementation through Iris dataset.
# ~ Load the iris dataset from the scikit-learn library and split it into a training set and a testing setl
# ~ Train a linear SVM classifier on the training set and predict the labels for the testing setl
# ~ Compute the accuracy of the model on the testing setl
# ~ Plot the decision boundaries of the trained model using two of the featuresl
# ~ Try different values of the regularisation parameter C and see how it affects the performance of the model.

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = load_iris()

# Split the dataset into a training set and a testing set
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)

# Train a linear SVM classifier on the training set
svm_clf = SVC(kernel='linear', C=1)
svm_clf.fit(X_train, y_train)

# Predict the labels for the testing set
y_pred = svm_clf.predict(X_test)

# Compute the accuracy of the model on the testing set
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy of SVM classifier on Iris dataset: {:.2f}%".format(accuracy*100))

# Plot the decision boundaries of the trained model using two of the features
# We choose Sepal Length and Sepal Width as the two features to plot
X = iris.data[:, :2]  # Only the first two features
y = iris.target

# Plot the decision boundaries
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))
Z = svm_clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.title('Decision Boundary for Iris Dataset using SVM')
plt.show()
