In [None]:
#Q1. What is the mathematical formula for a linear SVM?

In [None]:
'''
The mathematical formula for a linear Support Vector Machine (SVM) is:
y = w^T * x + b

Where:
y: The predicted class label (-1 or 1).
w: The weight vector, which determines the orientation of the separating hyperplane.
x: The input feature vector.
b: The bias term, which determines the offset of the hyperplane.
The goal of the linear SVM is to find the hyperplane that maximizes the margin between the two classes. 

This is achieved by minimizing the following objective function:   
minimize ||w||^2
subject to: y_i * (w^T * x_i + b) >= 1, for all i

Where:

||w||^2: The squared norm of the weight vector, which is proportional to the inverse of the margin.
y_i: The true class label for the i-th training example.
x_i: The input feature vector for the i-th training example.
The constraint ensures that all training examples are correctly classified and have a margin of at least 1.'''

In [None]:
#Q2. What is the objective function of a linear SVM?

In [None]:
'''
The objective function of a linear SVM is:
minimize ||w||^2
subject to: y_i * (w^T * x_i + b) >= 1, for all i

Where:

||w||^2: The squared norm of the weight vector, which is proportional to the inverse of the margin.
y_i: The true class label for the i-th training example.
x_i: The input feature vector for the i-th training example.
b: The bias term.
The objective is to minimize the squared norm of the weight vector while ensuring that all training examples are correctly classified with
a margin of at least 1. This leads to finding the hyperplane that maximizes the margin between the two classes.'''

In [None]:
#Q3. What is the kernel trick in SVM?

In [None]:
'''
The kernel trick is a mathematical technique used in Support Vector Machines (SVMs) to transform the original input space into 
a higher-dimensional feature space. This transformation can help to create a more linearly separable boundary between the classes, 
even if the original data is not linearly separable.

How it works:

Kernel function: A kernel function is chosen to compute the inner product between two data points in the transformed feature space without explicitly calculating the transformed features.
Kernel matrix: A kernel matrix is constructed, where each element represents the inner product between two data points in the transformed feature space.
SVM optimization: The SVM optimization problem is formulated using the kernel matrix instead of the original input features.

Commonly used kernel functions:

Linear kernel: K(x, y) = x^T * y
Polynomial kernel: K(x, y) = (x^T * y + c)^d
Radial basis function (RBF) kernel: K(x, y) = exp(-gamma * ||x - y||^2)
Sigmoid kernel: K(x, y) = tanh(gamma * x^T * y + c)

Benefits of the kernel trick:

Non-linear separability: It allows SVMs to handle non-linearly separable data.
Computational efficiency: It avoids the explicit calculation of high-dimensional features, which can be computationally expensive.
Flexibility: A variety of kernel functions can be used to explore different feature spaces.'''

In [None]:
#Q4. What is the role of support vectors in SVM Explain with example

In [None]:
'''
Support Vectors in SVM

Support vectors are a subset of training examples that lie on the margin or boundary between the two classes in a Support Vector Machine (SVM).
These points play a crucial role in determining the orientation and position of the separating hyperplane.

Why are they important?

Define the margin: The distance between the separating hyperplane and the nearest support vectors on either side defines the margin. A larger margin generally indicates better generalization performance.
Determine the model: The SVM model is defined solely by the support vectors and their corresponding labels. Non-support vectors have no influence on the model.
Computational efficiency: The number of support vectors directly affects the computational complexity of the SVM. Fewer support vectors lead to faster training and prediction.

Key points about support vectors:

Margin maximization: SVMs aim to maximize the margin between the classes, which is determined by the support vectors.
Sparsity: SVMs often have a sparse representation, meaning that only a small subset of training examples (support vectors) are needed to define the model.
Sensitivity: Support vectors can be sensitive to outliers or noise in the data, which can affect the model's performance.'''

In [None]:
#Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in SVM?

In [None]:
'''
SVM: Hyperplane, Marginal Plane, Soft Margin, and Hard Margin

Hyperplane:

A hyperplane is a decision boundary that separates the data points into two classes.
In a linear SVM, the hyperplane is a linear equation of the form: w^T * x + b = 0.
It is oriented perpendicular to the weight vector w.

Marginal Plane:

The marginal planes are parallel to the hyperplane and define the boundaries of the margin.
They are located at a distance of 1/||w|| from the hyperplane.
The goal of SVM is to maximize the margin between the marginal planes.

Hard Margin:

A hard margin SVM requires all training examples to be correctly classified with a margin of at least 1.
This means that there must be no overlap between the classes and the marginal planes.
Hard margin SVMs are sensitive to outliers and may not be suitable for noisy data.

Soft Margin:

A soft margin SVM allows for some misclassifications in order to achieve a larger margin.
This is achieved by introducing a slack variable ξ_i for each training example.
The objective function is modified to include a penalty term for misclassifications.

Visualization

In the image:

The blue and red points represent the two classes.
The green line is the hyperplane.
The dashed lines are the marginal planes.
The points on the marginal planes are the support vectors.

Hard Margin:

All data points are correctly classified and lie on or outside the marginal planes.
The margin is maximized.

Soft Margin:

Some data points may be misclassified, but the overall margin is still maximized.
The slack variables allow for a trade-off between margin maximization and misclassification.

Key Points:

The hyperplane determines the decision boundary.
The marginal planes define the margin.
A hard margin SVM requires all data points to be correctly classified.
A soft margin SVM allows for some misclassifications to achieve a larger margin.
The choice between hard and soft margin depends on the data and the desired trade-off between accuracy and robustness. '''

In [None]:
'''Q6. SVM Implementation through Iris dataset.
~ Load the iris dataset from the scikit-learn library and split it into a training set and a testing set.
~ Train a linear SVM classifier on the training set and predict the labels for the testing set.
~ Compute the accuracy of the model on the testing set.
~ Plot the decision boundaries of the trained model using two of the features.
~ Try different values of the regularisation parameter C and see how it affects the performance of
  the model.'''

In [None]:
'''
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a linear SVM classifier
svm = SVC(kernel='linear')

# Train the model on the training set
svm.fit(X_train, y_train)

# Predict the labels for the testing set
y_pred = svm.predict(X_test)

# Compute the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Visualize the decision boundaries (using the first two features)
h = 0.02  # Step size in the mesh
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = svm.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Spectral)
plt.title('SVM Decision Boundaries')
plt.show()

# Try different values of the regularization parameter C
C_values = [0.1, 1, 10]
for C in C_values:
    svm = SVC(kernel='linear', C=C)
    svm.fit(X_train, y_train)
    y_pred = svm.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"C={C}, Accuracy:", accuracy)   '''