In [None]:
'''
Q1. The mathematical formula for a linear SVM is f(x) = (w dot x + b).

Q2. The objective function of a linear SVM is to maximize the margin, which can be written as:
minimize (1/2) ||w||^2 subject to y_i(w dot x_i + b) >= 1 for all i = 1, ..., n

Here, ||w|| is the Euclidean norm of the weight vector, y_i is the label of the ith training example,
and x_i is the corresponding feature vector.

Q3. The kernel trick in SVM is used to implicitly map the input data into a higher-dimensional 
feature space without actually computing the coordinates of the data in that space. 
This is done by replacing the dot product w dot x between the weight vector and the input vector
with a kernel function K(x, x') that computes the similarity between the two vectors:
f(x) = sign(sigma_i alpha_i y_i K(x, x_i) + b)
Here, alpha_i are the coefficients of the support vectors, which are the training examples that lie
closest to the decision boundary.

Q4. The role of support vectors in SVM is to define the decision boundary and determine the margin.
Only the support vectors have non-zero coefficients alpha_i, and they lie on or close to the margin. 
An example of support vectors in a binary classification problem is shown in the following formula:

f(x) = sign(sigma_i alpha_i y_i K(x, x_i) + b)

Here, the support vectors are the data points with non-zero alpha_i values, and they lie on
the margin or on the wrong side of the margin.

Q5. The following are the formulas for the different types of SVM margins:

Hyperplane: The hyperplane is the decision boundary of the SVM and can be expressed as w dot x + b = 0.
Marginal plane: The marginal plane is parallel to the hyperplane and is located at a distance of 1/||w||
from the hyperplane on both sides.
Soft margin: The soft margin allows for some misclassifications by introducing slack variables eta_i
for each training example, which measure the degree of misclassification.
The objective function for the soft margin SVM can be written as:
minimize (1/2) ||w||^2 + C sigma_i eta_i subject to y_i(w dot x_i + b) >= 1 - eta_i 
for all i = 1, ..., n and ξ_i >= 0 for all i = 1, ..., n

Here, C is a hyperparameter that controls the tradeoff between maximizing the margin and minimizing 
the misclassifications.

Hard margin: The hard margin SVM requires that all training examples are correctly classified,
and there are no slack variables. The objective function for the hard margin SVM can be written as:
minimize (1/2) ||w||^2 subject to y_i(w dot x_i + b) >= 1 for all i = 1, ..., n
'''

In [4]:
from warnings import filterwarnings
filterwarnings('ignore')

In [7]:
# Q6

In [6]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import numpy as np

# Load the iris dataset
iris = load_iris()

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

# Train a linear SVM classifier on the training set
svm = LinearSVC()
svm.fit(X_train, y_train)

# Predict the labels for the testing set
y_pred = svm.predict(X_test)

# Compute the accuracy of the model on the testing set
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Try different values of the regularisation parameter C and see how it affects the performance of the model
for C in [0.1, 1, 10]:
    svm = LinearSVC(C=C)
    svm.fit(X_train, y_train)
    y_pred = svm.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"C = {C}, Accuracy = {accuracy}")


Accuracy: 1.0
C = 0.1, Accuracy = 1.0
C = 1, Accuracy = 1.0
C = 10, Accuracy = 1.0
