# Q1. What is the mathematical formula for a linear SVM?

A linear support vector machine (SVM) is a type of binary classification algorithm that seeks to find a hyperplane that separates the input data into two classes. The mathematical formula for a linear SVM can be expressed as follows:

y = sign(w^T x + b)

y is the predicted class label (+1 or -1).

x is the input vector.

w is the weight vector.

b is the bias term.

# Q2. What is the objective function of a linear SVM?

The objective function of a linear SVM is to maximize the margin between the two classes while minimizing the classification error. This is typically achieved by solving a constrained optimization problem, where the objective function seeks to minimize the norm of the weight vector subject to the constraint that all training data is correctly classified and lies within a certain margin from the decision boundary.

# Q3. What is the kernel trick in SVM?

The kernel trick in SVM is a technique that allows the SVM to effectively classify non-linearly separable data by implicitly mapping the input data to a higher-dimensional feature space using a kernel function, without actually computing the coordinates of the data in that space. This makes it computationally efficient to apply the SVM to high-dimensional feature spaces, and enables the SVM to capture complex non-linear relationships between the input features. The kernel function effectively measures the similarity between pairs of input vectors in the high-dimensional space, allowing the SVM to find a non-linear decision boundary that separates the two classes. The most commonly used kernel functions are the linear kernel, polynomial kernel, radial basis function (RBF) kernel, and sigmoid kernel.

# Q4. What is the role of support vectors in SVM Explain with example

In SVM, support vectors are the training examples that lie closest to the decision boundary or hyperplane. These are the examples that determine the position and orientation of the hyperplane, and are therefore critical to the performance of the SVM.

Support vectors are important because they define the margin of the hyperplane, which is the distance between the hyperplane and the closest examples from either class. Maximizing the margin is a key goal of SVM, as it helps to improve the generalization performance of the model and reduce overfitting.

In [4]:
## Example
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_classification

In [5]:
X,y=make_classification(n_samples=1000,n_features=2,n_classes=2,
                        n_clusters_per_class=2,n_redundant=0)

In [6]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=10)

In [7]:
from sklearn.model_selection import GridSearchCV
 
# defining parameter range
param_grid = {'C': [0.1, 1, 10, 100, 1000],
              'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
              'kernel':['linear']
              }

In [9]:
from sklearn.svm import SVC

In [10]:
grid=GridSearchCV(SVC(),param_grid=param_grid,refit=True,cv=5,verbose=3)

In [11]:
grid.fit(X_train,y_train)

Fitting 5 folds for each of 25 candidates, totalling 125 fits
[CV 1/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.853 total time=   0.0s
[CV 2/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.853 total time=   0.0s
[CV 3/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.887 total time=   0.0s
[CV 4/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.867 total time=   0.0s
[CV 5/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.867 total time=   0.0s
[CV 1/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.853 total time=   0.0s
[CV 2/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.853 total time=   0.0s
[CV 3/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.887 total time=   0.0s
[CV 4/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.867 total time=   0.0s
[CV 5/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.867 total time=   0.0s
[CV 1/5] END ..C=0.1, gamma=0.01, kernel=linear;, score=0.853 total time=   0.0s
[CV 2/5] END ..C=0.1, gamma=0.01, kernel=linear

In [12]:
grid.best_params_

{'C': 0.1, 'gamma': 1, 'kernel': 'linear'}

In [13]:
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score

In [15]:
y_pred4=grid.predict(X_test)
print(classification_report(y_test,y_pred4))
print(confusion_matrix(y_test,y_pred4))
print(accuracy_score(y_test,y_pred4))

              precision    recall  f1-score   support

           0       0.88      0.87      0.87       131
           1       0.86      0.87      0.86       119

    accuracy                           0.87       250
   macro avg       0.87      0.87      0.87       250
weighted avg       0.87      0.87      0.87       250

[[114  17]
 [ 16 103]]
0.868


# Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in SVM?

Hyperplane: In SVM, the hyperplane is the decision boundary that separates the data into different classes. For example, in a binary classification problem with two input features, the hyperplane is a straight line in a two-dimensional space. In higher-dimensional spaces, the hyperplane is a hyperplane or a higher-dimensional plane.

Marginal plane: The marginal plane is the hyperplane that is parallel to the decision boundary and lies at a certain distance from it. In SVM, the margin is the distance between the hyperplane and the closest data points from each class. The marginal plane is defined by adding or subtracting a certain margin value from the hyperplane.

Hard margin: In a hard margin SVM, the goal is to find a hyperplane that completely separates the data into two classes with no errors. This works only if the data is linearly separable, i.e., there exists a hyperplane that can completely separate the two classes.

Soft margin: In a soft margin SVM, the goal is to find a hyperplane that maximizes the margin while allowing some errors in classification. This is useful when the data is not linearly separable, or when we want to allow some noise or outliers in the data. The soft margin SVM introduces a slack variable that allows some data points to be misclassified, but penalizes misclassification with a cost parameter C. The higher the value of C, the more the model will try to avoid misclassification.

# Q6. SVM Implementation through Iris dataset.

In [25]:
# Load the iris dataset from the scikit-learn library and split it into a training set and a testing setl
import pandas 
import numpy
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris

In [26]:
iris = load_iris()

In [27]:
iris

{'data': array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
  

In [29]:
X = iris.data
y = iris.target

In [30]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# Train a linear SVM classifier on the training set and predict the labels for the testing set

In [31]:
from sklearn.svm import SVC

In [32]:
svc=SVC(kernel='linear')

In [33]:
svc.fit(X_train,y_train)

In [34]:
svc.coef_

array([[-0.04631136,  0.52105578, -1.0030165 , -0.46411816],
       [-0.00641373,  0.17867392, -0.5389119 , -0.29158729],
       [ 0.54628096,  1.19553697, -1.92187359, -1.86235093]])

In [35]:
y_pred=svc.predict(X_test)

In [36]:
y_pred

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
       0, 1, 2, 2, 1, 2])

# Compute the accuracy of the model on the testing setl

In [37]:
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score

In [38]:
print(classification_report(y_test,y_pred))
print(confusion_matrix(y_test,y_pred))
print(accuracy_score(y_test,y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        15
           2       1.00      1.00      1.00        16

    accuracy                           1.00        50
   macro avg       1.00      1.00      1.00        50
weighted avg       1.00      1.00      1.00        50

[[19  0  0]
 [ 0 15  0]
 [ 0  0 16]]
1.0


# Try different values of the regularisation parameter C and see how it affects the performance of the model.

In [53]:
from sklearn.model_selection import GridSearchCV
 
# defining parameter range
param_grid = {'C': [0.1, 1, 10, 100, 1000],
              'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
              'kernel':['linear']
              }

In [54]:
grid=GridSearchCV(SVC(),param_grid=param_grid,refit=True,cv=5,verbose=3)

In [55]:
grid.fit(X_train,y_train)

Fitting 5 folds for each of 25 candidates, totalling 125 fits
[CV 1/5] END .....C=0.1, gamma=1, kernel=linear;, score=1.000 total time=   0.0s
[CV 2/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.900 total time=   0.0s
[CV 3/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.900 total time=   0.0s
[CV 4/5] END .....C=0.1, gamma=1, kernel=linear;, score=1.000 total time=   0.0s
[CV 5/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.900 total time=   0.0s
[CV 1/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=1.000 total time=   0.0s
[CV 2/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.900 total time=   0.0s
[CV 3/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.900 total time=   0.0s
[CV 4/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=1.000 total time=   0.0s
[CV 5/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.900 total time=   0.0s
[CV 1/5] END ..C=0.1, gamma=0.01, kernel=linear;, score=1.000 total time=   0.0s
[CV 2/5] END ..C=0.1, gamma=0.01, kernel=linear

In [56]:
grid.best_params_

{'C': 1, 'gamma': 1, 'kernel': 'linear'}

In [57]:
y_pred1=grid.predict(X_test)
print(classification_report(y_test,y_pred1))
print(confusion_matrix(y_test,y_pred1))
print(accuracy_score(y_test,y_pred1))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        15
           2       1.00      1.00      1.00        16

    accuracy                           1.00        50
   macro avg       1.00      1.00      1.00        50
weighted avg       1.00      1.00      1.00        50

[[19  0  0]
 [ 0 15  0]
 [ 0  0 16]]
1.0
