Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?
In machine learning, kernel functions allow algorithms to operate in a high-dimensional feature space without explicitly computing the coordinates of the data in that space. Polynomial functions are a specific type of kernel function that can be used to map the original features into a higher-dimensional space.

The polynomial kernel function is defined as:

𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
=
(
𝑥
𝑖
𝑇
𝑥
𝑗
+
𝑐
)
𝑑
K(x 
i
​
 ,x 
j
​
 )=(x 
i
T
​
 x 
j
​
 +c) 
d
 

where:

𝑥
𝑖
x 
i
​
  and 
𝑥
𝑗
x 
j
​
  are input feature vectors.
𝑐
c is a constant term that controls the influence of higher-order versus lower-order terms.
𝑑
d is the degree of the polynomial.
This kernel function allows the SVM to create non-linear decision boundaries by transforming the data into a higher-dimensional space where a linear separator might be more effective.

In [1]:
#Q2

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = load_iris()
X = iris.data[:, :2]  # Using only the first two features for visualization
y = iris.target

# We will only use the first two classes for binary classification
X = X[y != 2]
y = y[y != 2]

# Split the dataset into a training set and a testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create an SVM with a polynomial kernel
clf = SVC(kernel='poly', degree=3, C=1.0)
clf.fit(X_train, y_train)

# Predict the labels of the testing set
y_pred = clf.predict(X_test)

# Evaluate the performance of the classifier
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")




Accuracy: 1.0


Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?
In Support Vector Regression (SVR), the epsilon parameter (
𝜖
ϵ) defines a margin of tolerance where no penalty is given to errors. When 
𝜖
ϵ increases, the margin of tolerance becomes wider, allowing more data points to fall within this margin without contributing to the loss function. As a result, the number of support vectors typically decreases because fewer data points are outside the epsilon margin and thus influence the model.



Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?
Kernel function: The choice of kernel (e.g., linear, polynomial, RBF) determines the way the input space is transformed.

Use a linear kernel for linearly separable data.
Use a polynomial kernel for data with polynomial relationships.
Use an RBF kernel for complex data patterns.
C parameter: The regularization parameter 
𝐶
C controls the trade-off between achieving a low error on the training data and minimizing the model complexity.

A high C value means fewer margins and less tolerance for misclassified points, which can lead to overfitting.
A low C value allows more flexibility, which can lead to underfitting.
Epsilon parameter: The epsilon parameter (
𝜖
ϵ) in SVR defines the margin within which no penalty is given to errors.

A large epsilon creates a wider margin, which can reduce the number of support vectors and might underfit the data.
A small epsilon results in a narrower margin, potentially increasing the number of support vectors and risk overfitting.
Gamma parameter: In kernels like RBF and polynomial, the gamma parameter defines how far the influence of a single training example reaches.

A low gamma means a larger influence, which can lead to underfitting.
A high gamma means a smaller influence, which can lead to overfitting.

In [2]:
##Q5

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import GridSearchCV
import joblib

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into a training set and a testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


In [3]:
# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


In [9]:

svc = SVC(kernel='linear', C=1.0)

# Train the classifier on the training data
svc.fit(X_train, y_train)

# Predict the labels of the testing data
y_pred = svc.predict(X_test)

# Evaluate the performance of the classifier
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1-Score: {f1}")

# Define the parameter grid for hyperparameter tuning
param_grid = {
    'C': [0.1, 1, 10, 100],
    'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
    'gamma': ['scale', 'auto']
}

# Create a GridSearchCV instance
grid_search = GridSearchCV(SVC(), param_grid, cv=5, scoring='accuracy', n_jobs=-1)

# Fit the GridSearchCV to the training data
grid_search.fit(X_train, y_train)

# Print the best parameters
print(f"Best parameters: {grid_search.best_params_}")

# Train the tuned classifier on the entire dataset
best_svc = grid_search.best_estimator_
best_svc.fit(X_train, y_train)

# Save the trained classifier to a file
joblib.dump(best_svc, 'best_svc_model.joblib')

# To load the model in the future, use:
# best_svc = joblib.load('best_svc_model.joblib')



Accuracy: 0.9777777777777777
Precision: 0.9793650793650793
Recall: 0.9777777777777777
F1-Score: 0.9777448559670783
Best parameters: {'C': 10, 'gamma': 'scale', 'kernel': 'linear'}


['best_svc_model.joblib']