Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?

Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?   

Q5. Assignment:

Import the necessary libraries and load the dataset

Split the dataset into training and testing sets

Preprocess the data using any technique of your choice (e.g., scaling, normalization)

Create an instance of the SVC classifier and train it on the training data

Use the trained classifier to predict the labels of the testing data

Evaluate the performance of the classifier using any metric of your choice (e.g., accuracy, precision, recall, F1-score)

Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomizedSearchCV to improve its performance

Train the tuned classifier on the entire dataset


# Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?

The relationship between polynomial functions and kernel functions is that the polynomial kernel allows SVMs to create decision boundaries that are polynomial in nature, thus enabling the model to capture more complex patterns in the data.

# Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

In [None]:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Standardize the features
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create and train the SVM model with a polynomial kernel
svm_poly = SVC(kernel='poly', degree=3, C=1.0, coef0=1)
svm_poly.fit(X_train, y_train)

# Predict on the test set
y_pred = svm_poly.predict(X_test)

# Evaluate the model
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))


Confusion Matrix:
 [[19  0  0]
 [ 0 13  0]
 [ 0  0 13]]

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



# Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

In Support Vector Regression (SVR), the ϵ parameter defines a margin of tolerance within which no penalty is given to errors. It essentially controls a "tube" around the regression line, within which predictions are considered acceptable.

Effect of Increasing ϵ:

As you increase the value of ϵ, the margin of tolerance around the predicted regression line increases, allowing more data points to fall within this margin without being considered support vectors. This reduces the number of support vectors, as fewer points will violate the ϵ-tube.

# Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?  

Kernel: Choose based on the nature of your data.

C: Balance between fitting the training data and generalizing to new data.

Epsilon: Controls the tolerance of the regression model to errors.

Gamma: Influences the smoothness of the decision boundary.

Each parameter needs to be tuned based on the specific problem, often using cross-validation to find the best combination.

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report

# Step 1: Load the dataset (Breast Cancer dataset)
data = load_breast_cancer()
X = data.data
y = data.target

# Step 2: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Preprocess the data (Scaling)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Step 4: Create an instance of the SVC classifier and train it on the training data
svc = SVC(kernel='rbf', random_state=42)
svc.fit(X_train, y_train)

# Step 5: Use the trained classifier to predict the labels of the testing data
y_pred = svc.predict(X_test)

# Step 6: Evaluate the performance of the classifier
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print("Classification Report:\n", report)

# Step 7: Tune the hyperparameters of the SVC classifier using GridSearchCV
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [0.01, 0.1, 1, 10],
    'kernel': ['linear', 'rbf', 'poly']
}

grid_search = GridSearchCV(SVC(), param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search.fit(X_train, y_train)

# Get the best parameters
best_params = grid_search.best_params_
print("Best Parameters:", best_params)

# Step 8: Train the tuned classifier on the entire dataset
svc_tuned = SVC(**best_params)
svc_tuned.fit(scaler.fit_transform(X), y)

# Predict and evaluate the tuned model
y_pred_tuned = svc_tuned.predict(scaler.transform(X))
tuned_accuracy = accuracy_score(y, y_pred_tuned)
tuned_report = classification_report(y, y_pred_tuned)

print(f"Tuned Accuracy: {tuned_accuracy}")
print("Tuned Classification Report:\n", tuned_report)


Accuracy: 0.9766081871345029
Classification Report:
               precision    recall  f1-score   support

           0       0.97      0.97      0.97        63
           1       0.98      0.98      0.98       108

    accuracy                           0.98       171
   macro avg       0.97      0.97      0.97       171
weighted avg       0.98      0.98      0.98       171

