In [None]:


# ### Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?

# In machine learning, kernel functions enable algorithms to operate in a high-dimensional space without explicitly computing the coordinates of the data in that space. This technique is known as the "kernel trick." Polynomial kernels are a specific type of kernel function that allows Support Vector Machines (SVMs) to create decision boundaries that are polynomial curves rather than linear lines.

# A polynomial kernel of degree \( d \) is defined as:
# \[ K(x, x') = (x \cdot x' + c)^d \]

# where:
# - \( x \) and \( x' \) are input vectors.
# - \( c \) is a constant that can be adjusted.
# - \( d \) is the degree of the polynomial.

# ### Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

# You can implement an SVM with a polynomial kernel in Python using the Scikit-learn library by specifying the `kernel='poly'` parameter in the `SVC` class. Here is an example:

# ```python
# from sklearn import datasets
# from sklearn.model_selection import train_test_split
# from sklearn.preprocessing import StandardScaler
# from sklearn.svm import SVC
# from sklearn.metrics import accuracy_score

# # Load the Iris dataset
# iris = datasets.load_iris()
# X = iris.data
# y = iris.target

# # Split the dataset into training and testing sets
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# # Standardize the features
# scaler = StandardScaler()
# X_train = scaler.fit_transform(X_train)
# X_test = scaler.transform(X_test)

# # Create an SVM classifier with a polynomial kernel
# svc = SVC(kernel='poly', degree=3, C=1, random_state=42)
# svc.fit(X_train, y_train)

# # Predict the labels for the testing set
# y_pred = svc.predict(X_test)

# # Compute the accuracy of the model
# accuracy = accuracy_score(y_test, y_pred)
# print(f'Accuracy: {accuracy:.2f}')
# ```

# ### Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

# In Support Vector Regression (SVR), epsilon (\(\epsilon\)) defines a margin of tolerance where no penalty is given to errors. Increasing the value of \(\epsilon\) allows a larger margin of tolerance, which means more data points can fall within the margin without affecting the model. As a result, increasing \(\epsilon\) typically reduces the number of support vectors because fewer data points lie outside the margin and thus contribute to the model.

# ### Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)?

# - **Kernel Function**: The kernel function determines the shape of the decision boundary. Common choices include linear, polynomial, and RBF (radial basis function) kernels. The choice of kernel affects the model's ability to capture complex relationships in the data.

# - **C Parameter**: The regularization parameter \( C \) controls the trade-off between achieving a low error on the training data and minimizing the model complexity. A high \( C \) value aims for a low training error, potentially leading to overfitting, while a low \( C \) value allows more slack, which can lead to underfitting.

# - **Epsilon Parameter**: The \(\epsilon\) parameter defines the margin of tolerance where no penalty is given to errors. A larger \(\epsilon\) creates a wider margin, reducing the number of support vectors and making the model less sensitive to small variations in the data.

# - **Gamma Parameter**: The gamma parameter (used in RBF and polynomial kernels) defines the influence of a single training example. A low gamma value means a large radius of influence for each point, leading to a smoother decision boundary. A high gamma value means a small radius of influence, leading to a more complex decision boundary that can overfit the training data.

# ### Q5. Assignment Implementation

# Here is a full implementation of an SVM classifier using Scikit-learn, with hyperparameter tuning using GridSearchCV:

# ```python
# import numpy as np
# import pandas as pd
# from sklearn import datasets
# from sklearn.model_selection import train_test_split, GridSearchCV
# from sklearn.preprocessing import StandardScaler
# from sklearn.svm import SVC
# from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
# import joblib

# # Load the dataset
# iris = datasets.load_iris()
# X = iris.data
# y = iris.target

# # Split the dataset into training and testing sets
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# # Standardize the features
# scaler = StandardScaler()
# X_train = scaler.fit_transform(X_train)
# X_test = scaler.transform(X_test)

# # Create an instance of the SVC classifier
# svc = SVC(kernel='rbf', random_state=42)

# # Define the parameter grid for GridSearchCV
# param_grid = {
#     'C': [0.1, 1, 10, 100],
#     'gamma': [1, 0.1, 0.01, 0.001]
# }

# # Perform GridSearchCV to tune hyperparameters
# grid_search = GridSearchCV(svc, param_grid, cv=5, scoring='accuracy')
# grid_search.fit(X_train, y_train)

# # Print the best parameters and best score
# print(f'Best parameters: {grid_search.best_params_}')
# print(f'Best score: {grid_search.best_score_}')

# # Train the tuned classifier on the entire dataset
# best_svc = grid_search.best_estimator_
# best_svc.fit(X_train, y_train)

# # Save the trained classifier to a file
# joblib.dump(best_svc, 'svm_classifier.pkl')

# # Predict the labels for the testing set
# y_pred = best_svc.predict(X_test)

# # Evaluate the performance of the classifier
# accuracy = accuracy_score(y_test, y_pred)
# precision = precision_score(y_test, y_pred, average='weighted')
# recall = recall_score(y_test, y_pred, average='weighted')
# f1 = f1_score(y_test, y_pred, average='weighted')

# print(f'Accuracy: {accuracy:.2f}')
# print(f'Precision: {precision:.2f}')
# print(f'Recall: {recall:.2f}')
# print(f'F1 Score: {f1:.2f}')
# print('\nClassification Report:\n', classification_report(y_test, y_pred))
# ```

