# Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?

Polynomial kernel functions compute the dot product between feature vectors in a higher-dimensional space defined by polynomial functions of the original features. They allow SVMs and other algorithms to capture non-linear relationships between features without explicitly calculating the transformed feature vectors. In essence, polynomial kernel functions use polynomial functions to create the higher-dimensional space, making it possible to separate data that might not be linearly separable in the original space.

# Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

In [3]:
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC
from sklearn import datasets
from sklearn.model_selection import train_test_split

In [6]:
x, y = datasets.load_iris(return_X_y=True)

In [10]:
x_train,x_test,y_train,y_test = train_test_split(x,y,train_size=0.25,random_state=42)
svc = SVC(kernel = "poly")
svc.fit(x_train,y_train)
y_pred = svc.predict(x_test)
print(accuracy_score(y_pred,y_test))

0.9823008849557522


# Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

 In Support Vector Regression (SVR), the value of epsilon (ε) represents the margin of tolerance within which no penalty is given to errors. Increasing the value of epsilon makes the margin larger, which means that SVR allows more data points to be within the margin without incurring a penalty. As a result, increasing epsilon tends to increase the number of support vectors.

# Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?

Choice of Kernel Function: Different kernel functions (e.g., linear, polynomial, RBF) can capture different types of relationships in the data. The choice depends on the data's characteristics.

C Parameter: The C parameter controls the trade-off between maximizing the margin and minimizing the training error. Smaller C values allow for a larger margin but may tolerate more training errors, while larger C values emphasize correct classification and can lead to a smaller margin.

Epsilon Parameter (ε): Epsilon defines the margin of tolerance for errors in SVR. Larger epsilon values allow for more data points within the margin, leading to a wider margin but potentially sacrificing precision.

Gamma Parameter (for non-linear kernels): Gamma controls the shape of the decision boundary. Smaller gamma values lead to a more flexible decision boundary, while larger values make the boundary more rigid and may lead to overfitting.

You might increase or decrease these parameters based on the following scenarios:

Increase C when you want to reduce training errors and are willing to accept a narrower margin.
Increase epsilon if you want to allow more data points within the margin to reduce sensitivity to individual data points.
Adjust the kernel choice based on the linearity or complexity of the data.
Adjust gamma to control overfitting or underfitting in non-linear kernels

# Q5. Assignment:
# L Import the necessary libraries and load the dataseg
# L Split the dataset into training and testing setZ
# L Preprocess the data using any technique of your choice (e.g. scaling, normaliMationK
# L Create an instance of the SVC classifier and train it on the training datW
# L hse the trained classifier to predict the labels of the testing datW
# L Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy,
# precision, recall, F1-scoreK
# L Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to
# improve its performanc_
# L Train the tuned classifier on the entire dataseg
# L Save the trained classifier to a file for future use.

In [13]:
from sklearn.datasets import load_iris
dataset = load_iris()

In [15]:
import seaborn as sns
df = sns.load_dataset("iris")

In [18]:
x = df.iloc[:,:-1]
y = dataset.target

In [23]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,train_size=0.25,random_state=42)

In [24]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(x_train)
x_test_scaled = scaler.fit(x_test)

In [34]:
from sklearn.svm import SVC
svc = SVC()
svc.fit(x_train_scaled,y_train)

In [39]:
import warnings
warnings.filterwarnings("ignore")
from sklearn.metrics import accuracy_score,classification_report
y_pred = svc.predict(x_test)
print(accuracy_score(y_pred,y_test))
print(classification_report(y_test,y_pred))

0.37168141592920356
              precision    recall  f1-score   support

           0       0.37      1.00      0.54        42
           1       0.00      0.00      0.00        36
           2       0.00      0.00      0.00        35

    accuracy                           0.37       113
   macro avg       0.12      0.33      0.18       113
weighted avg       0.14      0.37      0.20       113



In [40]:
## hyperparameter

In [41]:
from sklearn.model_selection import GridSearchCV
param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf', 'poly'], 'degree': [2, 3, 4]}
clf = GridSearchCV(svc,param_grid=param_grid,cv=5)

In [42]:
clf.fit(x_train,y_train)

In [None]:
clf.best_params_

{'C': 1, 'degree': 2, 'kernel': 'linear'}

In [45]:
svc = SVC(C=1,kernel="linear",degree = 2)

In [46]:
svc.fit(x_train,y_train)

In [48]:
y_pred = svc.predict(x_test)
print(accuracy_score(y_pred,y_test))

0.9646017699115044


In [51]:
import pickle
pickle.dump(svc,open("svc.pkl","wb"))
