Relationship between Polynomial Functions and Kernel Functions in Machine Learning: Polynomial functions and kernel functions are related in that polynomial kernel functions are based on polynomial functions. A kernel function transforms the input data into a higher-dimensional space where a linear separation might be possible, even if the original data is not linearly separable. The polynomial kernel is a specific type of kernel function that uses a polynomial function to compute the similarity between data points. 

In [1]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

iris = datasets.load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

svm_poly = SVC(kernel='poly', degree=3, coef0=1)
svm_poly.fit(X_train, y_train)
y_pred = svm_poly.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy with Polynomial Kernel: {accuracy * 100:.2f}%")

Accuracy with Polynomial Kernel: 97.78%


Effect of Increasing Epsilon on the Number of Support Vectors in SVR: In Support Vector Regression (SVR), epsilon defines the margin of tolerance, within which no penalty is given for errors. As increases, the model becomes more tolerant to errors, leading to fewer support vectors. This happens because more data points fall within the ϵ-tube, meaning fewer data points contribute to the margin. When epsilon is small, the model is more sensitive and may create more support vectors to fit the data closely.

Kernel Function: The choice of kernel (linear, polynomial, RBF) impacts how data is transformed into a higher-dimensional space. Polynomial and RBF kernels are commonly used for non-linear relationships, while a linear kernel is used for simpler, linear relationships.

C Parameter: Controls the trade-off between maximizing the margin and minimizing classification error. A high value of C emphasizes minimizing errors (overfitting), while a low C value allows for more margin violations but might generalize better.

Epsilon (𝜖): Defines the margin of tolerance where no penalty is assigned for errors. A higher epsilon results in fewer support vectors, as the model becomes more lenient in how it treats deviations from the true output. A small epsilon value results in a model that fits the data more closely.

Gamma Parameter: In non-linear kernels, gamma controls how much influence each training point has. A high gamma makes the decision boundary very sensitive to individual points (risk of overfitting), while a low gamma leads to a smoother boundary (underfitting).

In [3]:
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report
import joblib

iris = datasets.load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

svc = SVC(kernel='linear')
svc.fit(X_train_scaled, y_train)

y_pred = svc.predict(X_test_scaled)

print(classification_report(y_test, y_pred))

param_grid = {'C': [0.1, 1, 10], 'gamma': ['scale', 'auto'], 'kernel': ['linear', 'rbf']}
grid_search = GridSearchCV(SVC(), param_grid, refit=True, verbose=3)
grid_search.fit(X_train_scaled, y_train)

print("Best parameters from GridSearchCV:", grid_search.best_params_)

best_svc = grid_search.best_estimator_
X_scaled = scaler.fit_transform(X) 
best_svc.fit(X_scaled, y)

joblib.dump(best_svc, 'iris_svc_model.pkl')

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      0.92      0.96        13
           2       0.93      1.00      0.96        13

    accuracy                           0.98        45
   macro avg       0.98      0.97      0.97        45
weighted avg       0.98      0.98      0.98        45

Fitting 5 folds for each of 12 candidates, totalling 60 fits
[CV 1/5] END .C=0.1, gamma=scale, kernel=linear;, score=0.952 total time=   0.0s
[CV 2/5] END .C=0.1, gamma=scale, kernel=linear;, score=0.905 total time=   0.0s
[CV 3/5] END .C=0.1, gamma=scale, kernel=linear;, score=0.905 total time=   0.0s
[CV 4/5] END .C=0.1, gamma=scale, kernel=linear;, score=1.000 total time=   0.0s
[CV 5/5] END .C=0.1, gamma=scale, kernel=linear;, score=0.905 total time=   0.0s
[CV 1/5] END ....C=0.1, gamma=scale, kernel=rbf;, score=0.952 total time=   0.0s
[CV 2/5] END ....C=0.1, gamma=scale, kernel=rbf;, score=0.762 total time

['iris_svc_model.pkl']