Q1

In machine learning, kernel functions are a mathematical technique that allows us to implicitly transform data into a higher-dimensional space, where it may be easier to find linear decision boundaries. Polynomial functions are one type of kernel function commonly used in this context.

The relationship between polynomial functions and kernel functions is that polynomial kernels, specifically the polynomial kernel function, are a type of kernel function. The polynomial kernel calculates the similarity between data points in the original feature space by applying a polynomial function to their dot product. It's used to map data into a higher-dimensional space using polynomial transformations. Other kernel functions, like the Gaussian (RBF) kernel, are also used to map data into higher-dimensional spaces, but polynomial kernels specifically employ polynomial functions for this purpose.

Q2

In [2]:
from sklearn import datasets
from sklearn.svm import SVC

# Load your dataset
X, y = datasets.load_iris(return_X_y=True)

# Create an SVM classifier with a polynomial kernel
svm_classifier = SVC(kernel='poly', degree=3)  # You can adjust the 'degree' parameter for the polynomial order

# Fit the SVM classifier to your data
svm_classifier.fit(X, y)

# Make predictions using the trained SVM
predictions = svm_classifier.predict(X)
predictions


array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

Q3

In Support Vector Regression (SVR), the value of epsilon (ε) determines the width of the tube within which errors are ignored. Increasing the value of epsilon typically leads to an increase in the number of support vectors. Here's how it works:

1. **Smaller Epsilon (Tighter Tube):** When you set a smaller epsilon value, you are specifying a narrower tube around the regression line. This means that the SVR model will be less tolerant of errors or deviations from the regression line. As a result, it is more likely to include data points as support vectors to ensure that errors stay within the smaller tube. This leads to a larger number of support vectors, and the model may fit the training data more closely.

2. **Larger Epsilon (Wider Tube):** Conversely, when you set a larger epsilon value, you are allowing a wider tube around the regression line, meaning that the model can tolerate larger errors. In this case, fewer data points are included as support vectors because the model has more flexibility to accommodate larger deviations from the regression line. This results in a smaller number of support vectors.

The choice of epsilon in SVR is a trade-off between model complexity and fitting accuracy. A smaller epsilon may lead to a model that fits the training data more closely but might be sensitive to noise or overfit. A larger epsilon allows for a more robust model that is less sensitive to individual data points but may have a looser fit to the training data. The optimal epsilon value depends on the specific characteristics of your data and the trade-offs you are willing to make between model complexity and fitting accuracy.

Q4

1. **Kernel Function:** The choice of kernel function affects the mapping of the data into a higher-dimensional space. Different kernel functions (e.g., linear, polynomial, radial basis function) can capture different types of relationships in the data. For example, a polynomial kernel may work well for data with polynomial patterns, while an RBF kernel is more flexible and can capture complex non-linear relationships.

2. **C Parameter (Regularization):** The C parameter controls the trade-off between maximizing the margin and minimizing the classification error. A small C makes the decision boundary smoother, allowing some misclassifications, while a large C results in a narrower margin but fewer misclassifications. You might increase C when you want to reduce misclassifications (potentially overfit), or decrease it to encourage a wider margin (potentially underfit).

3. **Epsilon Parameter (for Epsilon-Support Vector Regression, ε-SVR):** The epsilon parameter (ε) defines a tube around the regression line within which errors are ignored. It controls the sensitivity to errors. Larger ε allows more errors within the tube, while smaller ε enforces tighter tolerance to errors. You might increase ε when you have noisy data or decrease it for a more precise fit.

4. **Gamma Parameter (for RBF Kernel):** In the context of the Radial Basis Function (RBF) kernel, the gamma parameter defines the shape of the decision boundary. A small gamma results in a more flexible boundary, capturing fine details in the data, which can lead to overfitting. A large gamma results in a smoother boundary, potentially underfitting the data. You might increase gamma when the data is complex, and you want a more intricate decision boundary, or decrease it for a smoother, less complex boundary.

The choice of these parameters should be made through experimentation and model validation, as the optimal values depend on the specific dataset and problem you're working with. Adjusting these parameters allows you to fine-tune your SVR model to achieve the best balance between model complexity, accuracy, and generalization.

Q5


IRIS dataset

In [3]:
#load dataset
from sklearn.datasets import load_iris
data=load_iris()
data.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [4]:
X=data.data
y=data.target
print(X.shape,y.shape)


(150, 4) (150,)


In [6]:
#scaling
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
X_scaled=scaler.fit_transform(X)

In [7]:
#split the dataset
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X_scaled,y,test_size=0.3)


In [8]:
from sklearn.svm import SVC
classifier=SVC()
classifier.fit(X_train,y_train)


SVC()

In [12]:
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report,f1_score
y_pred=classifier.predict(X_test)
print(confusion_matrix(y_test,y_pred))
print(accuracy_score(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(f1_score(y_test,y_pred,average='macro'))

[[16  0  0]
 [ 0 12  0]
 [ 0  2 15]]
0.9555555555555556
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        16
           1       0.86      1.00      0.92        12
           2       1.00      0.88      0.94        17

    accuracy                           0.96        45
   macro avg       0.95      0.96      0.95        45
weighted avg       0.96      0.96      0.96        45

0.953525641025641


## Hyperparameter tuning ##

In [27]:
import numpy as np
parameters={ 'C':np.arange(1,10,0.5),
            'kernel':['linear', 'poly', 'rbf', 'sigmoid'],
            'degree':np.arange(1,10,0.5),
            'gamma':['scale','auto']

}

In [28]:
from sklearn.model_selection import GridSearchCV
classifier=SVC()
grid=GridSearchCV(classifier,param_grid=parameters,scoring='accuracy',cv=5,)

In [29]:
grid.fit(X_train,y_train)

GridSearchCV(cv=5, estimator=SVC(),
             param_grid={'C': array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5, 7. ,
       7.5, 8. , 8.5, 9. , 9.5]),
                         'degree': array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5, 7. ,
       7.5, 8. , 8.5, 9. , 9.5]),
                         'gamma': ['scale', 'auto'],
                         'kernel': ['linear', 'poly', 'rbf', 'sigmoid']},
             scoring='accuracy')

In [30]:
grid.best_params_

{'C': 5.0, 'degree': 1.0, 'gamma': 'auto', 'kernel': 'poly'}

In [32]:
grid.best_score_

0.9714285714285715

In [33]:
#Let's save the model on best params
classifier=SVC(C=5,degree=1,gamma='auto',kernel='poly')
classifier.fit(X_train,y_train)
accuracy_score(y_test,classifier.predict(X_test))

0.9777777777777777

In [34]:
import pickle as pik
pik.dump(classifier,open('SVC.pkl','wb'))