#### Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?

Kernel functions are used to transform the original feature space into a higher dimensional feature space, where a linear hyperplane can better separate the data points.

Polynomial functions are one type of kernel function used in SVMs. Specifically, a polynomial kernel is a type of kernel function that uses a polynomial of a certain degree to map the original features to a higher dimensional space. The degree of the polynomial determines the degree of the mapping, and higher degrees result in a larger increase in the dimensionality of the feature space.

Polynomial functions and kernel functions are related in machine learning algorithms because polynomial kernel is a type of kernel function that can be used to transform the input data into a higher-dimensional feature space where linear models can be applied to non-linear problems. Polynomial kernel represents the similarity of vectors (training samples) in a feature space over polynomials of the original variables. Polynomial kernel is defined as:

K(x,y)=(x<sup>T</sup>y+c)<sup>d</sup>

where x and y are vectors in the input space, c ≥ 0 is a free parameter trading off the influence of higher-order versus lower-order terms in the polynomial, and d is the degree of the polynomial.

#### Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

To implement an SVM with a polynomial kernel in python using Scikit-learn as:
1. import the necessary libraries such as svm, datasets from sklearn
2. Load the dataset and split the dataset into train and test data using train_test_split.
3. Create an instance of SVC() class and specify kernel parameter as poly.
4. Train the model using fit method.
5. Make prediction on the trained model using predict method on test data.
6. Evaluate the performance of the model using accuracy_score, confusion matrix, etc.

In [1]:
# import necessary libraries
from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix

# load the dataset
df = load_iris()
X = df.data    # features
y = df.target  # target

# split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# create the instance of svc
svc = SVC(kernel='poly')

# train the data using fit method
svc.fit(X_train, y_train)

# test the data
y_pred = svc.predict(X_test)

# Evaluate the performance of the model
print("Accuracy: ", accuracy_score(y_test, y_pred))
print("confusion metrix: \n", confusion_matrix(y_test, y_pred))

Accuracy:  0.9777777777777777
confusion metrix: 
 [[19  0  0]
 [ 0 12  1]
 [ 0  0 13]]


#### Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

Epsilon is a parameter that controls the width of the epsilon-tube within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value. In other words, epsilon defines a margin of tolerance where errors are ignored. The larger epsilon is, the larger errors we admit in our solution, and the fewer support vectors we will have. Conversely, the smaller epsilon is, the more strict we are with errors, and the more support vectors we will have.

Therefore, increasing the value of epsilon will decrease the number of support vectors in SVR, and vice versa. However, choosing an appropriate value of epsilon is a trade-off between model complexity and generalization ability. A too large epsilon may result in underfitting, while a too small epsilon may result in overfitting.

#### Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?

Kernel function:</br> 
This parameter specifies the type of function that is used to map the input data into a higher-dimensional feature space where linear models can be applied to non-linear problems. There are different types of kernel functions, such as linear, polynomial, radial basis function (RBF), sigmoid, etc. Each kernel function has its own advantages and disadvantages, depending on the nature and complexity of the data. For example, linear kernel is simple and fast, but it may not capture complex patterns in the data. Polynomial kernel can capture feature interactions, but it may suffer from numerical instability. RBF kernel can handle non-linear and flexible data, but it may be sensitive to outliers and noise. The choice of kernel function depends on the data and the problem at hand.

C parameter:</br> 
This parameter controls the regularization strength of the SVR model. It is inversely proportional to the norm of the function, which means a larger C means less regularization and a smaller C means more regularization. Regularization is a technique that prevents overfitting by adding a penalty term to the cost function that shrinks the coefficients towards zero. The optimal value of C depends on the trade-off between bias and variance. Bias is the error due to incorrect assumptions or oversimplification of the model. Variance is the error due to sensitivity or instability of the model to small changes in the data. A too large C may result in high variance and low bias, which means overfitting. A too small C may result in low variance and high bias, which means underfitting. We might want to increase C if we have a lot of data and low noise, or decrease C if we have a small amount of data and high noise.

Epsilon parameter:</br> 
Epsilon is a parameter that controls the width of the epsilon-tube within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value. In other words, epsilon defines a margin of tolerance where errors are ignored. The larger epsilon is, the larger errors we admit in our solution, and the fewer support vectors we will have. Conversely, the smaller epsilon is, the more strict we are with errors, and the more support vectors we will have. Therefore, increasing the value of epsilon will decrease the number of support vectors in SVR, and vice versa. However, choosing an appropriate value of epsilon is a trade-off between model complexity and generalization ability. A too large epsilon may result in underfitting, while a too small epsilon may result in overfitting.

Gamma parameter:</br> 
This parameter is only relevant for kernel functions that involve distance or similarity measures, such as RBF, polynomial, or sigmoid kernels. It controls how much influence a single training example has on the decision boundary. It is directly proportional to the width of the kernel function, which means a larger gamma means a narrower kernel and a smaller gamma means a wider kernel. A narrower kernel means that each training example has a small region of influence around it, which results in a more complex and wiggly decision boundary. A wider kernel means that each training example has a large region of influence around it, which results in a smoother and simpler decision boundary. The optimal value of gamma depends on the trade-off between bias and variance as well. A too large gamma may result in high variance and low bias, which means overfitting. A too small gamma may result in low variance and high bias, which means underfitting. We might want to increase gamma if we have a lot of features and low noise.

#### Q5. Assignment:
- Import the necessary libraries and load the dataset
- Split the dataset into training and testing sets.
- Preprocess the data using any technique of your choice (e.g. scaling, normalization)
- Create an instance of the SVC classifier and train it on the training data
- Use the trained classifier to predict the labels of the testing data
- Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy, precision, recall, F1-score)
- Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to improve its performance.
- Train the tuned classifier on the entire dataset.
- Save the trained classifier to a file for future use.

In [2]:
# Import the necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix


# load the dataset
df = load_iris()
X = df.data    # features
y = df.target  # target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Preprocess the data using StandardScler
scaler = StandardScaler()
scaler.fit_transform(X_train, y_train)
scaler.transform(X_test)

# Create an instance of the SVC classifier and train it on the training data
svc = SVC(kernel='linear')
svc.fit(X_train, y_train)

# Use the trained classifier to predict the labels of the testing data
y_pred = svc.predict(X_test)

# Evaluate the performance of the classifier using any metric 
print("Accuracy: ", accuracy_score(y_test, y_pred))
print("Confusion Matrix: \n",confusion_matrix(y_test, y_pred))

Accuracy:  1.0
Confusion Matrix: 
 [[19  0  0]
 [ 0 13  0]
 [ 0  0 13]]


In [3]:
# Hyperparameter tuning using GridSearchCV
from sklearn.model_selection import GridSearchCV

param_grid = {
    'C':[0.1, 1, 10, 100, 1000],
    'gamma':[1, 0.1, 0.01, 0.0001],
    'kernel':['linear', 'poly', 'rbf', 'sigmoid']
}

# Train the tuned classifier on the entire dataset
svc_tuning = GridSearchCV(SVC(), param_grid=param_grid, refit=True, cv = 5)
svc_tuning.fit(X_train, y_train)

# prediction
y_pred = svc_tuning.predict(X_test)
print(accuracy_score(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

1.0
[[19  0  0]
 [ 0 13  0]
 [ 0  0 13]]


In [4]:
# Save the trained classifier to a file for future use
import pickle

pickle.dump(scaler, open('scaler.pkl', 'wb'))
pickle.dump(svc_tuning, open('svc.pkl', 'wb'))