## Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?

Polynomial functions and kernel functions are related concepts in machine learning, particularly in the context of kernel methods such as Support Vector Machines (SVMs) and kernel ridge regression.

A polynomial function is a mathematical function that can be expressed in the form of a sum of terms, where each term is a product of a coefficient and a power of the input variable(s). For example, the polynomial function `f(x) = 3x^2 + 2x - 1` is a second-degree polynomial in one variable.

A kernel function, in the context of machine learning, is a function that takes two input vectors and returns a scalar value that represents the similarity between the vectors. Kernel functions are used in kernel methods to transform the input data into a higher-dimensional feature space, where the data can be more easily separated or modeled.

The relationship between polynomial functions and kernel functions arises from the fact that some polynomial functions can be expressed as kernel functions. Specifically, a polynomial kernel function of degree `d` can be defined as:

`K(x, y) = (x^T y + c)^d`

where `x` and `y` are input vectors, `c` is a constant, and `d` is the degree of the polynomial. The polynomial kernel function computes the dot product between the input vectors, adds a constant, and then raises the result to the power of `d`. The resulting scalar value represents the similarity between the input vectors in a higher-dimensional feature space.

In summary, polynomial functions and kernel functions are related concepts in machine learning, particularly in the context of kernel methods. Some polynomial functions can be expressed as kernel functions, which are used to transform the input data into a higher-dimensional feature space, where the data can be more easily separated or modeled.

## Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

To implement an SVM with a polynomial kernel in Python using scikit-learn, you can use the SVC or SVR classes, depending on whether you're doing classification or regression, and specify the kernel parameter as 'poly'. You can also specify the degree of the polynomial kernel using the degree parameter.

In [1]:
from sklearn.svm import SVC
clf=SVC(kernel='poly',degree=3)

In [2]:
clf

In [3]:
#clf.fit(x_train,y_train)
#y_pred=clf.predict(x_test)

In this example, X_train and y_train are the feature matrix and target vector of the training data, respectively, and X_test is the feature matrix of the test data. The SVC class is used to create an SVM classifier, and the kernel and degree parameters are specified to use a polynomial kernel of degree 3. The classifier is then fit to the training data using the fit() method, and the labels of the test data are predicted using the predict() method.

The evaluation of the classifier's performance can be done using various metrics, such as accuracy, precision, recall, F1 score, etc., depending on the problem and the data.

Similarly, to implement an SVM with a polynomial kernel for regression, you can use the SVR class and specify the kernel and degree parameters in the same way. The fit() and predict() methods are also used in the same way, but the evaluation of the regression model's performance can be done using metrics such as mean squared error, mean absolute error, R2 score, etc.

## Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

In Support Vector Regression (SVR), the value of the parameter `epsilon` controls the width of the "epsilon-insensitive" tube around the regression function. The epsilon-insensitive tube is a region where the errors between the predicted and actual values are ignored, and only the errors outside of this region are used to penalize the model during training.

The number of support vectors in SVR is directly related to the number of training samples that lie outside of the epsilon-insensitive tube. These samples are called "support vectors" because they "support" the regression function and define the boundaries of the epsilon-insensitive tube.

Increasing the value of `epsilon` has the effect of widening the epsilon-insensitive tube, which in turn reduces the number of training samples that lie outside of it. This can lead to a decrease in the number of support vectors in the model.

Conversely, decreasing the value of `epsilon` has the effect of narrowing the epsilon-insensitive tube, which in turn increases the number of training samples that lie outside of it. This can lead to an increase in the number of support vectors in the model.

However, it's important to note that the relationship between `epsilon` and the number of support vectors is not always straightforward, and can depend on other factors such as the noise in the data, the complexity of the regression function, and the values of other hyperparameters in the model.

In practice, the value of `epsilon` should be chosen based on the specific problem and data at hand, and should be tuned using techniques such as cross-validation and grid search to achieve

## Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?

The performance of Support Vector Regression (SVR) can be significantly affected by the choice of kernel function, C parameter, epsilon parameter, and gamma parameter. Here's an explanation of each parameter and how it can affect the performance of the model:

1. Kernel function: The kernel function is used to transform the input data into a higher-dimensional feature space, where the regression function can be more easily modeled. The choice of kernel function can have a significant impact on the performance of the model. Commonly used kernel functions in SVR include linear, polynomial, and radial basis function (RBF). The linear kernel is the simplest and fastest kernel, but it may not be able to capture complex non-linear relationships in the data. The polynomial kernel can capture more complex relationships, but it can be sensitive to the choice of degree and can overfit the data if the degree is too high. The RBF kernel is a popular and flexible kernel that can capture a wide range of non-linear relationships, but it can be sensitive to the choice of gamma parameter and can be computationally expensive for large datasets.
2. C parameter: The C parameter is used to control the trade-off between the margin of the regression function and the errors of the training samples. A larger value of C corresponds to a higher penalty for errors, which can lead to a more complex and potentially overfitted model. A smaller value of C corresponds to a lower penalty for errors, which can lead to a simpler and potentially underfitted model. The optimal value of C depends on the specific problem and data at hand, and should be tuned using techniques such as cross-validation and grid search.
3. Epsilon parameter: The epsilon parameter is used to control the width of the "epsilon-insensitive" tube around the regression function. The epsilon-insensitive tube is a region where the errors between the predicted and actual values are ignored, and only the errors outside of this region are used to penalize the model during training. A larger value of epsilon corresponds to a wider tube and a potentially underfitted model, while a smaller value of epsilon corresponds to a narrower tube and a potentially overfitted model. The optimal value of epsilon depends on the specific problem and data at hand, and should be tuned using techniques such as cross-validation and grid search.
4. Gamma parameter: The gamma parameter is used to control the width of the RBF kernel. A larger value of gamma corresponds to a narrower kernel and a potentially overfitted model, while a smaller value of gamma corresponds to a wider kernel and a potentially underfitted model. The optimal value of gamma depends on the specific problem and data at hand, and should be tuned using techniques such as cross-validation and grid search.

Here are some examples of when you might want to increase or decrease the value of each parameter:

* Kernel function: If the data is linearly separable, a linear kernel may be sufficient. If the data is non-linear, a polynomial or RBF kernel may be more appropriate. If the RBF kernel is too slow for large datasets, a polynomial kernel may be a faster alternative.
* C parameter: If the model is underfitting the data, increasing the value of C may help to reduce the bias and improve the performance. If the model is overfitting the data, decreasing the value of C may help to reduce the variance and improve the performance.
* Epsilon parameter: If the model is underfitting the data, increasing the value of epsilon may help to reduce the bias and improve the performance. If the model is overfitting the data, decreasing the value of epsilon may help to reduce the variance and improve the performance.
* Gamma parameter: If the model is underfitting the data, decreasing the value of gamma may help to reduce the bias and improve the performance. If the model is overfitting the data, increasing the value of gamma may help to reduce the variance and improve the performance.

## Q5. Assignment:
- Import the necessary libraries and load the dataseg
- Split the dataset into training and testing setZ
- Preprocess the data using any technique of your choice (e.g. scaling, normaliMationK
- Create an instance of the SVC classifier and train it on the training datW
- hse the trained classifier to predict the labels of the testing datW
- Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy, precision, recall, F1-scoreK
- Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to
improve its performanc_
- Train the tuned classifier on the entire dataseg
- Save the trained classifier to a file for future use.

In [4]:
import pandas as pd
import numpy as np
from sklearn.model_selection import cross_val_score,GridSearchCV,RandomizedSearchCV,train_test_split
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix
from sklearn.preprocessing import StandardScaler,Normalizer
from sklearn.svm import SVC,LinearSVC,NuSVC

In [5]:
df=pd.read_csv('winequality-red.csv')
df.sample(5)

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
408,10.4,0.34,0.58,3.7,0.174,6.0,16.0,0.997,3.19,0.7,11.3,6
832,10.4,0.44,0.42,1.5,0.145,34.0,48.0,0.99832,3.38,0.86,9.9,3
1103,7.4,0.49,0.27,2.1,0.071,14.0,25.0,0.99388,3.35,0.63,12.0,6
27,7.9,0.43,0.21,1.6,0.106,10.0,37.0,0.9966,3.17,0.91,9.5,5
1061,9.1,0.4,0.5,1.8,0.071,7.0,16.0,0.99462,3.21,0.69,12.5,8


In [6]:
df.shape

(1599, 12)

In [7]:
df.isnull().sum()

fixed acidity           0
volatile acidity        0
citric acid             0
residual sugar          0
chlorides               0
free sulfur dioxide     0
total sulfur dioxide    0
density                 0
pH                      0
sulphates               0
alcohol                 0
quality                 0
dtype: int64

In [8]:
df.duplicated().sum()

240

In [9]:
df.drop_duplicates(inplace=True)

In [10]:
x=df.iloc[:,:-1]
y=df.iloc[:,-1]
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)
x_train.shape,y_train.shape

((1087, 11), (1087,))

In [11]:
x_train

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol
1016,8.9,0.38,0.40,2.2,0.068,12.0,28.0,0.99486,3.27,0.75,12.6
1519,6.6,0.70,0.08,2.6,0.106,14.0,27.0,0.99665,3.44,0.58,10.2
452,6.8,0.56,0.03,1.7,0.084,18.0,35.0,0.99680,3.44,0.63,10.0
847,7.4,0.68,0.16,1.8,0.078,12.0,39.0,0.99770,3.50,0.70,9.9
58,7.8,0.59,0.18,2.3,0.076,17.0,54.0,0.99750,3.43,0.59,10.0
...,...,...,...,...,...,...,...,...,...,...,...
1285,11.3,0.37,0.50,1.8,0.090,20.0,47.0,0.99734,3.15,0.57,10.5
1329,7.4,0.60,0.26,2.1,0.083,17.0,91.0,0.99616,3.29,0.56,9.8
1526,6.8,0.47,0.08,2.2,0.064,18.0,38.0,0.99553,3.30,0.65,9.6
1011,8.9,0.32,0.31,2.0,0.088,12.0,19.0,0.99570,3.17,0.55,10.4


In [12]:
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1359 entries, 0 to 1598
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   fixed acidity         1359 non-null   float64
 1   volatile acidity      1359 non-null   float64
 2   citric acid           1359 non-null   float64
 3   residual sugar        1359 non-null   float64
 4   chlorides             1359 non-null   float64
 5   free sulfur dioxide   1359 non-null   float64
 6   total sulfur dioxide  1359 non-null   float64
 7   density               1359 non-null   float64
 8   pH                    1359 non-null   float64
 9   sulphates             1359 non-null   float64
 10  alcohol               1359 non-null   float64
 11  quality               1359 non-null   int64  
dtypes: float64(11), int64(1)
memory usage: 138.0 KB


In [19]:
list(range(7))

[0, 1, 2, 3, 4, 5, 6]

In [46]:
ct1=ColumnTransformer([
    ('ss',StandardScaler(),list(range(11)))
])

In [47]:
ct2=SVC()

In [48]:
pipe1=Pipeline([
    ('trf1',ct1),
    ('clf',ct2)
])

In [49]:
pipe1

In [50]:
pipe1.fit(x_train,y_train)

In [51]:
y_pred_svc=pipe1.predict(x_test)
print(y_pred_svc)
print('accuracy score of svc is',accuracy_score(y_test,y_pred_svc))

[5 6 7 5 6 7 6 5 6 5 7 6 6 6 6 5 6 5 5 6 5 6 5 5 6 5 6 5 5 6 6 7 6 5 6 6 5
 6 5 6 5 6 6 5 6 5 5 5 5 6 6 5 6 5 6 5 6 5 5 5 5 6 5 6 6 6 5 5 6 5 5 7 6 6
 6 5 5 5 5 5 6 5 5 5 5 6 5 5 6 6 6 6 5 6 6 6 5 7 6 5 5 5 5 5 6 5 6 6 6 7 6
 6 5 6 5 5 5 5 5 6 5 5 5 5 6 5 5 5 5 5 5 5 6 6 5 5 7 6 6 5 5 5 6 6 6 6 6 5
 6 5 5 6 5 5 5 6 6 5 6 7 6 6 5 6 5 6 5 5 6 6 7 6 6 6 7 6 5 5 5 7 5 6 5 6 5
 6 5 5 5 6 5 5 6 6 5 5 7 5 6 6 5 7 5 6 5 6 5 5 5 6 6 6 6 5 6 5 7 6 5 6 5 6
 5 6 5 6 6 6 5 6 6 6 7 5 6 7 6 6 5 5 5 6 5 5 6 5 6 6 6 7 5 5 6 6 6 6 5 5 6
 6 7 6 5 5 6 5 6 7 6 6 6 5]
accuracy score of svc is 0.6470588235294118


In [52]:
confusion_matrix(y_test,y_pred_svc)

array([[ 0,  0,  4,  0,  0,  0],
       [ 0,  0,  6,  5,  0,  0],
       [ 0,  0, 89, 31,  0,  0],
       [ 0,  0, 27, 72,  4,  0],
       [ 0,  0,  2, 14, 15,  0],
       [ 0,  0,  0,  2,  1,  0]], dtype=int64)

In [53]:
print(classification_report(y_test,y_pred_svc))

              precision    recall  f1-score   support

           3       0.00      0.00      0.00         4
           4       0.00      0.00      0.00        11
           5       0.70      0.74      0.72       120
           6       0.58      0.70      0.63       103
           7       0.75      0.48      0.59        31
           8       0.00      0.00      0.00         3

    accuracy                           0.65       272
   macro avg       0.34      0.32      0.32       272
weighted avg       0.61      0.65      0.62       272



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [45]:
ct1=ColumnTransformer([
    ('ss',StandardScaler(),list(range(11)))
])
ct2=LinearSVC(max_iter=2500)
pipe2=Pipeline([
    ('trf1',ct1),
    ('clf',ct2)
])
pipe2.fit(x_train,y_train)
y_pred_linear_svc=pipe2.predict(x_test)
print(y_pred_linear_svc)
print('accuracy score of svc is',accuracy_score(y_test,y_pred_linear_svc))

[5 6 6 5 5 6 6 5 6 6 6 6 6 5 6 5 6 5 5 6 5 6 5 5 6 5 6 5 5 6 6 6 6 5 6 6 5
 6 5 6 5 6 6 5 5 5 5 5 5 6 6 5 6 6 6 5 5 5 5 6 5 6 5 6 5 6 5 5 7 6 5 6 6 6
 6 5 5 5 5 5 6 5 5 5 5 6 5 5 6 6 5 5 5 5 6 5 5 6 6 5 5 5 5 5 6 5 6 6 6 6 6
 6 5 6 5 5 5 5 5 6 6 5 6 5 5 6 5 5 5 5 5 5 5 5 5 5 7 6 6 5 5 5 6 6 6 6 6 5
 5 5 5 5 5 5 5 6 5 5 6 6 6 5 6 6 5 6 6 5 6 6 6 5 6 6 6 6 5 5 5 6 5 5 5 6 5
 6 5 5 5 6 5 5 6 5 5 5 6 5 6 7 5 7 5 6 5 6 5 6 5 6 6 6 6 5 6 5 6 6 5 6 5 6
 5 6 5 6 6 6 5 6 5 5 6 5 6 7 6 6 5 5 5 6 5 5 6 5 5 6 6 6 5 5 6 6 6 6 5 5 6
 6 5 5 5 5 6 5 5 6 5 6 6 5]
accuracy score of svc is 0.6029411764705882




In [41]:
confusion_matrix(y_test,y_pred_linear_svc)

array([[ 0,  0,  4,  0,  0,  0],
       [ 0,  0,  7,  4,  0,  0],
       [ 0,  0, 96, 24,  0,  0],
       [ 0,  0, 35, 66,  2,  0],
       [ 0,  0,  3, 25,  3,  0],
       [ 0,  0,  0,  3,  0,  0]], dtype=int64)

In [54]:
cross_val_score(pipe1,x_train,y_train,scoring='accuracy',cv=5).mean()

0.5933243140405022

In [55]:
cross_val_score(pipe2,x_train,y_train,scoring='accuracy',cv=5).mean()



0.5611465776011499

In [58]:
from sklearn.metrics import make_scorer,precision_score, recall_score, f1_score

In [76]:
scoring_metrics = {
    'Accuracy': 'accuracy',
    'Precision': make_scorer(precision_score, average='weighted'),
    'Recall': make_scorer(recall_score, average='weighted'),
    'F1': make_scorer(f1_score, average='weighted'),
    'ROC AUC': 'roc_auc'
}

In [95]:
param_grid = {
    'clf__C': [0.1, 1, 10, 0.4],
    'clf__penalty': ['l1', 'l2'],
    'clf__dual': [True, False],
    'clf__loss': ['hinge', 'squared_hinge']
}


gcv = GridSearchCV(
    estimator=pipe2,
    param_grid=param_grid,
    cv=5,
    scoring='f1',
    verbose=4,
    n_jobs=-1
)




In [84]:
import warnings
warnings.filterwarnings('ignore')

In [101]:
param_grid = {
    'clf__C': [0.1, 1,  0.4],
    'clf__kernel': ['rbf', 'linear', 'poly'],
    'clf__degree': [2, 3, 4],
    'clf__gamma': ['scale', 'auto', 0.1, 1]
}

# Set up the GridSearchCV object with the pipeline, hyperparameter grid, and scoring metric
gcv = GridSearchCV(
    estimator=pipe1,
    param_grid=param_grid,
    cv=3,
    scoring='f1',
    verbose=2,
    n_jobs=-1
)

# Fit the GridSearchCV object to the training data
gcv.fit(x_train, y_train)

Fitting 3 folds for each of 108 candidates, totalling 324 fits


In [102]:
gcv.best_params_

{'clf__C': 0.1, 'clf__degree': 2, 'clf__gamma': 'scale', 'clf__kernel': 'rbf'}

In [103]:
gcv.best_score_

nan