Q1. What is the relationship between polynomial functions and kernel functions in machine learning
algorithms?

1. Polynomial Kernel:

- The polynomial kernel is a type of kernel function commonly used with support vector machines (SVMs) and other kernelized models.
- It represents the similarity of vectors (training samples) in a feature space over polynomials of the original variables.
- Essentially, it allows learning of non-linear models by transforming the data into a higher-dimensional space.
- The polynomial kernel is particularly useful when the data is not linearly separable.
- It captures non-linear relationships between input data and can handle complex datasets.
- The kernel function is defined as:K(x,y)=(x⋅y+c)d
- where:

- (x) and (y) are input vectors.
- (c) is a constant term.
- (d) is the degree of the polynomial.


- The polynomial kernel can be approximated by other kernels via a Taylor series expansion1.


2. Other Major Kernel Functions:

- Gaussian Kernel (Radial Basis Function - RBF):Used when there is no prior knowledge about the data.
- Provides a non-linear transformation.


- Sigmoid Kernel:Equivalent to a two-layer perceptron model of a neural network.
- Used as an activation function for artificial neurons.


- Linear Kernel:Used when data is linearly separable.
- Represents the inner product between input vectors.


- Other Custom Kernels:Besides the polynomial kernel, you can also create custom kernels based on specific problem requirements.

Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns

In [2]:
from sklearn.datasets import make_regression

In [3]:
x,y=make_regression(n_samples=50,n_targets=1,n_features=2,noise=3.0)

In [4]:
from sklearn.svm import SVR

In [6]:
svr=SVR(kernel='poly')

In [7]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,random_state=32,test_size=0.23)

In [8]:
svr.fit(x_train,y_train)

Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

1. Effect of Increasing Epsilon on Support Vectors:
- When we increase the value of (\epsilon):
- The margin around the regression function widens.
- More data points fall within this margin.
- These data points are considered support vectors.
- Therefore, increasing (\epsilon) results in more support vectors.
- These support vectors contribute to defining the regression function.
- They are the data points that lie within the (\epsilon)-tube around the regression line.

Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter
affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works
and provide examples of when you might want to increase or decrease its value?

1. Kernel Function:The choice of kernel function determines how SVR captures relationships between input features.
- Common kernel types include:
- Linear Kernel: Simple dot product between input vectors.
- Polynomial Kernel: Captures non-linear patterns using polynomial functions.
- Radial Basis Function (RBF) Kernel: Suitable for complex, non-linear data.

- Sigmoid Kernel: Used for neural network-like activation.
- Example:
- Use a polynomial kernel when the data exhibits non-linear behavior (e.g., stock market predictions).


2. C Parameter:
- The regularization parameter (C) controls the trade-off between fitting the training data and preventing overfitting.
- Larger (C) values:
- Tend to minimize training error (less tolerance for errors).
- May lead to overfitting if the model becomes too complex.
- Smaller (C) values:
- Allow larger errors (more tolerance for errors).
- Result in a simpler model.
- Example:
- Increase (C) when you want to fit the training data more closely (low tolerance for errors).

3. Epsilon Parameter:
- In the epsilon-SVR model, (\epsilon) specifies the width of the epsilon-tube around the predicted values.
- Points within this tube do not incur a penalty in the training loss function.
- Larger (\epsilon) values:
- Widen the tube, allowing more points to be within the margin.
- May lead to more support vectors.
- Smaller (\epsilon) values:
- Narrow the tube, emphasizing points closer to the actual values.
- Result in fewer support vectors.
- Example: Increase (\epsilon) when you want to allow more flexibility in predictions.

4. Gamma Parameter:
- Gamma controls the influence of a single training example on the decision boundary.
- High gamma values:
- Make the influence of nearby points stronger.
- Result in more complex decision boundaries.
- Low gamma values:
- Make the influence of distant points stronger.
- Result in smoother decision boundaries.
- Example:
- Increase gamma for complex, non-linear data with many features.
- Remember that parameter tuning depends on the specific problem and dataset. It’s essential to experiment and validate using cross-validation to find optimal values for these parameters

Kernel Function:
The choice of kernel function determines how SVR captures relationships between input features.
Common kernel types include:
Linear Kernel: Simple dot product between input vectors.
Polynomial Kernel: Captures non-linear patterns using polynomial functions.
Radial Basis Function (RBF) Kernel: Suitable for complex, non-linear data.
Sigmoid Kernel: Used for neural network-like activation.
Example:
Use a polynomial kernel when the data exhibits non-linear behavior (e.g., stock market predictions).
C Parameter:
The regularization parameter (C) controls the trade-off between fitting the training data and preventing overfitting.
Larger (C) values:
Tend to minimize training error (less tolerance for errors).
May lead to overfitting if the model becomes too complex.
Smaller (C) values:
Allow larger errors (more tolerance for errors).
Result in a simpler model.
Example:
Increase (C) when you want to fit the training data more closely (low tolerance for errors).
Epsilon Parameter:
In the epsilon-SVR model, (\epsilon) specifies the width of the epsilon-tube around the predicted values.
Points within this tube do not incur a penalty in the training loss function.
Larger (\epsilon) values:
Widen the tube, allowing more points to be within the margin.
May lead to more support vectors.
Smaller (\epsilon) values:
Narrow the tube, emphasizing points closer to the actual values.
Result in fewer support vectors.
Example:
Increase (\epsilon) when you want to allow more flexibility in predictions.
Gamma Parameter:
Gamma controls the influence of a single training example on the decision boundary.
High gamma values:
Make the influence of nearby points stronger.
Result in more complex decision boundaries.
Low gamma values:
Make the influence of distant points stronger.
Result in smoother decision boundaries.
Example:
Increase gamma for complex, non-linear data with many features.
Remember that parameter tuning depends on the specific problem and dataset. It’s essential to experiment and validate using cross-validation to find optimal values for these parameters

In [31]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification

In [32]:
X,Y=make_classification(n_samples=1000,n_features=2,n_classes=2,n_clusters_per_class=2,n_redundant=0)

In [33]:
X

array([[-0.92409277,  1.32412279],
       [ 0.14204365, -0.31038257],
       [-1.43996262, -1.5218374 ],
       ...,
       [ 0.04569757, -0.66855252],
       [-1.10948996, -0.95351233],
       [-1.40423511, -0.56908751]])

In [12]:
Y

array([-1.18156438e+02,  8.91859403e+01, -4.55383342e+01, -2.82185865e+01,
       -2.91049391e+01, -3.83357335e+01, -7.16020395e+01, -4.75596468e+01,
        1.12485168e+02, -7.01525429e+01,  1.12266028e+01,  3.43592077e+01,
       -1.27347044e+02,  7.54492582e+01,  5.34226728e+01,  1.40392332e+02,
       -2.14264617e+01,  4.10839607e+01,  5.16000914e+01,  3.21439641e+02,
        8.54796830e+01, -1.26712438e+01,  6.64666225e+01,  1.18908878e+02,
       -4.99722217e+01,  1.68268435e+02, -1.20602811e+02,  1.21023682e+02,
       -5.74576958e+01,  3.36037755e+01,  6.15797125e+01, -6.94130495e+01,
       -1.06822792e+02, -4.85479879e+00, -1.44748072e+02,  1.37736992e+02,
       -7.32751676e+01,  7.25685720e+01, -1.21864147e+02, -3.11378004e+01,
        7.97095689e+01,  4.88242848e+01,  8.18275591e+01, -7.49750985e+01,
       -2.42951098e+01, -4.60653184e+00,  1.29499162e+02,  7.83149149e+01,
       -5.61653221e+01,  8.57317646e+01,  9.01849958e+01, -2.67352321e+00,
        3.17752425e+01, -

In [34]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(X,Y,test_size=0.23,random_state=34)

In [35]:
from sklearn.svm import SVC

In [36]:
svc=SVC(kernel='linear')

In [37]:
svc.fit(x_train,y_train)

In [38]:
svc.coef_

array([[0.55774284, 2.38075129]])

In [39]:
y_pred=svc.predict(x_test)

In [40]:
from sklearn.metrics import accuracy_score,f1_score,precision_score,recall_score

In [41]:
print(accuracy_score(y_test,y_pred))
print(f1_score(y_test,y_pred))
print(precision_score(y_test,y_pred))
print(recall_score(y_test,y_pred))

0.9260869565217391
0.923076923076923
0.9532710280373832
0.8947368421052632


In [42]:
from sklearn.model_selection import GridSearchCV

In [44]:
param_grid={
    
    'C':[0.1,1,10,100,1000],
    'gamma':[1,0.1,0.01,0.001,0.0001]
}

In [45]:
grid=GridSearchCV(SVC(),param_grid=param_grid,cv=5,verbose=3)

In [46]:
grid.fit(x_train,y_train)

Fitting 5 folds for each of 25 candidates, totalling 125 fits
[CV 1/5] END ....................C=0.1, gamma=1;, score=0.942 total time=   0.0s
[CV 2/5] END ....................C=0.1, gamma=1;, score=0.922 total time=   0.0s
[CV 3/5] END ....................C=0.1, gamma=1;, score=0.942 total time=   0.0s
[CV 4/5] END ....................C=0.1, gamma=1;, score=0.922 total time=   0.0s
[CV 5/5] END ....................C=0.1, gamma=1;, score=0.916 total time=   0.0s
[CV 1/5] END ..................C=0.1, gamma=0.1;, score=0.922 total time=   0.0s
[CV 2/5] END ..................C=0.1, gamma=0.1;, score=0.916 total time=   0.0s
[CV 3/5] END ..................C=0.1, gamma=0.1;, score=0.942 total time=   0.0s
[CV 4/5] END ..................C=0.1, gamma=0.1;, score=0.916 total time=   0.0s
[CV 5/5] END ..................C=0.1, gamma=0.1;, score=0.909 total time=   0.0s
[CV 1/5] END .................C=0.1, gamma=0.01;, score=0.909 total time=   0.0s
[CV 2/5] END .................C=0.1, gamma=0.01

In [47]:
y_pred=grid.predict(x_test)

In [None]:
print(accuracy_score(y_test,y_))