### Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?

### Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

### Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

### Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?

### Q5. Assignment:
- Import the necessary libraries and load the dataseg
- Split the dataset into training and testing setZ
- Preprocess the data using any technique of your choice (e.g. scaling, normaliMationK
- Create an instance of the SVC classifier and train it on the training datW
- hse the trained classifier to predict the labels of the testing datW
- Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy, precision, recall, F1-scoreK
- Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to improve its performanc_
- Train the tuned classifier on the entire dataseg
- Save the trained classifier to a file for future use.

## Answer

### Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?



#### Polynomial Functions:

Polynomial functions are mathematical functions that involve powers of variables raised to integer exponents.
In the context of machine learning, polynomial features are additional features created by taking all possible combinations of the input features raised to various powers.
For example, given input features x and y, adding polynomial features of degree 2 would include x^2, y^2, and xy as new features.

#### Kernel Functions:

- Kernel functions in machine learning are used to compute the similarity or inner product between pairs of data points in a transformed feature space without explicitly calculating the transformation.
- Common kernel functions include the polynomial kernel and the radial basis function (RBF) kernel.
- The polynomial kernel calculates the similarity between data points as if they had been mapped into a higher-dimensional space using polynomial feature transformations.

The key relationship between these concepts is that certain kernel functions, such as the polynomial kernel, implicitly apply polynomial transformations to the data points. This means that instead of explicitly computing the polynomial features, which can be computationally expensive, the kernel function computes the inner product in the higher-dimensional space without needing to expand the features explicitly.

### Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?



When we are create the SVM Classifire we use kernel as poly like this:

##### from sklearn.svm import SVC

##### clf = SVC(kernel='poly') 

### Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?



In Support Vector Regression (SVR), epsilon (ε) is a hyperparameter that defines the width of the epsilon-insensitive tube around the predicted values. The epsilon-insensitive tube allows for a certain amount of error or deviation between the predicted and actual target values, and any data points falling within this tube do not contribute to the loss function during training. Epsilon plays a significant role in determining the number of support vectors in SVR.

#### Smaller Epsilon (ε):

- A smaller epsilon value tightens the epsilon-insensitive tube, reducing the allowable error or deviation between predicted and actual values.
- As epsilon decreases, the SVR model becomes more sensitive to small errors and strives for a more precise fit to the training data.
- This often leads to a larger number of support vectors, as the model may need to include more data points to minimize the loss and satisfy the tighter epsilon constraint.

#### Larger Epsilon (ε):

A larger epsilon value widens the epsilon-insensitive tube, allowing for a greater error margin between predicted and actual values.
As epsilon increases, the SVR model becomes more tolerant of errors and focuses on capturing the general trend in the data rather than fitting the training data exactly.
This typically results in a smaller number of support vectors, as the model can achieve a good fit with fewer data points that fall within the wider epsilon-insensitive tube.

### Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?



The performance of Support Vector Regression (SVR) is significantly affected by various hyperparameters, including the choice of kernel function, C parameter, epsilon parameter (ε), and gamma parameter (γ). Let's explore each parameter, how it works, and when you might want to adjust its value:

1. **Kernel Function**:
   - **Purpose**: The kernel function determines the mapping of data points into a higher-dimensional space, which is crucial for capturing non-linear relationships.
   - **Common Choices**: Linear, Polynomial, Radial Basis Function (RBF/Gaussian), Sigmoid, etc.
   - **When to Choose**:
     - Use a linear kernel (`kernel='linear'`) when the data exhibits a linear relationship.
     - Use a polynomial kernel (`kernel='poly'`) when the data is moderately non-linear and you want to control the degree of the polynomial transformation with the `degree` parameter.
     - Use an RBF kernel (`kernel='rbf'`) when the data is highly non-linear or when you don't know the degree of non-linearity. The RBF kernel is a versatile choice.
     - Experiment with different kernels to find the one that best fits your data.

2. **C Parameter**:
   - **Purpose**: The C parameter controls the trade-off between achieving a smaller training error and a larger margin. It balances the model's bias-variance trade-off.
   - **Effect of Increasing**:
     - Increasing C makes the model more tolerant of training errors, potentially leading to a smaller margin and more support vectors.
     - The model may become overfit if C is too large.
   - **Effect of Decreasing**:
     - Decreasing C encourages a larger margin but may allow more training errors.
     - The model may become underfit if C is too small.
   - **When to Adjust**:
     - Increase C when you want the model to focus on accurately fitting the training data, especially when you believe the data contains minimal noise.
     - Decrease C when you want the model to prioritize a larger margin and generalization, even if it means accepting some training errors.

3. **Epsilon Parameter (ε)**:
   - **Purpose**: The epsilon parameter defines the width of the epsilon-insensitive tube around the predicted values. Data points inside this tube do not contribute to the loss during training.
   - **Effect of Increasing**:
     - Increasing ε makes the model more tolerant of deviations between predicted and actual values.
     - It may result in a smaller number of support vectors.
   - **Effect of Decreasing**:
     - Decreasing ε tightens the tube, requiring the model to fit the data more precisely.
     - It may result in a larger number of support vectors.
   - **When to Adjust**:
     - Increase ε when you want the model to be less sensitive to noise or small fluctuations in the target values.
     - Decrease ε when you need the model to fit the training data more closely or when you believe the target values are very accurate.

4. **Gamma Parameter (γ)**:
   - **Purpose**: The gamma parameter controls the shape of the radial basis function (RBF) kernel. It determines how much influence a single training example has on the others.
   - **Effect of Increasing**:
     - Increasing γ makes the RBF kernel more peaked and localizes the influence of training examples.
     - It can lead to a more complex model that fits the training data more closely.
   - **Effect of Decreasing**:
     - Decreasing γ makes the RBF kernel more spread out and influences a larger region around each training example.
     - It can lead to a simpler model with a larger margin.
   - **When to Adjust**:
     - Increase γ when you suspect that the target function is highly non-linear or when you have a small dataset. Be cautious about overfitting.
     - Decrease γ when you have a large dataset or when you want the model to focus on capturing global trends rather than local variations.



### Q5. Assignment:
- Import the necessary libraries and load the dataset
- Split the dataset into training and testing set
- Preprocess the data using any technique of your choice (e.g. scaling, normaliMation)
- Create an instance of the SVC classifier and train it on the training data
- use the trained classifier to predict the labels of the testing data
- Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy, precision, recall, F1-score)
- Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to improve its performanc
- Train the tuned classifier on the entire dataset
- Save the trained classifier to a file for future use.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
data=sns.load_dataset('iris')
data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [4]:
from sklearn.preprocessing import LabelEncoder
encoder=LabelEncoder()
data['species']=encoder.fit_transform(data['species'])
data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [5]:
X=data.iloc[:,:-1]
y=data.iloc[:,-1]

In [6]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=42)

In [7]:
from sklearn.svm import SVC
svc=SVC()

In [8]:
svc.fit(X_train,y_train)

In [9]:
y_pred=svc.predict(X_test)

In [10]:
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix
score=accuracy_score(y_test,y_pred)
score

1.0

In [11]:
confusion_matrix(y_test,y_pred)

array([[15,  0,  0],
       [ 0, 11,  0],
       [ 0,  0, 12]])

In [12]:
classification_report(y_test,y_pred)

'              precision    recall  f1-score   support\n\n           0       1.00      1.00      1.00        15\n           1       1.00      1.00      1.00        11\n           2       1.00      1.00      1.00        12\n\n    accuracy                           1.00        38\n   macro avg       1.00      1.00      1.00        38\nweighted avg       1.00      1.00      1.00        38\n'

In [13]:
from sklearn.model_selection import GridSearchCV

In [14]:
param_grid={'C':[0.1,1,10,100,1000],
            'gamma':[1,0.1,0.01,0.001,0.0001]

}

In [15]:
grid=GridSearchCV(SVC(),param_grid=param_grid,refit=True,cv=5,verbose=3)

In [16]:
grid.fit(X_train,y_train)

Fitting 5 folds for each of 25 candidates, totalling 125 fits
[CV 1/5] END ....................C=0.1, gamma=1;, score=1.000 total time=   0.0s
[CV 2/5] END ....................C=0.1, gamma=1;, score=0.957 total time=   0.0s
[CV 3/5] END ....................C=0.1, gamma=1;, score=0.818 total time=   0.0s
[CV 4/5] END ....................C=0.1, gamma=1;, score=1.000 total time=   0.0s
[CV 5/5] END ....................C=0.1, gamma=1;, score=0.955 total time=   0.0s
[CV 1/5] END ..................C=0.1, gamma=0.1;, score=1.000 total time=   0.0s
[CV 2/5] END ..................C=0.1, gamma=0.1;, score=0.870 total time=   0.0s
[CV 3/5] END ..................C=0.1, gamma=0.1;, score=0.864 total time=   0.0s
[CV 4/5] END ..................C=0.1, gamma=0.1;, score=0.955 total time=   0.0s
[CV 5/5] END ..................C=0.1, gamma=0.1;, score=0.864 total time=   0.0s
[CV 1/5] END .................C=0.1, gamma=0.01;, score=0.348 total time=   0.0s
[CV 2/5] END .................C=0.1, gamma=0.01

In [17]:
y_pred=grid.predict(X_test)
accuracy_score(y_test,y_pred)

1.0

In [18]:
confusion_matrix(y_test,y_pred)

array([[15,  0,  0],
       [ 0, 11,  0],
       [ 0,  0, 12]])

In [19]:
grid.best_params_

{'C': 100, 'gamma': 0.01}

In [21]:
import pickle
model_filename = 'grid.pkl'

# Serialize (pickle) the model to a file
with open(model_filename, 'wb') as model_file:
    pickle.dump(grid, model_file)
