### Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?

Polynomial functions and kernel functions share a relationship in machine learning, particularly in algorithms like Support Vector Machines (SVMs) where kernel methods are employed for non-linear transformations of data. 

### Polynomial Functions:
- Polynomial functions are mathematical functions of the form \( f(x) = a_nx^n + a_{n-1}x^{n-1} + ... + a_1x + a_0 \), where \( x \) is the variable, \( a_n \) are coefficients, and \( n \) is the degree.
- In the context of machine learning, polynomial functions are used for feature transformation. For instance, in polynomial regression, a polynomial function is used to model the relationship between the features and the target variable by introducing polynomial terms (e.g., \( x^2 \), \( x^3 \)) of the original features.

### Kernel Functions:
- Kernel functions in machine learning define similarity measures between data points in a higher-dimensional space without explicitly transforming the data into that space.
- Polynomial kernel functions are one type of kernel function used in SVMs and other algorithms. They compute the similarity (or inner product) between two data points as if they were mapped into a higher-dimensional space using polynomial transformations.

### Relationship:
- Polynomial functions are used to perform explicit transformations on features to a higher-dimensional space.
- Polynomial kernel functions, on the other hand, implicitly compute the similarity between data points as if they were transformed into a higher-dimensional space using polynomial transformations, without actually performing the transformation explicitly.

### Polynomial Kernel:
- The polynomial kernel function is represented as \( K(\mathbf{x}_i, \mathbf{x}_j) = (\gamma \mathbf{x}_i \cdot \mathbf{x}_j + r)^d \), where \( \mathbf{x}_i \) and \( \mathbf{x}_j \) are data points, \( \gamma \) is a coefficient, \( r \) is an optional constant term, and \( d \) is the degree of the polynomial.
- It computes the dot product of the transformed feature vectors without explicitly performing the transformation, thereby allowing SVMs to learn non-linear decision boundaries using polynomial functions.

### Summary:
- Polynomial functions are used explicitly to transform features to a higher-dimensional space.
- Polynomial kernel functions in algorithms like SVMs perform similar transformations implicitly by computing similarities between data points as if they were transformed into a higher-dimensional space using polynomial functions.

### Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

In [1]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Generating a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_classes=2, random_state=42)

# Splitting the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating an SVM model with a polynomial kernel
svm_poly = SVC(kernel='poly', degree=3)  # You can adjust the degree as needed

# Training the SVM model
svm_poly.fit(X_train, y_train)

# Making predictions on the test set
y_pred = svm_poly.predict(X_test)

# Calculating the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")


Accuracy: 0.895


### Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

In Support Vector Regression (SVR), epsilon (\(\varepsilon\)) is a hyperparameter that determines the width of the margin of tolerance around the predicted values (known as the epsilon-tube). It influences the trade-off between the model's complexity and its ability to capture deviations within a certain margin.

### Relationship between Epsilon and Support Vectors:

1. **Larger Epsilon:** When the value of epsilon is increased, the margin of tolerance around the predicted values becomes wider. This means the model allows for larger deviations between the predicted and actual target values.

2. **Impact on Support Vectors:** Increasing epsilon typically results in fewer support vectors. A larger epsilon allows more data points to fall within the margin of tolerance, reducing the need for additional support vectors to define the regression function within the margin.

3. **Smoother Predictions:** With a larger epsilon, the model tends to have smoother predictions, as it tolerates larger errors and is less sensitive to individual data points. This can lead to a simpler model with fewer support vectors.

### Summary:
- **Larger epsilon:** 
  - Wider margin of tolerance.
  - Fewer support vectors as the model tolerates larger deviations.
  - Smoother predictions, potentially resulting in a simpler model.

It's essential to carefully select the value of epsilon based on the problem's requirements, balancing the need for model flexibility with the goal of avoiding overfitting or underfitting. Cross-validation or grid search can help identify an appropriate epsilon value that optimizes model performance.

### Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?

Absolutely, each parameter in Support Vector Regression (SVR) plays a crucial role in determining the model's performance and its ability to generalize to unseen data. Let's break down the impact and functionality of each parameter:

### 1. Kernel Function:
- **Function:** Determines the type of mapping used to transform the input features into a higher-dimensional space.
- **Effect:** Affects the model's ability to capture non-linear patterns in the data.
- **Examples:**
  - Use a `linear` kernel for linear relationships.
  - Choose `poly` for polynomial relationships.
  - Opt for `rbf` (Radial Basis Function) for complex, non-linear patterns.

### 2. C Parameter:
- **Function:** Controls the trade-off between minimizing the error and maximizing the margin.
- **Effect:** Regulates the penalty for errors, influencing the model's flexibility.
- **Examples:**
  - Increase `C` to reduce the margin of tolerance, potentially leading to a model that fits the training data more closely.
  - Decrease `C` for a wider margin and more tolerance for errors, promoting a smoother model.

### 3. Epsilon Parameter (\(\varepsilon\)):
- **Function:** Specifies the margin of tolerance around the predicted values.
- **Effect:** Governs the width of the epsilon-tube around the regression line.
- **Examples:**
  - Increase \(\varepsilon\) to allow larger deviations between predicted and actual values, resulting in fewer support vectors.
  - Decrease \(\varepsilon\) for a narrower margin, potentially leading to more support vectors and a tighter fit.

### 4. Gamma Parameter:
- **Function:** Defines the influence of a single training example, affecting the curvature of the decision boundary.
- **Effect:** Controls the reach of the kernel function and the impact of individual data points.
- **Examples:**
  - Increase `gamma` to consider only nearby points for modeling, creating a more complex and sensitive model.
  - Decrease `gamma` to have a broader influence, potentially leading to smoother predictions.

### Parameter Choices:
- **Balancing Act:** These parameters are interconnected and involve a trade-off between model complexity and generalization.
- **Tuning:** Parameter values are typically optimized using techniques like grid search or randomized search, considering the dataset and problem requirements.
- **Overfitting vs. Underfitting:** Be mindful of overfitting (high model complexity) or underfitting (oversimplified model) while adjusting these parameters.

### Example Scenarios:
- **High Variance:** Increase `C` and decrease \(\varepsilon\) to reduce tolerance for errors.
- **Overfitting:** Decrease `gamma` to have a smoother decision boundary.
- **Non-Linearity:** Choose a suitable kernel function (`poly` or `rbf`) to capture complex relationships.

In practice, finding the right combination of these parameters involves experimentation, domain knowledge, and understanding the dataset's characteristics to achieve optimal SVR performance. Tuning these parameters is crucial for obtaining a well-performing and well-generalizing model.

### Q5. Assignment:
- Import the necessary libraries and load the dataset.
- Split the dataset into training and testing set.
- Preprocess the data using any technique of your choice (e.g. scaling, normalization).
- Create an instance of the SVC classifier and train it on the training data.
- hse the trained classifier to predict the labels of the testing data.
- Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy, precision, recall, F1-score.
- Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to improve its performance.
- Train the tuned classifier on the entire dataset.
- Save the trained classifier to a file for future use.

In [118]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
from sklearn.model_selection import train_test_split,GridSearchCV
import pickle
from warnings import filterwarnings
filterwarnings(action="ignore")

In [119]:
dataset = load_iris()

In [120]:
print(dataset.DESCR)

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :

In [121]:
df = pd.DataFrame(data=dataset.data,columns=dataset.feature_names)
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [122]:
# from sklearn.preprocessing import StandardScaler

In [123]:
# scaler = StandardScaler()
# X_scaled = scaler.fit_transform(dataset.data)

In [124]:
X = df.iloc[:,:-1]
y = dataset.target

In [125]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.25, random_state=42)

In [126]:
classifier = SVC()

In [127]:
classifier.fit(X_train,y_train)

SVC()

In [128]:
y_pred = classifier.predict(X_test)

In [129]:
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test,y_pred))

[[15  0  0]
 [ 0 11  0]
 [ 0  0 12]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       1.00      1.00      1.00        11
           2       1.00      1.00      1.00        12

    accuracy                           1.00        38
   macro avg       1.00      1.00      1.00        38
weighted avg       1.00      1.00      1.00        38

1.0


In [130]:
params = {
    'C': [0.1, 1, 10, 100],  # Values for the C parameter
    'gamma': [0.001, 0.01, 0.1, 1],  # Values for the gamma parameter
    'kernel': ['linear', 'rbf', 'poly']  # Kernel functions to try
}

In [131]:
class_cv = GridSearchCV(classifier,param_grid=params,cv=5,scoring="neg_mean_squared_error")

In [132]:
class_cv.fit(X_train,y_train)

GridSearchCV(cv=5, estimator=SVC(),
             param_grid={'C': [0.1, 1, 10, 100], 'gamma': [0.001, 0.01, 0.1, 1],
                         'kernel': ['linear', 'rbf', 'poly']},
             scoring='neg_mean_squared_error')

In [133]:
class_cv.best_params_

{'C': 1, 'gamma': 0.1, 'kernel': 'poly'}

In [134]:
y_pred = class_cv.predict(X_test)

In [135]:
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test,y_pred))

[[15  0  0]
 [ 0 11  0]
 [ 0  3  9]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       0.79      1.00      0.88        11
           2       1.00      0.75      0.86        12

    accuracy                           0.92        38
   macro avg       0.93      0.92      0.91        38
weighted avg       0.94      0.92      0.92        38

0.9210526315789473


In [136]:
pickle.dump(classifier,open("SVC.pkl","wb"))

In [137]:
svc_algo = pickle.load(open("SVC.pkl","rb"))

In [139]:
svc_algo.predict([[5.1,3.5,1.4]])

array([0])