# Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?

Polynomial functions and kernel functions are both used in machine learning algorithms, particularly in the context of support vector machines (SVMs) and kernel methods. They serve different purposes but are related in how they can transform data for better learning and decision-making.

1. **Polynomial Functions**:
   - A polynomial function is a mathematical function where the variable is raised to a power, and it may have multiple terms.
   - In the context of machine learning, polynomial features are used to transform the input data into a higher-dimensional space.
   - For example, if you have a 1-dimensional input (x), you can create polynomial features of higher degrees (x^2, x^3, etc.), which can help in fitting more complex relationships between the input and output.

2. **Kernel Functions**:
   - A kernel function is a mathematical function that computes a dot product in some (possibly infinite-dimensional) feature space.
   - In machine learning, kernel functions are used in algorithms like Support Vector Machines (SVMs) to effectively handle non-linearly separable data.
   - The kernel trick allows the algorithm to implicitly work in a higher-dimensional feature space without explicitly calculating the transformed features. This can save computational resources.

**Relationship**:

The relationship between polynomial functions and kernel functions lies in the fact that some kernel functions effectively perform a polynomial transformation in a higher-dimensional space without explicitly calculating the transformed features.

For instance, the polynomial kernel is a specific type of kernel that effectively computes the dot product of two polynomially transformed vectors. This allows SVMs to implicitly work with polynomial features, making it possible to separate data that is not linearly separable in the original feature space.

In summary, while polynomial functions are a specific type of feature transformation that can be used, kernel functions offer a more general way to achieve similar results without explicitly calculating the transformed features, which can be computationally expensive. Different types of kernels (e.g., Gaussian Radial Basis Function kernel, sigmoid kernel) can perform different types of transformations. The choice of kernel depends on the specific problem and the nature of the data.

# Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

In [1]:
from sklearn.datasets import load_iris
data = load_iris()

from sklearn.model_selection import train_test_split
import pandas as pd

In [2]:
df = pd.DataFrame(data.data , columns=data.feature_names)
df['target'] = data.target

X = df.drop('target' , axis=1)
y = df.target

X_train, X_test, y_train, y_test = train_test_split(X,y,
                                                   test_size=0.20,
                                                   random_state=42)
from sklearn.svm import SVC
model = SVC(kernel='poly')

model.fit(X_train , y_train)

In [3]:
y_pred = model.predict(X_test)

from sklearn.metrics import accuracy_score

print(accuracy_score(y_pred , y_test))

1.0


# Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

In Support Vector Regression (SVR), the parameter epsilon (ε) is a crucial hyperparameter that controls the margin of tolerance around the predicted value. It determines the width of the epsilon-tube within which no penalty is associated in the training loss function. This tube contains the training data points that are considered to have zero error.

Increasing the value of epsilon has an impact on the number of support vectors in SVR in the following way:

1. **Wider Tube (Larger Epsilon)**:
   - When epsilon is increased, the epsilon-tube becomes wider. This means that more data points are allowed to fall within this tube without incurring a penalty.
   - This results in a larger margin of tolerance for errors in the prediction.
   - Consequently, fewer data points are treated as support vectors, as the model allows for a greater degree of error.

2. **Fewer Support Vectors**:
   - Support vectors are the data points that fall on the margin or inside the margin, meaning they have a non-zero Lagrange multiplier (alpha) in the SVR optimization problem.
   - When epsilon is increased, the model becomes more tolerant of errors. This means that fewer data points will fall within the margin region, and therefore fewer points will be designated as support vectors.

3. **Smoothing Effect**:
   - A larger epsilon encourages a more "relaxed" model that aims to minimize the prediction error within the epsilon-tube rather than fitting the data points exactly.
   - This can lead to a smoother regression function that is less sensitive to individual data points.

It's important to note that the choice of epsilon should be made based on the specific problem and the desired trade-off between accuracy and generalization. A larger epsilon allows for more flexibility in the model, but it may sacrifice precision. Conversely, a smaller epsilon may lead to a more precise fit but may also increase the risk of overfitting. It's a hyperparameter that requires careful tuning during the model selection process.

# Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?

Support Vector Regression (SVR) is sensitive to several hyperparameters, each of which plays a crucial role in determining the performance and behavior of the model. Here's an explanation of each parameter and how they influence SVR:

1. **Kernel Function**:
   - **Explanation**: The kernel function determines the type of transformation applied to the input data. Common choices include linear, polynomial, radial basis function (RBF), and more.
   - **Impact**:
     - The choice of kernel can have a significant effect on how well the SVR model fits the data.
     - Linear kernels assume a linear relationship between the input features and the target variable, while non-linear kernels (like RBF or polynomial) allow for more complex relationships.
   - **Example**:
     - If you suspect a non-linear relationship between the features and the target variable, using a non-linear kernel (e.g., RBF) may be more appropriate.

2. **C Parameter**:
   - **Explanation**: The C parameter controls the trade-off between minimizing the training error and minimizing the margin. A smaller C allows a larger margin but may lead to more training errors, while a larger C penalizes errors more heavily.
   - **Impact**:
     - Higher values of C will lead to a more "strict" model that tries to minimize training errors, potentially at the expense of overfitting.
     - Lower values of C result in a "softer" margin, which may generalize better but might tolerate more training errors.
   - **Example**:
     - Use a higher C if you believe the training data is very reliable and should be fitted closely.
     - Use a lower C to allow more flexibility and generalization.

3. **Epsilon Parameter (ε)**:
   - **Explanation**: Epsilon defines the margin of tolerance for errors in the prediction. It determines the width of the epsilon-tube around the predicted value within which no penalty is associated.
   - **Impact**:
     - A larger epsilon allows more data points to fall within the margin, which results in a wider epsilon-tube.
     - A smaller epsilon enforces a narrower epsilon-tube and a stricter fit to the data.
   - **Example**:
     - Use a larger epsilon if you want the model to be more tolerant of prediction errors.
     - Use a smaller epsilon for a more precise fit to the data.

4. **Gamma Parameter**:
   - **Explanation**: For non-linear kernels (like RBF), gamma defines the influence of a single training example, which determines how far the influence of a single training example reaches.
   - **Impact**:
     - Higher values of gamma lead to a more complex and "narrower" decision boundary. This may result in overfitting.
     - Lower values of gamma lead to a smoother and "wider" decision boundary, potentially improving generalization.
   - **Example**:
     - Use a higher gamma if you suspect the data has complex, non-linear relationships.
     - Use a lower gamma for simpler, more linear relationships.

It's crucial to perform hyperparameter tuning (e.g., using cross-validation) to find the optimal values for these parameters based on the specific dataset and problem at hand. The appropriate values often depend on factors like the complexity of the data, noise level, and the desired trade-off between bias and variance.

In [4]:
from sklearn.datasets import load_diabetes
d_data = load_diabetes()

from sklearn.svm import SVR
svr_model = SVR()

In [5]:
d_df = pd.DataFrame(d_data.data , columns=d_data.feature_names)
d_df['target'] = d_data.target

X = d_df.drop('target' , axis=1)
y = d_df.target

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.30 , random_state=1)
svr_model.fit(X_train , y_train)
y_predd = svr_model.predict(X_test)
svr_model.score(X_test , y_test)

0.15652882548296554

In [6]:
parameters = {
    'kernel' : ['poly' , 'sigmoid' , 'rbf'],
    'C' : [1,2,3,4,5,6,7,8,9,10],
    'epsilon' : [0.1 , 0.01 , 0.001],
    'gamma' : ['scale' , 'auto']
}

svr = SVR()
    
from sklearn.model_selection import GridSearchCV

clf = GridSearchCV(svr , param_grid=parameters , cv=5)
clf.fit(X_train , y_train)

In [7]:
clf.best_params_

{'C': 7, 'epsilon': 0.1, 'gamma': 'scale', 'kernel': 'sigmoid'}

In [11]:
best_model = SVR(C=7 , epsilon=0.1 , gamma='scale' , kernel='sigmoid')

best_model.fit(X_train , y_train)
y_pred = best_model.predict(X_test)

best_model.score(X_test , y_test)

0.4378296078745709

# Q5. Assignment: Import the necessary libraries and load the dataseg
>
Split the dataset into training and testing setZ
>
Preprocess the data using any technique of your choice (e.g. scaling, normaliMationK
>
Create an instance of the SVC classifier and train it on the training datW
>
hse the trained classifier to predict the labels of the testing datW
>
Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy,
precision, recall, F1-scoreK
>
Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to
improve its performanc_
>
Train the tuned classifier on the entire dataseg
>
Save the trained classifier to a file for future use.

In [15]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_wine

data = load_wine()

from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
model = SVC()

from sklearn.metrics import accuracy_score , r2_score , recall_score

In [21]:
df = pd.DataFrame(data.data , columns=data.feature_names)

df['target'] = data.target

df.head()

X = df.drop('target' , axis=1)
y = df.target

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.20,random_state=42)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model.fit(X_train_scaled , y_train)

In [23]:
y_pred = model.predict(X_test_scaled)

print(accuracy_score(y_pred , y_test))

1.0


In [26]:
from sklearn.model_selection import GridSearchCV

param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf'], 'gamma': [0.01, 0.1, 1]}

g_clf = GridSearchCV(model , param_grid=param_grid , cv=5)

g_clf.fit(X_train_scaled , y_train)

In [27]:
g_clf.best_params_

{'C': 1, 'gamma': 0.01, 'kernel': 'rbf'}

In [30]:
entire_scaled = scaler.fit_transform(X)

new_model = SVC(C=1 , gamma=0.01 , kernel='rbf')
new_model.fit(entire_scaled , y)

In [31]:
new_model.score(X_test_scaled , y_test)

1.0

In [34]:
import joblib


joblib.dump(new_model , 'new_model.joblib')

['new_model.joblib']