Q1. What is the relationship between polynomial functions and kernel functions in machine learning
algorithms?


In [None]:
"""
Polynomial functions and kernel functions in machine learning are related through their capacity to capture 
non-linear relationships in data. Polynomial functions are explicit mathematical expressions that involve powers 
and combinations of variables, commonly used in feature engineering to create polynomial features. For instance,
in a 2D feature space, introducing polynomial features like X1^2 or X1*X2 can help capture non-linear patterns.

Kernel functions, on the other hand, play a vital role in machine learning algorithms, such as Support Vector Machines
(SVMs), for implicitly transforming data into higher-dimensional spaces without the need to explicitly calculate and 
store the transformed features. Polynomial kernel functions are a specific type of kernel that computes polynomial
combinations of input features, resembling the polynomial features created during feature engineering.

The relationship between them lies in the fact that polynomial kernel functions effectively perform the same type of 
non-linear transformations as polynomial features but do so implicitly and efficiently, making them suitable for 
high-dimensional data. Kernel functions, including polynomial kernels, enable SVMs to handle non-linear data by 
working in higher-dimensional spaces while avoiding the computational burden of explicitly expanding the feature space.
"""

Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?


In [None]:
"""
To implement a Support Vector Machine (SVM) with a polynomial kernel in Python using Scikit-learn, 
follow these steps:

Import Libraries:
Import necessary libraries, including Scikit-learn components for SVM, dataset loading, data splitting,
and performance evaluation.

Load Data:
Load a dataset suitable for classification. In this example, we use the Iris dataset as a demonstration.

Split Data: 
Divide the dataset into training and testing sets to evaluate model performance accurately. The train_test_split 
function from Scikit-learn is commonly used for this purpose.

Create SVM Classifier:
Instantiate an SVM classifier with the desired kernel. Specify the kernel as 'poly' to indicate a polynomial 
kernel. You can also set the polynomial kernel's degree using the degree parameter.

Train Classifier:
Fit the SVM classifier on the training data using the fit method.

Make Predictions:
Use the trained classifier to make predictions on the test data with the predict method.

Evaluate Performance:
Assess the model's performance by calculating a performance metric, such as accuracy, comparing the predicted
labels to the true labels.

Adjust Hyperparameters:
Experiment with different polynomial degrees and other hyperparameters to optimize model performance based on
cross-validation or other evaluation techniques.

This implementation allows you to harness the power of polynomial kernels in SVMs to capture complex non-linear
relationships in your data for classification tasks.
"""

Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?


In [None]:
"""
In Support Vector Regression (SVR), epsilon serves as a crucial parameter that influences the width of the
margin, which, in turn, affects the number of support vectors. Support vectors are data points that directly
influence the regression model and lie either within the margin or on the wrong side of it.

Increasing the value of epsilon, referred to as a wider margin, results in a more permissive model that allows
larger errors. In this scenario, the SVR prioritizes generalization over precise fitting. Fewer data points are
classified as support vectors because the model permits greater deviations from the regression line.

Conversely, reducing epsilon, creating a tighter margin, enforces a stricter error tolerance, and the SVR endeavors 
to fit the training data more closely. As a consequence, more data points may become support vectors as the model
attempts to minimize errors within the narrow margin.

The choice of epsilon in SVR involves a trade-off: a larger epsilon favors simplicity and generalization, while a 
smaller epsilon emphasizes fitting the training data more accurately but may lead to overfitting. Selecting the
appropriate epsilon depends on the specific problem and the balance between model complexity and performance.
"""

Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter
affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works
and provide examples of when you might want to increase or decrease its value?


In [None]:
"""
The performance of Support Vector Regression (SVR) is heavily influenced by the choice of kernel function, C parameter,
epsilon parameter, and gamma parameter:

Kernel Function:
The kernel function determines the transformation applied to the data to capture non-linear patterns. Selecting the
appropriate kernel is essential for modeling complex relationships in the data. For instance, using an RBF kernel
is beneficial when data exhibits intricate non-linearities.

C Parameter:
The C parameter regulates the trade-off between model complexity and training error. A higher C leads to a narrower
margin, potentially overfitting noisy data, while a lower C allows a wider margin, prioritizing generalization.

Epsilon Parameter:
Epsilon defines the margin's width, affecting the tolerance for deviations from the regression line. Increasing 
epsilon permits a larger margin and robustness to noise, while decreasing epsilon tightens the margin for precise 
fitting to training data.

Gamma Parameter:
In RBF kernels, Gamma controls the kernel's shape and flexibility. Smaller gamma values result in broader, smoother kernels,
while larger Gamma values lead to narrower, more peaked kernels. Careful selection of Gamma is crucial for handling 
high-dimensional data and avoiding overfitting.

The optimal parameter values depend on the specific dataset and the trade-offs you are willing to make. Cross-validation 
and experimentation are often necessary to find the right combination for optimal SVR performance.
"""

Q5. Assignment:
    
Import the necessary libraries and load the dataset

Split the dataset into training and testing sets

Preprocess the data using any technique of your choice (e.g. scaling, normalization)

Create an instance of the SVC classifier and train it on the training data

Use the trained classifier to predict the labels of the testing data

Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy,
precision, recall, F1-score)

Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomizedSearchCV to
improve its performance

Train the tuned classifier on the entire dataset

Save the trained classifier to a file for future use.

In [7]:
# Import the necessary libraries and load the dataset
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings('ignore')

df=sns.load_dataset('geyser')

In [8]:
df.head()

Unnamed: 0,duration,waiting,kind
0,3.6,79,long
1,1.8,54,short
2,3.333,74,long
3,2.283,62,short
4,4.533,85,long


In [10]:
# Split the dataset into training and testing sets
X=df.drop('kind',axis=1)
encoder=LabelEncoder()
y=encoder.fit_transform(df['kind'])

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=.20,random_state=2)

In [11]:
# Preprocess the data using any technique of your choice (e.g. scaling, normalization)

scaler=StandardScaler()
X_train_scaled=scaler.fit_transform(X_train)
X_test_scaled=scaler.transform(X_test)


In [12]:
#Create an instance of the SVC classifier and train it on the training data
svc=SVC(kernel='linear')
svc.fit(X_train,y_train)

In [14]:
# Use the trained classifier to predict the labels of the testing data
y_pred=svc.predict(X_test)

In [15]:
# Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy,recision, recall, F1-score)
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report

print(confusion_matrix(y_pred,y_test))
print(accuracy_score(y_pred,y_test))
print(classification_report(y_pred,y_test))

[[35  0]
 [ 0 20]]
1.0
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        35
           1       1.00      1.00      1.00        20

    accuracy                           1.00        55
   macro avg       1.00      1.00      1.00        55
weighted avg       1.00      1.00      1.00        55



In [17]:
# Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomizedSearchCV to improve its performance
from sklearn.model_selection import RandomizedSearchCV
param={'C':[1.0,10,100],
        'kernel':['linear', 'poly', 'rbf'],
         'gamma':['scale','auto']}

clf=RandomizedSearchCV(SVC(),param_distributions=param,cv=5)

In [20]:
# Train the tuned classifier on the entire dataset
clf.fit(X_train,y_train)
y_pred4=clf.predict(X_test)

print(confusion_matrix(y_pred4,y_test))
print(accuracy_score(y_pred4,y_test))
print(classification_report(y_pred4,y_test))

[[35  0]
 [ 0 20]]
1.0
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        35
           1       1.00      1.00      1.00        20

    accuracy                           1.00        55
   macro avg       1.00      1.00      1.00        55
weighted avg       1.00      1.00      1.00        55



In [21]:
# Save the trained classifier to a file for future use.
import pickle
pickle.dump(clf,open('classifier.pkl','wb'))