# Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?

Polynomial functions are used to represent complex relationships between variables, while kernel functions are used to compute similarity or distance between data points in a higher-dimensional feature space. Polynomial functions can be used as basis functions in various algorithms, whereas kernel functions are specifically utilized in kernel methods like support vector machines.

# Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

In [1]:
from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


In [2]:
# Generate a random classification dataset
X, y = make_classification(n_samples=100, n_features=2, n_informative=2, n_redundant=0, random_state=42)

In [3]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [4]:
# Create an SVM classifier with polynomial kernel
svm = SVC(kernel='poly', degree=3)

# Train the SVM classifier
svm.fit(X_train, y_train)

In [5]:
y_pred = svm.predict(X_test)

# Evaluate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.9


# Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

Increasing the value of epsilon in Support Vector Regression (SVR) typically leads to an increase in the number of support vectors.

In SVR, the parameter epsilon determines the width of the epsilon-insensitive tube, which is a region around the predicted values within which no penalty is incurred. It allows for a certain amount of error tolerance in the predictions. A larger epsilon value increases the width of this tube, allowing more data points to fall within the epsilon-insensitive zone.

As a result, with a larger epsilon, SVR allows more data points to be considered as support vectors. Support vectors are the data points that lie either on the margin or within the epsilon-insensitive tube, and they play a crucial role in defining the regression function. By increasing the epsilon value, the algorithm becomes more lenient and accepts a broader range of points as support vectors, leading to a larger number of support vectors.

However, it's important to note that the relationship between epsilon and the number of support vectors may also depend on the specific dataset and the complexity of the regression problem. In some cases, increasing epsilon might not necessarily result in a significant increase in the number of support vectors.

# Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?

The choice of kernel function, C parameter, epsilon parameter, and gamma parameter in Support Vector Regression (SVR) can significantly affect its performance. Here's a brief explanation of each parameter and how it can be tuned:

1-Kernel Function: SVR uses a kernel function to map the data into a higher-dimensional feature space. The choice of kernel function determines the type of mapping performed. Common kernel functions include linear, polynomial, radial basis function (RBF), and sigmoid. The selection of the kernel function depends on the nature of the data and the underlying problem. For example, the RBF kernel is effective when dealing with non-linear relationships, while the linear kernel works well for linearly separable data.

2-C Parameter: The C parameter controls the trade-off between achieving a low training error and a low margin violation. A smaller C value allows for a wider margin but might result in more training errors. In contrast, a larger C value enforces a smaller margin, potentially reducing training errors but increasing the risk of overfitting. Increasing C can make the model more sensitive to individual data points, while decreasing C can make the model more tolerant of errors. The appropriate choice of C depends on the dataset and the desired balance between fitting the training data and generalization.

3-Epsilon Parameter: The epsilon parameter defines the width of the epsilon-insensitive tube in SVR. It determines the tolerance for errors in the training data. Data points within the epsilon tube do not contribute to the loss function. A larger epsilon allows more points to be within the tube, resulting in a wider margin and potentially a larger number of support vectors. Increasing epsilon makes the model more tolerant of errors in the training data, while decreasing epsilon makes it more sensitive to errors.

4-Gamma Parameter: The gamma parameter is specific to the RBF kernel function. It determines the influence of each training example. A smaller gamma value results in a smoother decision boundary, as the influence of individual training examples decreases. In contrast, a larger gamma value gives more weight to nearby points, leading to a more complex and potentially overfitting decision boundary. Increasing gamma can lead to overfitting, while decreasing gamma can result in underfitting. The choice of gamma depends on the dataset and the desired level of complexity in the model.

# Q5. Assignment:

# Import the necessary libraries and load the dataset

In [6]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris

In [7]:
iris = load_iris()

In [8]:
iris

{'data': array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
  

In [9]:
X = iris.data
y = iris.target

# Split the dataset into training and testing setZ

In [10]:
from sklearn.model_selection import train_test_split

In [11]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Preprocess the data using any technique of your choice (e.g. scaling, normalization)

In [12]:
from sklearn.preprocessing import StandardScaler

In [13]:
scaled = StandardScaler()

In [15]:
X_train_scaled = scaled.fit_transform(X_train)
X_test_scaled = scaled.transform(X_test)

# Create an instance of the SVC classifier and train it on the training data

In [16]:
from sklearn.svm import SVC

In [17]:
svm = SVC()

In [18]:
svm.fit(X_train_scaled,y_train)

 # Use the trained classifier to predict the labels of the testing data

In [19]:
y_pred = svm.predict(X_test_scaled)

# Evaluate the performance of the classifier using accuracy

In [20]:
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 1.0


# Evaluate the performance of the classifier using classification report

In [21]:
from sklearn.metrics import classification_report

In [22]:
report = classification_report(y_test, y_pred)
print("Classification Report:\n", report)

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



# Tune the hyperparameters of the SVC classifier using GridSearchCV

In [23]:
from sklearn.model_selection import GridSearchCV

In [24]:
param_grid = {'C': [0.1, 1, 10, 100, 1000],
              'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
              'kernel':['linear']
              }

In [25]:
grid=GridSearchCV(SVC(),param_grid=param_grid,refit=True,cv=5,verbose=3)

In [26]:
grid.fit(X_train_scaled,y_train)

Fitting 5 folds for each of 25 candidates, totalling 125 fits
[CV 1/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.958 total time=   0.0s
[CV 2/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.958 total time=   0.0s
[CV 3/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.875 total time=   0.0s
[CV 4/5] END .....C=0.1, gamma=1, kernel=linear;, score=1.000 total time=   0.0s
[CV 5/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.958 total time=   0.0s
[CV 1/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.958 total time=   0.0s
[CV 2/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.958 total time=   0.0s
[CV 3/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.875 total time=   0.0s
[CV 4/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=1.000 total time=   0.0s
[CV 5/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.958 total time=   0.0s
[CV 1/5] END ..C=0.1, gamma=0.01, kernel=linear;, score=0.958 total time=   0.0s
[CV 2/5] END ..C=0.1, gamma=0.01, kernel=linear

In [27]:
grid.best_params_

{'C': 10, 'gamma': 1, 'kernel': 'linear'}

In [28]:
best_estimator = grid.best_estimator_

# Train the tuned classifier on the entire dataset

In [29]:
best_estimator.fit(X, y)

# Save the trained classifier to a file for future use.

In [32]:
import pickle

In [33]:
with open('trained_classifier.pkl', 'wb') as file:
    pickle.dump(best_estimator, file)

In [34]:
with open('trained_classifier.pkl', 'rb') as file:
    loaded_classifier = pickle.load(file)
