# SVM 2

**Q1. What is the relationship between polynomial functions and kernel functions in machine learning
algorithms?**

**Ans:**  
  
**Polynomial Functions and Kernel Functions in Machine Learning**

Polynomial functions and kernel functions are closely related concepts in machine learning, particularly in the context of Support Vector Machines (SVMs) and other kernel-based algorithms. Here's an explanation of their relationship:

**Polynomial Functions**

A polynomial function is a mathematical expression involving a sum of powers in one or more variables multiplied by coefficients. For example, a polynomial function of degree 2 in two variables $x$ and $y$ can be written as:

$$
P(x, y) = a_0 + a_1x + a_2y + a_3x^2 + a_4xy + a_5y^2
$$

In machine learning, polynomial functions are often used to capture non-linear relationships between features. By transforming the feature space using polynomial functions, we can fit more complex models.

**Kernel Functions**

A kernel function is a mathematical tool used to compute the inner product of two vectors in a high-dimensional space without explicitly transforming the data into that space. This is particularly useful in algorithms like SVMs where the goal is to classify data that is not linearly separable in the original feature space.

The idea is to implicitly map the input features into a higher-dimensional space where linear separation might be possible. This is achieved through the use of kernel functions.

**Polynomial Kernel**

A polynomial kernel is a specific type of kernel function that corresponds to polynomial feature expansion. The polynomial kernel function computes the inner product of vectors in a high-dimensional polynomial feature space. For a polynomial kernel of degree $d$, the kernel function is given by:

$$
K(x, x') = (x \cdot x' + c)^d
$$

where:
- $x$ and $x'$ are feature vectors.
- $c$ is a constant (often set to 1).
- $d$ is the degree of the polynomial.

This kernel function implicitly maps the input data into a high-dimensional space where polynomial features of degree $d$ are considered.

**Relationship**

1. **Implicit Feature Mapping**:
   - **Polynomial Functions**: Directly expand the feature space by including polynomial terms.
   - **Polynomial Kernel**: Computes the dot product in the high-dimensional feature space without explicitly creating the polynomial features.

2. **Computational Efficiency**:
   - **Polynomial Functions**: Explicitly compute and work with high-dimensional polynomial features, which can be computationally expensive.
   - **Polynomial Kernel**: Efficiently computes the dot product in the high-dimensional space using a kernel trick, avoiding the need to explicitly construct polynomial features.

3. **Applications in SVMs**:
   - **Polynomial Functions**: Used in models where explicit feature expansion is feasible.
   - **Polynomial Kernel**: Used in kernel methods like SVMs to classify data that is not linearly separable by transforming it into a higher-dimensional space implicitly.

**Example**

Consider using a polynomial kernel with a degree of 2. For two-dimensional input vectors $x = [x_1, x_2]$ and $x' = [x'_1, x'_2]$, the polynomial kernel of degree 2 would compute:

$$
K(x, x') = (x \cdot x' + 1)^2
$$

Expanding this:


$K(x, x')$ = $(x_1x'_1 + x_2x'_2 + 1)^2$


This kernel function corresponds to the polynomial feature expansion up to degree 2.


**Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?**

**Ans:**  
  
#### Steps:

**1. Import Libraries**

Start by importing the necessary libraries for data handling, model building, and visualization.

**2. Load and Prepare Data**

Load the Iris dataset and prepare it for training and testing. For simplicity, use only two classes and two features. Split the dataset into training and testing sets and standardize the features.

**3. Initialize and Train the SVM Model**

Create an instance of the `SVC` class with a polynomial kernel. Specify the degree of the polynomial kernel and other parameters as needed. Train the model using the training data.

**4. Predict and Evaluate**

Make predictions on the test set using the trained model. Compute the accuracy of the model to evaluate its performance.

**5. Plot Decision Boundaries**

(Optional) Define a function to plot the decision boundaries of the model. Visualize the decision boundaries to understand how the model classifies different regions in the feature space.


**Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?**

**Ans:**  
  
In Support Vector Regression (SVR), the parameter `epsilon` (often denoted as ε) is crucial in defining the margin of tolerance within which errors are considered acceptable. Specifically, it determines the width of the tube or epsilon-insensitive zone around the regression function, where deviations are ignored.

**Understanding Epsilon in SVR**

- **Epsilon-Insensitive Tube**: In SVR, the epsilon-insensitive tube is a region around the predicted regression line within which errors are not penalized. This means that if the true value falls within this tube, no penalty is assigned to the deviation from the predicted value.
  
- **Epsilon (ε)**: The value of ε determines the width of this tube. Larger values of ε correspond to a wider tube.

**Effect of Increasing Epsilon on the Number of Support Vectors**

1. **Decrease in the Number of Support Vectors**:
   - As ε increases, the epsilon-insensitive tube becomes wider. This means that more deviations from the predicted value fall within this tube and are not penalized.
   - Consequently, fewer data points will lie outside the epsilon-insensitive tube and thus require support vectors to be considered. Support vectors are the data points that lie outside the tube or are exactly on its boundary.

2. **Less Sensitivity to Small Deviations**:
   - With a larger ε, the SVR model becomes less sensitive to small deviations between the predicted values and the actual values, as these deviations are considered acceptable if they fall within the tube.
   - This reduced sensitivity generally results in fewer support vectors, as only data points with deviations larger than ε are considered.

3. **Smoother Model**:
   - Increasing ε can lead to a smoother regression function because the model allows more flexibility in accommodating deviations within the epsilon-insensitive tube.
   - However, this could also lead to underfitting if ε is too large, as the model may become too generalized and not capture the underlying patterns of the data.


**Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter
affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works
and provide examples of when you might want to increase or decrease its value?**

**Ans:**  

The performance of Support Vector Regression (SVR) is influenced by several key parameters, including the choice of kernel function, the regularization parameter $C$, the epsilon parameter $\epsilon$, and the gamma parameter $\gamma$. Each parameter affects the behavior and effectiveness of the SVR model in different ways.

**1. Kernel Function**

The kernel function defines the type of transformation applied to the input features to map them into a higher-dimensional space where a linear regression model can be applied. Common kernel functions include:

- **Linear Kernel**: ($ K(x, x') = x \cdot x'$
  - **Use Case**: Suitable when the data is linearly separable or nearly linear. It is computationally efficient and simpler.

- **Polynomial Kernel**: $K(x, x') = (x \cdot x' + c)^d$
  - **Use Case**: Ideal for capturing polynomial relationships between features. Adjusting the degree \( d \) allows fitting more complex polynomial relationships.

- **Radial Basis Function (RBF) Kernel**: $K(x, x') = \exp(-\gamma \|x - x'\|^2)$
  - **Use Case**: Effective for complex, non-linear relationships. It maps data into a higher-dimensional space and can handle non-linearly separable data.

- **Sigmoid Kernel**: $K(x, x') = \tanh(\gamma x \cdot x' + c)$
  - **Use Case**: Sometimes used in neural network models to handle complex patterns. It may not always outperform RBF.

**Impact**:
- The choice of kernel affects the model's ability to capture various types of patterns in the data. More complex kernels can model intricate patterns but may also lead to overfitting if not chosen properly.

**2. Regularization Parameter $C$**

The parameter C controls the trade-off between achieving a low training error and minimizing the model's complexity. It determines the penalty for errors within the epsilon-insensitive tube.

- **High C**:
  - **Effect**: Reduces the width of the epsilon-insensitive tube, allowing fewer errors but increasing model complexity and risk of overfitting.
  - **Use Case**: When you want to fit the training data closely, especially if the data is noisy but you prefer a tighter fit.

- **Low C**:
  - **Effect**: Increases the width of the epsilon-insensitive tube, allowing more errors but simplifying the model, which can help with generalization.
  - **Use Case**: When prioritizing model generalization over fitting the training data exactly, particularly with noisy data.

**3. Epsilon Parameter $\epsilon$**

The epsilon parameter defines the width of the epsilon-insensitive tube around the predicted regression function within which errors are not penalized.

- **High $\epsilon$**:
  - **Effect**: Increases the width of the epsilon-insensitive tube, meaning more deviations are considered acceptable, resulting in fewer support vectors.
  - **Use Case**: When allowing a larger margin of tolerance for errors and thus simplifying the model. Useful for noisy data or when a smoother model is preferred.

- **Low $\epsilon$**:
  - **Effect**: Decreases the width of the epsilon-insensitive tube, making the model more sensitive to deviations and potentially increasing the number of support vectors.
  - **Use Case**: When a more precise fit to the training data is needed and you are willing to accept more support vectors.

**4. Gamma Parameter $\gamma$**

The gamma parameter is used with the RBF, polynomial, and sigmoid kernels to define how far the influence of a single training example reaches.

- **High $\gamma$**:
  - **Effect**: Restricts the influence of a single training example to a small region, leading to a more complex model that captures finer patterns.
  - **Use Case**: When expecting high local interactions in the data and wanting a model that can fit these details. Be cautious of overfitting with very high values.

- **Low $\gamma$**:
  - **Effect**: Extends the influence of a single training example over a larger region, leading to a smoother and less complex model.
  - **Use Case**: When a model that captures more general patterns is preferred to avoid overfitting. Useful for data with less localized patterns.


**Q5.  Assignment:**  uture use.

1. Import the necessary libraries and load the dataseg

In [1]:
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# Load the penguins dataset
penguins = sns.load_dataset('penguins')


In [2]:
penguins.head(5)

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female


In [3]:
penguins.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 344 entries, 0 to 343
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   species            344 non-null    object 
 1   island             344 non-null    object 
 2   bill_length_mm     342 non-null    float64
 3   bill_depth_mm      342 non-null    float64
 4   flipper_length_mm  342 non-null    float64
 5   body_mass_g        342 non-null    float64
 6   sex                333 non-null    object 
dtypes: float64(4), object(3)
memory usage: 18.9+ KB


In [4]:
penguins.describe()

Unnamed: 0,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g
count,342.0,342.0,342.0,342.0
mean,43.92193,17.15117,200.915205,4201.754386
std,5.459584,1.974793,14.061714,801.954536
min,32.1,13.1,172.0,2700.0
25%,39.225,15.6,190.0,3550.0
50%,44.45,17.3,197.0,4050.0
75%,48.5,18.7,213.0,4750.0
max,59.6,21.5,231.0,6300.0


In [5]:
penguins.shape

(344, 7)

In [6]:
penguins.isnull().sum()

species               0
island                0
bill_length_mm        2
bill_depth_mm         2
flipper_length_mm     2
body_mass_g           2
sex                  11
dtype: int64

In [7]:
penguins.dropna(axis=0,inplace=True)

In [8]:
penguins.shape

(333, 7)

In [9]:
penguins.species.unique()

array(['Adelie', 'Chinstrap', 'Gentoo'], dtype=object)

2. Split the dataset into training and testing setZ

In [10]:
from sklearn.model_selection import train_test_split

In [11]:
X = penguins.drop("species",axis=1)
y = penguins["species"]

In [12]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=42)

3.Preprocess the data using any technique of your choice (e.g. scaling, normalization)

In [13]:
from sklearn.preprocessing import LabelEncoder

# Initialize the LabelEncoder
label_encoder = LabelEncoder()

# Fit and transform on training data
y_train_encoded = label_encoder.fit_transform(y_train)

# Transform the test data using the same encoder
y_test_encoded = label_encoder.transform(y_test)

# Encode categorical features in X_train and X_test
X_train_encoded = X_train.copy()
X_test_encoded = X_test.copy()

for column in ['island', 'sex']:
    # Fit and transform on training data
    label_encoder.fit(X_train_encoded[column])
    X_train_encoded[column] = label_encoder.transform(X_train_encoded[column])
    
    # Transform test data
    X_test_encoded[column] = label_encoder.transform(X_test_encoded[column])

# Display encoded datasets
print("Training set after encoding:")
print(X_train_encoded.head())
print("\nTesting set after encoding:")
print(X_test_encoded.head())

Training set after encoding:
     island  bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  \
321       0            55.9           17.0              228.0       5600.0   
265       0            43.6           13.9              217.0       4900.0   
36        1            38.8           20.0              190.0       3950.0   
308       0            47.5           14.0              212.0       4875.0   
191       1            53.5           19.9              205.0       4500.0   

     sex  
321    1  
265    0  
36     1  
308    0  
191    1  

Testing set after encoding:
     island  bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  \
30        1            39.5           16.7              178.0       3250.0   
317       0            46.9           14.6              222.0       4875.0   
79        2            42.1           19.1              195.0       4000.0   
201       1            49.8           17.3              198.0       3675.0   
63        0      

In [14]:
##Standardization
from sklearn.preprocessing import StandardScaler

In [15]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train_encoded)
X_test_scaled = scaler.transform(X_test_encoded)

4. Create an instance of the SVC classifier and train it on the training data

In [16]:
from sklearn.svm import SVC

In [17]:
svc_clf = SVC()

In [18]:
svc_clf.fit(X_train_scaled,y_train_encoded)

5. Use the trained classifier to predict the labels of the testing data

In [19]:
y_pred = svc_clf.predict(X_test_scaled)

6. Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy,precision, recall, F1-score)

In [20]:
from sklearn.metrics import accuracy_score,precision_score,recall_score,f1_score

In [21]:
acc_score = accuracy_score(y_test_encoded,y_pred)
precision = precision_score(y_test_encoded,y_pred,average='micro')
recall = recall_score(y_test_encoded,y_pred,average='micro')
f1 = f1_score(y_test_encoded,y_pred,average='micro')

In [22]:
print(f"Accuracy score is {acc_score}")
print(f"Precision score is {precision}")
print(f"Recall score is {recall}")
print(f"F1 Sccore score is {f1}")


Accuracy score is 0.9880952380952381
Precision score is 0.9880952380952381
Recall score is 0.9880952380952381
F1 Sccore score is 0.9880952380952381


### We get 3 values for precision, recall and F1 score is because it is multiclass classification problem.

7. Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to improve its performance

In [23]:
from sklearn.model_selection import GridSearchCV

In [24]:
param_grid = {
    'C': [0.1, 1, 10, 100, 1000],                  # Regularization parameter
    'kernel': ['linear', 'poly', 'rbf', 'sigmoid'], # Type of kernel function
    'degree': [2, 3, 4, 5],                        # Degree for polynomial kernel (if kernel='poly')
    'gamma': ['scale', 'auto', 0.001, 0.01, 0.1, 1, 10] # Kernel coefficient for 'rbf', 'poly', and 'sigmoid'
}

In [25]:
svc_1 = SVC()

In [26]:
svc_grid = GridSearchCV(svc_1,param_grid=param_grid,cv =5)
svc_grid.fit(X_train_scaled,y_train_encoded)

In [27]:
svc_grid.best_estimator_

In [28]:
svc_grid.best_params_

{'C': 10, 'degree': 2, 'gamma': 'scale', 'kernel': 'rbf'}

In [29]:
svc_grid.best_score_

0.9960000000000001

8. Train the tuned classifier on the entire dataset

In [30]:
best_svc = svc_grid.best_estimator_

In [31]:
y_pred = best_svc.predict(X_test_scaled)

In [32]:
acc_score = accuracy_score(y_test_encoded,y_pred)
precision = precision_score(y_test_encoded,y_pred,average='micro')
recall = recall_score(y_test_encoded,y_pred,average='micro')
f1 = f1_score(y_test_encoded,y_pred,average='micro')

print(f"Accuracy score is {acc_score}")
print(f"Precision score is {precision}")
print(f"Recall score is {recall}")
print(f"F1 Sccore score is {f1}")

Accuracy score is 1.0
Precision score is 1.0
Recall score is 1.0
F1 Sccore score is 1.0


9. Save the trained classifier to a file for future use.

In [33]:
import pickle

In [34]:
#save the model to a file
with open("svc_model.pkl",'wb') as file:
    pickle.dump(best_svc,file)