## Assignment 61 - 07 April 2023 Divya

**Q1.What is the relationship between polynomial functions and kernel functions in machine learning
algorithms?**


Polynomial functions and kernel functions are both mathematical tools used in machine learning, particularly in the context of support vector machines (SVMs) and kernelized methods. Let's explore their relationship:

1. **Polynomial Functions:**
   - Polynomial functions are mathematical functions that involve variables raised to whole number powers and multiplied by coefficients.
   - In the context of machine learning, polynomial functions are often used as the basis functions for feature transformation in polynomial regression or kernelized SVMs.
   - Polynomial regression involves fitting a polynomial function to the data, allowing for more complex relationships to be captured compared to linear regression.

2. **Kernel Functions:**
   - Kernel functions are used in machine learning algorithms, especially in support vector machines (SVMs), to transform the input data into a higher-dimensional space without explicitly computing the transformed feature vectors.
   - The kernel trick is a way to implicitly map input data into a higher-dimensional space, allowing linear algorithms to effectively operate in a nonlinear feature space.
   - Common kernel functions include the linear kernel, polynomial kernel, radial basis function (RBF) kernel, and others.

3. **Relationship:**
   - The polynomial kernel is a specific type of kernel function used in SVMs that is based on polynomial functions. It allows SVMs to capture non-linear relationships between features.
   - The polynomial kernel is defined as $(K(x, y) = (x \cdot y + c)^d)$, where $(x)$ and $(y)$ are input feature vectors, $(c)$ is a constant, and $(d)$ is the degree of the polynomial.
   - Essentially, the polynomial kernel computes the inner product of the input feature vectors raised to a certain degree and adds a constant term.

4. **Usage in SVMs:**
   - In SVMs, the choice of kernel function, including the polynomial kernel, is crucial for capturing complex relationships in the data.
   - The polynomial kernel allows SVMs to model non-linear decision boundaries by implicitly transforming the input features into a higher-dimensional space.

In summary, the relationship between polynomial functions and kernel functions in machine learning lies in the fact that the polynomial kernel is a type of kernel function that uses polynomial functions to implicitly map data into a higher-dimensional space, enabling SVMs to capture non-linear patterns in the data.

**Q2.How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?**

Implementing an SVM with a polynomial kernel in Python using Scikit-learn is straightforward. Scikit-learn provides the `SVC` (Support Vector Classification) class, which allows you to specify the kernel type, including the polynomial kernel. Here's an example:

```python
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load a sample dataset (e.g., the Iris dataset)
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features (important for SVMs)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Create an SVM classifier with a polynomial kernel
# You can adjust the degree parameter for the polynomial kernel
# For example, degree=3 represents a cubic polynomial kernel
svm_classifier = SVC(kernel='poly', degree=3, C=1.0, gamma='scale')  
# C is the regularization parameter, and gamma is the kernel coefficient (auto uses 1/n_features)

# Train the SVM classifier
svm_classifier.fit(X_train_scaled, y_train)

# Make predictions on the test set
y_pred = svm_classifier.predict(X_test_scaled)

# Evaluate the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
```

In this example, we use the Iris dataset, split it into training and testing sets, standardize the features (important for SVMs), and then create an SVM classifier with a polynomial kernel using the `SVC` class. The `degree` parameter in the `SVC` class is used to specify the degree of the polynomial kernel.

You can adjust the `degree`, `C`, and other parameters based on the characteristics of your dataset and the complexity of the relationships you want the SVM to capture. Experimentation with these parameters is often necessary to find the best model for your specific problem.

**Q3.How does increasing the value of epsilon affect the number of support vectors in SVR?**

In Support Vector Regression (SVR), the parameter epsilon (denoted as ε) is part of the margin around the predicted values within which no penalty is associated with errors. Specifically, in SVR, you have two parameters that control the width of the ε-insensitive tube: ε and C.

- **ε (epsilon):** The width of the ε-insensitive tube. It determines the margin within which errors are not penalized. Any prediction within ε distance from the true target is considered accurate.

- **C:** The regularization parameter, which controls the trade-off between achieving a low training error and a low testing error. Higher values of C encourage a smaller-margin hyperplane, potentially leading to more training errors but fewer support vectors.

Now, let's address your question about the effect of increasing the value of epsilon on the number of support vectors in SVR:

1. **Increasing Epsilon:**
   - When you increase the value of epsilon, you are essentially widening the ε-insensitive tube. This means that SVR becomes more tolerant to errors within this wider margin.
   - As epsilon increases, the model becomes less sensitive to small deviations between the predicted values and the actual targets. This results in a larger margin where errors are not penalized.

2. **Effect on Support Vectors:**
   - Increasing epsilon generally leads to an increase in the number of support vectors. This is because a larger ε-insensitive tube allows more data points to fall within the margin without incurring a penalty.
   - Support vectors are the data points that either lie on the margin or within the ε-insensitive tube. When epsilon is larger, more data points can be within this wider tube without violating the margin constraints.

3. **Impact on Model Complexity:**
   - A larger epsilon makes the model more flexible and tolerant to errors but may lead to a less precise fit to the training data. It could result in a smoother regression function with a larger margin.

In summary, increasing the value of epsilon in SVR generally leads to a wider ε-insensitive tube, making the model more tolerant to errors. This, in turn, tends to increase the number of support vectors, as more data points can fall within the widened margin without penalty. The choice of epsilon should be based on the trade-off between model flexibility and precision that is suitable for the specific problem at hand.

**Q4.How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter
affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works
and provide examples of when you might want to increase or decrease its value?**

Support Vector Regression (SVR) is a powerful technique for regression tasks in machine learning. The performance of SVR is heavily influenced by its various parameters. Let's discuss the key parameters and their impact on SVR:

1. **Kernel Function:**
   - **Role:** The kernel function determines the type of decision boundary that the SVR will create in the feature space.
   - **Options:**
     - **Linear Kernel (`'linear'`):** Provides a linear decision boundary.
     - **Polynomial Kernel (`'poly'`):** Introduces non-linearity, and you can control the degree of the polynomial with the `degree` parameter.
     - **Radial Basis Function Kernel (`'rbf'` or `'gaussian'`):** Introduces non-linearity and is often suitable for capturing complex relationships.

   - **Example:** Choose a polynomial kernel when the relationship between inputs and outputs is expected to be polynomial, or use an RBF kernel for a more flexible, non-linear fit.

2. **C Parameter:**
   - **Role:** The C parameter controls the trade-off between achieving a low training error and a low testing error. A smaller C encourages a larger-margin hyperplane, potentially allowing more training errors but fewer support vectors.
   - **Impact:** Higher C values make the model more sensitive to errors, potentially leading to a smaller-margin hyperplane and more support vectors.

   - **Example:** If the dataset has noise or outliers, a smaller C may be preferred to obtain a more robust model. If a precise fit to the training data is desired and the data is not noisy, a higher C may be chosen.

3. **Epsilon Parameter (ε):**
   - **Role:** Epsilon defines the width of the ε-insensitive tube, within which errors are not penalized.
   - **Impact:** Larger epsilon values increase the tolerance for errors, resulting in a wider tube and potentially more support vectors.

   - **Example:** Use a larger epsilon when a certain amount of error is acceptable in the predictions, and a wider margin is desired. Smaller epsilon values make the model less tolerant to errors.

4. **Gamma Parameter:**
   - **Role:** Gamma defines how far the influence of a single training example reaches. Low values mean a far reach, and high values mean a closer reach.
   - **Impact:** A small gamma value leads to a smoother decision boundary, while a large gamma value makes the decision boundary more influenced by individual data points.

   - **Example:** Use a small gamma when the dataset is large and diverse, and a large gamma when the dataset is small and more homogeneous. A large gamma might also be suitable when there is reason to believe that individual data points should have a strong influence on the model.

In summary, the choice of kernel function, C parameter, epsilon parameter, and gamma parameter in SVR depends on the characteristics of the dataset and the desired trade-off between model flexibility and precision. It often involves experimentation and tuning to find the values that result in the best performance for a specific regression task. Regularization (C), tube width (epsilon), and the shape of the decision boundary (kernel function and gamma) are critical aspects to consider when fine-tuning an SVR model.

**Question 5: Assignment**
- Dataset : survey_lung_cancer
- Import the necessary libraries and load the dataset
- Split the dataset into training and testing sets
- Preprocess the data using any technique of your choice ((e.g. scaling, normalization)
- Create an instance of the SVC classifier and train it on the training data
- hse the trained classifier to predict the labels of the testing data
- Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy, precision, recall, F1-score
- Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomizedSearchCV to improve its performanc_
- Train the tuned classifier on the entire dataset
- Save the trained classifier to a file for future use.

### Importing Imp. Libraries

In [1]:
!pip install feature-engine




In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.colors import ListedColormap

#from Sk-learn
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, cross_val_score

#from feture engine
from feature_engine.encoding import OneHotEncoder

#From mlxtend
from mlxtend.plotting import plot_decision_regions

import warnings
warnings.filterwarnings('ignore')

### Importing the dataset

In [3]:
# Load a sample dataset (e.g., the Iris dataset)
df = pd.read_csv('survey_lung_cancer.csv')

In [4]:
df.head()

Unnamed: 0,GENDER,AGE,SMOKING,YELLOW_FINGERS,ANXIETY,PEER_PRESSURE,CHRONIC DISEASE,FATIGUE,ALLERGY,WHEEZING,ALCOHOL CONSUMING,COUGHING,SHORTNESS OF BREATH,SWALLOWING DIFFICULTY,CHEST PAIN,LUNG_CANCER
0,M,69,1,2,2,1,1,2,1,2,2,2,2,2,2,YES
1,M,74,2,1,1,1,2,2,2,1,1,1,2,2,2,YES
2,F,59,1,1,1,2,1,2,1,2,1,2,2,1,2,NO
3,M,63,2,2,2,1,1,1,1,1,2,1,1,2,2,NO
4,F,63,1,2,1,1,1,1,1,2,1,2,2,1,1,NO


In [5]:
df.shape

(309, 16)

### Feature Engineering

In [6]:
encoder = OneHotEncoder(
    top_categories=2,
    variables= ['GENDER', 'LUNG_CANCER'],
    ignore_format=True,
    )

# fit the encoder
df = encoder.fit_transform(df)

In [7]:
df.columns

Index(['AGE', 'SMOKING', 'YELLOW_FINGERS', 'ANXIETY', 'PEER_PRESSURE',
       'CHRONIC DISEASE', 'FATIGUE ', 'ALLERGY ', 'WHEEZING',
       'ALCOHOL CONSUMING', 'COUGHING', 'SHORTNESS OF BREATH',
       'SWALLOWING DIFFICULTY', 'CHEST PAIN', 'GENDER_M', 'GENDER_F',
       'LUNG_CANCER_YES', 'LUNG_CANCER_NO'],
      dtype='object')

In [8]:
df.drop(columns=['GENDER_F','LUNG_CANCER_NO'], axis=1).head()

Unnamed: 0,AGE,SMOKING,YELLOW_FINGERS,ANXIETY,PEER_PRESSURE,CHRONIC DISEASE,FATIGUE,ALLERGY,WHEEZING,ALCOHOL CONSUMING,COUGHING,SHORTNESS OF BREATH,SWALLOWING DIFFICULTY,CHEST PAIN,GENDER_M,LUNG_CANCER_YES
0,69,1,2,2,1,1,2,1,2,2,2,2,2,2,1,1
1,74,2,1,1,1,2,2,2,1,1,1,2,2,2,1,1
2,59,1,1,1,2,1,2,1,2,1,2,2,1,2,0,0
3,63,2,2,2,1,1,1,1,1,2,1,1,2,2,1,0
4,63,1,2,1,1,1,1,1,2,1,2,2,1,1,0,0


In [9]:
#Check data types in df

numerical_features = [feature for feature in df.columns if df[feature].dtypes != 'O']

discrete_features = [feature for feature in numerical_features if len(df[feature].unique())<25]

continuous_features = [feature for feature in numerical_features if feature not in discrete_features]

categorical_features = [feature for feature in df.columns if feature not in numerical_features]

binary_categorical_features = [feature for feature in categorical_features if len(df[feature].unique()) <=3]

print(f"Numerical Features Count {len(numerical_features)}")
print(f"Discrete features Count {len(discrete_features)}")
print(f"Continuous features Count {len(continuous_features)}")
print(f"Categorical features Count {len(categorical_features)}")
print(f"Binary Categorical features Count {len(binary_categorical_features)}")

Numerical Features Count 18
Discrete features Count 17
Continuous features Count 1
Categorical features Count 0
Binary Categorical features Count 0


### Splitting the dataset into the Training set and Test set

In [10]:
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

In [11]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Feature Scaling

In [12]:
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

### Training the SVM model on the training set

In [13]:
classifier = SVC(kernel = 'linear', random_state = 0)
classifier.fit(X_train, y_train)

### Evaluating the performance of the classifier

In [14]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

y_pred = classifier.predict(X_test)

In [15]:
# Classification Report
print("Classification Report:")
print(classification_report(y_test, y_pred))

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        60
           1       1.00      1.00      1.00         2

    accuracy                           1.00        62
   macro avg       1.00      1.00      1.00        62
weighted avg       1.00      1.00      1.00        62



In [16]:
# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

Confusion Matrix:
[[60  0]
 [ 0  2]]


In [17]:
# accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Test Set Accuracy:", accuracy)


Test Set Accuracy: 1.0


### Tune the hyperparameters

In [18]:
parameters = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf', 'poly'], 'gamma': ['scale', 'auto']}
svc = SVC()
grid_search = GridSearchCV(svc, parameters, cv=3)
grid_search.fit(X_train, y_train)

In [19]:
# Get the best hyperparameters
best_params = grid_search.best_params_
print("Best Hyperparameters:", best_params)

Best Hyperparameters: {'C': 0.1, 'gamma': 'scale', 'kernel': 'linear'}


In [20]:
# Train the svc model with the best hyperparameters
best_svc_model = SVC(**best_params)
best_svc_model.fit(sc.transform(X), y)

In [21]:
import joblib  # for saving the trained model

# Save the trained classifier to a file for future use
joblib.dump(best_svc_model, 'tuned_svm_classifier_model.pkl')

['tuned_svm_classifier_model.pkl']