# Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?

In machine learning, especially in the context of support vector machines (SVMs) and some other algorithms, kernel functions are used to map input data into a higher-dimensional space. The primary goal of this transformation is to make the data linearly separable, even if it wasn't in its original form. The relationship between polynomial functions and kernel functions lies in the fact that one type of kernel function is a polynomial kernel, which uses a polynomial function for this mapping.

**1.Polynomial Functions:** These are mathematical functions that involve terms with powers (or degrees). A polynomial of degree dd in one variable xx can be represented as:
![image.png](attachment:image.png)

**2.Kernel Functions:** In the context of machine learning, a kernel function is a function that computes the dot product between the images of two vectors under some transformation ϕ. Essentially, it computes:
K(x,y)=⟨ϕ(x),ϕ(y)⟩

where ⟨⋅,⋅⟩ denotes the dot product. The beauty of kernel functions is that they allow us to compute this dot product in the transformed space without explicitly calculating the transformation ϕ.

**3.Polynomial Kernel:** It is a type of kernel function that is based on polynomial functions. The polynomial kernel of degree dd is given by: K(x,y)=(x⋅y+c)d

where xx and yy are vectors, ⋅⋅ represents the dot product, cc is a constant (often set to 1), and dd is the degree of the polynomial.

The relationship between polynomial functions and kernel functions, in summary, is that polynomial functions provide a specific form of transformation in the context of kernel methods. When data isn't linearly separable in its original space, using a polynomial kernel can help in mapping the data to a higher-dimensional space where it becomes linearly separable. This allows algorithms like SVMs to find a hyperplane that separates the classes.

# Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

implementing an SVM with a polynomial kernel using Scikit-learn is straightforward. 

## 1. Import necessary libraries:

In [2]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

## 2. Load a dataset:
**Use famous Iris dataset.**

In [4]:
iris = datasets.load_iris()
X = iris.data
y = iris.target

# 3. Split the dataset into training and test sets:

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [6]:
# For this example, we'll use a polynomial of degree 3.

svm_poly = SVC(kernel='poly', degree=3, C=1, gamma='scale')

*    **`kernel='poly'`**: This specifies that we want to use the polynomial kernel.
*    **`degree=3`**: This sets the degree of the polynomial kernel.
*    **`C=1`**: This is the regularization parameter. A smaller value of **`C`** will result in a wider margin, which may result in more misclassifications on the training data. A larger value of **`C`** will result in a narrower margin, potentially leading to overfitting.

*    `gamma='scale': Kernel coefficient. If gamma='scale', then it is calculated as 1 / (n_features * X.var()) for the input data X.

## 5. Train the model: 

In [7]:
svm_poly.fit(X_train, y_train)

## 6. Predict on the test set: 

In [8]:
y_pred = svm_poly.predict(X_test)

## 7. Evaluate the model:

In [9]:
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

Accuracy: 100.00%


# Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

Support Vector Regression (SVR) is an adaptation of the Support Vector Machine (SVM) for regression problems. In SVR, the idea is not to separate two classes, but to find a hyperplane that best fits the data, such that the deviations from this hyperplane are minimized.

The parameter ε (epsilon) in SVR defines a margin of tolerance where no penalty is associated with errors. Specifically, errors are not penalized as long as they are within this εε-insensitive zone. This means that any data point within the εε distance from the predicted value won't affect the model. These points are not considered as support vectors.

how increasing the value of ε affects the number of support vectors in SVR:

**1.    Increase in** ε: As ε increases, the ε-insensitive zone (or the tube around the regression function) becomes wider. This means that more data points will fall within this zone and won't be counted as support vectors. Consequently, the number of support vectors will decrease.

**2.    Decrease in** ε: Conversely, as ε decreases, the εε-insensitive zone becomes narrower. Fewer data points will fall within this zone, leading to an increase in the number of support vectors.

# Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?

**1 .Kernel Function:**

*    **How it works:** The kernel function is responsible for transforming the input data into a higher-dimensional space. By doing this, the data might become more easily fit by a hyperplane, even if it seems non-linear in the original space.
*    **Effect on performance:** The choice of kernel can have a significant impact. For instance, a linear kernel assumes a linear relationship between the inputs and the output, while a polynomial or RBF kernel can capture more complex relationships.

**Examples:**

*    **Linear kernel:** Use when the relationship between the input features and the output seems linear. It's computationally less intensive.
*    **Polynomial kernel:** Use when the data has a polynomial relationship. Be cautious with higher degrees as they can lead to overfitting.
*    **RBF (Radial Basis Function) kernel:** A good default choice for many problems as it can capture a wide range of relationships. It's non-linear and can model complex data structures.

**2. C Parameter (Regularization parameter):**

*    **How it works:** The C parameter trades off between having a larger margin and minimizing the training error. In simple terms, a smaller C allows for more tolerance of error, while a larger C penalizes errors more heavily.
*    **Effect on performance:** A small C may underfit the data, while a large C might overfit.
 
**Examples:**

* **Increase C**: When the model is too simple and has high bias (underfitting).
*  **Decrease C**: When the model is too complex and has high variance (overfitting).

**3. Epsilon Parameter (ε):**

*    **How it works**: Defines the ε-insensitive zone. Errors within this zone are not penalized. It essentially controls the width of the "tube" around the regression line.
*    **Effect on performance**: A larger ε can lead to fewer support vectors and a simpler model. However, it might miss out on finer variations in the data.

**Examples:**
*   **Increase ε**: When you want a simpler model and are okay with tolerating larger errors.
*   **Decrease ε**: When you want the model to be more sensitive to errors and capture finer variations.

**4. Gamma Parameter:**

*   **How it works**: For RBF, polynomial, and sigmoid kernels, γ defines how far the influence of a single training example reaches. Low values mean 'far' and high values mean 'close'.
*    **Effect on performance**: A small γ will produce a more flexible model, while a large γ will produce a more constrained model.

**Examples:**
*   **Increase γ**: When the model is underfitting and you want to capture more complexity.
*   **Decrease γ**: When the model is overfitting and you want it to generalize better.

# Q5. Assignment:

* Import the necessary libraries and load the dataseg

* Split the dataset into training and testing sets

* Preprocess the data using any technique of your choice (e.g. scaling, normaliMation)

* Create an instance of the SVC classifier and train it on the training data

* use the trained classifier to predict the labels of the testing data

* Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy,
  precision, recall, F1-score)
  
* Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to
  improve its performance
  
* Train the tuned classifier on the entire dataseg

* Save the trained classifier to a file for future use.

**NOTE**: You can use any dataset of your choice for this assignment, but make sure it is suitable for
classification and has a sufficient number of features and samples.

## Step 1: Import the necessary libraries and load the dataset.

In [11]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report
from sklearn.model_selection import GridSearchCV
import joblib

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Check the shape of the dataset to ensure it's loaded correctly
X.shape, y.shape

((150, 4), (150,))

## Step 2: Split the dataset into a training and testing set.

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Check the shape of the training and testing datasets
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((105, 4), (45, 4), (105,), (45,))

##  Step 3: Preprocess the data using standard scaling.

Using standard scaling to scale the features so that they have a mean of 0 and a standard deviation of 1.

In [14]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Check the mean and standard deviation of the scaled training data for the first feature as an example
X_train_scaled[:, 0].mean(), X_train_scaled[:, 0].std()

(2.5418820487608346e-15, 0.9999999999999999)

The first feature of the scaled training data now has a mean close to 0 and a standard deviation of approximately 1, indicating successful scaling.

## Step 4: Create an instance of the SVC classifier and train it.

In [15]:
# Initialize the SVC classifier
svc = SVC(random_state=42)

# Train the classifier
svc.fit(X_train_scaled, y_train)

# Check if the model is trained by returning its support vectors
svc.support_vectors_[:5]  # Displaying the first 5 support vectors as an example

array([[-1.01631531, -0.02284379, -1.32533157, -1.40568508],
       [-0.89573553,  0.69673574, -1.26695916, -0.99982734],
       [-0.4134164 ,  2.85547435, -1.44207638, -1.40568508],
       [-1.73979401, -0.26270364, -1.44207638, -1.40568508],
       [-1.8603738 , -0.02284379, -1.6171936 , -1.540971  ]])

The SVC classifier has been successfully trained, as indicated by the display of the first five support vectors.

## Step 5: Use the trained classifier to predict the labels of the testing data.

In [16]:
y_pred = svc.predict(X_test_scaled)
y_pred[:10]  # Displaying the first 10 predicted labels as an example

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1])

The model has made predictions for the test set. The first 10 predicted labels are displayed above.

## Step 6: Evaluate the classifier's performance.

In [17]:
# Generate a classification report
classification_rep = classification_report(y_test, y_pred, target_names=iris.target_names)

classification_rep

'              precision    recall  f1-score   support\n\n      setosa       1.00      1.00      1.00        19\n  versicolor       1.00      1.00      1.00        13\n   virginica       1.00      1.00      1.00        13\n\n    accuracy                           1.00        45\n   macro avg       1.00      1.00      1.00        45\nweighted avg       1.00      1.00      1.00        45\n'

**Setosa:**

*    Precision: 1.00
*    Recall: 1.00
*    F1-score: 1.00

**Versicolor:**

*    Precision: 1.00
*    Recall: 1.00
*    F1-score: 1.00

**Virginica:**

*    Precision: 1.00
*    Recall: 1.00
*    F1-score: 1.00

Overall Accuracy: 1.00

The classifier performed exceptionally well on the test data, achieving an accuracy of 100%.

## Step 7: Tune the hyperparameters of the SVC classifier using GridSearchCV.

In [18]:
# Define the parameters grid
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [1, 0.1, 0.01, 0.001],
    'kernel': ['linear', 'rbf']
}

# Initialize GridSearchCV
grid_search = GridSearchCV(SVC(), param_grid, refit=True, verbose=2, cv=5)

# Fit GridSearchCV on the scaled training data
grid_search.fit(X_train_scaled, y_train)

# Get the best parameters from the grid search
best_params = grid_search.best_params_
best_params

Fitting 5 folds for each of 32 candidates, totalling 160 fits
[CV] END ......................C=0.1, gamma=1, kernel=linear; total time=   0.0s
[CV] END ......................C=0.1, gamma=1, kernel=linear; total time=   0.0s
[CV] END ......................C=0.1, gamma=1, kernel=linear; total time=   0.0s
[CV] END ......................C=0.1, gamma=1, kernel=linear; total time=   0.0s
[CV] END ......................C=0.1, gamma=1, kernel=linear; total time=   0.0s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END ....................C=0.1, gamma=0.1, kernel=linear; total time=   0.0s
[CV] END ....................C=0.1, gamma=0.1, 

[CV] END ......................C=100, gamma=0.01, kernel=rbf; total time=   0.0s
[CV] END ......................C=100, gamma=0.01, kernel=rbf; total time=   0.0s
[CV] END ......................C=100, gamma=0.01, kernel=rbf; total time=   0.0s
[CV] END ......................C=100, gamma=0.01, kernel=rbf; total time=   0.0s
[CV] END ......................C=100, gamma=0.01, kernel=rbf; total time=   0.0s
[CV] END ..................C=100, gamma=0.001, kernel=linear; total time=   0.0s
[CV] END ..................C=100, gamma=0.001, kernel=linear; total time=   0.0s
[CV] END ..................C=100, gamma=0.001, kernel=linear; total time=   0.0s
[CV] END ..................C=100, gamma=0.001, kernel=linear; total time=   0.0s
[CV] END ..................C=100, gamma=0.001, kernel=linear; total time=   0.0s
[CV] END .....................C=100, gamma=0.001, kernel=rbf; total time=   0.0s
[CV] END .....................C=100, gamma=0.001, kernel=rbf; total time=   0.0s
[CV] END ...................

{'C': 1, 'gamma': 0.1, 'kernel': 'rbf'}

The best hyperparameters found using GridSearchCV for the SVC classifier are:

*    C: 1
*    γ: 0.1
*    Kernel: 'rbf'

## Step 8: Train the tuned classifier on the entire dataset.

In [19]:
# Combine the training and testing datasets for training
X_combined = scaler.fit_transform(X)  # Scaling the entire dataset
y_combined = y

# Train the classifier with the best parameters on the entire dataset
svc_tuned = SVC(C=best_params['C'], gamma=best_params['gamma'], kernel=best_params['kernel'], random_state=42)
svc_tuned.fit(X_combined, y_combined)

# Check if the model is trained by returning its support vectors
svc_tuned.support_vectors_[:5]  # Displaying the first 5 support vectors as an example

array([[-0.17367395,  3.09077525, -1.2833891 , -1.05217993],
       [-0.90068117,  0.55861082, -1.16971425, -0.92054774],
       [-1.02184904, -0.13197948, -1.22655167, -1.3154443 ],
       [-1.62768839, -1.74335684, -1.39706395, -1.18381211],
       [-1.26418478, -0.13197948, -1.34022653, -1.18381211]])

The SVC classifier has been successfully trained with the tuned hyperparameters on the entire dataset, as evidenced by the display of the first five support vectors.

## Step 9: Save the trained classifier for future use.

In [27]:
filename = "/Users/MANOJ/Data Scientist/PW Skills course/april/6th to 8th April/svc_tuned_classifier.pkl"
joblib.dump(svc_tuned, filename)

filename

'/Users/MANOJ/Data Scientist/PW Skills course/april/6th to 8th April/svc_tuned_classifier.pkl'