### [Q1.] What is the relationship between polynomial functions and kernel functions in machine learning algorithms?
##### [Ans]



### [Q2.] How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?
##### [Ans]

In [5]:
import numpy as np
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target
X, y = X[y<2], y[y<2]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

svm = SVC(kernel="poly", C=1, gamma='scale', degree=1)
svm.fit(X_train, y_train)

y_pred = svm.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"ACCURACY : {accuracy:.2f}")


ACCURACY : 1.00


### [Q3.] How does increasing the value of epsilon affect the number of support vectors in SVR?
##### [Ans]

In Support Vector Regression (SVR):

- The epsilon (ϵ) parameter defines a margin of tolerance for prediction errors.
- Increasing ϵ means the model allows a larger margin of error, leading to:
    - Fewer support vectors (as more points fall within the margin of tolerance).
    - A simpler model that may underfit the data.

### [Q4.] How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?
##### [Ans]

**Kernel Function:**
Determines the type of decision boundary:
- Linear kernel: Best for linearly separable data.
- Polynomial kernel: Suitable for data with polynomial relationships.
- RBF kernel: Most flexible, handles non-linear data well.
Example: Use RBF for highly non-linear patterns and polynomial for structured relationships.

**C Parameter:**
Controls the trade-off between margin size and misclassification.
- Small C: Large margin, allows more misclassification (underfitting).
- Large C: Narrow margin, fits the data closely (overfitting).

**Epsilon Parameter:**
In SVR, it defines the margin of tolerance for error.
- Small ϵ: Captures finer details but increases support vectors (overfitting).
- Large ϵ: Simplifies the model, fewer support vectors (underfitting).

**Gamma Parameter:**
Controls the influence of a single training sample.
- Small gamma: Far-reaching influence (smoother model).
- Large gamma: Focuses on nearby points (risk of overfitting).

### [Q5.] Assignment:
- Import the necessary libraries and load the dataset.
- Split the dataset into training and testing set.
- Preprocess the data using any technique of your choice (e.g. scaling, normalization)
- Create an instance of the SVC classifier and train it on the training data
- Use the trained classifier to predict the labels of the testing data
- Evaluate the performance of the classifier using any metric of your choice(eg. accuracy,precision, recall, F1-score)
- Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomizedSearchCV to improve its performance.
- Train the tuned classifier on the entire dataset.
- Save the trained classifier to a file for future use.
#####[Ans]

In [19]:
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report

iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target
X, y = X[y<2], y[y<2]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

svm = SVC(kernel='linear', C=1)
svm.fit(X_train, y_train)

y_pred = svm.predict(X_test)
print("Performance before tuning:")
print(classification_report(y_test, y_pred))

param_grid = {'C' : [0.1, 1, 10], 'kernel' : ['linear', 'poly', 'rbf'], 'degree' : [2, 3, 4]}
grid_search = GridSearchCV(SVC(), param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

print("Best Parameters : ", grid_search.best_params_)

tuned_svm = grid_search.best_estimator_
tuned_svm.fit(X, y)

Performance before tuning:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        14
           1       1.00      1.00      1.00        11

    accuracy                           1.00        25
   macro avg       1.00      1.00      1.00        25
weighted avg       1.00      1.00      1.00        25

Best Parameters :  {'C': 0.1, 'degree': 2, 'kernel': 'linear'}
