<a href="https://colab.research.google.com/github/DIVYA14797/Machine-Learning/blob/main/SVM_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. What is relationship between polynomial functions and kernel functions in machine learning algorithm ?



Polynomial functions and kernel functions are both used in machine learning algorithms, particularly in the context of support vector machines (SVMs) and kernel methods. Here's a brief overview of their relationship:

1. Polynomial functions:
* Polynomial functions are mathematical functions of the form f(x)=$a_nx^n+a_(n-1)x^(n-1)+....+a_1x+a_0$ , where x is the independent variable and $a_0,a_1,....,a_n$ are coefficients.
* In machine learning, polynomial functions are often used as basis functions to transform the input features into a higher-dimensional space. This transformation can help in capturing complex relationships between features.
* In the context of SVMs, polynomial kernels compute the similarity between two samples as the inner product of the transformed feature vectors in a higher-dimensional space.

2. Kernel functions:

* Kernel functions are used to measure similarity between pairs of data points in a feature space without explicitly transforming them into that space.
* A kernel function K($x_i,x_j$) takes two input data points $x_i$and $x_j$ and computes their similarity, often referred to as the kernel trick.
* The most common kernel functions include linear kernel, polynomial kernel, Gaussian (RBF) kernel, sigmoid kernel, etc.
* Polynomial kernel is one type of kernel function which computes the similarity between two samples using a polynomial function.

The relationship between polynomial functions and polynomial kernel functions in machine learning is that the polynomial kernel function essentially computes the similarity between data points in a higher-dimensional space induced by a polynomial transformation of the original feature space. This allows SVMs to capture non-linear relationships between features by implicitly mapping the data into a higher-dimensional space where linear separation may be more feasible.

In summary, polynomial functions are used to transform input features into a higher-dimensional space in a deterministic way, while polynomial kernel functions compute similarities between data points in the original space, effectively achieving a similar result to transforming features into higher dimensions but without the computational cost of explicitly doing so.







2. How can we implement an SVM with a polynomial kernel in python using scikit-learn ?

We can implement an SVM with a polynomial kernel in Python using scikit-learn library.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

In [None]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [None]:
# Create an SVM classifier with a polynomial kernel
svm_classifier = SVC(kernel='poly', degree=3)  # 'degree' parameter specifies the degree of the polynomial kernel

# Train the classifier
svm_classifier.fit(X_train, y_train)

In [None]:
# Make predictions on the test set
y_pred = svm_classifier.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 1.0


3. How does increasing the value of epsilon affect the number of support vectors in SVR ?

In Support Vector Regression (SVR), the epsilon parameter (ε) determines the margin of tolerance around the predicted value. It defines a margin within which no penalty is associated with errors, essentially controlling the tube width within which errors are ignored.

As we increase the value of epsilon:

1. Wider Margin:

* A larger epsilon results in a wider margin around the predicted value. This means that SVR tolerates larger errors before considering them as part of the loss function.
* With a wider margin, the SVR model allows more data points to fall within the margin of tolerance without being penalized.

2. More Support Vectors:

* Support vectors are the data points that lie on the margin boundary or within the margin with non-zero coefficients.
* When epsilon is increased, more data points can fall within the margin of tolerance without penalty, which means more data points are likely to become support vectors.
* This happens because the wider margin allows more flexibility in fitting the training data while still satisfying the margin requirements.

Therefore, increasing the value of epsilon tends to increase the number of support vectors in SVR. Conversely, decreasing the value of epsilon would result in a narrower margin, leading to fewer support vectors as the model becomes more strict in enforcing the margin requirements and penalizing errors outside the margin.

However, it's important to note that the exact impact of epsilon on the number of support vectors can vary depending on other factors such as the complexity of the data, the choice of kernel function, and other hyperparameters like C and gamma.

4. How does the choice of kernel function , Cparameter , epsilon parameter and gamma parameter affect the performance of SVR ? Can you explain how each parameter works and provide examples of when you might want to increase oe decrease its value ?

Support Vector Regression (SVR) is a type of Support Vector Machine (SVM) algorithm used for regression tasks. The performance of SVR can be significantly affected by the choice of kernel function and various hyperparameters such as C, epsilon, and gamma. Let's discuss each parameter and how it affects SVR performance:

1. Kernel Function:

* The kernel function determines the type of mapping that transforms the input data into a higher-dimensional space.
* Common kernel functions include linear, polynomial, radial basis function (RBF), sigmoid, etc.
* Choice of kernel function depends on the nature of the data and the problem at hand:
 * Linear kernel: Suitable for linear relationships between features.
 * Polynomial kernel: Suitable for data with complex, non-linear relationships. Higher degrees can capture more complex relationships, but be cautious of overfitting.
 * RBF kernel: Suitable for data with non-linear and non-linearly separable patterns. It is highly flexible but can be prone to overfitting if gamma is too large.

2. C Parameter:

* C is the regularization parameter that controls the trade-off between maximizing the margin and minimizing the training error.
* A smaller C value allows for a larger margin, potentially resulting in a smoother decision boundary. This can help prevent overfitting, especially if the data is noisy or if there are outliers.
* Conversely, a larger C value penalizes misclassifications more heavily, potentially leading to a tighter decision boundary. This can improve accuracy on training data but may lead to overfitting.

3. Epsilon Parameter:

* Epsilon (ε) is the margin of tolerance around the predicted value.
* It defines a margin of tolerance where no penalty is associated with errors that fall within this margin. SVR will ignore errors smaller than ε.
* A smaller epsilon allows for a smaller margin of tolerance, which means the model will try to fit the data more closely, potentially leading to overfitting.
* A larger epsilon allows for a larger margin of tolerance, which can result in a more robust model that generalizes better to unseen data but may sacrifice accuracy on the training data.

4. Gamma Parameter:

* Gamma (γ) defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’.
* It is a parameter of the RBF kernel and controls the smoothness of the decision boundary.
* A smaller gamma value results in a smoother decision boundary, which can help prevent overfitting but might lead to underfitting if set too low.
* A larger gamma value makes the model more sensitive to the training data, potentially leading to overfitting. It can capture intricate details of the training data but may not generalize well to unseen data.

Examples of when to increase or decrease each parameter:

* Increase C when the training data is not well-separated and you want to emphasize correct classification more.
Decrease C when the model is overfitting or when you have noisy data.
* Increase epsilon to allow for more tolerance of errors, especially if the data has noise or outliers.
* Decrease epsilon if you want the model to fit the training data more closely.
* Increase gamma when you have a small number of training samples or when the data is well-separated and you want a more complex decision boundary.
* Decrease gamma if you have a large number of training samples or if the decision boundary is too complex, leading to overfitting.

It's important to note that the choice of parameters often involves experimentation and tuning, and there's no one-size-fits-all solution. Cross-validation and grid search techniques can help in finding the optimal values for these parameters.

5. Assignment: Import dataset from scikit-learn  .( Train , Test Scaling , Normalization ,SVC classifier and train , Prediction ,Metrics , RandomizedCV to the dataset .)

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report

# Load the dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [None]:
# Feature scaling (standardization)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
X_test_scaled

array([[ 0.35451684, -0.58505976,  0.55777524,  0.02224751],
       [-0.13307079,  1.65083742, -1.16139502, -1.17911778],
       [ 2.30486738, -1.0322392 ,  1.8185001 ,  1.49058286],
       [ 0.23261993, -0.36147005,  0.44316389,  0.4227026 ],
       [ 1.2077952 , -0.58505976,  0.61508092,  0.28921757],
       [-0.49876152,  0.75647855, -1.27600637, -1.04563275],
       [-0.2549677 , -0.36147005, -0.07258719,  0.15573254],
       [ 1.32969211,  0.08570939,  0.78699794,  1.49058286],
       [ 0.47641375, -1.92659808,  0.44316389,  0.4227026 ],
       [-0.01117388, -0.80864948,  0.09932984,  0.02224751],
       [ 0.84210448,  0.30929911,  0.78699794,  1.09012776],
       [-1.23014297, -0.13788033, -1.33331205, -1.44608785],
       [-0.37686461,  0.98006827, -1.39061772, -1.31260282],
       [-1.10824606,  0.08570939, -1.27600637, -1.44608785],
       [-0.86445224,  1.65083742, -1.27600637, -1.17911778],
       [ 0.59831066,  0.53288883,  0.55777524,  0.55618763],
       [ 0.84210448, -0.

In [None]:
# Feature normalization (min-max scaling)
normalizer = MinMaxScaler()
X_train_normalized = normalizer.fit_transform(X_train)
X_test_normalized = normalizer.transform(X_test)
X_test_normalized

array([[0.52941176, 0.33333333, 0.64912281, 0.45833333],
       [0.41176471, 0.75      , 0.12280702, 0.08333333],
       [1.        , 0.25      , 1.03508772, 0.91666667],
       [0.5       , 0.375     , 0.61403509, 0.58333333],
       [0.73529412, 0.33333333, 0.66666667, 0.54166667],
       [0.32352941, 0.58333333, 0.0877193 , 0.125     ],
       [0.38235294, 0.375     , 0.45614035, 0.5       ],
       [0.76470588, 0.45833333, 0.71929825, 0.91666667],
       [0.55882353, 0.08333333, 0.61403509, 0.58333333],
       [0.44117647, 0.29166667, 0.50877193, 0.45833333],
       [0.64705882, 0.5       , 0.71929825, 0.79166667],
       [0.14705882, 0.41666667, 0.07017544, 0.        ],
       [0.35294118, 0.625     , 0.05263158, 0.04166667],
       [0.17647059, 0.45833333, 0.0877193 , 0.        ],
       [0.23529412, 0.75      , 0.0877193 , 0.08333333],
       [0.58823529, 0.54166667, 0.64912281, 0.625     ],
       [0.64705882, 0.41666667, 0.84210526, 0.875     ],
       [0.38235294, 0.20833333,

In [None]:
# Train an SVC classifier (using scaled features)
svc_classifier = SVC(kernel='rbf', C=1.0, gamma='scale')  # Example parameters
svc_classifier.fit(X_train_scaled, y_train)



In [None]:
# Make predictions on the test set
y_pred_scaled = svc_classifier.predict(X_test_scaled)
y_pred_scaled

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0])

In [None]:
# Evaluate metrics
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)
print("Accuracy (scaled):", accuracy_scaled)


Accuracy (scaled): 1.0


In [None]:
# Classification report
print("Classification Report (scaled):\n", classification_report(y_test, y_pred_scaled))


Classification Report (scaled):
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



In [None]:
# Hyperparameter tuning using RandomizedSearchCV
param_distributions = {
    'C': [0.1, 1, 10],
    'gamma': ['scale', 'auto'],
    'kernel': ['linear', 'rbf', 'poly', 'sigmoid']
}

randomized_search = RandomizedSearchCV(SVC(), param_distributions, n_iter=10, cv=5, random_state=42)
randomized_search.fit(X_train_scaled, y_train)

best_params = randomized_search.best_params_
best_svc_classifier = randomized_search.best_estimator_

print("Best Parameters:", best_params)


Best Parameters: {'kernel': 'linear', 'gamma': 'scale', 'C': 10}


In [None]:
# Make predictions with the best model
y_pred_best = best_svc_classifier.predict(X_test_scaled)
y_pred_best

array([1, 0, 2, 1, 1, 0, 1, 2, 2, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0])

In [None]:
# Evaluate metrics with the best model
accuracy_best = accuracy_score(y_test, y_pred_best)
print("Accuracy (best model):", accuracy_best)

Accuracy (best model): 0.9666666666666667


In [None]:
# Classification report with the best model
print("Classification Report (best model):\n", classification_report(y_test, y_pred_best))

Classification Report (best model):
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.89      0.94         9
           2       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30

