<a href="https://colab.research.google.com/github/Vaishnavi-P-Kudalkar/LocalRepo/blob/main/SVM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Linear Kernel vs. RBF Kernel
Linear Kernel:

A linear kernel is used when the data is linearly separable, meaning you can draw a straight line (or hyperplane in higher dimensions) to separate the classes. The SVM with a linear kernel tries to find this optimal hyperplane.
In the linear kernel, the relationship between the data points is represented as a dot product of the input vectors.
It is simpler and faster to compute compared to non-linear kernels.
RBF Kernel:

The Radial Basis Function (RBF) kernel, also known as the Gaussian kernel, is used for non-linearly separable data. It maps the input features into higher-dimensional space where a linear separation is possible.
The RBF kernel considers the distance between data points to classify them. The transformation is controlled by the parameter gamma, which determines how far the influence of a single training example reaches.
It is more flexible and can model more complex relationships between data points.

In [5]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np

In [7]:
digits = load_digits()
df = pd.DataFrame(digits.data)
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,54,55,56,57,58,59,60,61,62,63
0,0.0,0.0,5.0,13.0,9.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,6.0,13.0,10.0,0.0,0.0,0.0
1,0.0,0.0,0.0,12.0,13.0,5.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,11.0,16.0,10.0,0.0,0.0
2,0.0,0.0,0.0,4.0,15.0,12.0,0.0,0.0,0.0,0.0,...,5.0,0.0,0.0,0.0,0.0,3.0,11.0,16.0,9.0,0.0
3,0.0,0.0,7.0,15.0,13.0,1.0,0.0,0.0,0.0,8.0,...,9.0,0.0,0.0,0.0,7.0,13.0,13.0,9.0,0.0,0.0
4,0.0,0.0,0.0,1.0,11.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,2.0,16.0,4.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1792,0.0,0.0,4.0,10.0,13.0,6.0,0.0,0.0,0.0,1.0,...,4.0,0.0,0.0,0.0,2.0,14.0,15.0,9.0,0.0,0.0
1793,0.0,0.0,6.0,16.0,13.0,11.0,1.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,6.0,16.0,14.0,6.0,0.0,0.0
1794,0.0,0.0,1.0,11.0,15.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,2.0,9.0,13.0,6.0,0.0,0.0
1795,0.0,0.0,2.0,10.0,7.0,0.0,0.0,0.0,0.0,0.0,...,2.0,0.0,0.0,0.0,5.0,12.0,16.0,12.0,0.0,0.0


In [9]:
X_train, X_test, y_train, y_test = train_test_split(df, digits.target, test_size=0.2, random_state=42)

In [15]:
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

#train and evaluate svm with linear kernel
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
accuracy_linear = accuracy_score(y_test, y_pred_linear)
print("Accuracy:", accuracy_linear)

#train and evaluate svm with rbf kernel
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)
print("Accuracy:", accuracy_rbf)




Accuracy: 0.9777777777777777
Accuracy: 0.9861111111111112


Tune Hyperparameters
Now, let's tune the hyperparameters (C and gamma) to improve the accuracy of the RBF kernel SVM.

Hyperparameter tuning is an essential part of optimizing a machine learning model's performance. It involves selecting the best parameters (hyperparameters) that control the learning process of the algorithm. For an SVM with an RBF kernel, important hyperparameters are C and gamma.

Explanation of Hyperparameters
C (Regularization Parameter):

Controls the trade-off between achieving a low error on the training data and minimizing the model complexity to avoid overfitting.
A smaller C value makes the decision surface smoother, while a larger C value aims to classify all training examples correctly by allowing the model to have a more complex decision surface.
Gamma:

Defines how far the influence of a single training example reaches.
A low gamma means a larger radius of influence for each example, leading to smoother decision boundaries. A high gamma value means the radius of influence is small, capturing more detail but potentially leading to overfitting.

The provided code uses GridSearchCV to find the optimal values of C and gamma for an SVM with an RBF kernel by trying different combinations of these hyperparameters.



In [17]:
from sklearn.model_selection import GridSearchCV

In [19]:
param_grid = {
    'C' : [0.1,1,10,100],
    'gamma': [1,0.1,0.01,],
    'kernel': ['rbf']
}

GridSearchCV is initialized with the SVM classifier (SVC()), the parameter grid (param_grid), and additional parameters (refit=True, verbose=2):
refit=True: Once the best parameters are found, the model is retrained on the entire training set.
verbose=2: This provides detailed output of the grid search process.

In [21]:
grid_search = GridSearchCV(SVC(), param_grid, refit=True, verbose=2)

The grid search is performed on the training data (X_train, y_train). It evaluates all combinations of the parameters in param_grid using cross-validation and selects the combination that yields the best performance.


In [22]:
grid_search.fit(X_train, y_train)

Fitting 5 folds for each of 12 candidates, totalling 60 fits
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.2s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.2s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.2s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.2s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.2s
[CV] END .......................C=0.1, gamma=0.1, kernel=rbf; total time=   0.2s
[CV] END .......................C=0.1, gamma=0.1, kernel=rbf; total time=   0.2s
[CV] END .......................C=0.1, gamma=0.1, kernel=rbf; total time=   0.2s
[CV] END .......................C=0.1, gamma=0.1, kernel=rbf; total time=   0.2s
[CV] END .......................C=0.1, gamma=0.1, kernel=rbf; total time=   0.2s
[CV] END ......................C=0.1, gamma=0.01, kernel=rbf; total time=   0.2s
[CV] END ......................C=0.1, gamma=0.01

best_params: Retrieves the best combination of parameters found by the grid search.
best_score: The best cross-validation score achieved with these parameters.


In [24]:
best_param = grid_search.best_params_
best_score = grid_search.best_score_
print(best_param)
print(best_score)

{'C': 10, 'gamma': 0.01, 'kernel': 'rbf'}
0.7613046844754161


best_model: The model trained with the best hyperparameters.
y_pred_best: Predictions on the test set using the best model.
accuracy_best: Accuracy of the best model on the test set.


In [25]:
best_model = grid_search.best_estimator_
y_pred_best = best_model.predict(X_test)
accuracy_best = accuracy_score(y_test, y_pred_best)
print(accuracy_best)

0.8138888888888889


To visualize the classification of the trained model, we can use techniques like plotting the decision boundaries or visualizing the confusion matrix. However, since the digits dataset is multi-dimensional (each digit is an 8x8 image), visualizing decision boundaries directly is challenging. Instead, we can use dimensionality reduction techniques like PCA (Principal Component Analysis) to project the data into a 2D space and then visualize the decision boundaries.

Visualizing the Classification with PCA
Let's follow these steps:

Reduce the dimensionality of the dataset using PCA.
Train the SVM on the reduced data.
Plot the decision boundaries and the classified points.
First, let's perform PCA to reduce the dataset to 2 dimensions and then train the SVM model on this reduced dataset.

In [31]:
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

#reduce the dimensionality of the dataset to 2 dimension using pca
pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)

#train the svm with best hyper parameter found on the reduced data
best_svm_pca = SVC(C=best_param['C'], gamma=best_param['gamma'],kernel='rbf')
best_svm_pca.fit(X_train_pca, y_train)

#plot the decision boundaries and the classified points
def plot_decision_boundary(X, y, model,ax):
  x_min , x_max = X[:,0].min() - 1, X[:,0].max() + 1
  y_min , y_max = X[:,1].min() - 1, X[:,1].max() + 1
  xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), np.arange(y_min, y_max, 0.01))
  Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
  Z = Z.reshape(xx.shape)
  ax.contourf(xx, yy, Z, alpha=0.8,cmap=plt.cm.RdYlBu)

  #plot also the training points
  scatter = ax.scatter(X[:,0], X[:,1], c=y, edgecolor='k', s=20, cmap=plt.cm.RdYlBu)
  legend = ax.legend(*scatter.legend_elements(), title='Classes')
  ax.add_artist(legend)

  #create a plot
  fig, ax = plt.subplots()
  plot_decision_boundary(X_train_pca, y_test, best_svm_pca, ax)
  ax.set_title('SVM Desision Boundaries with RBF Kernal')
  plt.show()