<a href="https://colab.research.google.com/github/AmruthaReddy1397/MachineLearning/blob/main/Hyperparameter_Tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Perform Hyper Parameter Tuning using Grid Search**

#####Importing Libraries

In [None]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

#####Loading handwritten digits dataset from sklearn

In [None]:
from sklearn.datasets import load_digits

#####Splitting the dataset into feature and target sets

In [None]:
digits = load_digits()
x = digits.data
y = digits.target

In [None]:
#### Splitting the data into train and test sets

In [None]:
from sklearn.model_selection import train_test_split
trainX,testX,trainY,testY = train_test_split(digits.data, digits.target)

##### Building SVM model without hyperparameter tuning
*  Importing SVC from sklearn.svm and the model architecture has hyperparameter values set to:
*  Regularization factor (C) is set to 15
*  The kernel used is sigmoid kernel function

In [None]:
from sklearn.svm import SVC
svclassifier = SVC(C = 15, kernel = 'sigmoid')

##### Training the model

In [None]:
svclassifier.fit(trainX, trainY)

SVC(C=15, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='sigmoid',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

##### Making predictions

In [None]:
y_pred = svclassifier.predict(testX)

##### Determining the accuracy

In [None]:
#Evaluating the Algorithm
from sklearn.metrics import accuracy_score
print(accuracy_score(testY,y_pred))

0.8066666666666666


Now, without hyperparameter tuning, we obtained a model accuracy of 80. This can be improved by hyperparameter tuning which is done with the help of Grid Search.

#####We use SVM classification algorithm and perform grid search on the hyperparameters of SVM algorithm

**Model Hyperparameters:** A model hyperparameter is a parameter that determines model architecture and cannot be learned from or estimated by training the data. They are externally given and are to be tuned in order to build a model to obtain the most optimal performance.

**Hyperparameters of SVM:**

*   C : Regularization Parameter
*   Kernel : The main function of the kernel is to take low dimensional input space and transform it into a higher-dimensional space. It is mostly useful in non-linear separation problem.
*   Kernel values: Linear, rbf, poly, sigmoid, precomputed
*   Degree : Degree of the polynomial kernel and is ignored by other kernels.
*   Gamma : Kernel coefficient for rbf, poly and sigmoid. (scaled,auto) 
*   Default value of gamma is 'scale'
*   coef0 : Independent term in kernel function. It is only significant in poly and sigmoid.
*   Default value of coef0 is '0.0'

**HyperParameter Tuning:**

The process of searching for the most optimal value of a hyperparameter is known as hyperparameter tuning. Using optimal values to build a model results in a better architecture of the model and improved model accuracy.

**Grid Search:**

Grid search calculates the best set of hyperparameter values to build a model. In grid search, a model is built for each possible combination of all of the hyperparameter values provided. Each model is evaluatued and based on the results, the values that build a model with the most optimal performance are considered as best.

Importing svm from sklearn to build model using support vector machine algorithm



In [None]:
from sklearn import svm

Instead of simply fitting the model on the train and test data sets, we use cross-validation alongwith grid search.

**Cross-Validation:**

Cross-validation is a technique in which we train our model using the subset of the data-set and then evaluate using the complementary subset of the data-set.

**K-fold validation:**

In K Fold cross validation, the data is divided into k subsets. It works for 'k' number of times. Each time, one of the k subsets is used as the test set/ validation set and the other k-1 subsets are put together to form a training set. The error estimation is averaged over all k trials to get total effectiveness of the model. This significantly reduces bias as most of the data is used for fitting, and also significantly reduces variance as most of the data is also being used in validation set.

The general procedure is as follows:

*  Shuffle the dataset randomly.
*  Split the dataset into k groups
*  For each unique group:
     1. Take the group as a hold out or test data set
     2. Take the remaining groups as a training data set
     3. Fit a model on the training set and evaluate it on the test set
     4. Retain the evaluation score and discard the model
*  Summarize the skill of the model using the sample of model evaluation scores

Here, the following hyperparameters are tuned:

*  **C (The regularization factor) :** The default value of C is **1**. Six different values are taken which are **1,5,7,10,15,25**
*  **Kernel :** Excluding "precomputed" all other kernel values are taken into consideration.
*  **Gamma :** Although the **default value of gamma** is set to **'scale'**, we perform hyperparameter tuning on it by providing **'auto' and 'scale'** as the values for the parameter.
*  **Degree :** Since, we are using 'poly' and 'sigmoid' kernel fucntions, we use degree as a parameter. Three different values are passed which are **3,4 and 5.**
*  For **cross validation** the number of folds is set to **5** and then the model is fit and the model performance for each combination and split of data is evaluated and presented using cv_results_. 
*  cv_results_ returns a dictionary of all the evaluation metrics from the gridsearch.

In [None]:
from sklearn.model_selection import GridSearchCV
model = GridSearchCV(svm.SVC(coef0 = 0.0), {'C': [1,5,7,10,15,25],'kernel': ['rbf','linear','poly','sigmoid'],'degree': [3,4,5],'gamma' : ['auto','scale']},
                     cv=5, return_train_score=False)
model.fit(digits.data, digits.target)
model.cv_results_

{'mean_fit_time': array([0.47055898, 0.04038973, 0.04850478, 0.30887938, 0.0857162 ,
        0.04146824, 0.04991612, 0.15181007, 0.46677351, 0.04017673,
        0.04936352, 0.31136465, 0.08534813, 0.04192472, 0.04984746,
        0.14989324, 0.46957326, 0.0405755 , 0.05036554, 0.30945153,
        0.08605752, 0.04065304, 0.05258927, 0.15273247, 0.46574292,
        0.04281125, 0.04884148, 0.31640534, 0.07403064, 0.04226875,
        0.04889102, 0.08082914, 0.46428695, 0.04062696, 0.0491775 ,
        0.31388993, 0.08081589, 0.04116364, 0.04879618, 0.08115568,
        0.47042966, 0.04008269, 0.05085273, 0.31455488, 0.07487016,
        0.04160991, 0.0511137 , 0.08396773, 0.46137218, 0.04372954,
        0.04937825, 0.31908603, 0.07833076, 0.04159756, 0.05046844,
        0.07423258, 0.47733111, 0.04055109, 0.04810929, 0.31044946,
        0.07459612, 0.04106512, 0.04975519, 0.07149272, 0.46926308,
        0.04091325, 0.05239511, 0.31061506, 0.07525973, 0.04054418,
        0.04968414, 0.07280555,

For a better visualization, the dictionary is stored in a dataframe and then evaluation is displayed.

In [None]:
df = pd.DataFrame(model.cv_results_)
df

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_C,param_degree,param_gamma,param_kernel,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.470559,0.006276,0.053037,0.000258,1,3,auto,rbf,"{'C': 1, 'degree': 3, 'gamma': 'auto', 'kernel...",0.411111,0.450000,0.454039,0.448468,0.479109,0.448545,0.021761,124
1,0.040390,0.001095,0.010493,0.000255,1,3,auto,linear,"{'C': 1, 'degree': 3, 'gamma': 'auto', 'kernel...",0.963889,0.919444,0.966574,0.963788,0.924791,0.947697,0.020978,55
2,0.048505,0.002292,0.012267,0.000322,1,3,auto,poly,"{'C': 1, 'degree': 3, 'gamma': 'auto', 'kernel...",0.983333,0.944444,0.980501,0.988858,0.947075,0.968842,0.019056,17
3,0.308879,0.003906,0.038873,0.000469,1,3,auto,sigmoid,"{'C': 1, 'degree': 3, 'gamma': 'auto', 'kernel...",0.100000,0.100000,0.103064,0.100279,0.100279,0.100724,0.001177,127
4,0.085716,0.002982,0.024445,0.000196,1,3,scale,rbf,"{'C': 1, 'degree': 3, 'gamma': 'scale', 'kerne...",0.961111,0.944444,0.983287,0.988858,0.938719,0.963284,0.020086,40
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
139,0.307232,0.001971,0.039789,0.000345,25,5,auto,sigmoid,"{'C': 25, 'degree': 5, 'gamma': 'auto', 'kerne...",0.100000,0.100000,0.103064,0.100279,0.100279,0.100724,0.001177,127
140,0.075421,0.001812,0.021716,0.000301,25,5,scale,rbf,"{'C': 25, 'degree': 5, 'gamma': 'scale', 'kern...",0.980556,0.958333,0.983287,0.988858,0.958217,0.973850,0.012995,4
141,0.041664,0.001216,0.010675,0.000366,25,5,scale,linear,"{'C': 25, 'degree': 5, 'gamma': 'scale', 'kern...",0.963889,0.919444,0.966574,0.963788,0.924791,0.947697,0.020978,55
142,0.050904,0.000624,0.010710,0.000241,25,5,scale,poly,"{'C': 25, 'degree': 5, 'gamma': 'scale', 'kern...",0.961111,0.925000,0.977716,0.986072,0.935933,0.957167,0.023490,44


Only the hyperparameter columns and the average performance score is displayed.

In [None]:
df[['param_C','param_degree','param_gamma','param_kernel','mean_test_score']]

Unnamed: 0,param_C,param_degree,param_gamma,param_kernel,mean_test_score
0,1,3,auto,rbf,0.448545
1,1,3,auto,linear,0.947697
2,1,3,auto,poly,0.968842
3,1,3,auto,sigmoid,0.100724
4,1,3,scale,rbf,0.963284
...,...,...,...,...,...
139,25,5,auto,sigmoid,0.100724
140,25,5,scale,rbf,0.973850
141,25,5,scale,linear,0.947697
142,25,5,scale,poly,0.957167


Using best_params_, the best values of each corresponding hyperparameter is displayed.

In [None]:
model.best_params_

{'C': 7, 'degree': 3, 'gamma': 'scale', 'kernel': 'rbf'}

The best accuracy score of the model is displayed using best_score_

In [None]:
model.best_score_

0.974405756731662

Before hyperparameter tuning the model accuracy score was 80 and after hyperparameter tuning using Grid Search, the model accuracy score improved to 97.