Hyperparameter tuning is a very essential part of our machine learning project, with good hyperparameters we can achieve a very good result for our machine learning model. To optimize our machine leanring model performance, we must tune the hyperparameters.

**What is hyperparameter ?**

A hyperparameter is a variable in our machine learning algorithm which we can not learn by training the model and its value can not be estimated based on data. We have to simply put the values according to over model performance and see for which value of hyperparameters the model is giving the best result.

Hyperparameter tuning is a very time-consuming process in the machine learning pipeline if we do it manually and try out every possible combination of hyperparameters for the machine learning algorithm. There are various methods and libraries are available for this task such as :

- GridSearchCV 

- RandomizedSearchCV

- Bayesian Optimization

- Sequential Model Based Optimization(Tuning a scikit-learn estimator with skopt)

- Optuna

- Genetic Algorithms 

- Hyperopt


In this notebook, I am going to discuss how we can tune hyperparameters using GridSearchCV, RandomizedSearchCV, and Bayesian Optimization for various machine learning algorithms and discuss how these methods work.

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [2]:
from sklearn import datasets #import datasets to use in this notebook
from sklearn import model_selection #model_selection provides the methods for hyperparameter tuning
from skopt import BayesSearchCV #importing BayesSearchCV for Bayesian Optimization

    Following evaluation metrics are available in sklearn library.

In [3]:
import sklearn
sklearn.metrics.SCORERS.keys()

dict_keys(['explained_variance', 'r2', 'max_error', 'neg_median_absolute_error', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_root_mean_squared_error', 'neg_mean_poisson_deviance', 'neg_mean_gamma_deviance', 'accuracy', 'roc_auc', 'roc_auc_ovr', 'roc_auc_ovo', 'roc_auc_ovr_weighted', 'roc_auc_ovo_weighted', 'balanced_accuracy', 'average_precision', 'neg_log_loss', 'neg_brier_score', 'adjusted_rand_score', 'homogeneity_score', 'completeness_score', 'v_measure_score', 'mutual_info_score', 'adjusted_mutual_info_score', 'normalized_mutual_info_score', 'fowlkes_mallows_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'jaccard', 'jaccard_macro', 'jaccard_micro', 'jaccard_samples', 'jaccard_weighted'])

I am using boston house price data for in this notebook for Regression algorithms.

In [4]:
#loading bostonhousing dataset for regression problems
boston_data = datasets.load_boston()
r_X = pd.DataFrame(boston_data.data,columns=boston_data.feature_names)
r_y = boston_data.target
print(f"Shape of the dataset : {r_X.shape}")
print(r_X.head())

#applying standard scaling on the dataset
from sklearn.preprocessing import StandardScaler
sc_r = StandardScaler()
sc_r_X = sc_r.fit_transform(r_X)

#split the dataset into test and train 
from sklearn.model_selection import train_test_split
#taking 20% data as test data and 80% data as train data
r_X_train,r_X_test,r_Y_train,r_Y_test = train_test_split(sc_r_X,r_y, test_size=0.20, random_state=42) 

Shape of the dataset : (506, 13)
      CRIM    ZN  INDUS  CHAS    NOX     RM   AGE     DIS  RAD    TAX  \
0  0.00632  18.0   2.31   0.0  0.538  6.575  65.2  4.0900  1.0  296.0   
1  0.02731   0.0   7.07   0.0  0.469  6.421  78.9  4.9671  2.0  242.0   
2  0.02729   0.0   7.07   0.0  0.469  7.185  61.1  4.9671  2.0  242.0   
3  0.03237   0.0   2.18   0.0  0.458  6.998  45.8  6.0622  3.0  222.0   
4  0.06905   0.0   2.18   0.0  0.458  7.147  54.2  6.0622  3.0  222.0   

   PTRATIO       B  LSTAT  
0     15.3  396.90   4.98  
1     17.8  396.90   9.14  
2     17.8  392.83   4.03  
3     18.7  394.63   2.94  
4     18.7  396.90   5.33  


Using Iris flower dataset for classification algorithms

In [5]:
#loading Iris dataset for regression problems
iris_data = datasets.load_iris()
c_X = pd.DataFrame(iris_data.data,columns=iris_data.feature_names)
c_y = iris_data.target
print(f"Shape of the dataset : {c_X.shape}")
print(c_X.head())

#applying standard scaling on the dataset
from sklearn.preprocessing import StandardScaler
sc_c = StandardScaler()
sc_c_X = sc_c.fit_transform(c_X)

#split the dataset into test and train 
from sklearn.model_selection import train_test_split
#taking 20% data as test data and 80% data as train data
c_X_train,c_X_test,c_Y_train,c_Y_test = train_test_split(sc_c_X,c_y, test_size=0.20, random_state=42) 

Shape of the dataset : (150, 4)
   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0                5.1               3.5                1.4               0.2
1                4.9               3.0                1.4               0.2
2                4.7               3.2                1.3               0.2
3                4.6               3.1                1.5               0.2
4                5.0               3.6                1.4               0.2


### GridSearchCV :
GridSearchCV uses every possible combination of given entries for our hyperparameters and gives the output for which set of hyperparameter our model performs best. This method takes very much time because it tries out each and every possible combination. Here we discuss how we can use GridSearchCV using sklearn library.<br>


Some important parameters and attributes which are important for our model :<br>

 ***Parameters*** :
 - estimatior : In this parameter, we pass the function of our model for which we want to tune hyperparameters. The information is only for sklearn library.
 - param_grid : This parameter takes a dictionary which consist of hyperparameter name and their values.
 - scoring : This parameter take a string which define for what evaluation matrics we want to optimize our model.
 - cv : Number of cross validations we want split our training dataset. Default value is 5. <br>
 
 ***Attributes***
 - best_estimator_ : It provide which estimatior gives best result for letf-out data.
 - best_params_ : Parameter setting that gave the best results on the hold out data.
 - best_score_ : Mean cross-validated score of the best_estimator.

Here we discuss how we can tune **Ridge_regression hyperparameters** uning GridSearchCV.

In [6]:
from sklearn.linear_model import Ridge
R_reg = Ridge()
#parameter grid dictionary
params = {
    'alpha' : [0.01, 0.1, 1.0, 10, 100],
    'fit_intercept' : [True,False],
    'normalize' : [True,False]
}
#r2_score is choosen as evaluation matrics
gs = model_selection.GridSearchCV(estimator=R_reg,param_grid=params,scoring='r2',cv=3,n_jobs = -1)
print(gs)
#fit our traing data to GridSearchCV
gs.fit(r_X_train,r_Y_train)
best_est = gs.best_estimator_
print(f"Best Estimator is : {best_est}")

GridSearchCV(cv=3, estimator=Ridge(), n_jobs=-1,
             param_grid={'alpha': [0.01, 0.1, 1.0, 10, 100],
                         'fit_intercept': [True, False],
                         'normalize': [True, False]},
             scoring='r2')
Best Estimator is : Ridge(alpha=0.01, normalize=True)


Here we can see for alpha = 0.01 and normalise = True our model gives best r2_score.

### RandomizedSearchCV :
RandomizedSearchCV generates all possible combination of given hyperparameters and then randomly choose fix numbers of combinations. How many combinations of hyperparameters to be used is given by us. It trains the model for those hyperparameter combinations and use cross-validation technique to measure the accuracy and return the best estimator. It takes less time then GridSearchCV.<br>


Some important parameters and attributes which are important for our model :<br>

 ***Parameters*** :
 - n_iter : It is the parameter that decides for how much combinations model is going to be trained. The default value is 10. If the value fo n_iter is high there may be a high chance we get the best estimator.
 - estimatior : In this parameter, we pass the function of our model for which we want to tune hyperparameters. The information is only for sklearn library.
 - param_distributions : This parameter takes a dictionary which consist of hyperparameter name and their values.
 - scoring : This parameter take a string which define for what evaluation matrics we want to optimize our model.
 - cv : Number of cross validations we want split our training dataset. Default value is 5. <br>
 
 ***Attributes***
 - best_estimator_ : It provide which estimatior gives best result for letf-out data.
 - best_params_ : Parameter setting that gave the best results on the hold out data.
 - best_score_ : Mean cross-validated score of the best_estimator.

Here I am going to discuss we can tune hyperparameters for **support vector machine** classifier.

In [7]:
from sklearn import svm
svm_clf = svm.SVC(probability=True)
#parameter grid dictionary
params = {
    'C': [0.1, 1, 10, 100, 1000],  
    'gamma': [1, 0.1, 0.01, 0.001, 0.0001], 
    'kernel': ['linear','rbf'] #other kernels are ‘sigmoid’, ‘precomputed’, 'ploy'
    
}
#neg_log_loss is choosen as evaluation matrics
#here I am using n_iter = 20 only you can choose more
gs = model_selection.RandomizedSearchCV(n_iter=20,estimator=svm_clf,param_distributions=params,scoring='neg_log_loss',cv=3,n_jobs = -1)
print(gs)
#fit our traing data to RandomizedSearchCV
gs.fit(c_X_train,c_Y_train)
best_est = gs.best_estimator_
print(f"Best Estimator is : {best_est}")

RandomizedSearchCV(cv=3, estimator=SVC(probability=True), n_iter=20, n_jobs=-1,
                   param_distributions={'C': [0.1, 1, 10, 100, 1000],
                                        'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
                                        'kernel': ['linear', 'rbf']},
                   scoring='neg_log_loss')
Best Estimator is : SVC(C=100, gamma=1, kernel='linear', probability=True)


### Bayesian Optimization :
Bayesian optimization works fast and provides more efficient results compare to GridSearchCV and RandomizedSearchCV. This technique is based on Bayes theorem and provides a probabilistically principled method for global optimization. Bayesian Optimization keeps tracks of past combinations that we previously used and according to that, it provides hyperparameter to the model in such a manner that results in improves in each iteration.<br>

***Baysian optimzation is not present in sklearn library for this we use skopt library.***

Some important parameters and attributes which are important for our model :<br>

 - n_iter : It is the parameter that decides for how much combinations model is going to be trained. The default value is 50. If the value fo n_iter is high there may be a high chance we get the best estimator.
 - estimatior : In this parameter, we pass the function of our model for which we want to tune hyperparameters. The information is only for sklearn library.
 - search_spaces : This parameter takes a dictionary which consist of hyperparameter name and their values.
 - optimizer_kwargs : Dictionary of arguments passed to Optimizer. For example, {'base_estimator': 'RF'} would use a Random Forest surrogate instead of the default Gaussian Process.
 - cv : Number of cross validations we want to split our training dataset. Default value is 3. <br>
 
 ***Attributes***
 - best_estimator_ : It provide which estimatior gives best result for letf-out data.
 - best_params_ : Parameter setting that gave the best results on the hold out data.
 - best_score_ : Mean cross-validated score of the best_estimator.

### Hyperparameter Tuning of Random Forest Classifier

In [8]:
#importing skopt library for optimization
import skopt
from sklearn import ensemble

rf_clf = ensemble.RandomForestClassifier()
#parameter grid dictionary
params = {
   'n_estimators': [int(x) for x in np.linspace(50,1000, num=20)], #this return a list with 20 equally spaced numbers between 50 and 1000
   'max_features': [len(c_X.columns), "sqrt", "log2"],
   'max_depth': [int(x) for x in np.linspace(5,50,num=10)],
   'min_samples_split': [3,4,6,7,8,9,10],
   'min_samples_leaf': [1,3,5,7,9,10],
   'bootstrap': [True,False]
    
}

#here I am using n_iter = 50 only you can choose more
gs = skopt.BayesSearchCV(n_iter=50,estimator=rf_clf,search_spaces=params,optimizer_kwargs={'base_estimator': 'RF'},cv=5,n_jobs = -1)
print(gs)
#fit our traing data to BayesSearchCV
gs.fit(c_X_train,c_Y_train)
best_est = gs.best_estimator_
best_score = gs.best_score_
print(f"Best Estimator is : {best_est}")
print(f"Best Score is : {best_score}")

BayesSearchCV(cv=5, estimator=RandomForestClassifier(), n_jobs=-1,
              optimizer_kwargs={'base_estimator': 'RF'},
              search_spaces={'bootstrap': [True, False],
                             'max_depth': [5, 10, 15, 20, 25, 30, 35, 40, 45,
                                           50],
                             'max_features': [4, 'sqrt', 'log2'],
                             'min_samples_leaf': [1, 3, 5, 7, 9, 10],
                             'min_samples_split': [3, 4, 6, 7, 8, 9, 10],
                             'n_estimators': [50, 100, 150, 200, 250, 300, 350,
                                              400, 450, 500, 550, 600, 650, 700,
                                              750, 800, 850, 900, 950, 1000]})




Best Estimator is : RandomForestClassifier(max_depth=50, max_features='log2', min_samples_leaf=7,
                       min_samples_split=8, n_estimators=500)
Best Score is : 0.9583333333333334


*So in this notebook, I discussed some common approaches for hyperparameter tuning which you can be applied in your machine learning algorithm optimization. If you find this notebook useful upvote the note or some mistake in the notebook please mention it in the comment section.*

**References** :

https://scikit-optimize.github.io/stable/modules/generated/skopt.BayesSearchCV.html#skopt.BayesSearchCV

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f

