# Hyperparameter tuning
`Hyperparameter tuning` is the process of finding the best set of hyperparameters for a model.
`Types` of Hyperparameter tuning:
1. Grid Search:is a method for searching the hyperparameter space by evaluating all combinations of hyperparameters.
2. Random Search:is a method for searching the hyperparameter space by evaluating a random sample of hyperparameters.
3. Bayesian Optimization:is a method for searching the hyperparameter space by optimizing a surrogate model.
4. Gradient Descent:is a method for searching the hyperparameter space by optimizing a loss function
# Cross-validation
`Cross vaalidation` is the process of evaluating the performance of a model on different subsets of the data.


In [26]:
# import libraries 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report,precision_score,recall_score,f1_score
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import GridSearchCV


In [27]:
# Load the Data using sciketlearn
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target


In [28]:
# define the model
model = RandomForestClassifier()

In [29]:
# create the parameter grid
param_grid = {
    'n_estimators': [10, 50, 100, 200],
    'max_depth': [4, 5, 10, 20],
    'criterion': ['gini', 'entropy'],
    'bootstrap': [True, False]
    
   
}


In [30]:
%%time
#setup the grid
grid = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    cv=5,
    verbose=2,
    scoring='accuracy',
    n_jobs=-1
)
# fit the model
grid.fit(X,y)
# print the best parameters
print(grid.best_params_)
# print the best score
print(grid.best_score_)

Fitting 5 folds for each of 64 candidates, totalling 320 fits
{'bootstrap': True, 'criterion': 'gini', 'max_depth': 4, 'n_estimators': 100}
0.9666666666666668
CPU times: total: 922 ms
Wall time: 12.5 s


In [39]:
%%time
from sklearn.model_selection import RandomizedSearchCV
# define the model
model_01 = RandomForestClassifier()
# create the parameter grid
param_grid_01 = {
    'n_estimators': [10, 50, 100, 200],
    'max_depth': [4, 5, 10, 20],
    'criterion': ['gini', 'entropy'],
    'bootstrap': [True, False]
}
#setup the grid
grid = RandomizedSearchCV(
    estimator=model_01,
    param_distributions=param_grid_01,
    cv=5,
    verbose=2,
    scoring='accuracy',
    n_jobs=-1
)
# fit the model
grid.fit(X,y)
# print the best parameters
print(grid.best_params_)
# print the best score
print(grid.best_score_)

Fitting 5 folds for each of 10 candidates, totalling 50 fits
{'n_estimators': 50, 'max_depth': 5, 'criterion': 'entropy', 'bootstrap': True}
0.9666666666666668
CPU times: total: 125 ms
Wall time: 1.81 s
