# Hyperparameter Tuning
Hyperparameter is the process of finding the best combination of parameters for a model.

#### Types:
1. `Grid Search`: Searches through all possible combinations of hyperparameters.
2. `Random Search`: Samples random combinations of hyperparameters.
3. `Bayesian Optimization`: Model the objective function and search for the best hyperparameters.
4. `Gradiant-based Optimization`: Uses gradient descent to find the best hyperparameters.


### Cross Validation
Cross validation is a statistical technique used in machine learning and data analysis to evaluate how well a model is able to generalize to new data.

so, GridSearchCV will check the all combinations of hyperparameters and choose the best one.

In [12]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.metrics import accuracy_score, confusion_matrix

In [13]:
# load iris dataset form sklearn
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target

#### GridSearchCV

In [None]:
%%time
# define model
model = RandomForestClassifier()

# create the RandomForestClassifier parameters
parameters = {
    # total 6
    'n_estimators': [50, 100, 200, 300, 400, 500], # number of trees
    # total 7
    'max_depth': [4, 5, 6, 7, 8, 9, 10], # maximum depth of the tree
    # total 2
    'criterion': ['gini', 'entropy'], # splitting criterion
    # total 2
    'bootstrap': [True, False], # whether to use bootstrap sampling
}
# 6 * 7 * 2 * 2 = 168 so total checks will 168

# set up the grid search
grid = GridSearchCV(
    estimator=model, # our model
    param_grid=parameters, # parameters we want to check
    cv=5, # cross-validation splits
    scoring='accuracy', # scoring metric
    verbose=1,
    n_jobs=-1
    )

# fit the model
grid.fit(X, y)

# print the best hyperparameters
print('Best hyperparameters:', grid.best_params_)

Fitting 5 folds for each of 168 candidates, totalling 840 fits
Best hyperparameters: {'bootstrap': True, 'criterion': 'gini', 'max_depth': 4, 'n_estimators': 400}
CPU times: total: 2.7 s
Wall time: 2min 42s


In [None]:
%%time
# define model
model = RandomForestClassifier()

# create the RandomForestClassifier parameters
parameters = {
    # total 6
    'n_estimators': [50, 100, 200, 300, 400, 500], # number of trees
    # total 7
    'max_depth': [4, 5, 6, 7, 8, 9, 10], # maximum depth of the tree
}
# 6 * 7 = 42 so total checks will 52

# set up the grid search
grid = GridSearchCV(
    estimator=model, # our model
    param_grid=parameters, # parameters we want to check
    cv=5, # cross-validation splits
    scoring='accuracy', # scoring metric
    verbose=1,
    n_jobs=-1
    )

# fit the model
grid.fit(X, y)

# print the best hyperparameters
print('Best hyperparameters:', grid.best_params_)

Fitting 5 folds for each of 42 candidates, totalling 210 fits
Best hyperparameters: {'max_depth': 4, 'n_estimators': 100}
CPU times: total: 453 ms
Wall time: 48.4 s


#### RandomSearchCV

In [None]:
%%time
# define model
model = RandomForestClassifier()

# # create the RandomForestClassifier parameters
parameters = {
    'n_estimators': [50, 100, 200, 300, 400, 500], # number of trees
    'max_depth': [4, 5, 6, 7, 8, 9, 10], # maximum depth of the tree
    'criterion': ['gini', 'entropy'], # splitting criterion
    'bootstrap': [True, False], # whether to use bootstrap sampling
}

# set up the grid search
grid = RandomizedSearchCV(
    estimator=model, # our model
    param_distributions=parameters, # parameters grid we want to check
    cv=5, # cross-validation splits
    scoring='accuracy', # scoring metric
    verbose=1,
    n_jobs=-1,
    n_iter=20, # number of iterations
    )

# fit the model
grid.fit(X, y)

# print the best hyperparameters
print('Best hyperparameters:', grid.best_params_)

Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best hyperparameters: {'n_estimators': 100, 'max_depth': 6, 'criterion': 'entropy', 'bootstrap': True}
CPU times: total: 422 ms
Wall time: 13.7 s
