# Hyperparameter Tuning

Hyperparameter tuning is the process of finding the best combination of hyperparameter for a given model.

**Types:**

- Grid Search: Exhaustive Search over all possible combinations of hyperparameters.
- Random Search: Randomly sample combinations of hyperparameters from a given distribution.
- Bayesian Optimization: Model the objective function and search for the maximum.
- Gradient-based-optimization: Use gradient descent to find the minimum of the objective function.

# Cross Validation
Cross validation is a technique to evaluate a model on unseen data. It is used to check how well the model grneralizes to new data.

In [1]:
# Import libraries
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [2]:
# load the data
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
y = iris.target

In [3]:
%%time
# define the model
model = RandomForestClassifier()

# create the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200, 300, 400, 500], #6
    # 'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth': [4, 5, 6, 7, 8, 9, 10], #7
    'criterion': ['gini', 'entropy'], #2
    'bootstrap': [True, False] #2           if we multiply all above (6*7*2*2) then it will be 168 (its the number mention in first output line as soon as we run the model)
}

# set up the grid search
grid = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    cv=5,
    scoring='accuracy',
    verbose=1,
    n_jobs=-1   
)

# fit the model
grid.fit(X, y)

# print the best parameters
print(f'Best Parameters: {grid.best_params_}')

Fitting 5 folds for each of 168 candidates, totalling 840 fits
Best Parameters: {'bootstrap': True, 'criterion': 'gini', 'max_depth': 4, 'n_estimators': 50}
CPU times: total: 1.52 s
Wall time: 1min 30s


# above is grid classification

# RandomizedSearchCV

In [4]:
%%time

from sklearn.model_selection import RandomizedSearchCV
# define the model
model = RandomForestClassifier()

# create the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200, 300, 400, 500],
    # 'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth': [4, 5, 6, 7, 8, 9, 10],
    'criterion': ['gini', 'entropy'],
    'bootstrap': [True, False]
}

# set up the grid search
grid = RandomizedSearchCV(
    estimator=model,
    param_distributions=param_grid,
    cv=5,
    scoring='accuracy',
    verbose=1,
    n_jobs=-1,
    n_iter=20
)

# fit the model
grid.fit(X, y)

# print the best parameters
print(f'Best Parameters: {grid.best_params_}')

Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best Parameters: {'n_estimators': 200, 'max_depth': 7, 'criterion': 'gini', 'bootstrap': True}
CPU times: total: 312 ms
Wall time: 9.28 s
