# Hyperparameter Tuning

When you're training machine learning models, each dataset and model needs a different set of hyperparameters, which are a kind of variable. The only way to determine these is through multiple experiments, where you pick a set of hyperparameters and run them through your model. This is called hyperparameter tuning.

**Types**
- **Grid Search:** Exhaustive search all over possibile combitations of hyperparameters.
- **Random Search:** Randomly sample combinations of hyperparameters for a given distribution.
- **Bayesian Optimization:** Model the objective function and search for the maximum.
- **Gradient-based Optimization:** Use Gradient Descentto find the minimum of the objective function.

In [25]:
# import the liberaries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# import ML liberaries
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [3]:
# load the dataset using seaborn
df = sns.load_dataset('iris')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [5]:
# Choose features (X) and labels (y)
X = df.drop('species', axis = 1)
y = df['species']

In [23]:
# call the model
model = RandomForestClassifier()

# create the hyperparameter grid
param_grid = {
    'n_estimators' : [50, 100, 200, 300, 400, 500],
    'criterion' : ['gini', 'entropy'],
    'max_depth' :  [4, 5, 6, 7, 8, 9, 10],
    'max_features' : ['auto', 'sqrt', 'log2'] 
}

# set up the grid
grid  = GridSearchCV(
    estimator = model,
    param_grid = param_grid,
    cv = 5,
    scoring = 'accuracy',
    verbose = 1,
    n_jobs = -1

)

In [24]:
# train the model
grid.fit(X, y)

# print the best parameters
print(f"Best Parameters: {grid.best_params_}")

Fitting 5 folds for each of 252 candidates, totalling 1260 fits
Best Parameters: {'criterion': 'gini', 'max_depth': 4, 'max_features': 'sqrt', 'n_estimators': 200}


In [26]:
# call the model
model = RandomForestClassifier()

# create the hyperparameter grid
param_grid = {
    'n_estimators' : [50, 100, 200, 300, 400, 500],
    'criterion' : ['gini', 'entropy'],
    'max_depth' :  [4, 5, 6, 7, 8, 9, 10],
    'max_features' : ['auto', 'sqrt', 'log2'] 
}

# set up the grid
grid  = RandomizedSearchCV(
    estimator = model,
    param_distributions = param_grid,
    cv = 5,
    scoring = 'accuracy',
    verbose = 1,
    n_jobs = -1

)

In [29]:
%%time

# train the model
grid.fit(X, y)

# print the best parameters
print(f"Best Parameters: {grid.best_params_}")

Fitting 5 folds for each of 10 candidates, totalling 50 fits


Best Parameters: {'n_estimators': 400, 'max_features': 'sqrt', 'max_depth': 7, 'criterion': 'entropy'}
CPU times: total: 4.34 s
Wall time: 1min 14s
