# Hyperparameter Tuning
Hyperparameter tuning is the process of finding the best combination of hypermaters for a given model.
## Types
- Grid Search: Exaustive search over all possible combinations of hyperparameters.
- Random Search: Randomly sample combinations of hyperparameters from a given distribution.
- Bayesian Optimization: Model the objective function and search for the maximum.
- Gradient-based Optimization: Use gradient descent to find the minimum of the objective function.
# Cross Validation
Cross Validation is a technique used to evaluate the performance of a model on unseen data. It is used to check how well the model generalizes to new data.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [2]:
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target

In [3]:
%%time
# define the model
model = RandomForestClassifier()

# create the parameter grid
param_grid = {
    'n_estimators' :[50, 100, 200],
    'max_depth': [4, 5, 6],
    'criterion': ['gini', 'entropy'],
    'bootstrap': [True, False]
}

# setup the grid
grid = GridSearchCV(
    estimator = model,
    param_grid = param_grid,
    cv = 5,
    scoring = 'accuracy',
    verbose = 1,
    n_jobs = -1
)

# fit the model
grid.fit(X,y)

# print the best parameters
print(f"Best parameters: {grid.best_params_}")

Fitting 5 folds for each of 36 candidates, totalling 180 fits
Best parameters: {'bootstrap': True, 'criterion': 'gini', 'max_depth': 4, 'n_estimators': 100}
CPU times: total: 1.3 s
Wall time: 47.9 s


# Random Search

In [4]:
%%time
# define the model
model = RandomForestClassifier()

# create the parameter grid
param_grid = {
    'n_estimators' :[50, 100, 200],
    'max_depth': [4, 5, 6],
    'criterion': ['gini', 'entropy'],
    'bootstrap': [True, False]
}

# setup the grid
grid = RandomizedSearchCV(
    estimator = model,
    param_distributions = param_grid,
    cv = 5,
    scoring = 'accuracy',
    verbose = 1,
    n_jobs = -1,
    n_iter = 20
)

# fit the model
grid.fit(X,y)

# print the best parameters
print(f"Best parameters: {grid.best_params_}")

Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best parameters: {'n_estimators': 200, 'max_depth': 6, 'criterion': 'entropy', 'bootstrap': True}
CPU times: total: 1.14 s
Wall time: 22.5 s
