# Hyperparameter Tuning

Hyperpermeter tuning is a process of finding optimal values for the parameters of a machine learning model. It is done to improve the performance of the model.

**Types:**
- Grid Search: Searches for the best combination of parameters
- Random Search: Searches for the best combination of parameters
- Bayesian Optimization: Searches for the best combination of parameters
- Gradient-based Optimization: Searches for the best combination of parameters based on gradients.

# Cross Validation

Cross validation is a process of evaluating the performance of a machine learning model on unseen data. It is used to prevent overfitting.

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [5]:
# Load the data
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
y = iris.target


In [7]:
%%time
# Define the model
model = RandomForestClassifier()

# Create the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200, 300, 400, 500],
    'max_depth': [4, 5,6, 7, 8, 9, 10],
    'criterion': ['gini', 'entropy'],
    'bootstrap': [True, False]   
}

# Set up the grid
grid = GridSearchCV(
    estimator=model, 
    param_grid=param_grid, 
    cv=5, 
    scoring='accuracy',
    n_jobs=-1,
    verbose=1
)

# Fit the model
grid.fit(X, y)

# Print the best parameter
print(f"Best Parameter: {grid.best_params_}")

Fitting 5 folds for each of 168 candidates, totalling 840 fits


Best Parameter: {'bootstrap': True, 'criterion': 'gini', 'max_depth': 4, 'n_estimators': 50}
CPU times: total: 8.86 s
Wall time: 5min 52s


In [11]:
%%time
from sklearn.model_selection import RandomizedSearchCV
# Define the model
model = RandomForestClassifier()

# Create the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200, 300, 400, 500],
    'max_depth': [4, 5,6, 7, 8, 9, 10],
    'criterion': ['gini', 'entropy'],
    'bootstrap': [True, False]
}

# Set up the grid
grid = RandomizedSearchCV(
    estimator=model, 
    param_distributions=param_grid, 
    cv=5, 
    scoring='accuracy',
    n_jobs=-1,
    verbose=1,
    n_iter=20
)

# Fit the model
grid.fit(X, y)

# Print the best parameter
print(f"Best Parameter: {grid.best_params_}")

Fitting 5 folds for each of 20 candidates, totalling 100 fits


Best Parameter: {'n_estimators': 200, 'max_depth': 9, 'criterion': 'gini', 'bootstrap': True}
CPU times: total: 1.33 s
Wall time: 49.1 s
