# Hyperparameter tuning of Machine Learning Models

* Hyperparameter tuning is the process of `finding the best hyperparameters` for a machine learning model.
* `Hyperparameters` are the parameters that are not learned by the model. 
* They are set before training the model. 
* The process of hyperparameter tuning is also known as `hyperparameter optimization.`

## Why Hyperparameter Tuning is Important?

* Hyperparameter tuning is important because the performance of the model is highly dependent on the hyperparameters.
* The right choice of hyperparameters can make a huge difference in the performance of the model.
* Hyperparameter tuning helps to find the best hyperparameters for the model which results in the best performance.
* It helps to improve the performance of the model.
* It helps to avoid overfitting and underfitting.
* It helps to make the model more robust.

## Techniques for Hyperparameter Tuning

There are several techniques for hyperparameter tuning. Some of the most popular techniques in Scikit-learn are:
* Grid Search
* Random Search
* Successive Halving
  * Halving Grid Search
  * Halving Random Search 

**ARTICLE**: https://scikit-learn.org/stable/modules/grid_search.html

## Example of Hyperparameter Tuning

In [None]:
import seaborn as sns
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV #, HalvingGridSearchCV, HalvingRandomSearchCV
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

# Hyperparameter Tuning with scikit-learn on the Tips Dataset
# This notebook demonstrates how to perform hyperparameter tuning using scikit-learn's GridSearchCV on the Tips dataset.


# Load and Explore the Data
tips = sns.load_dataset('tips')
tips.head()

In [None]:
%%time
# Preprocess the Data
# Convert categorical variables using one-hot encoding
tips_encoded = pd.get_dummies(tips, drop_first=True)
print(tips_encoded.head())

# Define features and target variable
X = tips_encoded.drop('tip', axis=1)
y = tips_encoded['tip']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Hyperparameter Tuning with GridSearchCV
# Define the model
rf = RandomForestRegressor(random_state=42) 
print("Parameter that could be tuned: ", rf.get_params()) # list all hyperparameters

# Define the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200], # length = 3
    'max_depth': [None, 10, 20, 30], # length = 4
    'min_samples_split': [2, 5, 10], # length = 3
    # 'min_samples_leaf': [1, 2, 3], 
    # 'max_features': ['auto', 'sqrt', 'log2']
}
# total combination of parameters = 3 * 4 * 3 = 36 combinations. It means model will be trained 36 times.
# GridSearchCV will evaluate all combinations of these parameters. i.e; 36
# RandomizedSearchCV will evaluate a random subset of these combinations i.e; 10 or 20 combinations which are less than 36. Its more efficient for large parameter grids

# Initialize GridSearchCV
grid_search = GridSearchCV(
    estimator=rf,
    param_grid=param_grid,
    cv=5, # cross-validate 5 times, means: 36 combinations will be evaluated 5 times each. total = 36 * 5 = 180 model evaluations
    scoring='neg_mean_squared_error', # scoring metric for regression tasks. Other scoring metrics: roc_auc, accuracy
    # n_jobs=-1,
    # verbose=1,
)

# Fit GridSearchCV
grid_search.fit(X_train, y_train)

# Best Parameters and Evaluation
print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best Score (Negative MSE): {grid_search.best_score_:.2f}") # Best score is the negative mean squared error, so we take the absolute value
print(f"Best Score (MSE): {-grid_search.best_score_:.2f}")

# Predict on test set
best_rf = grid_search.best_estimator_ # best fitted model
y_pred = best_rf.predict(X_test)

# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error on Test Set: {mse:.2f}")

In [None]:
%%time
from sklearn.model_selection import RandomizedSearchCV
# Preprocess the Data
# Convert categorical variables using one-hot encoding
tips_encoded = pd.get_dummies(tips, drop_first=True)

# Define features and target variable
X = tips_encoded.drop('tip', axis=1)
y = tips_encoded['tip']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Hyperparameter Tuning with GridSearchCV
# Define the model
rf = RandomForestRegressor(random_state=42)

# Define the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10]
}

# Initialize RandomizedSearchCV
random_search = RandomizedSearchCV(
    estimator=rf,
    param_distributions=param_grid,
    n_iter=10, # Number of parameters to pick from 36 combinations. remaining combinations will be randomly selected. i.e; 10 combinations will be evaluated
    cv=5, # cross-validate 5 times, means: 10 combinations will be evaluated 5 times each. total = 10 * 5 = 50 model evaluations
    # n_jobs=-1,
    scoring='neg_mean_squared_error'
)

# Fit RandomizedSearchCV
random_search.fit(X_train, y_train)

# Best Parameters and Evaluation
print(f"Best Parameters: {random_search.best_params_}")
print(f"Best Score (Negative MSE): {random_search.best_score_:.2f}")

# Predict on test set
best_rf = random_search.best_estimator_

y_pred = best_rf.predict(X_test)

# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error on Test Set: {mse:.2f}")

In [None]:
%%time

from sklearn.experimental import enable_halving_search_cv
from sklearn.model_selection import RandomizedSearchCV, HalvingGridSearchCV, HalvingRandomSearchCV

# Initialize HalvingGridSearchCV
halving_grid_search = HalvingGridSearchCV(
    estimator=rf,
    param_grid=param_grid,
    cv=5, # cross-validate 5 times, means: 36 combinations will be evaluated 5 times each. total = 36 * 5 = 180 model evaluations
    factor=2, # Halving factor, means: each iteration will halve the number of candidates
    # resource='n_estimators',
    scoring='neg_mean_squared_error'
)

# Fit HalvingGridSearchCV
halving_grid_search.fit(X_train, y_train)

# Best Parameters and Evaluation
print(f"Best Parameters: {halving_grid_search.best_params_}")

# Predict on test set
best_rf = halving_grid_search.best_estimator_
y_pred = best_rf.predict(X_test)

# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error on Test Set: {mse:.2f}")