# Hyperparameter tuning of Machine Learning Models

* Hyperparameter tuning is the process of `finding the best hyperparameters` for a machine learning model.
* `Hyperparameters` are the parameters that are not learned by the model. 
* They are set before training the model. 
* The process of hyperparameter tuning is also known as `hyperparameter optimization.`

## Why Hyperparameter Tuning is Important?

* Hyperparameter tuning is important because the performance of the model is highly dependent on the hyperparameters.
* The right choice of hyperparameters can make a huge difference in the performance of the model.
* Hyperparameter tuning helps to find the best hyperparameters for the model which results in the best performance.
* It helps to improve the performance of the model.
* It helps to avoid overfitting and underfitting.
* It helps to make the model more robust.

## Techniques for Hyperparameter Tuning

There are several techniques for hyperparameter tuning. Some of the most popular techniques in Scikit-learn are:
* Grid Search
* Random Search
* Successive Halving
  * Halving Grid Search
  * Halving Random Search 

## Example of Hyperparameter Tuning

In [None]:
import seaborn as sns
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV, HalvingGridSearchCV, HalvingRandomSearchCV
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

# Hyperparameter Tuning with scikit-learn on the Tips Dataset
# This notebook demonstrates how to perform hyperparameter tuning using scikit-learn's GridSearchCV on the Tips dataset.


# Load and Explore the Data
tips = sns.load_dataset('tips')
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [3]:
%%time
# Preprocess the Data
# Convert categorical variables using one-hot encoding
tips_encoded = pd.get_dummies(tips, drop_first=True)

# Define features and target variable
X = tips_encoded.drop('tip', axis=1)
y = tips_encoded['tip']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Hyperparameter Tuning with GridSearchCV
# Define the model
rf = RandomForestRegressor(random_state=42)

# Define the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10]
}

# Initialize GridSearchCV
grid_search = GridSearchCV(
    estimator=rf,
    param_grid=param_grid,
    cv=5,
    # n_jobs=-1,
    scoring='neg_mean_squared_error'
)

# Fit GridSearchCV
grid_search.fit(X_train, y_train)

# Best Parameters and Evaluation
print(f"Best Parameters: {grid_search.best_params_}")

# Predict on test set
best_rf = grid_search.best_estimator_
y_pred = best_rf.predict(X_test)

# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error on Test Set: {mse:.2f}")

Best Parameters: {'max_depth': 10, 'min_samples_split': 10, 'n_estimators': 100}
Mean Squared Error on Test Set: 0.97
CPU times: total: 26.9 s
Wall time: 31.4 s


In [6]:
%%time
from sklearn.experimental import enable_halving_search_cv
from sklearn.model_selection import RandomizedSearchCV, HalvingGridSearchCV, HalvingRandomSearchCV
# Preprocess the Data
# Convert categorical variables using one-hot encoding
tips_encoded = pd.get_dummies(tips, drop_first=True)

# Define features and target variable
X = tips_encoded.drop('tip', axis=1)
y = tips_encoded['tip']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Hyperparameter Tuning with GridSearchCV
# Define the model
rf = RandomForestRegressor(random_state=42)

# Define the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10]
}

# Initialize RandomizedSearchCV
random_search = RandomizedSearchCV(
    estimator=rf,
    param_distributions=param_grid,
    n_iter=10,
    cv=5,
    # n_jobs=-1,
    scoring='neg_mean_squared_error'
)

# Fit RandomizedSearchCV
random_search.fit(X_train, y_train)

# Best Parameters and Evaluation
print(f"Best Parameters: {random_search.best_params_}")

# Predict on test set
best_rf = random_search.best_estimator_

y_pred = best_rf.predict(X_test)

# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error on Test Set: {mse:.2f}")

Best Parameters: {'n_estimators': 100, 'min_samples_split': 10, 'max_depth': 10}
Mean Squared Error on Test Set: 0.97
CPU times: total: 6.44 s
Wall time: 8.07 s


In [14]:
%%time
# Initialize HalvingGridSearchCV
halving_grid_search = HalvingGridSearchCV(
    estimator=rf,
    param_grid=param_grid,
    cv=5,
    factor=2,
    # resource='n_estimators',
    scoring='neg_mean_squared_error'
)

# Fit HalvingGridSearchCV
halving_grid_search.fit(X_train, y_train)

# Best Parameters and Evaluation
print(f"Best Parameters: {halving_grid_search.best_params_}")

# Predict on test set
best_rf = halving_grid_search.best_estimator_
y_pred = best_rf.predict(X_test)

# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error on Test Set: {mse:.2f}")

Best Parameters: {'max_depth': None, 'min_samples_split': 5, 'n_estimators': 200}
Mean Squared Error on Test Set: 1.01
CPU times: total: 55.3 s
Wall time: 1min 3s
