https://www.geeksforgeeks.org/machine-learning/hyperparameter-tuning/

Hyperparameter tuning is the process of selecting the optimal values for a machine learning model's hyperparameters. These are typically set before the actual training process begins and control aspects of the learning process itself.

Effective tuning helps the model learn better patterns, avoid overfitting or underfitting and achieve higher accuracy on unseen data.

In [12]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets
from sklearn.linear_model import LogisticRegression

In [13]:
# load iris dataset
iris = datasets.load_iris()

# only take the first two features
X = iris.data[:, :2]

y = iris.target

In [14]:
# create model
model = LogisticRegression()

# define a range of C values using logarithmic scale
c_space = np.logspace(-5, 8, 15)

# define parameters
param_grid = {'C': c_space}

### GridSearchCV

It trains the model using all possible combinations of specified hyperparameter values to find the best-performing setup. It is slow and uses a lot of computer power which makes it hard to use with big datasets or many settings.

Steps:

  - Create a grid of potential values for each hyperparameter

  - Train the model for every combination in the grid

  - Evaluate each model using cross-validation

  - Select the combination that gives the highest score

In [15]:
from sklearn.model_selection import GridSearchCV

# GridSearchCV tries all combinations from param_grid and uses 5-fold cross-validation
logreg_cv = GridSearchCV(model, param_grid, cv=5)

logreg_cv.fit(X, y)

print("Tuned Logistic Regression Parameters: {}".format(logreg_cv.best_params_))
print("Best score is {}".format(logreg_cv.best_score_))

Tuned Logistic Regression Parameters: {'C': np.float64(0.05179474679231213)}
Best score is 0.8200000000000001


This represents the highest accuracy achieved by the model using the hyperparameter combination C = 0.0517.

The best score of 0.82 means the model achieved 82% accuracy on the validation data during the grid search process.

### RandomizedSearchCV

Picks random combinations of hyperparameters from the given ranges instead of checking every single combination like GridSearchCV

In [16]:
from sklearn.model_selection import RandomizedSearchCV

logreg_rs = RandomizedSearchCV(model, param_grid, cv=5)

logreg_rs.fit(X, y)

print("Tuned Logistic Regression Parameters: {}".format(logreg_rs.best_params_))
print("Best score is {}".format(logreg_rs.best_score_))

Tuned Logistic Regression Parameters: {'C': np.float64(0.05179474679231213)}
Best score is 0.8200000000000001


The result is the same.