#### About

> Hyperparameter tuning

Hyperparameter tuning in machine learning refers to the process of selecting the optimal values for the hyperparameters of a machine learning algorithm. Hyperparameters are parameters that are not learned during the training process, but rather set by the user prior to training. They control the behavior and performance of the machine learning model, and tuning them can significantly impact the model's performance.

Hyperparameter tuning in machine learning refers to the process of selecting the optimal values for the hyperparameters of a machine learning algorithm. Hyperparameters are parameters that are not learned during the training process, but rather set by the user prior to training. They control the behavior and performance of the machine learning model, and tuning them can significantly impact the model's performance.

Hyperparameter tuning is important because the performance of a machine learning model can be highly sensitive to the values of hyperparameters. Selecting the right hyperparameter values can result in better model performance, while poor hyperparameter choices can lead to suboptimal performance or even failure to converge during training.

Types of HP Tuning

1. Grid Search: This is a brute-force approach where all possible combinations of hyperparameter values are tried. Grid Search exhaustively searches through a predefined set of hyperparameter values and evaluates the model's performance for each combination. It can be computationally expensive, but it ensures that all possible combinations are explored.

2. Random Search: This approach randomly samples hyperparameter values from predefined distributions. Random Search is less computationally expensive than Grid Search because it does not exhaustively search through all possible combinations. However, it may not be as thorough in exploring the hyperparameter space.

3. Bayesian Optimization: This is a more advanced technique that models the relationship between hyperparameter values and model performance using probabilistic models. It uses this information to guide the search towards more promising regions of the hyperparameter space, which can lead to faster convergence to optimal values.

4. Genetic Algorithms: This technique is inspired by the concept of natural selection and involves evolving a population of hyperparameter configurations over multiple generations. Genetic Algorithms explore the hyperparameter space by evolving and mutating hyperparameter values to find better solutions iteratively.

5. Automated Hyperparameter Tuning: Some machine learning frameworks provide built-in tools for automated hyperparameter tuning, such as scikit-learn's GridSearchCV and RandomizedSearchCV, or TensorFlow's Keras Tuner. These tools automate the process of hyperparameter tuning, making it more efficient and less prone to human error.









In [2]:
#1. Grid search
from sklearn.datasets import load_iris
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

In [3]:
iris = load_iris()


In [4]:
param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']}


In [5]:
svm = SVC()


In [6]:
grid_search = GridSearchCV(svm, param_grid, cv=5)
grid_search.fit(iris.data, iris.target)

In [7]:
# Best hyperparameters and corresponding model performance
best_params = grid_search.best_params_
best_score = grid_search.best_score_
print("Best Hyperparameters:", best_params)
print("Best Score:", best_score)

Best Hyperparameters: {'C': 1, 'kernel': 'linear'}
Best Score: 0.9800000000000001


In [8]:
#2 . Random search
from sklearn.model_selection import RandomizedSearchCV
import numpy as np


In [9]:
param_dist = {'C': np.logspace(-3, 3, 7), 'kernel': ['linear', 'rbf']}


In [10]:
# Perform Random Search
random_search = RandomizedSearchCV(svm, param_distributions=param_dist, n_iter=10, cv=5)
random_search.fit(iris.data, iris.target)

In [11]:

best_params = random_search.best_params_
best_score = random_search.best_score_
print("Best Hyperparameters:", best_params)
print("Best Score:", best_score)

Best Hyperparameters: {'kernel': 'linear', 'C': 1.0}
Best Score: 0.9800000000000001


In [15]:
#3. Bayesian optimization

from skopt import BayesSearchCV
param_space = {'C': (0.1, 10.0), 'kernel': ['linear', 'rbf']}



In [16]:
bayes_search = BayesSearchCV(svm, param_space, n_iter=10, cv=5)
bayes_search.fit(iris.data, iris.target)

In [17]:
best_params = bayes_search.best_params_
best_score = bayes_search.best_score_
print("Best Hyperparameters:", best_params)
print("Best Score:", best_score)

Best Hyperparameters: OrderedDict([('C', 5.957716121282662), ('kernel', 'rbf')])
Best Score: 0.9866666666666667


In [20]:
# 4. Automated hyperparameter tuning
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import optuna




  from .autonotebook import tqdm as notebook_tqdm


In [21]:
# Define an objective function for Optuna to optimize
def objective(trial):
    # Define hyperparameters to tune and their search spaces
    n_estimators = trial.suggest_int('n_estimators', 10, 100)
    max_depth = trial.suggest_int('max_depth', 1, 10)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 10)

    # Create a random forest classifier with the sampled hyperparameters
    model = RandomForestClassifier(n_estimators=n_estimators,
                                   max_depth=max_depth,
                                   min_samples_split=min_samples_split,
                                   random_state=42)

    # Train and evaluate the model
    model.fit(X_train, y_train)
    y_pred = model.predict(X_val)
    accuracy = accuracy_score(y_val, y_pred)

    return accuracy


In [22]:
X_train, X_val, y_train, y_val = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)


In [23]:
# Create an Optuna study
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)  # Run optimization for a certain number of trials


[32m[I 2023-04-21 02:33:39,878][0m A new study created in memory with name: no-name-830ae6c2-83f8-4ffd-8427-bcb8958ae62e[0m
[32m[I 2023-04-21 02:33:40,097][0m Trial 0 finished with value: 1.0 and parameters: {'n_estimators': 53, 'max_depth': 4, 'min_samples_split': 7}. Best is trial 0 with value: 1.0.[0m
[32m[I 2023-04-21 02:33:40,208][0m Trial 1 finished with value: 1.0 and parameters: {'n_estimators': 38, 'max_depth': 5, 'min_samples_split': 3}. Best is trial 0 with value: 1.0.[0m
[32m[I 2023-04-21 02:33:40,319][0m Trial 2 finished with value: 1.0 and parameters: {'n_estimators': 46, 'max_depth': 7, 'min_samples_split': 2}. Best is trial 0 with value: 1.0.[0m
[32m[I 2023-04-21 02:33:40,451][0m Trial 3 finished with value: 1.0 and parameters: {'n_estimators': 41, 'max_depth': 5, 'min_samples_split': 8}. Best is trial 0 with value: 1.0.[0m
[32m[I 2023-04-21 02:33:40,535][0m Trial 4 finished with value: 1.0 and parameters: {'n_estimators': 28, 'max_depth': 3, 'min_sampl

In [25]:
# Get the best hyperparameters found by Optuna
best_params = study.best_trial.params


In [29]:
# Train a model with the best hyperparameters on the entire training set
best_model = RandomForestClassifier(**best_params, random_state=42)
best_model.fit(X_train, y_train)