# Hyperopt Definitions

**Hyperopt: Space Configuration**
One of the most valuable offers of Hyperopt is the flexibility it provides to create priors over the hyperparameters distributions.

**Hyperopt offers**:
* Multiple distributions
* Possibility to combine distributions
* Possibility to create nested spaces
* Multiple configuration ways including lists, dictionaries and tuples

**Distributions**

* hp.choice: returns one of several options (suitable for categorical hyperparams)
* hp.randint: returns a random integer between 0 and an upper limit
* hp.uniform: returns a value uniformly between specified limits
* hp.quniform: Returns a value like round(uniform(low, high) / q) * q

**hp.quniform** would be an equivalent of randint (if q=1), but the upper and **lower** limits can be specified. hp.quniform also offers the possibility to use bigger values of q. So if we search for the optimal number of trees in a random forest, we could search hp.quniform('n_estimators', 10, 1000, 50), in which case we would sample between 10 and 1000 trees in increments of 50.

# Basic Syntax

In [None]:
# Importing hp module for hyperparameter space definitions
# This module provides functions to define various types of search spaces for hyperparameter tuning.
from hyperopt import hp
# Importing sample function for sampling from distributions
from hyperopt.pyll.stochastic import sample

### Integer and Continuous Sampling:


In [None]:
# This space samples an integer uniformly from [0, 10)
# This allows for selecting integer hyperparameters within a specific range randomly.
# Example: Choosing a random 'max_leaf_nodes' value for a decision tree.
space = [{'example': hp.randint('example', 10)}]

# This space samples a float uniformly from [1, 10]
# This allows for continuous hyperparameters to be sampled within a range.
# Example: Selecting a float value for 'max_features' that determines the number of features to consider at each split in a decision tree.
space = [{'example': hp.uniform('example', 1, 10)}]

# This space samples a float uniformly from [1, 10] at discrete intervals of 1
# This is useful for sampling continuous values but constrained to certain intervals.
# Example: Sampling the 'min_samples_leaf' in a decision tree, ensuring it can only take whole number values.
space = [{'example': hp.quniform('example', 1, 10, 1)}]

### Logarithmic Sampling:


In [None]:
# This space samples a float from a log-uniform distribution between [1, 10]
# This method is used when we expect the optimal values to be better expressed on a logarithmic scale.
# Example: Setting 'alpha' in a regularized decision tree model, where smaller values are more common.
space = [{'example': hp.loguniform('example', 1, 10)}]

# This space samples from a log-uniform distribution between [10, 1000] in increments of 50
# This samples values in a way that is biased towards smaller values, useful for certain distributions.
# Example: Tuning the 'max_depth' of a decision tree with an emphasis on shallower trees.
space = [{'example': hp.qloguniform('example', np.log(10), np.log(1000), np.log(50))}]

### Normal Distribution Sampling:


In [None]:
# This space samples from a standard normal distribution (mean=0, std=1)
# This allows for sampling continuous values with a specified normal distribution.
# Example: Adjusting 'max_features' based on a normal distribution around a preferred value.
space = [{'example': hp.normal('example', 0, 1)}]

# This space samples from a normal distribution (mean=100, std=25) at discrete intervals of 10
# This provides samples that are rounded to specific intervals, useful for hyperparameters.
# Example: Sampling 'min_samples_split' for a decision tree, ensuring it only takes discrete values.
space = [{'example': hp.qnormal('example', 100, 25, 10)}]

# This space samples from a log-normal distribution (mean=np.log(100), std=np.log(25))
# This method is used for parameters that are strictly positive and follow a log-normal distribution.
# Example: Setting 'min_impurity_decrease' in a decision tree to ensure it’s always a positive value.
space = [{'example': hp.lognormal('example', np.log(100), np.log(25))}]

# This space samples from a log-normal distribution at discrete intervals
# This samples values that follow a log-normal distribution but constrained to specific intervals.
# Example: Tuning 'max_depth' of a decision tree while ensuring the values are rounded.
space = [{'example': hp.qlognormal('example', 0, 1, 1)}]

### Categorical and Probabilistic Sampling:


In [None]:
# This space samples from a predefined list of options: 'A', 'B', or 'C'
# This allows for selecting categorical hyperparameters from a fixed set of choices.
# Example: Choosing a specific criterion ('gini' or 'entropy') for splitting in a decision tree.
space = [{'example': hp.choice('example', ['A', 'B', 'C'])}]

# This space samples a loss function based on given probabilities
# 80% chance for 'deviance' and 20% for 'exponential'
# This method allows for sampling based on defined probabilities, which is useful in decision-making scenarios.
# Example: Choosing a loss function for training a decision tree based on their effectiveness in the context.
space = hp.pchoice('example', [
    (0.8, {'loss': 'deviance'}),
    (0.2, {'loss': 'exponential'})
])

# Nested Hyperparameters

In [None]:
# Step 1: Define a hyperparameter space for selecting different classifier types
space = hp.choice('classifier_type', [

    # Step 2: Define the naive_bayes classifier configuration
    {
        'type': 'naive_bayes',
    },

    # Step 3: Define the SVM classifier configuration with hyperparameters
    {
        'type': 'svm',

        # Step 4: Define the regularization parameter C for SVM sampled from a log-normal distribution
        'C': hp.lognormal('svm_C', 0, 1),

        # Step 5: Define the kernel type for SVM with options for linear and RBF kernels
        'kernel': hp.choice('svm_kernel', [
            # Step 6: Define the linear kernel configuration
            {'ktype': 'linear'},

            # Step 7: Define the RBF kernel configuration with a width parameter sampled from a log-normal distribution
            {'ktype': 'RBF', 'width': hp.lognormal('svm_rbf_width', 0, 1)},
        ]),
    },

    # Step 8: Define the decision tree classifier configuration with hyperparameters
    {
        'type': 'dtree',

        # Step 9: Define the criterion for the decision tree with options for 'gini' and 'entropy'
        'criterion': hp.choice('dtree_criterion', ['gini', 'entropy']),

        # Step 10: Define the maximum depth for the decision tree with an option for None and a sampled value
        'max_depth': hp.choice('dtree_max_depth',
            [None, hp.qlognormal('dtree_max_depth_int', 3, 1, 1)]),

        # Step 11: Define the minimum samples required to split an internal node, sampled from a log-normal distribution
        'min_samples_split': hp.qlognormal('dtree_min_samples_split', 2, 1, 1),
    },
])

# Search algorithms within Hyperopt


## Hyperopt provides 3 search algorithms:
* Randomized search
* Annealing
* Tree-structured Parzen Estimators

I find the documentation for Hyperopt quite unintuitive, so it helps to refer to the original article to understand the different parameters and classes.

Procedure
To tune the hyper-parameters of our model we need to:

* define a model
* define the hyperparameter space
* define the objective function we want to minimize.
* Run the minimization

In [None]:
# Step 1: Import XGBoost package
# Load XGBoost library to use its functionality
import xgboost as xgb

# hp: define the hyperparameter space
# fmin: optimization function
# Trials: to evaluate the different searched hyperparameters
from hyperopt import hp, fmin
# the search algorithms
from hyperopt import rand, anneal, tpe

# Step 2: Define the hyperparameter space using a parameter grid
# Creates a dictionary that specifies the range of hyperparameters for tuning
param_grid = {
    'n_estimators': hp.quniform('n_estimators', 200, 2500, 100),  # Number of boosting rounds
    'max_depth': hp.quniform('max_depth', 1, 10, 1),  # Maximum tree depth
    'learning_rate': hp.loguniform('learning_rate', np.log(0.001), np.log(1)),  # Learning rate
    'booster': hp.choice('booster', ['gbtree', 'dart']),  # Booster type
    'gamma': hp.loguniform('gamma', np.log(0.01), np.log(10)),  # Minimum loss reduction
    'subsample': hp.uniform('subsample', 0.50, 0.90),  # Subsample ratio of training data
    'colsample_bytree': hp.uniform('colsample_bytree', 0.50, 0.99),  # Subsample ratio of features per tree
    'colsample_bylevel': hp.uniform('colsample_bylevel', 0.50, 0.99),  # Subsample ratio of features per level
    'colsample_bynode': hp.uniform('colsample_bynode', 0.50, 0.99),  # Subsample ratio of features per node
    'reg_lambda': hp.uniform('reg_lambda', 1, 20)  # L2 regularization term
}

# Step 3: Perform random search optimization
# Uses random search to minimize the objective function over the hyperparameter space
random_search = fmin(
    fn=objective,  # Function to minimize
    space=param_grid,  # Hyperparameter space
    max_evals=50,  # Maximum evaluations
    rstate=np.random.default_rng(42),  # Random state for reproducibility
    algo=rand.suggest  # Random search algorithm
)
random_search

# Step 4: Perform annealing search optimization
# Uses simulated annealing to minimize the objective function over the hyperparameter space
anneal_search = fmin(
    fn=objective,  # Function to minimize
    space=param_grid,  # Hyperparameter space
    max_evals=50,  # Maximum evaluations
    rstate=np.random.default_rng(42),  # Random state for reproducibility
    algo=anneal.suggest  # Annealing search algorithm
)
anneal_search

# Step 5: Perform Tree-structured Parzen Estimator (TPE) search optimization
# Uses TPE to minimize the objective function over the hyperparameter space
tpe_search = fmin(
    fn=objective,  # Function to minimize
    space=param_grid,  # Hyperparameter space
    max_evals=50,  # Maximum evaluations
    rstate=np.random.default_rng(42),  # Random state for reproducibility
    algo=tpe.suggest  # TPE algorithm
)
tpe_search

# Conditional Search Spaces with Hyperopt
In this notebooks, we will create a conditional (or nested) hyperparameter space to optimize the hyperparameters of 3 different off-the-shelf algorithms.

Utilizing the breast cancer dataset, we will optimize the hyperparameters for a linear regression, a random forest and a gradient boosting machine within the same search.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
# hp: define the hyperparameter space
# fmin: optimization function
# Trials: to evaluate the different searched hyperparameters
from hyperopt import hp, fmin, Trials

# the search algorithms
from hyperopt import anneal

# Step 1: Define the hyperparameter space for multiple classifiers
# We define a list of dictionaries with hyperparameters for different models
param_grid = hp.choice('classifier', [

    # Step 2: Logistic Regression model
    # Specify hyperparameter space for Logistic Regression
    {'model': LogisticRegression,  # Model class
    'params': {
        'penalty': hp.choice('penalty', ['l1', 'l2']),  # Regularization penalty
        'C': hp.uniform('C', 0.001, 10),  # Inverse of regularization strength
        'solver': 'saga',  # Solver that supports both penalties
    }},

    # Step 3: Random Forest Classifier model
    # Specify hyperparameter space for Random Forest
    {'model': RandomForestClassifier,  # Model class
    'params': {
        'n_estimators': hp.quniform('n_estimators_rf', 50, 1500, 50),  # Number of trees
        'max_depth': hp.quniform('max_depth_rf', 1, 5, 1),  # Maximum depth of trees
        'criterion': hp.choice('criterion_rf', ['gini', 'entropy']),  # Split quality criterion
    }},

    # Step 4: Gradient Boosting Classifier model
    # Specify hyperparameter space for Gradient Boosting
    {'model': GradientBoostingClassifier,  # Model class
    'params': {
        'n_estimators': hp.quniform('n_estimators_gbm', 50, 1500, 50),  # Number of boosting rounds
        'max_depth': hp.quniform('max_depth_gbm', 1, 5, 1),  # Maximum tree depth
        'criterion': hp.choice('criterion_gbm', ['friedman_mse', 'squared_error']),  # Loss function criterion
    }},
])

## Objective Functions

In [None]:
def objective(params):

    # Step 1: Instantiate the model
    # Create a new instance of the model class based on the sampled hyperparameters
    model = params['model']()  # Don't forget the () to instantiate the class

    # Step 2: Capture the sampled hyperparameters
    # Extract the hyperparameters from the params dictionary
    hyperparams = params['params']

    try:
        # Step 3: Convert n_estimators and max_depth to integers
        # Ensure that tree-based algorithm parameters are integer values
        hyperparams['n_estimators'] = int(hyperparams['n_estimators'])
        hyperparams['max_depth'] = int(hyperparams['max_depth'])
    except:
        pass

    # Step 4: Print the model and hyperparameters for visualization (optional)
    # Displays the current model and hyperparameters being used for tuning
    print(model, hyperparams)

    # Step 5: Set the sampled hyperparameters to the model
    # Apply the sampled hyperparameters to the model using set_params
    model.set_params(**hyperparams)

    # Step 6: Train the model using cross-validation
    # Perform 3-fold cross-validation and calculate accuracy
    cross_val_data = cross_val_score(
        model,
        X_train,  # Training data features
        y_train,  # Training data labels
        scoring='accuracy',  # Accuracy as the performance metric
        cv=3,  # 3-fold cross-validation
        n_jobs=4,  # Run jobs in parallel using 4 CPU cores
    )

    # Step 7: Calculate loss
    # Negate the average cross-validation accuracy to minimize the loss
    loss = -cross_val_data.mean()
    print(loss)
    print()

    # Step 8: Return the loss
    # Return the calculated loss to the optimization function
    return loss

## Model Evaluation

In [None]:
# Step 1: Initialize the Trials object
# Trials object stores details of each evaluation during the optimization process
trials = Trials()

# Step 2: Perform annealing search optimization
# Use the anneal algorithm to minimize the objective function over the hyperparameter space
anneal_search = fmin(
    fn=objective,  # Function to minimize
    space=param_grid,  # Hyperparameter space
    max_evals=50,  # Maximum number of evaluations
    rstate=np.random.default_rng(42),  # Random state for reproducibility
    algo=anneal.suggest,  # Annealing search algorithm
    trials=trials  # Store the trials for later analysis
)

# Step 3: Return the best hyperparameters found by the anneal search
# Displays the hyperparameters that achieved the lowest loss
anneal_search

# Step 4: Return the hyperparameters with the minimum loss
# Returns the specific combination of hyperparameters that achieved the lowest loss
trials.argmin

# Step 5: Calculate the average best error from the trials/accuracy
# Computes the average error from the best trials during the optimization process
trials.average_best_error()

# Step 6: Return the best trial from the optimization process
# Provides detailed information about the best trial during the search
trials.best_trial

Overall, trials.best_trial gives you detailed information about the best-performing trial in your hyperparameter optimization process, including the trial's unique ID, loss value, status, and the specific hyperparameters used. This is useful for understanding what settings yielded the best results and for potential deployment or further analysis.