# Bayesian optimization
Useful in situations where you don't have an analytical form of the function or its derivatives, and each evaluation of the function (such as training a machine learning model with a particular set of hyperparameters) can be time-consuming or resource-intensive.

**Surrogate Model:** Bayesian optimization builds a probabilistic model, known as the surrogate model, to approximate the objective function. This model, typically a Gaussian Process but can also be a Random Forest or other models, provides a prediction of the objective function's output and an estimate of the uncertainty of this prediction at any given point in the hyperparameter space.

**Acquisition Function:** The algorithm uses an acquisition function to decide where to sample next. The acquisition function balances exploration (sampling where the model is uncertain) and exploitation (sampling where the model predicts high performance). Common acquisition functions include Expected Improvement, Probability of Improvement, and Upper Confidence Bound.


Each time the objective function is evaluated (e.g., training a model with a certain hyperparameter set), the result is fed back into the model, updating its understanding of the function. This updated model is then used to determine the next point to evaluate.


Bayesian optimization is an iterative process. In each iteration, it uses the current surrogate model to select the most promising point to evaluate next based on the acquisition function. After evaluation, the surrogate model is updated with the new observation. This process continues until a stopping criterion is met (like a maximum number of iterations or a convergence threshold).


The key advantage of Bayesian optimization is its efficiency, especially in high-dimensional spaces or when each evaluation of the objective function is costly (like in hyperparameter tuning of complex machine learning models). It requires fewer evaluations of the objective function compared to other methods like grid search or random search.


The primary limitation of Bayesian optimization is the complexity of the surrogate model, especially for high-dimensional hyperparameter spaces. Also, the performance can heavily depend on the choice of the surrogate model and the acquisition function.

In [1]:
import optuna
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Define the objective function to be optimized
def objective(trial):
    # Define the hyperparameter search space using Optuna
    n_estimators = trial.suggest_int("n_estimators", 10, 200)
    max_depth = trial.suggest_int("max_depth", 1, 32)
    min_samples_split = trial.suggest_float("min_samples_split", 0.1, 1)
    min_samples_leaf = trial.suggest_int("min_samples_leaf", 1, 4)
    max_features = trial.suggest_categorical("max_features", ["sqrt", "log2"])

    # Initialize and train the RandomForestClassifier
    clf = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        min_samples_leaf=min_samples_leaf,
        max_features=max_features,
        random_state=42
    )
    # Perform cross-validation and return the mean accuracy
    return cross_val_score(clf, X, y, n_jobs=-1, cv=3).mean()

# Create a study object and specify the optimization direction
study = optuna.create_study(direction="maximize")
# Perform the optimization
study.optimize(objective, n_trials=100)

# Best parameters and score
best_params = study.best_params
best_score = study.best_value

best_params, best_score

[I 2023-11-14 12:44:52,562] A new study created in memory with name: no-name-007e38e4-a3d9-4197-8e19-4f043b5774c0
[I 2023-11-14 12:44:54,813] Trial 0 finished with value: 0.32666666666666666 and parameters: {'n_estimators': 50, 'max_depth': 28, 'min_samples_split': 0.8222671658601485, 'min_samples_leaf': 3, 'max_features': 'sqrt'}. Best is trial 0 with value: 0.32666666666666666.
[I 2023-11-14 12:44:55,756] Trial 1 finished with value: 0.9466666666666667 and parameters: {'n_estimators': 93, 'max_depth': 28, 'min_samples_split': 0.2545242559478056, 'min_samples_leaf': 3, 'max_features': 'sqrt'}. Best is trial 1 with value: 0.9466666666666667.
[I 2023-11-14 12:44:56,102] Trial 2 finished with value: 0.3333333333333333 and parameters: {'n_estimators': 104, 'max_depth': 5, 'min_samples_split': 0.9185716659147138, 'min_samples_leaf': 3, 'max_features': 'sqrt'}. Best is trial 1 with value: 0.9466666666666667.
[I 2023-11-14 12:44:56,369] Trial 3 finished with value: 0.32 and parameters: {'n_e

({'n_estimators': 64,
  'max_depth': 20,
  'min_samples_split': 0.1425687516993196,
  'min_samples_leaf': 3,
  'max_features': 'log2'},
 0.9733333333333333)

In [2]:
best_params

{'n_estimators': 64,
 'max_depth': 20,
 'min_samples_split': 0.1425687516993196,
 'min_samples_leaf': 3,
 'max_features': 'log2'}

In [None]:
rf = RandomForestClassifier(
        n_estimators = best_params["n_estimators"],
        max_depth=best_params["max_depth"],
        min_samples_split=min_samples_split,
        min_samples_leaf=min_samples_leaf,
        max_features=max_features,
        random_state=42
)
pipeline = Pipeline([
    (Scaler),
    (rf)
])
pipeline.fit(X,y)