# Hyperparameter optimization - Bayesian optimization method

In [5]:
import numpy as np
import pandas as pd
import sklearn
import matplotlib as mlp
import matplotlib.pyplot as plt
import seaborn as sns

import time
import re, pip, conda
import os #修改环境设置

import sklearn
from sklearn.ensemble import RandomForestRegressor as RFR
from sklearn.model_selection import KFold, cross_validate

![](https://skojiangdoc.oss-cn-beijing.aliyuncs.com/2021MachineLearning/Ensembles/Public/06.png?versionId=CAEQIBiBgMC_i.2V7xciIGM4MDJiNTI2ZmY2NDQxYWI5ZDdkZTkwOGQzY2Y4ZWVk)

![](https://skojiangdoc.oss-cn-beijing.aliyuncs.com/2021MachineLearning/Ensembles/Public/09.png)

**This frequency reflects the probability of the occurrence of the minimum value to a certain extent. The higher the frequency, the higher the probability of the occurrence of the true minimum value of the function**

![](https://skojiangdoc.oss-cn-beijing.aliyuncs.com/2021MachineLearning/Ensembles/Public/07.png?versionId=CAEQIBiBgMC9je2V7xciIDU3MDA3Y2Q4NjlmMDQ4OTliZTNlNTQ2ZWM3YTZlOTE0)

![](https://skojiangdoc.oss-cn-beijing.aliyuncs.com/2021MachineLearning/Ensembles/Public/08.gif)

---

### Bayesian Optimization for HPO

- 1 Define the $f(x)$ to be estimated and the domain of $x$<br>

- 2 Take out the limited n values of $x$ and solve for the $f(x)$ corresponding to these $x$ (solve the observed values)<br>

- 3 Based on the limited observation values, estimate the function (this assumption is called a priori knowledge in Bayesian optimization), and obtain the target value (maximum or minimum value) on the estimate $f^*$< br>

- 4 Define some kind of rule to determine the next observation point that needs to be calculated

Sequential Model Optimisation (SMBO): 

- When Bayesian optimisation is not used for HPO, generally $f(x)$ can be a complete black box function, i.e., a class of functions for which only the correspondence between $x$ and $f(x)$ is known but the internal law of the function is not known at all, and for which no specific expression can be written. Therefore, Bayesian optimisation is also regarded as a classical method that can be used for black-box function estimation. However, in the process of HPO, $f(x)$ is generally defined as the result of cross-validation/loss function, and we often know the expression of the loss function very well, only that we do not know the internal law of the loss function, so $f(x)$ in HPO can not be regarded as a black-box function in the strict sense.

- In HPO, the independent variable $x$ is the hyperparameter space. In the two-dimensional image representation above, $x$ is one-dimensional, but when optimisation is actually performed, the hyperparameter space is often high-dimensional and extremely complex.

- The initial number of observations, n, and the maximum number of observations, m, that can eventually be taken are the hyperparameters of the Bayesian optimisation, and the maximum number of observations, m, also determines the number of iterations of the overall Bayesian optimisation.

- In step 3, the tool for estimating the distribution of the function based on a limited number of observations is called a **Probability Surrogate model**, after all, in mathematical computation we can't really invite tens of thousands of people to connect the dots of our observations. **These probability surrogate models come with certain assumptions, and they can estimate the distribution of the objective function $f^*$ (including the value of each point on $f^*$ and the confidence level corresponding to that point)** based on a number of observations from Liao Liao. In practice, probabilistic agent models are often powerful algorithms, most commonly such as Gaussian processes, Gaussian mixture models, and so on. While Gaussian processes are often used in traditional mathematical derivations, the most popular optimisation libraries nowadays basically use TPE processes based on Gaussian mixture models by default.

- The rule used to determine the next observation point in step 4 is called the **Aquisition Function**, which measures the impact of the observation point on the fitted $f^*$ and selects the point with the largest impact to perform the next observation, so we tend to focus on the point with the largest **Aquisition Function value**. The most common collection functions are Probability of improvement (PI), Expectation Improvement (EI), Upper Confidence Bound (UCB), Entropy, and so on. The gif above shows PI, UCB, and EI, with most optimisation libraries using Expectation Improvement by default.

HPO libraries that can implement Bayesian optimisation methods: https://www.automl.org/automl/hpo-packages/ ,: `bayesian-optimization`, `hyperopt`, `optuna`.

|HPO库|优劣评价|推荐指数|
|-|-|-|
|**bayes_opt**|✅实现基于高斯过程的贝叶斯优化<br>✅当参数空间由大量连续型参数构成时<br><br>⛔包含大量离散型参数时避免使用<br>⛔算力/时间稀缺时避免使用|⭐⭐|
|**hyperopt**|✅实现基于TPE的贝叶斯优化<br>✅支持各类提效工具<br>✅进度条清晰，展示美观，较少怪异警告或报错<br>✅可推广/拓展至深度学习领域<br><br>⛔不支持基于高斯过程的贝叶斯优化<br> ⛔代码限制多、较为复杂，灵活性较差|⭐⭐⭐⭐|
|**optuna**|✅（可能需结合其他库）实现基于各类算法的贝叶斯优化<br>✅代码最简洁，同时具备一定的灵活性<br>✅可推广/拓展至深度学习领域<br><br>⛔非关键性功能维护不佳，有怪异警告与报错|⭐⭐⭐⭐|

- Bayes_opt


In [None]:
#!pip install bayesian-optimization
#!conda install -c conda-forge bayesian-optimization

- Hyperopt

In [None]:
#!pip install hyperopt

- Optuna

In [None]:
#!pip install optuna
#!conda install -c conda-forge optuna

- Skopt（作为Optuna辅助包安装，也可单独使用）

In [None]:
#!pip install scikit-optimize

---

### Implementing Multiple Bayesian Optimisation Based on Optuna

Optuna is by far the most mature and extensible hyperparameter optimisation framework, Optuna is clearly designed specifically for machine learning and deep learning. To meet the needs of machine learning developers, Optuna has a powerful and fixed API, so Optuna is simple to code and highly modular to write, and is the most code-concise of the libraries we present.Optuna's strength is that it can be seamlessly integrated with deep learning frameworks such as PyTorch and Tensorflow, and also with sklearn's optimisation library scikit-optimize, so Optuna can be used in a wide variety of optimisation scenarios. Other optimisation aspects can be found on the following page: https://github.com/optuna/optuna .

In [1]:
import optuna
print(optuna.__version__)

3.5.0


In [9]:
data = pd.read_csv(r"D:\Practice\Machine Learning\datasets\House Price\train_encode.csv",index_col=0)

X = data.iloc[:,:-1]
y = data.iloc[:,-1]

In [10]:
def optuna_objective(trial):
    
    #定义参数空间
    n_estimators = trial.suggest_int("n_estimators",80,100,1) #整数型，(参数名称，下界，上界，步长)
    max_depth = trial.suggest_int("max_depth",10,25,1)
    max_features = trial.suggest_int("max_features",10,20,1)
    #max_features = trial.suggest_categorical("max_features",["log2","sqrt","auto"]) #字符型
    min_impurity_decrease = trial.suggest_int("min_impurity_decrease",0,5,1)
    #min_impurity_decrease = trial.suggest_float("min_impurity_decrease",0,5,log=False) #浮点型
    
    #定义评估器
    #需要优化的参数由上述参数空间决定
    #不需要优化的参数则直接填写具体值
    reg = RFR(n_estimators = n_estimators
              ,max_depth = max_depth
              ,max_features = max_features
              ,min_impurity_decrease = min_impurity_decrease
              ,random_state=1412
              ,verbose=False
              ,n_jobs=-1
             )
    
    #交叉验证过程，输出负均方根误差(-RMSE)
    #optuna同时支持最大化和最小化，因此如果输出-RMSE，则选择最大化
    #如果选择输出RMSE，则选择最小化
    cv = KFold(n_splits=5,shuffle=True,random_state=1412)
    validation_loss = cross_validate(reg,X,y
                                     ,scoring="neg_root_mean_squared_error"
                                     ,cv=cv 
                                     ,verbose=False 
                                     ,n_jobs=10 
                                     ,error_score='raise'
                                    )
    #最终输出RMSE
    return np.mean(abs(validation_loss["test_score"]))

In [11]:
def optimizer_optuna(n_trials, algo):
    
    #定义使用TPE或者GP
    if algo == "TPE":
        algo = optuna.samplers.TPESampler(n_startup_trials = 10, n_ei_candidates = 24)
    elif algo == "GP":
        from optuna.integration import SkoptSampler
        import skopt
        algo = SkoptSampler(skopt_kwargs={'base_estimator':'GP', #高斯过程
                                          'n_initial_points':10, #初始观测点10个
                                          'acq_func':'EI'} #选择的采集函数为EI，期望增量
                           )
    
    #实际优化过程，首先实例化优化器
    study = optuna.create_study(sampler = algo #要使用的算法
                                , direction="minimize" #优化的方向，可以填写minimize或maximize
                               )
    #开始优化，n_trials为允许的最大迭代次数
    #由于参数空间已经在目标函数中定义好，因此不需要输入参数空间
    study.optimize(optuna_objective #目标函数
                   , n_trials=n_trials #最大迭代次数（包括最初的观测值的）
                   , show_progress_bar=True #进度条
                  )
    
    #可直接从优化好的对象study中调用优化的结果
    #打印最佳参数与最佳损失值
    print("\n","\n","best params: ", study.best_trial.params,
          "\n","\n","best score: ", study.best_trial.values,
          "\n")
    
    return study.best_trial.params, study.best_trial.values

In [12]:
import warnings
warnings.filterwarnings('ignore', message='The objective has been evaluated at this point before.')

In [13]:
best_params, best_score = optimizer_optuna(10,"GP") #默认打印迭代过程

  algo = SkoptSampler(skopt_kwargs={'base_estimator':'GP', #选择高斯过程
[I 2024-02-06 16:55:21,459] A new study created in memory with name: no-name-42282041-b2c7-45bb-ae4d-6dc89253cb56


  0%|          | 0/10 [00:00<?, ?it/s]

  n_estimators = trial.suggest_int("n_estimators",80,100,1) #整数型，(参数名称，下界，上界，步长)
  max_depth = trial.suggest_int("max_depth",10,25,1)
  max_features = trial.suggest_int("max_features",10,20,1)
  min_impurity_decrease = trial.suggest_int("min_impurity_decrease",0,5,1)


[I 2024-02-06 16:55:24,110] Trial 0 finished with value: 29161.52387699812 and parameters: {'n_estimators': 97, 'max_depth': 24, 'max_features': 17, 'min_impurity_decrease': 3}. Best is trial 0 with value: 29161.52387699812.
[W 2024-02-06 16:55:24,121] Trial 1 failed with parameters: {} because of the following error: AttributeError("module 'numpy' has no attribute 'int'.\n`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.\nThe aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:\n    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations").
Traceback (most recent call last):
  File "d:\CODE\Lib\site-packages\

AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

In [14]:
optuna.logging.set_verbosity(optuna.logging.ERROR) #关闭自动打印的info，只显示进度条
#optuna.logging.set_verbosity(optuna.logging.INFO)
best_params, best_score = optimizer_optuna(300,"TPE")

  0%|          | 0/300 [00:00<?, ?it/s]

  n_estimators = trial.suggest_int("n_estimators",80,100,1) #整数型，(参数名称，下界，上界，步长)
  max_depth = trial.suggest_int("max_depth",10,25,1)
  max_features = trial.suggest_int("max_features",10,20,1)
  min_impurity_decrease = trial.suggest_int("min_impurity_decrease",0,5,1)



 
 best params:  {'n_estimators': 93, 'max_depth': 15, 'max_features': 18, 'min_impurity_decrease': 2} 
 
 best score:  [28529.20882339953] 



In [15]:
#error np.int()  int()
optuna.logging.set_verbosity(optuna.logging.ERROR)
best_params, best_score = optimizer_optuna(300,"GP")

  algo = SkoptSampler(skopt_kwargs={'base_estimator':'GP', #选择高斯过程


  0%|          | 0/300 [00:00<?, ?it/s]

  n_estimators = trial.suggest_int("n_estimators",80,100,1) #整数型，(参数名称，下界，上界，步长)
  max_depth = trial.suggest_int("max_depth",10,25,1)
  max_features = trial.suggest_int("max_features",10,20,1)
  min_impurity_decrease = trial.suggest_int("min_impurity_decrease",0,5,1)


AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations