# 三种调参方式学习
机器模型调参方法：
- 人工手动调参数：费时费力，枯燥无聊
- 网格/随机搜索：网格搜索是地雷式的搜索，一个不落；随机搜索是使用了抽样思维，提高搜索效率。
- 贝叶斯搜索：也可以叫做基于知识的搜索，考虑了历史调参信息，进一步提高搜索效率。（但在高纬参数空间下，贝叶斯优化复杂度较高，效果会近似随机搜索）。

# 网格/随机搜索

In [5]:
import time
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

# 加载模型数据
iris = load_iris()
x = iris.data
y = iris.target

# 选择模型
model = RandomForestClassifier()
# 参数搜索空间
param_grid = {
    "max_depth": np.arange(1, 20, 1),
    "n_estimators": np.arange(1, 50, 10),
    "max_leaf_nodes": np.arange(2,100,10)
}

{'max_depth': 2, 'max_leaf_nodes': 2, 'n_estimators': 41}
0.9733333333333334
RandomForestClassifier(max_depth=2, max_leaf_nodes=2, n_estimators=41)
网格搜索执行用时: 78.13324093818665


## 网格搜索

In [7]:
# 网格搜索模型参数
start_time = time.time()
grid_search = GridSearchCV(model, param_grid, cv=5, scoring="f1_micro")
grid_search.fit(x, y)
end_time = time.time()
execution_time = end_time - start_time
print(grid_search.best_params_)
print(grid_search.best_score_)
print(grid_search.best_estimator_)
print("网格搜索执行用时:", execution_time)

{'max_depth': 7, 'max_leaf_nodes': 32, 'n_estimators': 1}
0.9733333333333334
RandomForestClassifier(max_depth=7, max_leaf_nodes=32, n_estimators=1)
网格搜索执行用时: 78.97888278961182


## 随机搜索

In [6]:
# 随机搜索模型参数
start_time = time.time()
rd_search = RandomizedSearchCV(model, param_grid, n_iter=200,
                               cv=5, scoring="f1_micro")
rd_search.fit(x,y)
end_time = time.time()
execution_time = end_time - start_time

print(rd_search.best_params_)
print(rd_search.best_score_)
print(rd_search.best_estimator_)
print("随机搜索执行用时：", execution_time)

{'n_estimators': 41, 'max_leaf_nodes': 92, 'max_depth': 18}
0.9666666666666668
RandomForestClassifier(max_depth=18, max_leaf_nodes=92, n_estimators=41)
随机搜索执行用时： 17.121122121810913


## 总结
数据采用鸢尾花数据，模型采用随机森林的方法，
检验网格搜索以及随机搜索调整参数的准确率以及用时
对比发现，两者模型分数相差无几，但是用时相差甚多
网格搜索约是随机搜索时间的 4.5倍

# 贝叶斯优化

In [10]:
!pip install scikit-optimize

Collecting scikit-optimize
  Downloading scikit_optimize-0.9.0-py2.py3-none-any.whl (100 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m100.3/100.3 kB[0m [31m574.0 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting pyaml>=16.9
  Downloading pyaml-23.9.6-py3-none-any.whl (22 kB)
Installing collected packages: pyaml, scikit-optimize
Successfully installed pyaml-23.9.6 scikit-optimize-0.9.0


In [14]:
import numpy as np
from sklearn.metrics import f1_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score

# 加载数据
iris = load_iris()
x = iris.data
y = iris.target

# 划分训练集和测试集
train_x, test_x, train_y, test_y = train_test_split(x, y, test_size=0.2,
                                                    random_state=42)
# 定义模型评估指标
def model_metrics(model, x, y):
    """
    评估指标
    """
    yhat = model.predict(x)
    return f1_score(y, yhat, average="micro")

# 定义目标函数
def objective(params):
    fit_params = {
        "max_depeth": int(params["max_depth"]),
        "n_estimators": int(params["n_estimators"]),
        "max_leaf_nodes": int(params["max_leaf_nodes"])
    }

    model = RandomForestClassifier(**fit_params)
    score = cross_val_score(model, train_x, train_y, cv=5,
                            scoring=model_metrics).mean()
    return -score

# 定义参数空间
space = {
    'max_depth': (1, 20),
    'n_estimators': (2, 50),
    'max_leaf_nodes': (2, 100)
}



# 使用scikit-learn 的内置函数进行参数搜索
from skopt import gp_minimize
from skopt import space

# 定义参数空间
param_space = [
    space.Integer(1, 20, name='max_depth'),
    space.Integer(2, 50, name='n_estimators'),
    space.Integer(2, 100, name='max_leaf_nodes')
]
res = gp_minimize(objective, param_space, n_calls=100, n_random_starts=10)

#  提取最佳参数
best_params = {
    "max_depth": int(res.x[0]),
    "n_estimators":int(res.x[1]),
    "max_leaf_nodes":int(res.x[2])
}

print(best_params)

AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations