# mlr3：超参数调优

- 模型调优
- 调整超参数
    - 方法一：通过`tuninginstancesinglecrite`和`tuner`训练模型
    - 方法二：通过`autotuner`训练模型
    - 超参数设定的方法
    - 参数依赖
- 嵌套重抽样
    - 进行嵌套重抽样
    - 评价模型
- 把超参数应用于模型
    - Hyperband调参
- 特征选择
    - filters
    - 计算分数
    - 计算变量重要性
    - 组合方法（wrapper methods）
    - 自动选择

## 模型调优
当你对你的模型表现不满意时，你可能希望调高你的模型表现，可通过超参数调整或者尝试一个更加适合你的模型，本篇将介绍这些操作。

本章主要包括3个部分的内容：

### 超参数调整

机器学习模型都有默认的超参数，但是这些超参数不能根据数据自动调整，往往不能得到更好的性能表现。但是手动调整往往也不能获得最佳的表现，mlr3包含自动调参的策略，在此包中实现自动调参，需要指定：搜索空间（search_space），优化算法（调参方法），评估方法（重抽样策略），评价指标。

### 特征选择

主要是通过`mlr3filter`和`mlr3select`包进行。

### 嵌套重抽样

## 调整超参数
很多人戏称调参的过程就像是"炼丹"！确实差不多，而且很多时候你调整后的结果可能还不如默认的结果好！这就好比打游戏，"一顿操作猛如虎，一看战绩0比5"！

模型调优一定要基于对算法和数据的理解进行，不是随便调的。

我们使用著名的糖尿病数据集进行演示，首先创建任务

In [1]:
library(mlr3verse)
## 载入需要的程辑包：mlr3
task <- tsk("pima")
print(task)


Loading required package: mlr3



<TaskClassif:pima> (768 x 9): Pima Indian Diabetes
* Target: diabetes
* Properties: twoclass
* Features (8):
  - dbl (8): age, glucose, insulin, mass, pedigree, pregnant, pressure,
    triceps


In [3]:
ls(task)


In [5]:
task$data()


diabetes,age,glucose,insulin,mass,pedigree,pregnant,pressure,triceps
<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
pos,50,148,,33.6,0.627,6,72,35
neg,31,85,,26.6,0.351,1,66,29
pos,32,183,,23.3,0.672,8,64,
neg,21,89,94,28.1,0.167,1,66,23
pos,33,137,168,43.1,2.288,0,40,35
neg,30,116,,25.6,0.201,5,74,
pos,26,78,88,31.0,0.248,3,50,32
neg,29,115,,35.3,0.134,10,,
pos,53,197,543,30.5,0.158,2,70,45
pos,54,125,,,0.232,8,96,


## 选择算法,查看算法支持的超参数

In [6]:
learner <- lrn("classif.rpart")
learner$param_set


<ParamSet(10)>
                id    class lower upper nlevels        default  value
            <char>   <char> <num> <num>   <num>         <list> <list>
 1:             cp ParamDbl     0     1     Inf           0.01       
 2:     keep_model ParamLgl    NA    NA       2          FALSE       
 3:     maxcompete ParamInt     0   Inf     Inf              4       
 4:       maxdepth ParamInt     1    30      30             30       
 5:   maxsurrogate ParamInt     0   Inf     Inf              5       
 6:      minbucket ParamInt     1   Inf     Inf <NoDefault[0]>       
 7:       minsplit ParamInt     1   Inf     Inf             20       
 8: surrogatestyle ParamInt     0     1       2              0       
 9:   usesurrogate ParamInt     0     2       3              2       
10:           xval ParamInt     0   Inf     Inf             10      0

在这里我们选择调整复杂度参数`cp`和最小分支参数`minsplit`，并设定超参数的调整范围：

In [7]:
search_space <- ps(
  cp = p_dbl(lower = 0.001, upper = 0.1),
  minsplit = p_int(lower = 1, upper = 10)
)
search_space


<ParamSet(2)>
         id    class lower upper nlevels        default  value
     <char>   <char> <num> <num>   <num>         <list> <list>
1:       cp ParamDbl 0.001   0.1     Inf <NoDefault[0]>       
2: minsplit ParamInt 1.000  10.0      10 <NoDefault[0]>       

## 然后选择重抽样方法和性能指标

In [8]:
hout <- rsmp("holdout", ratio = 0.7)
measure <- msr("classif.ce")


接下来进行调参有两种方法。

### 方法一：通过tuninginstancesinglecrite和tuner训练模型

In [9]:
library(mlr3tuning)


Loading required package: paradox



In [10]:
evals20 <- trm("evals", n_evals = 20) # 设定何时停止训练


In [15]:
class(evals20)


In [17]:
ls(evals20)


In [24]:
evals20$param_set


<ParamSet(2)>
        id    class lower upper nlevels        default  value
    <char>   <char> <int> <num>   <num>         <list> <list>
1: n_evals ParamInt     0   Inf     Inf <NoDefault[0]>     20
2:       k ParamInt     0   Inf     Inf <NoDefault[0]>      0

In [25]:
# 统一放入instance中
instance <- TuningInstanceSingleCrit$new(
    task = task,
    learner = learner,
    resampling = hout,
    measure = measure,
    terminator = evals20,
    search_space = search_space
)


TuningInstanceSingleCrit is deprecated. Use TuningInstanceBatchSingleCrit instead.



In [26]:
instance


<TuningInstanceSingleCrit>
* State:  Not optimized
* Objective: <ObjectiveTuningBatch:classif.rpart_on_pima>
* Search Space:
         id    class lower upper nlevels
     <char>   <char> <num> <num>   <num>
1:       cp ParamDbl 0.001   0.1     Inf
2: minsplit ParamInt 1.000  10.0      10
* Terminator: <TerminatorEvals>

In [27]:
ls(instance)


In [29]:
instance$label


关于何时停止训练，mlr3给出了5种方法：

- Terminate after a given time：一定时间后停止
- Terninate after a given number of iterations：特定迭代次数后停止
- Terminate after a specific performance has been reached：达到特定性能指标后停止
- Terminate when tuning dose find a better configuration for a given number of iterations：在给定迭代次数中确实找到表现很好的参数组合后停止
- A combination of above in ALL or ANY fashon：上面几种方法组合  

然后还需要设置超参数搜索的方法：   
`mlr3tuning`目前支持以下超参数搜索的方法：
- Grid search：网格搜索
- Random search：随机搜索
- Generalized simulated annealing
- Non-Linear optimization

In [30]:
# 这里选择网格搜索
tuner <- tnr("grid_search", resolution = 5) # 网格搜索


下来就是进行训练模型，上面我们设置了网格搜索的分辨率是5，我们有2个超参数需要调整，所以理论上一共有5 * 5 = 25个组合，但是在前面的停止搜索的方法中我们选择了n_evals = 20，所有实际上在评价完20个组合后就会停止了！

In [31]:
# lgr::get_logger("mlr3")$set_threshold("warn")
# lgr::get_logger("bbotk")$set_threshold("warn")   # 减少屏幕打印内容

tuner$optimize(instance)


INFO  [10:22:13.338] [bbotk] Starting to optimize 2 parameter(s) with '<OptimizerBatchGridSearch>' and '<TerminatorEvals> [n_evals=20, k=0]'
INFO  [10:22:13.366] [bbotk] Evaluating 1 configuration(s)
INFO  [10:22:13.377] [mlr3] Running benchmark with 1 resampling iterations
INFO  [10:22:13.402] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 1/1)
INFO  [10:22:13.428] [mlr3] Finished benchmark
INFO  [10:22:13.451] [bbotk] Result of batch 1:
INFO  [10:22:13.453] [bbotk]  0.001        3  0.2913043        0      0            0.016
INFO  [10:22:13.453] [bbotk]                                 uhash
INFO  [10:22:13.453] [bbotk]  9340f7cd-9d85-4513-973f-68c1a84ccd34
INFO  [10:22:13.455] [bbotk] Evaluating 1 configuration(s)
INFO  [10:22:13.460] [mlr3] Running benchmark with 1 resampling iterations
INFO  [10:22:13.464] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 1/1)
INFO  [10:22:13.475] [mlr3] Finished benchmark
INFO  [10:22:13.496] [bbotk] Result of batch 2:
INFO

cp,minsplit,learner_param_vals,x_domain,classif.ce
<dbl>,<int>,<list>,<list>,<dbl>
0.02575,1,"0.00000, 0.02575, 1.00000","0.02575, 1.00000",0.2304348


查看调整好的超参数：

In [35]:
instance$result


cp,minsplit,learner_param_vals,x_domain,classif.ce
<dbl>,<int>,<list>,<list>,<dbl>
0.02575,1,"0.00000, 0.02575, 1.00000","0.02575, 1.00000",0.2304348


In [34]:
instance$result$cp


In [33]:
instance$result_learner_param_vals


查看模型性能：

In [36]:
instance$result_y


查看每一次迭代的结果，只有20个：

In [37]:
instance$archive


<ArchiveBatchTuning> with 20 evaluations
    <num>    <int>      <num>    <int>    <int>  <int>
 1: 0.001        3       0.29        1        0      0
 2: 0.051        3       0.28        2        0      0
 3: 0.075        8       0.28        3        0      0
 4: 0.026        1       0.23        4        0      0
 5: 0.026        6       0.23        5        0      0
 6: 0.100        3       0.28        6        0      0
 7: 0.075       10       0.28        7        0      0
 8: 0.026        8       0.23        8        0      0
 9: 0.051       10       0.28        9        0      0
10: 0.001        1       0.29       10        0      0
11: 0.001        6       0.29       11        0      0
12: 0.100       10       0.28       12        0      0
13: 0.026       10       0.23       13        0      0
14: 0.100        6       0.28       14        0      0
15: 0.075        6       0.28       15        0      0
16: 0.051        6       0.28       16        0      0
17: 0.100        8      

接下来就可以把训练好的超参数应用于模型，重新应用于数据：

In [38]:
learner$param_set$values <- instance$result_learner_param_vals
learner$train(task)


这个训练好的模型就可以用于预测了，使用`learner$predict()`即可！

In [40]:
learner$predict(task)


<PredictionClassif> for 768 observations:
    row_ids truth response
          1   pos      pos
          2   neg      neg
          3   pos      neg
---                       
        766   neg      neg
        767   pos      neg
        768   neg      neg

In [41]:
ls(learner)


以上步骤写起来有些复杂，与`tidymodels`相比不够简洁好理解，我刚开始学习的时候经常记不住，后来版本更新后终于有了简便写法：

In [44]:
instance <- tune(
  tuner = tnr("grid_search"),
  task = task,
  learner = learner,
  resampling = hout,
  measures = measure,
  search_space = search_space,
  term_evals = 25
)


INFO  [10:32:44.363] [bbotk] Starting to optimize 2 parameter(s) with '<OptimizerBatchGridSearch>' and '<TerminatorEvals> [n_evals=25, k=0]'
INFO  [10:32:44.371] [bbotk] Evaluating 1 configuration(s)
INFO  [10:32:44.377] [mlr3] Running benchmark with 1 resampling iterations
INFO  [10:32:44.381] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 1/1)
INFO  [10:32:44.392] [mlr3] Finished benchmark
INFO  [10:32:44.411] [bbotk] Result of batch 1:
INFO  [10:32:44.414] [bbotk]  0.001       10  0.2391304        0      0            0.007
INFO  [10:32:44.414] [bbotk]                                 uhash
INFO  [10:32:44.414] [bbotk]  63635c65-3dc5-4f9a-bb45-0453e27e4682
INFO  [10:32:44.416] [bbotk] Evaluating 1 configuration(s)
INFO  [10:32:44.421] [mlr3] Running benchmark with 1 resampling iterations
INFO  [10:32:44.425] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 1/1)
INFO  [10:32:44.436] [mlr3] Finished benchmark
INFO  [10:32:44.457] [bbotk] Result of batch 2:
INFO

In [45]:
instance$result_learner_param_vals


In [46]:
instance$result_y


In [47]:
learner$param_set$values <- instance$result_learner_param_vals
learner$train(task)


mlr3也支持同时设定多个性能指标：

In [48]:
measures <- msrs(c(
    "classif.ce",
    "time_train"
)) # 设定多个评价指标

evals20 <- trm("evals", n_evals = 20)

instance <- TuningInstanceMultiCrit$new(
    task = task,
    learner = learner,
    resampling = hout,
    measures = measures,
    search_space = search_space,
    terminator = evals20
)

tuner$optimize(instance)


TuningInstanceMultiCrit is deprecated. Use TuningInstanceBatchMultiCrit instead.



INFO  [10:33:54.775] [bbotk] Starting to optimize 2 parameter(s) with '<OptimizerBatchGridSearch>' and '<TerminatorEvals> [n_evals=20, k=0]'
INFO  [10:33:54.786] [bbotk] Evaluating 1 configuration(s)
INFO  [10:33:54.791] [mlr3] Running benchmark with 1 resampling iterations
INFO  [10:33:54.795] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 1/1)
INFO  [10:33:54.809] [mlr3] Finished benchmark
INFO  [10:33:54.840] [bbotk] Result of batch 1:
INFO  [10:33:54.843] [bbotk]  0.02575        8   0.273913      0.005        0      0            0.009
INFO  [10:33:54.843] [bbotk]                                 uhash
INFO  [10:33:54.843] [bbotk]  80e93081-e308-4157-8b83-00a8437ddefb
INFO  [10:33:54.846] [bbotk] Evaluating 1 configuration(s)
INFO  [10:33:54.852] [mlr3] Running benchmark with 1 resampling iterations
INFO  [10:33:54.857] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 1/1)
INFO  [10:33:54.871] [mlr3] Finished benchmark
INFO  [10:33:54.904] [bbotk] Result of 

cp,minsplit,learner_param_vals,x_domain,classif.ce,time_train
<dbl>,<int>,<list>,<list>,<dbl>,<dbl>
0.1,6,"0.1, 6.0, 0.0","0.1, 6.0",0.273913,0.003
0.001,8,"0.001, 8.000, 0.000","0.001, 8.000",0.2652174,0.004


In [49]:
instance$result_learner_param_vals


查看结果：

In [50]:
instance$result_learner_param_vals


In [51]:
instance$rusult_y


NULL

### 以上就是第一种方法，接下来介绍第二种方法

这种方式方法把调整参数、将调整好的参数应用于模型放到一起了，但是也需要提前设定好各种需要的参数。

In [52]:
task <- tsk("pima") # 创建任务

leanrer <- lrn("classif.rpart") # 选择学习器

search_space <- ps(
    cp = p_dbl(0.001, 0.1),
    minsplit = p_int(1, 10)
) # 设定搜索范围

terminator <- trm("evals", n_evals = 10) # 设定停止标志

tuner <- tnr("random_search") # 选择搜索方法

resampling <- rsmp("holdout") # 选择重抽样方法

measure <- msr("classif.acc") # 选择评价指标

# 训练
at <- AutoTuner$new(
    learner = learner,
    resampling = resampling,
    search_space = search_space,
    measure = measure,
    tuner = tuner,
    terminator = terminator
)


自动选择最优参数并作用于数据：

In [53]:
at$train(task)


INFO  [10:37:18.978] [bbotk] Starting to optimize 2 parameter(s) with '<OptimizerBatchRandomSearch>' and '<TerminatorEvals> [n_evals=10, k=0]'
INFO  [10:37:18.996] [bbotk] Evaluating 1 configuration(s)
INFO  [10:37:19.002] [mlr3] Running benchmark with 1 resampling iterations
INFO  [10:37:19.006] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 1/1)
INFO  [10:37:19.017] [mlr3] Finished benchmark
INFO  [10:37:19.038] [bbotk] Result of batch 1:
INFO  [10:37:19.041] [bbotk]  0.0839448       10   0.6835938        0      0            0.006
INFO  [10:37:19.041] [bbotk]                                 uhash
INFO  [10:37:19.041] [bbotk]  26d2d7ec-4960-460e-90db-18cf5b4ac7b5
INFO  [10:37:19.049] [bbotk] Evaluating 1 configuration(s)
INFO  [10:37:19.056] [mlr3] Running benchmark with 1 resampling iterations
INFO  [10:37:19.061] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 1/1)
INFO  [10:37:19.074] [mlr3] Finished benchmark
INFO  [10:37:19.095] [bbotk] Result of batch 

In [54]:
at$predict(task)


<PredictionClassif> for 768 observations:
    row_ids truth response
          1   pos      pos
          2   neg      neg
          3   pos      neg
---                       
        766   neg      neg
        767   pos      neg
        768   neg      neg

这个方法也有个简便写法：

In [55]:
auto_learner <- auto_tuner(
  learner = learner,
  resampling = resampling,
  measure = measure,
  search_space = search_space,
  tuner = tnr("random_search", batch_size = 2),
  term_evals = 10
)

auto_learner$train(task)


INFO  [10:38:23.192] [bbotk] Starting to optimize 2 parameter(s) with '<OptimizerBatchRandomSearch>' and '<TerminatorEvals> [n_evals=10, k=0]'
INFO  [10:38:23.211] [bbotk] Evaluating 2 configuration(s)
INFO  [10:38:23.218] [mlr3] Running benchmark with 2 resampling iterations
INFO  [10:38:23.224] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 1/1)
INFO  [10:38:23.240] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 1/1)
INFO  [10:38:23.253] [mlr3] Finished benchmark
INFO  [10:38:23.290] [bbotk] Result of batch 1:
INFO  [10:38:23.292] [bbotk]  0.07240712        8   0.7421875        0      0            0.008
INFO  [10:38:23.292] [bbotk]  0.06820181        4   0.7421875        0      0            0.008
INFO  [10:38:23.292] [bbotk]                                 uhash
INFO  [10:38:23.292] [bbotk]  cb871e89-1e70-4c21-ac18-5e72a59bcd10
INFO  [10:38:23.292] [bbotk]  7fabe98d-ac4d-4e94-80c6-f9df35799f19
INFO  [10:38:23.299] [bbotk] Evaluating 2 configuration(s)
INFO

In [56]:
auto_learner$predict(task)


<PredictionClassif> for 768 observations:
    row_ids truth response
          1   pos      pos
          2   neg      neg
          3   pos      neg
---                       
        766   neg      neg
        767   pos      neg
        768   neg      neg

## 超参数设定的方法

每次单独设置超参数的范围等可能会显得比较笨重无聊，mlr3也提供另外一种可以在选择学习器时进行设定超参数的方法。



In [57]:
# 在选择学习器时设置超参数范围
learner <- lrn("classif.svm")
learner$param_set$values$kernel <- "polynomial"
learner$param_set$values$degree <- to_tune(lower = 1, upper = 3)

print(learner$param_set$search_space())


“Package 'e1071' required but not installed for Learner 'classif.svm'”


<ParamSet(1)>
       id    class lower upper nlevels        default  value
   <char>   <char> <num> <num>   <num>         <list> <list>
1: degree ParamInt     1     3       3 <NoDefault[0]>       


但其实这样也有问题，这个方法要求你对算法很熟悉，能够记住所有超参数记忆它们在mlr3中的拼写！但很显然这有点困难，所有我还是推荐第一种，每次单独设置，记不住还可以查看一下具体的超参数。

## 参数依赖

某些超参数只有在某些条件下才有效，比如支持向量机（SVM），它的degree参数只有在kernel是polynomial时才有效，这种情况也可以在mlr3中设置好。

In [63]:
# library(data.table)
search_space <- ps(
    cost = p_dbl(-1, 1,
        trafo = function(x) 10^x
    ), # 可进行数据变换
    kernel = p_fct(c(
        "polynomial",
        "radial"
    )),
    degree = p_int(1, 3,
        depends = kernel == "polynomial"
    ) # 设置参数依赖
)


In [64]:
generate_design_grid(search_space, 3)


<Design> with 12 rows:
     cost     kernel degree
    <num>     <char>  <int>
 1:    -1 polynomial      1
 2:    -1 polynomial      2
 3:    -1 polynomial      3
 4:    -1     radial     NA
 5:     0 polynomial      1
 6:     0 polynomial      2
 7:     0 polynomial      3
 8:     0     radial     NA
 9:     1 polynomial      1
10:     1 polynomial      2
11:     1 polynomial      3
12:     1     radial     NA

In [65]:
data.table::rbindlist(generate_design_grid(search_space, 3)$transpose(),
    fill = TRUE
)


cost,kernel,degree
<dbl>,<chr>,<int>
0.1,polynomial,1.0
0.1,polynomial,2.0
0.1,polynomial,3.0
0.1,radial,
1.0,polynomial,1.0
1.0,polynomial,2.0
1.0,polynomial,3.0
1.0,radial,
10.0,polynomial,1.0
10.0,polynomial,2.0


进行以上设置后在进行后面的操作时不会出错，自动处理。