# 参数优化之贝叶斯优化算法

<div class="alert alert-block alert-success">  
 <b>Version:</b> v0.1 <b>Date:</b> 2020-06-09
  
在这个Notebook中，记录了贝叶斯参数搜索的实现策略。
    
</div>

<div class="alert alert-block alert-info">
<b>💡:</b> 

- **环境依赖**： Fastai v2 (0.0.18), BayesianOptimization
- **数据集**：[ADULT_SAMPLE](http://files.fast.ai/data/examples/adult_sample.tgz) 
</div>

## 数据准备

In [1]:
from fastai2.tabular.all import *

In [2]:
path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')

In [4]:
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
y_names = 'salary'
y_block = CategoryBlock()
splits = RandomSplitter()(range_of(df))

In [5]:
to = TabularPandas(df, procs=procs, cat_names=cat_names, cont_names=cont_names,
                   y_names=y_names, y_block=y_block, splits=splits)

In [6]:
dls = to.dataloaders(bs=512)

## 配置优化算法与搜索策略

In [None]:
!pip install bayesian-optimization -q

In [3]:
from bayes_opt import BayesianOptimization

In [7]:
def fit_with(lr:float, wd:float, dp:float):
  # create a Learner
  config = tabular_config(embed_p=dp, ps=wd)
  learn = tabular_learner(data, layers=[200,100], metrics=accuracy, config=config)
  
  # Train for x epochs
  with learn.no_bar():
    learn.fit_one_cycle(3, lr)
    
  # Save, print, and return the overall accuracy
  acc = float(learn.validate()[1])
  
  return acc

Let's adjust this further to show how we would go about adjusting the learning rate, embedded weight decay, drop out, and layer size:

In [8]:
def fit_with(lr:float, wd:float, dp:float, n_layers:float, layer_1:float, layer_2:float, layer_3:float):

  print(lr, wd, dp)
  if int(n_layers) == 2:
    layers = [int(layer_1), int(layer_2)]
  elif int(n_layers) == 3:
    layers = [int(layer_1), int(layer_2), int(layer_3)]
  else:
    layers = [int(layer_1)]
  config = tabular_config(embed_p=float(dp),
                          ps=float(wd))
  learn = tabular_learner(dls, layers=layers, metrics=accuracy, config = config)

  with learn.no_bar() and learn.no_logging():
    learn.fit(5, lr=float(lr))

  acc = float(learn.validate()[1])

  return acc

Let's try it out

We'll declare our hyper-parameters:

In [9]:
hps = {'lr': (1e-05, 1e-01),
      'wd': (4e-4, 0.4),
      'dp': (0.01, 0.5),
       'n_layers': (1,3),
       'layer_1': (50, 200),
       'layer_2': (100, 1000),
       'layer_3': (200, 2000)}

And now we build the optimizer:

In [10]:
optim = BayesianOptimization(
    f = fit_with, # our fit function
    pbounds = hps, # our hyper parameters to tune
    verbose = 2, # 1 prints out when a maximum is observed, 0 for silent
    random_state=1
)

And now we can search!

In [11]:
%time optim.maximize(n_iter=10)

|   iter    |  target   |    dp     |  layer_1  |  layer_2  |  layer_3  |    lr     | n_layers  |    wd     |
-------------------------------------------------------------------------------------------------------------
0.014684121522803134 0.07482958046651729 0.21434078230426126


| [0m 1       [0m | [0m 0.8415  [0m | [0m 0.2143  [0m | [0m 158.0   [0m | [0m 100.1   [0m | [0m 744.2   [0m | [0m 0.01468 [0m | [0m 1.185   [0m | [0m 0.07483 [0m |
0.06852509784467198 0.3512957275818218 0.1793247562510934


| [0m 2       [0m | [0m 0.8391  [0m | [0m 0.1793  [0m | [0m 109.5   [0m | [0m 584.9   [0m | [0m 954.6   [0m | [0m 0.06853 [0m | [0m 1.409   [0m | [0m 0.3513  [0m |
0.014047289990137426 0.32037752964274446 0.02341992066698382


| [0m 3       [0m | [0m 0.8383  [0m | [0m 0.02342 [0m | [0m 150.6   [0m | [0m 475.6   [0m | [0m 1.206e+0[0m | [0m 0.01405 [0m | [0m 1.396   [0m | [0m 0.3204  [0m |
0.0894617202837497 0.016006291379859792 0.4844481721025048


| [0m 4       [0m | [0m 0.8395  [0m | [0m 0.4844  [0m | [0m 97.01   [0m | [0m 723.1   [0m | [0m 1.778e+0[0m | [0m 0.08946 [0m | [0m 1.17    [0m | [0m 0.01601 [0m |
0.0957893741197487 0.27687409473460917 0.09321690558663875


| [95m 5       [0m | [95m 0.8426  [0m | [95m 0.09322 [0m | [95m 181.7   [0m | [95m 188.5   [0m | [95m 958.0   [0m | [95m 0.09579 [0m | [95m 2.066   [0m | [95m 0.2769  [0m |
0.06191147756969865 0.37690994180463505 0.13594244704069394


| [0m 6       [0m | [0m 0.8345  [0m | [0m 0.1359  [0m | [0m 58.72   [0m | [0m 121.5   [0m | [0m 1.958e+0[0m | [0m 0.06191 [0m | [0m 1.277   [0m | [0m 0.3769  [0m |
0.03866826261417955 0.16855031040289803 0.4601228621079202


| [0m 7       [0m | [0m 0.8395  [0m | [0m 0.4601  [0m | [0m 199.1   [0m | [0m 977.2   [0m | [0m 223.7   [0m | [0m 0.03867 [0m | [0m 1.036   [0m | [0m 0.1686  [0m |
0.0243344917906167 0.3026399240336963 0.10573854076050722


| [0m 8       [0m | [0m 0.8398  [0m | [0m 0.1057  [0m | [0m 180.7   [0m | [0m 272.5   [0m | [0m 977.7   [0m | [0m 0.02433 [0m | [0m 1.126   [0m | [0m 0.3026  [0m |
0.01926217459495932 0.09147910397966807 0.12524115091042773


| [0m 9       [0m | [0m 0.8391  [0m | [0m 0.1252  [0m | [0m 57.32   [0m | [0m 188.9   [0m | [0m 206.0   [0m | [0m 0.01926 [0m | [0m 2.35    [0m | [0m 0.09148 [0m |
0.018285009041055747 0.0030016178873130826 0.023091032083187666


| [0m 10      [0m | [0m 0.8417  [0m | [0m 0.02309 [0m | [0m 182.3   [0m | [0m 997.6   [0m | [0m 1.941e+0[0m | [0m 0.01829 [0m | [0m 2.989   [0m | [0m 0.003002[0m |
0.016475919398227245 0.02899527328407459 0.35565211893540777


| [95m 11      [0m | [95m 0.8437  [0m | [95m 0.3557  [0m | [95m 197.2   [0m | [95m 126.1   [0m | [95m 264.7   [0m | [95m 0.01648 [0m | [95m 2.714   [0m | [95m 0.029   [0m |
0.0758754095037137 0.26350352166402036 0.10013698396219092


| [0m 12      [0m | [0m 0.8371  [0m | [0m 0.1001  [0m | [0m 195.5   [0m | [0m 166.4   [0m | [0m 236.0   [0m | [0m 0.07588 [0m | [0m 1.276   [0m | [0m 0.2635  [0m |
0.08706685388671734 0.2815475542319707 0.0263005243239511


| [0m 13      [0m | [0m 0.8401  [0m | [0m 0.0263  [0m | [0m 199.1   [0m | [0m 990.5   [0m | [0m 1.991e+0[0m | [0m 0.08707 [0m | [0m 2.941   [0m | [0m 0.2815  [0m |
0.05628262681087503 0.1085257299138718 0.26772478114338355


| [0m 14      [0m | [0m 0.8404  [0m | [0m 0.2677  [0m | [0m 152.2   [0m | [0m 105.5   [0m | [0m 755.5   [0m | [0m 0.05628 [0m | [0m 2.52    [0m | [0m 0.1085  [0m |
0.0604172099015805 0.11754051701232375 0.4265112940023884


| [0m 15      [0m | [0m 0.8391  [0m | [0m 0.4265  [0m | [0m 198.0   [0m | [0m 995.7   [0m | [0m 1.118e+0[0m | [0m 0.06042 [0m | [0m 1.485   [0m | [0m 0.1175  [0m |
CPU times: user 1min 2s, sys: 37.4 s, total: 1min 39s
Wall time: 51.7 s


We can grab the best results:

In [12]:
print(optim.max)

{'target': 0.8436732292175293, 'params': {'dp': 0.35565211893540777, 'layer_1': 197.225429657967, 'layer_2': 126.14266384474438, 'layer_3': 264.66132182791586, 'lr': 0.016475919398227245, 'n_layers': 2.7135113787445775, 'wd': 0.02899527328407459}}


And with a few conversions we see:

* The best number of layers was 2
* The first layer a size of 57
* The second layer a size of 100
And then of course our other hyper paramters