# Hyperparameter Tuning with the HParams Dashboard

在构建机器学习模型时，您需要选择各种超参数），例如 layer 中的 dropout rate 或学习率。这些决策会影响模型指标，例如准确性。因此，机器学习工作流程的一个重要步骤是为您的问题确定最佳超参数，这通常涉及实验。这个过程被称为“超参数优化”或“超参数调优”。

TensorBoard中的HParams Dashboard 提供了几种工具来帮助确定最佳实验或最有前途的超参数集。

本教程将重点介绍以下步骤：

- 实验设置和HParams summary
- Adapt TensorFlow runs 以记录超参数和指标
- 开始 runs，并将其全部记录在一个父目录下
- 在 TensorBoard 的 HParams Dashboard 中可视化结果

注意：HParams summary API 和 dashboard UI 处于预览阶段，并将随着时间的推移而变化。

首先安装 TF 2.0 并加载 TensorBoard Notebook 扩展：

In [1]:
# Load the TensorBoard notebook extension
%load_ext tensorboard

In [2]:
# Clear any logs from previous runs
!rm -rf ./logs/ 

导入 TensorFlow 和 TensorBoard HParams 插件：

In [3]:
import tensorflow as tf
from tensorboard.plugins.hparams import api as hp

下载FashionMNIST数据集并缩放它：

In [5]:
fashion_mnist = tf.keras.datasets.fashion_mnist

(x_train, y_train),(x_test, y_test) = fashion_mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# 1. Experiment setup and the HParams experiment summary

实验模型中的三个超参数：

- 第一个 Dense 层的神经元数量
- dropout 层的 dropout rate
- 优化器

列出要尝试的值，并将实验配置记录到TensorBoard。此步骤是可选的：您可以提供域信息，以更精确地过滤UI中的超参数，并且您可以指定应该显示哪些指标。

In [7]:
HP_NUM_UNITS = hp.HParam('num_units', hp.Discrete([16, 32]))
HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.1, 0.2))
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd']))

METRIC_ACCURACY = 'accuracy'

with tf.summary.create_file_writer('logs/hparam_tuning').as_default():
    hp.hparams_config(
        hparams=[HP_NUM_UNITS, HP_DROPOUT, HP_OPTIMIZER],
        metrics=[hp.Metric(METRIC_ACCURACY, display_name='Accuracy')],
    )

Metal device set to: Apple M1


2022-03-28 13:40:57.970326: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-03-28 13:40:57.971514: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


如果您选择跳过此步骤，您可以在任何使用 `HParam` 值的地方使用字符串文字：例如，`hparams['dropout']` 而不是 `hparams[HP_DROPOUT]`。

# 2. Adapt TensorFlow runs to log hyperparameters and metrics

模型将非常简单：两个 Dense 层，中间有一个 Dropout 层。尽管超参数不再硬编码，但训练代码看起来会很熟悉。超参数在 `hparams` 字典中提供，并在整个训练函数中使用：

In [8]:
def train_test_model(hparams):
    model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(hparams[HP_NUM_UNITS], activation=tf.nn.relu),
    tf.keras.layers.Dropout(hparams[HP_DROPOUT]),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ])
    model.compile(
      optimizer=hparams[HP_OPTIMIZER],
      loss='sparse_categorical_crossentropy',
      metrics=['accuracy'],
    )

    model.fit(x_train, y_train, epochs=1) # Run with 1 epoch to speed things up for demo purposes
    _, accuracy = model.evaluate(x_test, y_test)
    return accuracy 

对于每次 run ，记录带有超参数和最终精度 hparams summary 

In [9]:
def run(run_dir, hparams):
    with tf.summary.create_file_writer(run_dir).as_default():
        hp.hparams(hparams)  # record the values used in this trial
        accuracy = train_test_model(hparams)
        tf.summary.scalar(METRIC_ACCURACY, accuracy, step=1)

在训练Keras模型时，您可以使用回调，而不是直接编写它们：

```
model.fit(
    ...,
    callbacks=[
        tf.keras.callbacks.TensorBoard(logdir),  # log metrics
        hp.KerasCallback(logdir, hparams),  # log hparams
    ],
)
```


# 3. Start runs and log them all under one parent directory

您现在可以尝试多个实验，用一组不同的超参数训练每个实验。

为了简单起见，请使用网格搜索：尝试离散参数的所有组合，以及实值参数的下界和上界。对于更复杂的场景，随机选择每个超参数值可能更有效（这称为随机搜索）。可以使用更先进的方法。

做一些实验，这需要几分钟：

In [10]:
session_num = 0

for num_units in HP_NUM_UNITS.domain.values:
    for dropout_rate in (HP_DROPOUT.domain.min_value, HP_DROPOUT.domain.max_value):
    for optimizer in HP_OPTIMIZER.domain.values:
        hparams = {
          HP_NUM_UNITS: num_units,
          HP_DROPOUT: dropout_rate,
          HP_OPTIMIZER: optimizer,
        }
        run_name = "run-%d" % session_num
        print('--- Starting trial: %s' % run_name)
        print({h.name: hparams[h] for h in hparams})
        run('logs/hparam_tuning/' + run_name, hparams)
        session_num += 1

--- Starting trial: run-0
{'num_units': 16, 'dropout': 0.1, 'optimizer': 'adam'}


2022-03-28 15:37:47.290042: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2022-03-28 15:37:47.293251: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-03-28 15:37:47.441948: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


 47/313 [===>..........................] - ETA: 0s - loss: 0.4836 - accuracy: 0.8431

2022-03-28 15:37:56.201833: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


--- Starting trial: run-1
{'num_units': 16, 'dropout': 0.1, 'optimizer': 'sgd'}
  22/1875 [..............................] - ETA: 9s - loss: 2.2635 - accuracy: 0.1634

2022-03-28 15:37:57.780612: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


 46/313 [===>..........................] - ETA: 0s - loss: 0.6609 - accuracy: 0.7860

2022-03-28 15:38:06.135873: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


--- Starting trial: run-2
{'num_units': 16, 'dropout': 0.2, 'optimizer': 'adam'}
  24/1875 [..............................] - ETA: 8s - loss: 2.0892 - accuracy: 0.2435

2022-03-28 15:38:07.583108: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


 48/313 [===>..........................] - ETA: 0s - loss: 0.4948 - accuracy: 0.8229

2022-03-28 15:38:15.869541: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


--- Starting trial: run-3
{'num_units': 16, 'dropout': 0.2, 'optimizer': 'sgd'}
  27/1875 [..............................] - ETA: 7s - loss: 2.2483 - accuracy: 0.1424

2022-03-28 15:38:17.293217: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


 47/313 [===>..........................] - ETA: 0s - loss: 0.6896 - accuracy: 0.7646

2022-03-28 15:38:24.988623: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


--- Starting trial: run-4
{'num_units': 32, 'dropout': 0.1, 'optimizer': 'adam'}
  21/1875 [..............................] - ETA: 9s - loss: 1.9314 - accuracy: 0.3259

2022-03-28 15:38:26.377637: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


 49/313 [===>..........................] - ETA: 0s - loss: 0.4334 - accuracy: 0.8438

2022-03-28 15:38:35.110746: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


--- Starting trial: run-5
{'num_units': 32, 'dropout': 0.1, 'optimizer': 'sgd'}
  27/1875 [..............................] - ETA: 7s - loss: 2.1626 - accuracy: 0.2292

2022-03-28 15:38:36.317028: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


 45/313 [===>..........................] - ETA: 0s - loss: 0.6165 - accuracy: 0.8049

2022-03-28 15:38:44.378772: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


--- Starting trial: run-6
{'num_units': 32, 'dropout': 0.2, 'optimizer': 'adam'}
  23/1875 [..............................] - ETA: 8s - loss: 1.8615 - accuracy: 0.3383

2022-03-28 15:38:45.731374: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


 47/313 [===>..........................] - ETA: 0s - loss: 0.4331 - accuracy: 0.8444

2022-03-28 15:38:54.389626: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


--- Starting trial: run-7
{'num_units': 32, 'dropout': 0.2, 'optimizer': 'sgd'}
  26/1875 [..............................] - ETA: 7s - loss: 2.1596 - accuracy: 0.2091

2022-03-28 15:38:55.741798: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


 48/313 [===>..........................] - ETA: 0s - loss: 0.6265 - accuracy: 0.8021

2022-03-28 15:39:03.860238: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.




# 4. Visualize the results in TensorBoard's HParams plugin

现在可以打开 HParams dashboard。启动 TensorBoard，然后单击顶部的 “HParams”。

- 筛选 dashboard 中显示的超参数/指标
- 筛选 dashboard 中显示的超参数/指标 值
- 筛选运行状态（running，success，...）
- 在表视图中按超参数/指标排序
- 要显示的 session 组数量（当有很多实验时对性能有用）

HParams dashboard 有三种不同的视图，包含各种有用信息：

- table 视图列出了运行、超参数和指标。
- Parallel Coordinates View 将每个运行显示为一条穿过每个超参数和度量轴的线。在任何轴上单击并拖动鼠标，以标记仅突出显示通过该轴的运行的区域。这对于确定哪些组超参数最为重要非常有用。axis 本身可以通过拖动它们来重新排序。
- Scatter Plot View 显示将每个超参数/指标与每个指标进行比较的绘图。这有助于识别相关性。单击并拖动以在特定绘图中选择一个区域，并在其他绘图中突出显示这些会话。

可以单击  table row、parallel coordinates line 和 scatter plot market，以查看指标图，作为该 session 训练步骤的函数（尽管在本教程中，每次运行仅使用一个步骤）。

要进一步探索 HParams dashboard 的功能，请下载一组带有更多实验的预先生成的日志：

In [11]:
%%bash
wget -q 'https://storage.googleapis.com/download.tensorflow.org/tensorboard/hparams_demo_logs.zip'
unzip -q hparams_demo_logs.zip -d logs/hparam_demo

在TensorBoard中查看以下日志：

In [12]:
%tensorboard --logdir logs/hparam_demo