# Get started with TensorBoard

在机器学习中，要改进一些东西，你通常需要能够度量它。TensorBoard是提供机器学习工作流程中所需的度量和可视化的工具。它能够跟踪实验指标，如损失和准确性，可视化模型 graph，将嵌入投影到较低维空间，等等。

此快速入门将展示如何快速开始使用TensorBoard。本网站上的其余指南提供了有关特定功能的更多详细信息，其中许多不包括在这里。

In [1]:
# Load the TensorBoard notebook extension
%load_ext tensorboard

In [2]:
import tensorflow as tf
import datetime

In [3]:
# Clear any logs from previous runs
!rm -rf ./logs/ 

以MNIST数据集为例，规范化数据并编写一个函数，创建一个简单的Keras模型，将图像分类为10个类。

In [4]:
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

def create_model():
    return tf.keras.models.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation='relu'),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation='softmax')
    ])

## Using TensorBoard with Keras Model.fit()

使用Keras的Model.fit()进行训练时，添加tf.keras.callbacks.TensorBoard回调可确保日志的创建和存储。此外，使用 `histogram_freq=1` 启用每个 epoch 的直方图计算（默认情况下关闭）。

将日志放置在时间戳的子目录中，以便轻松选择不同的训练运行。

In [5]:
model = create_model()
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

model.fit(x=x_train, 
          y=y_train, 
          epochs=5, 
          validation_data=(x_test, y_test), 
          callbacks=[tensorboard_callback])

2022-03-21 09:34:51.589092: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-03-21 09:34:51.589423: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-03-21 09:34:51.693392: I tensorflow/core/profiler/lib/profiler_session.cc:131] Profiler session initializing.
2022-03-21 09:34:51.693400: I tensorflow/core/profiler/lib/profiler_session.cc:146] Profiler session started.
2022-03-21 09:34:51.693460: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session tear down.


Metal device set to: Apple M1


2022-03-21 09:34:52.298402: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2022-03-21 09:34:52.299006: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-03-21 09:34:52.409697: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


Epoch 1/5
  33/1875 [..............................] - ETA: 9s - loss: 1.1729 - accuracy: 0.6525 

2022-03-21 09:34:53.263247: I tensorflow/core/profiler/lib/profiler_session.cc:131] Profiler session initializing.
2022-03-21 09:34:53.263255: I tensorflow/core/profiler/lib/profiler_session.cc:146] Profiler session started.
2022-03-21 09:34:53.268718: I tensorflow/core/profiler/lib/profiler_session.cc:66] Profiler session collecting data.
2022-03-21 09:34:53.269358: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session tear down.
2022-03-21 09:34:53.270442: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: logs/fit/20220321-093451/train/plugins/profile/2022_03_21_09_34_53

2022-03-21 09:34:53.270844: I tensorflow/core/profiler/rpc/client/save_profile.cc:142] Dumped gzipped tool data for trace.json.gz to logs/fit/20220321-093451/train/plugins/profile/2022_03_21_09_34_53/Shawns.local.trace.json.gz
2022-03-21 09:34:53.271851: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: logs/fit/20220321-093451/train/plugin



2022-03-21 09:35:00.600243: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x298160d30>

通过命令行或在 Notebook 中启动TensorBoard。这两个接口通常相同。在 Notebook 中，使用 %tensorboard。在命令行上，在没有“%”的情况下运行相同的命令。

In [6]:
%tensorboard --logdir logs/fit

显示的 dashboard 的简要概述（顶部导航栏中的标签）：

- Scalars dashboard 显示损失和指标如何随每个 epoch 而变化。您还可以使用它来跟踪训练速度、学习率和其他标量值。
- Graphs dashboard 可帮助您可视化模型。在这种情况下，会显示 Keras 层的 graph，这可以帮助您确保它正确构建。
- Distributions 和 Histograms dashboard显示张量随着时间的推移的分布。这有助于可视化权重和偏置，并验证它们是否正在以预期的方式变化。

当您记录其他类型的数据时，会自动启用其他TensorBoard插件。例如，Keras TensorBoard回调还允许您记录图像和嵌入。您可以通过单击右上角的 “inactive” 下拉菜单来查看TensorBoard中可用的其他插件。

## Using TensorBoard with other methods

使用 `tf.GradientTape()` 等方法进行训练时，使用 `tf.summary` 记录所需的信息。

使用与上述相同的数据集，但将其转换为 `tf.data.Dataset` 以利用批处理功能：

In [8]:
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))

train_dataset = train_dataset.shuffle(60000).batch(64)
test_dataset = test_dataset.batch(64)

训练代码遵循高级快速入门教程，但展示了如何将指标记录到TensorBoard。选择损失和优化器：

In [10]:
loss_object = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()

创建有状态指标，可用于在训练期间积累 value，并在任何时候记录：

In [11]:
# Define our metrics
train_loss = tf.keras.metrics.Mean('train_loss', dtype=tf.float32)
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy('train_accuracy')
test_loss = tf.keras.metrics.Mean('test_loss', dtype=tf.float32)
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy('test_accuracy')

定义训练和测试函数：

In [13]:
def train_step(model, optimizer, x_train, y_train):
    with tf.GradientTape() as tape:
        predictions = model(x_train, training=True)
        loss = loss_object(y_train, predictions)
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

    train_loss(loss)
    train_accuracy(y_train, predictions)

def test_step(model, x_test, y_test):
    predictions = model(x_test)
    loss = loss_object(y_test, predictions)

    test_loss(loss)
    test_accuracy(y_test, predictions)

设置summary writers，将摘要写入其他日志目录中的磁盘：

In [14]:
current_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
train_log_dir = 'logs/gradient_tape/' + current_time + '/train'
test_log_dir = 'logs/gradient_tape/' + current_time + '/test'
train_summary_writer = tf.summary.create_file_writer(train_log_dir)
test_summary_writer = tf.summary.create_file_writer(test_log_dir)

开始训练。使用 `tf.summary.scalar()` 在 summary writers  范围内记录训练/测试期间的指标（损失和准确性），以便将 summary 写入磁盘。您可以控制要记录哪些指标以及记录频率。其他 `tf.summary` 函数允许记录其他类型的数据。

In [15]:
model = create_model() # reset our model

EPOCHS = 5

for epoch in range(EPOCHS):
    for (x_train, y_train) in train_dataset:
        train_step(model, optimizer, x_train, y_train)
    with train_summary_writer.as_default():
        tf.summary.scalar('loss', train_loss.result(), step=epoch)
        tf.summary.scalar('accuracy', train_accuracy.result(), step=epoch)

    for (x_test, y_test) in test_dataset:
        test_step(model, x_test, y_test)
    with test_summary_writer.as_default():
        tf.summary.scalar('loss', test_loss.result(), step=epoch)
        tf.summary.scalar('accuracy', test_accuracy.result(), step=epoch)

    template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'
    print (template.format(epoch+1,
                         train_loss.result(), 
                         train_accuracy.result()*100,
                         test_loss.result(), 
                         test_accuracy.result()*100))

    # Reset metrics every epoch
    train_loss.reset_states()
    test_loss.reset_states()
    train_accuracy.reset_states()
    test_accuracy.reset_states()

Epoch 1, Loss: 0.2440771609544754, Accuracy: 92.81666564941406, Test Loss: 0.12053029239177704, Test Accuracy: 96.5
Epoch 2, Loss: 0.10263629257678986, Accuracy: 96.87833404541016, Test Loss: 0.0857343003153801, Test Accuracy: 97.38999938964844
Epoch 3, Loss: 0.07240449637174606, Accuracy: 97.7750015258789, Test Loss: 0.07199572771787643, Test Accuracy: 97.74000549316406
Epoch 4, Loss: 0.05271098390221596, Accuracy: 98.36666870117188, Test Loss: 0.07082069665193558, Test Accuracy: 97.7800064086914
Epoch 5, Loss: 0.04215889424085617, Accuracy: 98.6866683959961, Test Loss: 0.06146417558193207, Test Accuracy: 97.97000122070312


再次打开 TensorBoard，这次将其指向新的日志目录。我们也可以启动 TensorBoard 来监控训练的进展。

In [16]:
%tensorboard --logdir logs/gradient_tape

就这样！您现在已经了解如何通过Keras回调和tf.summary使用TensorBoard来获得更多自定义场景。

## TensorBoard.dev: Host and share your ML experiment results

TensorBoard.dev是一项免费的公共服务，使您能够上传TensorBoard日志并获得永久链接，可以在学术论文、博客文章、社交媒体等中与每个人共享。这可以实现更好的可复现性和协作性。

要使用TensorBoard.dev，请运行以下命令：


In [None]:
!tensorboard dev upload \
  --logdir logs/fit \
  --name "(optional) My latest experiment" \
  --description "(optional) Simple comparison of several hyperparameters" \
  --one_shot


***** TensorBoard Uploader *****

This will upload your TensorBoard logs to https://tensorboard.dev/ from
the following directory:

logs/fit

This TensorBoard will be visible to everyone. Do not upload sensitive
data.

Your use of this service is subject to Google's Terms of Service
<https://policies.google.com/terms> and Privacy Policy
<https://policies.google.com/privacy>, and TensorBoard.dev's Terms of Service
<https://tensorboard.dev/policy/terms/>.

This notice will not be shown again while you are logged into the uploader.
To log out, run `tensorboard dev auth revoke`.

Continue? (yes/NO) 