# Examining the TensorFlow Graph

## Overview

TensorBoard的 **Graphs dashboard** 是检查TensorFlow模型的强大工具。您可以快速查看模型结构的概念图，并确保它与您预期的设计相匹配。您还可以查看操作级别的graph，以了解TensorFlow如何理解您的程序。检查操作级别图可以让您深入了解如何更改模型。例如，如果训练进度比预期的要慢，您可以重新设计模型。

本教程简要概述了如何生成 graph 诊断数据，并将其可视化到 TensorBoard 的 Graphs dashboard 中。您将为 Fashion-MNIS T数据集定义和训练一个简单的Keras序列模型，并学习如何记录和检查模型 graph。您还将使用跟踪API为使用 `tf.function` 创建的函数 生成 graph 数据。

## Setup

In [1]:
# Load the TensorBoard notebook extension.
%load_ext tensorboard

In [2]:
from datetime import datetime
from packaging import version

import tensorflow as tf
from tensorflow import keras

print("TensorFlow version: ", tf.__version__)
assert version.parse(tf.__version__).release[0] >= 2, \
    "This notebook requires TensorFlow 2.0 or above."

TensorFlow version:  2.6.0


In [3]:
import tensorboard
tensorboard.__version__

'2.6.0'

In [4]:
# Clear any logs from previous runs
!rm -rf ./logs/ 

## Define a Keras model

在本例中，分类器是一个简单的四层 Sequential 模型。

In [6]:
# Define the model.
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10, activation='softmax')
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'])

Metal device set to: Apple M1


2022-03-22 09:34:43.620959: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-03-22 09:34:43.621111: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


下载并准备训练数据。

In [8]:
(train_images, train_labels), _ = keras.datasets.fashion_mnist.load_data()
train_images = train_images / 255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


## Train the model and log data

在训练之前，定义Keras TensorBoard回调，指定日志目录。通过将此回调传递给 Model.fit()，您可以确保在 TensorBoard 中记录 graph 数据以进行可视化。

In [9]:
# Define the Keras TensorBoard callback.
logdir="logs/fit/" + datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)

# Train the model.
model.fit(
    train_images,
    train_labels, 
    batch_size=64,
    epochs=5, 
    callbacks=[tensorboard_callback])

2022-03-22 09:36:48.311144: I tensorflow/core/profiler/lib/profiler_session.cc:131] Profiler session initializing.
2022-03-22 09:36:48.311168: I tensorflow/core/profiler/lib/profiler_session.cc:146] Profiler session started.
2022-03-22 09:36:48.311834: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session tear down.
2022-03-22 09:36:49.035495: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2022-03-22 09:36:49.035970: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-03-22 09:36:49.133932: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


Epoch 1/5
 33/938 [>.............................] - ETA: 4s - loss: 1.7715 - accuracy: 0.3655

2022-03-22 09:36:49.454897: I tensorflow/core/profiler/lib/profiler_session.cc:131] Profiler session initializing.
2022-03-22 09:36:49.454904: I tensorflow/core/profiler/lib/profiler_session.cc:146] Profiler session started.
2022-03-22 09:36:49.461907: I tensorflow/core/profiler/lib/profiler_session.cc:66] Profiler session collecting data.
2022-03-22 09:36:49.462593: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session tear down.
2022-03-22 09:36:49.463736: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: logs/fit/20220322-093648/train/plugins/profile/2022_03_22_09_36_49

2022-03-22 09:36:49.464160: I tensorflow/core/profiler/rpc/client/save_profile.cc:142] Dumped gzipped tool data for trace.json.gz to logs/fit/20220322-093648/train/plugins/profile/2022_03_22_09_36_49/Shawns.local.trace.json.gz
2022-03-22 09:36:49.465286: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: logs/fit/20220322-093648/train/plugin

Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x28e801f10>

## Op-level graph

启动TensorBoard，等待几秒钟等待 UI 加载。通过点击顶部的 “Graphs” 来选择 Graph dashboard。

In [11]:
%tensorboard --logdir logs

您还可以选择使用TensorBoard.dev创建托管的、可共享的实验。

In [12]:
!tensorboard dev upload \
  --logdir logs \
  --name "Sample op-level graph" \
  --one_shot


***** TensorBoard Uploader *****

This will upload your TensorBoard logs to https://tensorboard.dev/ from
the following directory:

logs

This TensorBoard will be visible to everyone. Do not upload sensitive
data.

Your use of this service is subject to Google's Terms of Service
<https://policies.google.com/terms> and Privacy Policy
<https://policies.google.com/privacy>, and TensorBoard.dev's Terms of Service
<https://tensorboard.dev/policy/terms/>.

This notice will not be shown again while you are logged into the uploader.
To log out, run `tensorboard dev auth revoke`.

Continue? (yes/NO) ^C
Traceback (most recent call last):
  File "/Users/shawnd/miniforge3/envs/keras/bin/tensorboard", line 8, in <module>
    sys.exit(run_main())
  File "/Users/shawnd/miniforge3/envs/keras/lib/python3.8/site-packages/tensorboard/main.py", line 46, in run_main
    app.run(tensorboard.main, flags_parser=tensorboard.configure)
  File "/Users/shawnd/miniforge3/envs/keras/lib/python3.8/site-packages/abs

默认情况下，TensorBoard显示 **op-leval graph**。（在左侧，您可以看到选择的“默认”标签。）请注意， graph是倒置的；数据从下到上流动，因此与代码相比是颠倒的。然而，您可以看到该图与Keras模型定义非常匹配，并增加了其他计算节点的边缘。

graph 通常非常大，因此您可以操作 graph 可视化：

- 滚动放大和缩小
- 拖动以平移
- 双击切换节点扩展（节点可以是其他节点的容器）

您还可以通过单击节点查看元数据。这允许您查看输入、输出、形状和其他详细信息。

## Conceptual graph

除了 execution graph 外，TensorBoard还显示一个 **conceptual graph**。这只是Keras模型的view。如果您正在重用保存的模型，并且想要检查或验证其结构，这可能会有用。

要查看概念图，请选择“keras”标签。在本例中，您将看到一个折叠的 **Sequential** 节点。双击节点以查看模型的结构：

## Graphs of tf.functions

到目前为止，这些示例描述了 Keras 模型的 graph，其中 graph 是通过定义 Keras 层和调用 Model.fit() 创建的。

您可能会遇到这样的情况，您需要使用 `tf.function` 注释 “autograph”，即将Python计算函数转换为高性能 TensorFlow graph。对于这些情况，您可以使用 **TensorFlow Summary Trace API** 在 TensorBoard 中记录 autographed functions 进行可视化。

要使用Summary Trace API：

- 使用 `tf.function` 定义和注释函数
- 在函数调用前使用 `tf.summary.trace_on()`。
- 通过传递 `profiler=True` 将配置文件信息（内存、CPU时间）添加到 graph 中
- 使用 Summary file writer，调用 `tf.summary.trace_export()` 来保存日志数据

然后，您可以使用TensorBoard查看您的函数表现如何。

In [14]:
# The function to be traced.
@tf.function
def my_func(x, y):
  # A simple hand-rolled layer.
  return tf.nn.relu(tf.matmul(x, y))

# Set up logging.
stamp = datetime.now().strftime("%Y%m%d-%H%M%S")
logdir = 'logs/func/%s' % stamp
writer = tf.summary.create_file_writer(logdir)

# Sample data for your function.
x = tf.random.uniform((3, 3))
y = tf.random.uniform((3, 3))

# Bracket the function call with
# tf.summary.trace_on() and tf.summary.trace_export().
tf.summary.trace_on(graph=True, profiler=True)

# Call only one tf.function when tracing.
z = my_func(x, y)
with writer.as_default():
    tf.summary.trace_export(
        name="my_func_trace",
        step=0,
        profiler_outdir=logdir)

Instructions for updating:
use `tf.profiler.experimental.start` instead.
Instructions for updating:
use `tf.profiler.experimental.stop` instead.
Instructions for updating:
`tf.python.eager.profiler` has deprecated, use `tf.profiler` instead.
Instructions for updating:
`tf.python.eager.profiler` has deprecated, use `tf.profiler` instead.


2022-03-22 09:57:06.764813: I tensorflow/core/profiler/lib/profiler_session.cc:131] Profiler session initializing.
2022-03-22 09:57:06.764837: I tensorflow/core/profiler/lib/profiler_session.cc:146] Profiler session started.
2022-03-22 09:57:06.804841: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2022-03-22 09:57:06.877904: I tensorflow/core/profiler/lib/profiler_session.cc:66] Profiler session collecting data.
2022-03-22 09:57:06.878126: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session tear down.


In [15]:
%tensorboard --logdir logs/func

您现在可以看到TensorBoard理解的功能结构。单击“Profile”按钮以查看CPU和内存统计信息。