##### Copyright 2018 The TensorFlow Authors.

In [1]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# 自定义层

<table class="tfo-notebook-buttons" align="left">
  <td>     <a target="_blank" href="https://tensorflow.google.cn/tutorials/customization/custom_layers"><img src="https://tensorflow.google.cn/images/tf_logo_32px.png">在 TensorFlow.org 查看</a>   </td>
  <td><a target="_blank" href="https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/tutorials/customization/custom_layers.ipynb"><img src="https://tensorflow.google.cn/images/colab_logo_32px.png">在 Google Colab 中运行</a></td>
  <td><a target="_blank" href="https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/tutorials/customization/custom_layers.ipynb"><img src="https://tensorflow.google.cn/images/GitHub-Mark-32px.png">在 GitHub 上查看源代码</a></td>
  <td><a href="https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/tutorials/customization/custom_layers.ipynb"><img src="https://tensorflow.google.cn/images/download_logo_32px.png">下载笔记本</a></td>
</table>

我们建议使用 `tf.keras` 作为构建神经网络的高级 API。也就是说，大多数 TensorFlow API 都支持 Eager Execution 模式。


In [2]:
import tensorflow as tf

2025-05-01 12:43:39.846718: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-01 12:43:39.878937: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [3]:
print(tf.config.list_physical_devices('GPU'))

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


2025-05-01 12:43:43.064063: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-01 12:43:43.067342: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-01 12:43:43.067443: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysf

## 层：常用的实用运算集

在大多数情况下，为机器学习模型编写代码时，您会希望在更高级别的抽象层上操作而非使用各个运算以及处理各个变量。

通常机器学习模型可以表示为简单层的组合与堆叠，并且 TensorFlow 提供了许多常用层的集合，并使您可以方便地从头开始或采用现有层的结构自行编写特定于应用的层。

TensorFlow 在 tf.keras 软件包中提供了完整的 [Keras](https://keras.io) API，Keras 层在构建您自己的模型时非常实用。


In [4]:
# In the tf.keras.layers package, layers are objects. To construct a layer,
# simply construct the object. Most layers take as a first argument the number
# of output dimensions / channels.
layer = tf.keras.layers.Dense(100)
# The number of input dimensions is often unnecessary, as it can be inferred
# the first time the layer is used, but it can be provided if you want to
# specify it manually, which is useful in some complex models.
layer = tf.keras.layers.Dense(10, input_shape=(None, 5))

[文档](https://tensorflow.google.cn/api_docs/python/tf/keras/layers)中提供了现有层的完整列表，其中包含 Dense（全连接层）、Conv2D、LSTM、BatchNormalization、Dropout 等各种层。

In [5]:
# To use a layer, simply call it.
layer(tf.zeros([10, 5]))

2025-05-01 12:43:43.088287: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-01 12:43:43.088425: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-01 12:43:43.088496: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysf

<tf.Tensor: shape=(10, 10), dtype=float32, numpy=
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>

In [6]:
# Layers have many useful methods. For example, you can inspect all variables
# in a layer using `layer.variables` and trainable variables using
# `layer.trainable_variables`. In this case a fully-connected layer
# will have variables for weights and biases.
layer.variables

[<tf.Variable 'dense_1/kernel:0' shape=(5, 10) dtype=float32, numpy=
 array([[ 0.55698603,  0.58896476, -0.6049905 ,  0.00156385, -0.239591  ,
          0.4902969 ,  0.61334175, -0.33264387,  0.29372823,  0.3794902 ],
        [ 0.4828195 , -0.418388  ,  0.4927191 ,  0.13703132,  0.43512815,
          0.33730084, -0.43961972, -0.0734669 ,  0.24264324, -0.0063588 ],
        [ 0.0689221 ,  0.3667304 ,  0.16115516, -0.07370722, -0.07517862,
          0.4931944 ,  0.06025457, -0.06185615,  0.37100464,  0.31360435],
        [-0.00900364, -0.48481154,  0.2088232 ,  0.12310654,  0.34100783,
          0.600704  , -0.23766118, -0.08601677,  0.3517822 , -0.05646873],
        [-0.1324678 ,  0.25147134,  0.20383465,  0.38305265,  0.37521905,
          0.53801125, -0.01732343, -0.36453006,  0.61426026,  0.49124628]],
       dtype=float32)>,
 <tf.Variable 'dense_1/bias:0' shape=(10,) dtype=float32, numpy=array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)>]

In [7]:
# The variables are also accessible through nice accessors
layer.kernel, layer.bias

(<tf.Variable 'dense_1/kernel:0' shape=(5, 10) dtype=float32, numpy=
 array([[ 0.55698603,  0.58896476, -0.6049905 ,  0.00156385, -0.239591  ,
          0.4902969 ,  0.61334175, -0.33264387,  0.29372823,  0.3794902 ],
        [ 0.4828195 , -0.418388  ,  0.4927191 ,  0.13703132,  0.43512815,
          0.33730084, -0.43961972, -0.0734669 ,  0.24264324, -0.0063588 ],
        [ 0.0689221 ,  0.3667304 ,  0.16115516, -0.07370722, -0.07517862,
          0.4931944 ,  0.06025457, -0.06185615,  0.37100464,  0.31360435],
        [-0.00900364, -0.48481154,  0.2088232 ,  0.12310654,  0.34100783,
          0.600704  , -0.23766118, -0.08601677,  0.3517822 , -0.05646873],
        [-0.1324678 ,  0.25147134,  0.20383465,  0.38305265,  0.37521905,
          0.53801125, -0.01732343, -0.36453006,  0.61426026,  0.49124628]],
       dtype=float32)>,
 <tf.Variable 'dense_1/bias:0' shape=(10,) dtype=float32, numpy=array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)>)

## 实现自定义层

自行实现层的最佳方式是扩展 tf.keras.Layer 类并实现：

1. `__init__`：您可以在其中执行所有与输入无关的初始化
2. `build`：您可以在其中获得输入张量的形状，并可以进行其余初始化
3. `call`：您可以在其中进行前向计算

请注意，您不必等到调用 `build` 来创建变量，您还可以在 `__init__` 中创建变量。但是，在 `build` 中创建变量的优点是，它可以根据层将要运算的输入的形状启用变量创建。另一方面，在 `__init__` 中创建变量意味着需要明确指定创建变量所需的形状。

In [8]:
class MyDenseLayer(tf.keras.layers.Layer):
  def __init__(self, num_outputs):
    super(MyDenseLayer, self).__init__()
    self.num_outputs = num_outputs

  def build(self, input_shape):
    self.kernel = self.add_weight("kernel",
                                  shape=[int(input_shape[-1]),
                                         self.num_outputs])

  def call(self, inputs):
    return tf.matmul(inputs, self.kernel)

layer = MyDenseLayer(10)

In [9]:
_ = layer(tf.zeros([10, 5])) # Calling the layer `.builds` it.

In [10]:
print([var.name for var in layer.trainable_variables])

['my_dense_layer/kernel:0']


总体而言，在可能的情况下，如果代码使用标准层，它将更易于阅读和维护，因为其他读者熟悉标准层的行为。如果要使用 `tf.keras.layers` 内不包含的层，建议您提交 [Github 议题](http://github.com/tensorflow/tensorflow/issues/new)，或者最好可以向我们发送拉取请求！

## 模型：组合层

机器学习模型中有许多有趣的层状物都是通过组合现有层来实现的。例如，ResNet 中的每个残差块都是卷积、批次归一化和捷径的组合。层可以嵌套在其他层中。

通常，当您需要以下模型方法时，您将从 `keras.Model` 继承：`Model.fit`,`Model.evaluate`, and `Model.save` (see [Custom Keras layers and models](https://tensorflow.google.cn/guide/keras/custom_layers_and_models) for details).

除了跟踪变量外，`keras.Model`（非 `keras.layers.Layer` ）提供的另一个功能是，`keras.Model` 还可跟踪其内部层，使它们更易于检查。

例如，以下是一个 ResNet 块：

In [11]:
class ResnetIdentityBlock(tf.keras.Model):
  def __init__(self, kernel_size, filters):
    super(ResnetIdentityBlock, self).__init__(name='')
    filters1, filters2, filters3 = filters

    self.conv2a = tf.keras.layers.Conv2D(filters1, (1, 1))
    self.bn2a = tf.keras.layers.BatchNormalization()

    self.conv2b = tf.keras.layers.Conv2D(filters2, kernel_size, padding='same')
    self.bn2b = tf.keras.layers.BatchNormalization()

    self.conv2c = tf.keras.layers.Conv2D(filters3, (1, 1))
    self.bn2c = tf.keras.layers.BatchNormalization()

  def call(self, input_tensor, training=False):
    x = self.conv2a(input_tensor)
    x = self.bn2a(x, training=training)
    x = tf.nn.relu(x)

    x = self.conv2b(x)
    x = self.bn2b(x, training=training)
    x = tf.nn.relu(x)

    x = self.conv2c(x)
    x = self.bn2c(x, training=training)

    x += input_tensor
    return tf.nn.relu(x)


block = ResnetIdentityBlock(1, [1, 2, 3])

In [12]:
_ = block(tf.zeros([1, 2, 3, 3])) 

2025-05-01 12:43:43.325645: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:432] Loaded cuDNN version 8600


In [13]:
block.layers

[<keras.src.layers.convolutional.conv2d.Conv2D at 0x7fa4a2dbfee0>,
 <keras.src.layers.normalization.batch_normalization.BatchNormalization at 0x7fa5983bb8b0>,
 <keras.src.layers.convolutional.conv2d.Conv2D at 0x7fa4a028be50>,
 <keras.src.layers.normalization.batch_normalization.BatchNormalization at 0x7fa4a028bee0>,
 <keras.src.layers.convolutional.conv2d.Conv2D at 0x7fa4a02598b0>,
 <keras.src.layers.normalization.batch_normalization.BatchNormalization at 0x7fa4a0259940>]

In [14]:
len(block.variables)

18

In [15]:
block.summary()

Model: ""
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             multiple                  4         
                                                                 
 batch_normalization (Batch  multiple                  4         
 Normalization)                                                  
                                                                 
 conv2d_1 (Conv2D)           multiple                  4         
                                                                 
 batch_normalization_1 (Bat  multiple                  8         
 chNormalization)                                                
                                                                 
 conv2d_2 (Conv2D)           multiple                  9         
                                                                 
 batch_normalization_2 (Bat  multiple                  12        


但是，在很多时候，由多个层组合而成的模型只需要逐一地调用各层。为此，使用 `tf.keras.Sequential` 只需少量代码即可完成：

In [16]:
my_seq = tf.keras.Sequential([tf.keras.layers.Conv2D(1, (1, 1),
                                                    input_shape=(
                                                        None, None, 3)),
                             tf.keras.layers.BatchNormalization(),
                             tf.keras.layers.Conv2D(2, 1,
                                                    padding='same'),
                             tf.keras.layers.BatchNormalization(),
                             tf.keras.layers.Conv2D(3, (1, 1)),
                             tf.keras.layers.BatchNormalization()])
my_seq(tf.zeros([1, 2, 3, 3]))

<tf.Tensor: shape=(1, 2, 3, 3), dtype=float32, numpy=
array([[[[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]]]], dtype=float32)>

In [17]:
my_seq.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_3 (Conv2D)           (None, None, None, 1)     4         
                                                                 
 batch_normalization_3 (Bat  (None, None, None, 1)     4         
 chNormalization)                                                
                                                                 
 conv2d_4 (Conv2D)           (None, None, None, 2)     4         
                                                                 
 batch_normalization_4 (Bat  (None, None, None, 2)     8         
 chNormalization)                                                
                                                                 
 conv2d_5 (Conv2D)           (None, None, None, 3)     9         
                                                                 
 batch_normalization_5 (Bat  (None, None, None, 3)     1

# 后续步骤

现在，您可以回到上一个笔记本，调整线性回归样本以使用结构更好的层和模型。