# Making new Layers and Models via subclassing
# 通过子类化来构建新层和新模型

## Setup
## 环境设置

In [2]:
import tensorflow as tf
from tensorflow import keras

## The Layer class: the combination of state (weights) and some computation
## 层类：静态权重和运算的组合

One of the central abstraction in Keras is the Layer class. A layer encapsulates both a state (the layer's "weights") and a transformation from inputs to outputs (a "call", the layer's forward pass).

Keras中的一个核心概念是层类。一个层包括了一个静态部分（即层的权重）和一个从输入到输出的转化过程

Here's a densely-connected layer. It has a state: the variables w and b.

这里有一个稠密连接层，他的静态部分是变量w和b

In [3]:
class Linear(keras.layers.Layer):
    def __init__(self, units=32, input_dim=32):
        super(Linear, self).__init__()
        w_init = tf.random_normal_initializer()
        self.w = tf.Variable(
            initial_value=w_init(shape=(input_dim, units), dtype="float32"),
            trainable=True,
        )
        b_init = tf.zeros_initializer()
        self.b = tf.Variable(
            initial_value=b_init(shape=(units,), dtype="float32"), trainable=True
        )

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

# Note 学习笔记
the above class has two methods:
1. __init__
in this method:

1st, get inheritance of parent class（keras.layers.Layer） by super()

2nd, generate initial values of w randomly by tf.random_normal_initializer()

3rd, transform w's type to tensor by tf.Variable(), also, setting w is trainable

4th, generate initial values of b with 0 by tf.zeros_initializer()

5th, transoform b's type to tensor by tf.Variable(), As w, it is trainable

2. call
in this method:

1st, matrix operation: inputs*w by tf.matmul()

2nd, the operation reuslt plus b, get the final result

3rd, return the final result

**To know more about the above functions, see:**
[super()](https://rhettinger.wordpress.com/2011/05/26/super-considered-super/)

[tf.random_normal_initializer()](https://www.tensorflow.org/api_docs/python/tf/random_normal_initializer)

[tf.Variable()](https://www.tensorflow.org/api_docs/python/tf/Variable)

[tf.zeros_initializer()](https://www.tensorflow.org/api_docs/python/tf/zeros_initializer)

[tf.matmul()](https://www.tensorflow.org/api_docs/python/tf/linalg/matmul)

以上的Linear类有两个方法：
1. __init__

在这个方法中：

首先，通过super函数继承了父类（keras.layers.Layer）

然后，通过tf.random_normal_initializer()随机生成了w的值（这时候w的类型还不是张量tensor）

接着，通过tf.Variable()将w转为张量类型，同时设定w是可训练的

之后对b进行类型的操作：通过 tf.zeros_initializer()把b的初始值设为0

最后，通过通过tf.Variable()将b转为张量类型，也设定为可训练的

2. call

在这个方法中：

先通过tf.matmul()计算输入和w的点乘结果

再将点乘结果加上b，得到最终结果

最后将计算得出的最终结果返回


**想要更多的了解上面出现的函数，请参考**

[super()](https://rhettinger.wordpress.com/2011/05/26/super-considered-super/)

[tf.random_normal_initializer()](https://www.tensorflow.org/api_docs/python/tf/random_normal_initializer)

[tf.Variable()](https://www.tensorflow.org/api_docs/python/tf/Variable)

[tf.zeros_initializer()](https://www.tensorflow.org/api_docs/python/tf/zeros_initializer)

[tf.matmul()](https://www.tensorflow.org/api_docs/python/tf/linalg/matmul)

You would use a layer by calling it on some tensor input(s), much like a Python function

你可以通过张量输入来调用这个层，这一过程和调用python的函数十分相似

In [4]:
x = tf.ones((2, 2))
linear_layer = Linear(4, 2)
y = linear_layer(x)
print(y)

tf.Tensor(
[[-0.07335697 -0.00262715  0.00929411  0.03786761]
 [-0.07335697 -0.00262715  0.00929411  0.03786761]], shape=(2, 4), dtype=float32)


Note that the weights w and b are automatically tracked by the layer upon being set as layer attributes:

注意，权重w和b会自动被层的属性所追踪记录（即你可以通过层的属性来查看权重w和b）

In [5]:
assert linear_layer.weights == [linear_layer.w, linear_layer.b]

Note you also have access to a quicker shortcut for adding weight to a layer: the add_weight() method:

注意，你也可以通过add_weight()方法来快捷地赋权

In [6]:
class Linear(keras.layers.Layer):
    def __init__(self, units=32, input_dim=32):
        super(Linear, self).__init__()
        self.w = self.add_weight(
            shape=(input_dim, units), initializer="random_normal", trainable=True
        )
        self.b = self.add_weight(shape=(units,), initializer="zeros", trainable=True)

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b


x = tf.ones((2, 2))
linear_layer = Linear(4, 2)
y = linear_layer(x)
print(y)

tf.Tensor(
[[0.03659176 0.05299353 0.00349442 0.02452748]
 [0.03659176 0.05299353 0.00349442 0.02452748]], shape=(2, 4), dtype=float32)


## Layers can have non-trainable weights
## 层可以拥有不可训练的权重

Besides trainable weights, you can add non-trainable weights to a layer as well. Such weights are meant not to be taken into account during backpropagation, when you are training the layer.

除了可训练的权重外，你也可以为层加入不可训练的权重。 当你训练模型时，这些权重不会被反向传播过程所考虑

In [7]:
# to add and use a non-trainable weight
# 加入并使用不可训练的权重
class ComputeSum(keras.layers.Layer):
    def __init__(self, input_dim):
        super(ComputeSum, self).__init__()
        self.total = tf.Variable(initial_value=tf.zeros((input_dim,)), trainable=False)

    def call(self, inputs):
        self.total.assign_add(tf.reduce_sum(inputs, axis=0))
        return self.total


x = tf.ones((2, 2))
my_sum = ComputeSum(2)
y = my_sum(x)
print(y.numpy())
y = my_sum(x)
print(y.numpy())

[2. 2.]
[4. 4.]


It's part of layer.weights, but it gets categorized as a non-trainable weight:

（不可训练的权重）也是layer.weights属性的一部分，不过（当你加入不可训练的权重后），返回的结果是分类的。这部分内容在“不可训练权重”中显示

In [8]:
print("weights:", len(my_sum.weights))
print("non-trainable weights:", len(my_sum.non_trainable_weights))

# It's not included in the trainable weights:
# 没有可训练的权重，但是结果依旧是分类的
print("trainable_weights:", my_sum.trainable_weights)

weights: 1
non-trainable weights: 1
trainable_weights: []


# Note 学习笔记
In the Sequential model part, we learn how to freeze layer or model. In fact, the so-called 'freeze' is the set weights as non-trainable

在顺序模型中，我们学习了如何冻结层或者模型。实际上，所谓的“冻结”就是将权重设置为“不可训练”。

## Best practice: deferring weight creation until the shape of the inputs is known
## 最佳实践：推迟生成权重，直到你知道了输出的维度信息

Our Linear layer above took an input_dimargument that was used to compute the shape of the weights w and b in \__init__():

我们的Linear层在 \__init__()中通过一个input_dimargument来计算权重W和b的维度

In [9]:
class Linear(keras.layers.Layer):
    def __init__(self, units=32, input_dim=32):
        super(Linear, self).__init__()
        self.w = self.add_weight(
            shape=(input_dim, units), initializer="random_normal", trainable=True
        )
        self.b = self.add_weight(shape=(units,), initializer="zeros", trainable=True)

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

In many cases, you may not know in advance the size of your inputs, and you would like to lazily create weights when that value becomes known, some time after instantiating the layer.

In the Keras API, we recommend creating layer weights in the build(self, inputs_shape) method of your layer. Like this:

在很多情况下，你可能并不能提前知道你输入的信息，或者你想要偷懒，不想在层实例化后但不知道输入信息时，就生成权重

在 Keras API中， 我们建议通过build(self, inputs_shape) 方法来为你的层创建权重，就像这样：

In [10]:
class Linear(keras.layers.Layer):
    def __init__(self, units=32):
        super(Linear, self).__init__()
        self.units = units

    def build(self, input_shape):
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer="random_normal",
            trainable=True,
        )
        self.b = self.add_weight(
            shape=(self.units,), initializer="random_normal", trainable=True
        )

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

The \__call__() method of your layer will automatically run build the first time it is called. You now have a layer that's lazy and thus easier to use:

\__call__()方法会在第一次被调用时自动运行build().这样你就拥有了一个“偷懒”但好用的层。

In [11]:
# At instantiation, we don't know on what inputs this is going to get called
#在实例化时，我们不知道层的输入是怎么样的
linear_layer = Linear(32)

# The layer's weights are created dynamically the first time the layer is called
# 在第一次调用时，层的权重会被自动创建
y = linear_layer(x)

## Layers are recursively composable
## 层是可以递归合成的

If you assign a Layer instance as an attribute of another Layer, the outer layer will start tracking the weights of the inner layer.

We recommend creating such sublayers in the \__init__() method (since the sublayers will typically have a build method, they will be built when the outer layer gets built).

如果你把一个层的实例作为另一个层的属性，那么这个外部层（父层）会开始追踪内部层（子层）的权重

我们建议在\__init__() method中创建子层 （因为子层通常有一个构建方法，他们会在父层被构建时一同被构建出来）

In [31]:
# Let's assume we are reusing the Linear class
# with a `build` method that we defined above.
# 我们假设我们通过一个build方法来复用之前的Linear类

class MLPBlock(keras.layers.Layer):
    def __init__(self):
        super(MLPBlock, self).__init__()
        self.linear_1 = Linear(32)
        self.linear_2 = Linear(32)
        self.linear_3 = Linear(1)

    def call(self, inputs):
        x = self.linear_1(inputs)
        x = tf.nn.relu(x)
        x = self.linear_2(x)
        x = tf.nn.relu(x)
        return self.linear_3(x)


mlp = MLPBlock()
y = mlp(tf.ones(shape=(3, 64)))  # The first call to the `mlp` will create the weights 第一次调用mlp将会创建权重
print("weights:", len(mlp.weights))
print("trainable weights:", len(mlp.trainable_weights))

weights: 6
trainable weights: 6


## The add_loss() method
## add_loss()方法

When writing the call() method of a layer, you can create loss tensors that you will want to use later, when writing your training loop. This is doable by calling self.add_loss(value):

当编写层的call()方法时，你可以创建一个损失张量。当你需要写训练循环时，这个张量能用得上。 通过调用self.add_loss(value)这一做法是可行的

In [13]:
# A layer that creates an activity regularization loss
# 一个创建了激活正则化损失的层
class ActivityRegularizationLayer(keras.layers.Layer):
    def __init__(self, rate=1e-2):
        super(ActivityRegularizationLayer, self).__init__()
        self.rate = rate

    def call(self, inputs):
        self.add_loss(self.rate * tf.reduce_sum(inputs))
        return inputs

These losses (including those created by any inner layer) can be retrieved via layer.losses. This property is reset at the start of every \__call__() to the top-level layer, so that layer.losses always contains the loss values created during the last forward pass.

这些损失（包括被任何子层创建的损失）可以被layer.losses读取。这个属性可以在每一次调用\__call__()给上级层时被重置。所以 layer.losses总是返回根据最后正向传播的数据计算得出的损失

In [14]:
class OuterLayer(keras.layers.Layer):
    def __init__(self):
        super(OuterLayer, self).__init__()
        self.activity_reg = ActivityRegularizationLayer(1e-2)

    def call(self, inputs):
        return self.activity_reg(inputs)


layer = OuterLayer()
assert len(layer.losses) == 0  # No losses yet since the layer has never been called 没有损失，因为层没有被调用过

_ = layer(tf.zeros(1, 1))
assert len(layer.losses) == 1  # We created one loss value 我们创建了一个损失值

# `layer.losses` gets reset at the start of each __call__ 每次调用__call__， `layer.losses`会被重置
_ = layer(tf.zeros(1, 1))
assert len(layer.losses) == 1  # This is the loss created during the call above 这是每次被调用时得到的损失

In addition, the loss property also contains regularization losses created for the weights of any inner layer

此外，这个损失属性中还包含了为子层权重所创建的正则化损失

In [15]:
class OuterLayerWithKernelRegularizer(keras.layers.Layer):
    def __init__(self):
        super(OuterLayerWithKernelRegularizer, self).__init__()
        self.dense = keras.layers.Dense(
            32, kernel_regularizer=tf.keras.regularizers.l2(1e-3)
        )

    def call(self, inputs):
        return self.dense(inputs)


layer = OuterLayerWithKernelRegularizer()
_ = layer(tf.zeros((1, 1)))

# This is `1e-3 * sum(layer.dense.kernel ** 2)`,
# created by the `kernel_regularizer` above.
print(layer.losses)

[<tf.Tensor: shape=(), dtype=float32, numpy=0.0018029165>]


These losses are meant to be taken into account when writing training loops, like this
在编写训练循环式，考虑这些损失是有意义的。就像这样：

In [None]:
# Instantiate an optimizer.
# 实例化一个优化器

optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Iterate over the batches of a dataset.
# 对数据集的所有批量数据进行迭代

for x_batch_train, y_batch_train in train_dataset:
  with tf.GradientTape() as tape:
    logits = layer(x_batch_train)  # Logits for this minibatch 对这个小批量数据进行取对数 
    # Loss value for this minibatch
    # 这个小批量数据的损失值
    loss_value = loss_fn(y_batch_train, logits)
    # Add extra losses created during this forward pass: 加入这次正向传播过程中的额外损失
    loss_value += sum(model.losses)

  grads = tape.gradient(loss_value, model.trainable_weights)
  optimizer.apply_gradients(zip(grads, model.trainable_weights))

For a detailed guide about writing training loops, see the [guide to writing a training loop from scratch.](https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch/)

对于编写训练循环的细节，请参考这篇[指南](https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch/)

These losses also work seamlessly with fit() (they get automatically summed and added to the main loss, if any):

这些损失同样可以和fit()无缝对接（如果存在的话，这些损失会被自动加总并加入到主要损失之中）

In [17]:
import numpy as np

inputs = keras.Input(shape=(3,))
outputs = ActivityRegularizationLayer()(inputs)
model = keras.Model(inputs, outputs)

# If there is a loss passed in `compile`, the regularization
# losses get added to it
# 如果有一个损失通过compile传递进来，这个正则化损失会加上传递进来的损失
model.compile(optimizer="adam", loss="mse")
model.fit(np.random.random((2, 3)), np.random.random((2, 3)))

# It's also possible not to pass any loss in `compile`,
# since the model already has a loss to minimize, via the `add_loss`
# call during the forward pass!
# compile不传递任何损失也是可以的。因为模型已经在正向传播时通过`add_loss`得到了一个需要最小化的损失函数了
model.compile(optimizer="adam")
model.fit(np.random.random((2, 3)), np.random.random((2, 3)))



<tensorflow.python.keras.callbacks.History at 0x21a405a2be0>

## The add_metric() method
## add_metric() 方法

Similarly to add_loss(), layers also have an add_metric() method for tracking the moving average of a quantity during training.

Consider the following layer: a "logistic endpoint" layer. It takes as inputs predictions & targets, it computes a loss which it tracks via add_loss(), and it computes an accuracy scalar, which it tracks via add_metric().

和add_loss()类似，层也可以通过 add_metric()方法在训练过程中来追踪一个数值的均值（即指标）的变化情况

思考这么一个层： 有一个"logistic endpoint"层，它将预测值和目标值作为输入，并通过add_loss()来计算损失，并通过add_metric()来计算准确率

In [32]:
class LogisticEndpoint(keras.layers.Layer):
    def __init__(self, name=None):
        super(LogisticEndpoint, self).__init__(name=name)
        self.loss_fn = keras.losses.BinaryCrossentropy(from_logits=True)
        self.accuracy_fn = keras.metrics.BinaryAccuracy()

    def call(self, targets, logits, sample_weights=None):
        # Compute the training-time loss value and add it
        # to the layer using `self.add_loss()`.
        # 计算训练训练损失并通过self.add_loss()将损失传递给层
        loss = self.loss_fn(targets, logits, sample_weights)
        self.add_loss(loss)

        # Log accuracy as a metric and add it
        # to the layer using `self.add_metric()`.
        # 将准确率作为指标，并通过`self.add_metric()`传递给层
        acc = self.accuracy_fn(targets, logits, sample_weights)
        self.add_metric(acc, name="accuracy")

        # Return the inference-time prediction tensor (for `.predict()`).
        # 返回预测的张量数据（相当于predict()）
        return tf.nn.softmax(logits)

Metrics tracked in this way are accessible via layer.metrics

这样追踪的指标可以被layer.metrics所访问

In [20]:
layer = LogisticEndpoint()

targets = tf.ones((2, 2))
logits = tf.ones((2, 2))
y = layer(targets, logits)

print("layer.metrics:", layer.metrics)
print("current accuracy value:", float(layer.metrics[0].result()))

layer.metrics: [<tensorflow.python.keras.metrics.BinaryAccuracy object at 0x0000021A35C6C160>]
current accuracy value: 1.0


Just like for add_loss(), these metrics are tracked by fit()

和add_loss()一样，这些指标也能被fit()追踪

In [21]:
inputs = keras.Input(shape=(3,), name="inputs")
targets = keras.Input(shape=(10,), name="targets")
logits = keras.layers.Dense(10)(inputs)
predictions = LogisticEndpoint(name="predictions")(logits, targets)

model = keras.Model(inputs=[inputs, targets], outputs=predictions)
model.compile(optimizer="adam")

data = {
    "inputs": np.random.random((3, 3)),
    "targets": np.random.random((3, 10)),
}
model.fit(data)



<tensorflow.python.keras.callbacks.History at 0x21a405ba0a0>

# Note 学习笔记
For model class, we can add optimizer, metric and loss by compile() or add_metric()/add_loss()

But for layer class, they cannot be supported by compile()(they are not model), so add_metric()/add_loss() is necessary. What's more, for a real subclass model, the above code just is a piece of the whole code. I mean, it just calculate and return the metric or loss you need based on input, not train model and return these value during training.

对于模型类而言，我们可以通过compile方法添加优化器，指标和损失函数，也可以通过add_metric()/add_loss()来添加

但是对于层类而言，compile是不支持的（层类不是模型），所以add_metric()/add_loss()成为了必须. 并且，对于子类模型而言，上述的示例代码只是整体的一小部分。我的意思是，这些代码只是根据输入（输入包含真实值和预测值）去计算指标和损失，并不是在训练模型并在训练过程中返回训练的损失和指标

## You can optionally enable serialization on your layers
## 你可以选择性地在层上启动序列化
If you need your custom layers to be serializable as part of a Functional model, you can optionally implement a get_config() method:

如果你需要将一个自定义的层序列化并作为一个函数式模型的一部分，你可以选择性地实现一个get_config()方法

In [22]:
class Linear(keras.layers.Layer):
    def __init__(self, units=32):
        super(Linear, self).__init__()
        self.units = units

    def build(self, input_shape):
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer="random_normal",
            trainable=True,
        )
        self.b = self.add_weight(
            shape=(self.units,), initializer="random_normal", trainable=True
        )

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

    def get_config(self):
        return {"units": self.units}


# Now you can recreate the layer from its config:
# 现在你可以从config中重建这个层
layer = Linear(64)
config = layer.get_config()
print(config)
new_layer = Linear.from_config(config)

{'units': 64}


Note that the \__init__() method of the base Layer class takes some keyword arguments, in particular a name and a dtype. It's good practice to pass these arguments to the parent class in \__init__() and to include them in the layer config:

注意，底层的\__init__() 方法中含有多个关键性参数，比如名称和数据类型。 将这些参数信息在\__init__()中传递给父层并将这些信息收录进该层的config是一个好习惯

In [23]:
class Linear(keras.layers.Layer):
    def __init__(self, units=32, **kwargs):
        super(Linear, self).__init__(**kwargs)
        self.units = units

    def build(self, input_shape):
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer="random_normal",
            trainable=True,
        )
        self.b = self.add_weight(
            shape=(self.units,), initializer="random_normal", trainable=True
        )

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

    def get_config(self):
        config = super(Linear, self).get_config() # 这里调用了父层的方法get_config()
        config.update({"units": self.units})
        return config


layer = Linear(64)
config = layer.get_config()
print(config)
new_layer = Linear.from_config(config)

{'name': 'linear_8', 'trainable': True, 'dtype': 'float32', 'units': 64}


If you need more flexibility when deserializing the layer from its config, you can also override the from_config() class method. This is the base implementation of from_config():

当你需要更多的灵活性在从设置信息中恢复层时，你也可以覆盖from_config() 类。 以下是对from_config()覆盖的基本实现

serializing，序列化，即把模型/层转化为数字信息
deserializing，反向序列化，即从数字信息中恢复模型/层

In [24]:
def from_config(cls, config):
  return cls(**config)

To learn more about serialization and saving, see the [complete guide to saving and serializing models.](https://www.tensorflow.org/guide/keras/save_and_serialize/)

要了解更多关于序列化和保存的内容，请参考[这里](https://www.tensorflow.org/guide/keras/save_and_serialize/)

# Note 学习笔记
if we just want to recreate  the same model/layer, what we need is the units argument from get_config() we written

However, the get_config() from parent class can record more useful argument. So, I guess although the get_config()  from parent class can get  many argument inforamtion, but it cannot access to the key argument "units", that's why we need to write a get_config() to return units.

如果我们要重建相同的模型/层，我们需要使用到由我们自己编写的get_config()所传递的参数“units”

然后，父类的 get_config()可以记录更多的有用的参数信息。 所以我的猜测是，尽管父类 get_config()很强大，能得到很多信息，但是它无法访问获得最关键的“units”信息。这也是为什么我们需要手写一个 get_config()来返回units的原因

## Privileged training argument in the call() method
## call() 方法中的特例训练参数

Some layers, in particular the BatchNormalization layer and the Dropout layer, have different behaviors during training and inference. For such layers, it is standard practice to expose a training (boolean) argument in the call() method.

By exposing this argument in call(), you enable the built-in training and evaluation loops (e.g. fit()) to correctly use the layer in training and inference.

一些层，比如BatchNormalization层和Dropout层，在训练过程和预测过程中有着不同的行为方式。因此，对这类层而言，常规的解决方案是在call方法中加入一个布尔型的训练参数

通过这个参数，你可以在内置的训练和预测过程中（比如fit()）中正确地调用这些层

In [34]:
class CustomDropout(keras.layers.Layer):
    def __init__(self, rate, **kwargs):
        super(CustomDropout, self).__init__(**kwargs)
        self.rate = rate

    def call(self, inputs, training=None):
        if training:
            return tf.nn.dropout(inputs, rate=self.rate)
        return inputs

## Privileged mask argument in the call() method
## call()方法中的特例蒙版参数

The other privileged argument supported by call() is the mask argument.

You will find it in all Keras RNN layers. A mask is a boolean tensor (one boolean value per timestep in the input) used to skip certain input timesteps when processing timeseries data.

Keras will automatically pass the correct mask argument to \__call__() for layers that support it, when a mask is generated by a prior layer. Mask-generating layers are the Embedding layer configured with mask_zero=True, and the Masking layer.

To learn more about masking and how to write masking-enabled layers, please check out the guide ["understanding padding and masking".](https://www.tensorflow.org/guide/keras/masking_and_padding/)

另一个call方法中的特例参数是蒙版参数

你可以在所有的Keras循环神经网络模型中找到它。蒙版参数是一个布尔型张量（输入中的每一个时间点/时间戳都有一个布尔值），其用于在处理时间序列数据时略过特定的输入时间戳

当前一层生成了一个蒙版参数时，对于支持的蒙版参数的层，Keras会自动的传递正确的参数给层的\__call__()。蒙版生成层时一个设置为mask_zero=True的嵌入层和蒙版层

要了解更多关于蒙版和如何编写蒙版生成层的信息，请参阅[这里](https://www.tensorflow.org/guide/keras/masking_and_padding/)

## The Model class
## 模型类
In general, you will use the Layer class to define inner computation blocks, and will use the Model class to define the outer model -- the object you will train.

For instance, in a ResNet50 model, you would have several ResNet blocks subclassing Layer, and a single Model encompassing the entire ResNet50 network.

通常来说，你会使用层类来构建内在的计算模块，然后用模型类来定义外部模型，即你所训练的对象

比如，在一个ResNet50模型中，你可以有多个深度残差网络模块的子类层，和一个包裹了整个ResNet50网络的模型


The Model class has the same API as Layer, with the following differences:

1. It exposes built-in training, evaluation, and prediction loops (model.fit(), model.evaluate(), model.predict()).
2. It exposes the list of its inner layers, via the model.layers property.
3. It exposes saving and serialization APIs (save(), save_weights()...)

类似层类，模型类也有相同的API，但两者有如下不同：

1. 模型类有显性的训练，评估和预测循环（即通过model.fit(), model.evaluate(), model.predict()可以知道是在训练模型，评估模型，还是用模型在进行预测）
2. 通过调用model.layers属性，模型类会显示它内部的层列表
3. 模型类有显性的保存和序列化API


Effectively, the Layer class corresponds to what we refer to in the literature as a "layer" (as in "convolution layer" or "recurrent layer") or as a "block" (as in "ResNet block" or "Inception block").

Meanwhile, the Model class corresponds to what is referred to in the literature as a "model" (as in "deep learning model") or as a "network" (as in "deep neural network").

So if you're wondering, "should I use the Layer class or the Model class?", ask yourself: will I need to call fit() on it? Will I need to call save() on it? If so, go with Model. If not (either because your class is just a block in a bigger system, or because you are writing training & saving code yourself), use Layer.

实际上，层类对应了文献中所谓的“层”（如卷积层，递归层）或者“模块”（如深度残差网络模块，启动模块）

而模型类，则对应了文献中所谓的“模型”（如深度学习模型）或者“网络”（深度神经网络）

所以，当你想知道“我是需要使用层类还是模型类”时，试着问自己如下问题：“我是否需要对其调用fit方法？”“我是否需要对其调用save方法？”。如果你的回答是“是”，那么使用模型类。如果不是（也许你的类只是一个大型系统的一部分，或者你自己编写了训练和保存代码），请使用层类

For instance, we could take our mini-resnet example above, and use it to build a Model that we could train with fit(), and that we could save with save_weights():
比如，我们选择之前的迷你深度残擦网络示例，并用它来来搭建一个可以用fit来进行训练用save_weights来进行保存操作的模型

In [None]:
class ResNet(tf.keras.Model):

    def __init__(self, num_classes=1000):
        super(ResNet, self).__init__()
        self.block_1 = ResNetBlock()
        self.block_2 = ResNetBlock()
        self.global_pool = layers.GlobalAveragePooling2D()
        self.classifier = Dense(num_classes)

    def call(self, inputs):
        x = self.block_1(inputs)
        x = self.block_2(x)
        x = self.global_pool(x)
        return self.classifier(x)


resnet = ResNet()
dataset = ...
resnet.fit(dataset, epochs=10)
resnet.save(filepath)

## Putting it all together: an end-to-end example
## 汇总：一个端到端的实例

Here's what you've learned so far:

1. A Layer encapsulate a state (created in \__init__() or build()) and some computation (defined in call()).
2. Layers can be recursively nested to create new, bigger computation blocks.
3. Layers can create and track losses (typically regularization losses) as well as metrics, via add_loss() and add_metric()
4. The outer container, the thing you want to train, is a Model. A Model is just like a Layer, but with added training and serialization utilities.

Let's put all of these things together into an end-to-end example: we're going to implement a Variational AutoEncoder (VAE). We'll train it on MNIST digits.

Our VAE will be a subclass of Model, built as a nested composition of layers that subclass Layer. It will feature a regularization loss (KL divergence).

目前为止，我们已经学到了：
1. 层包含了一个静态部分（由\__init__()或者build()创建）和一些运算部分（由call()创建）
2. 层可以通过递归的方式来组成新的更大的计算模块
3. 层可以通过add_loss()和add_metric()创建和追踪损失（一般是正则化损失）和指标
4. 层的外部容易即是模型，也就是我们需要训练的对象。模型和层类似，但是模型对象有训练和序列化工具

我们把这些内容汇总形成一个端到端实例：我们实现一个变分自动编码器（VAE），并用MNIST数据集对其进行训练

我们的VAE将会是一个模型的子类，由许多子类层交织构成。其特征是计算正则化损失（KL divergence).

In [27]:
from tensorflow.keras import layers


class Sampling(layers.Layer):
    """Uses (z_mean, z_log_var) to sample z, the vector encoding a digit."""

    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = tf.keras.backend.random_normal(shape=(batch, dim))
        return z_mean + tf.exp(0.5 * z_log_var) * epsilon


class Encoder(layers.Layer):
    """Maps MNIST digits to a triplet (z_mean, z_log_var, z)."""

    def __init__(self, latent_dim=32, intermediate_dim=64, name="encoder", **kwargs):
        super(Encoder, self).__init__(name=name, **kwargs)
        self.dense_proj = layers.Dense(intermediate_dim, activation="relu")
        self.dense_mean = layers.Dense(latent_dim)
        self.dense_log_var = layers.Dense(latent_dim)
        self.sampling = Sampling()

    def call(self, inputs):
        x = self.dense_proj(inputs)
        z_mean = self.dense_mean(x)
        z_log_var = self.dense_log_var(x)
        z = self.sampling((z_mean, z_log_var))
        return z_mean, z_log_var, z


class Decoder(layers.Layer):
    """Converts z, the encoded digit vector, back into a readable digit."""

    def __init__(self, original_dim, intermediate_dim=64, name="decoder", **kwargs):
        super(Decoder, self).__init__(name=name, **kwargs)
        self.dense_proj = layers.Dense(intermediate_dim, activation="relu")
        self.dense_output = layers.Dense(original_dim, activation="sigmoid")

    def call(self, inputs):
        x = self.dense_proj(inputs)
        return self.dense_output(x)


class VariationalAutoEncoder(keras.Model):
    """Combines the encoder and decoder into an end-to-end model for training."""

    def __init__(
        self,
        original_dim,
        intermediate_dim=64,
        latent_dim=32,
        name="autoencoder",
        **kwargs
    ):
        super(VariationalAutoEncoder, self).__init__(name=name, **kwargs)
        self.original_dim = original_dim
        self.encoder = Encoder(latent_dim=latent_dim, intermediate_dim=intermediate_dim)
        self.decoder = Decoder(original_dim, intermediate_dim=intermediate_dim)

    def call(self, inputs):
        z_mean, z_log_var, z = self.encoder(inputs)
        reconstructed = self.decoder(z)
        # Add KL divergence regularization loss.
        kl_loss = -0.5 * tf.reduce_mean(
            z_log_var - tf.square(z_mean) - tf.exp(z_log_var) + 1
        )
        self.add_loss(kl_loss)
        return reconstructed

Let's write a simple training loop on MNIST

让我们基于MNIST编写一个简单的训练循环

In [28]:
original_dim = 784
vae = VariationalAutoEncoder(original_dim, 64, 32)

optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
mse_loss_fn = tf.keras.losses.MeanSquaredError()

loss_metric = tf.keras.metrics.Mean()

(x_train, _), _ = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype("float32") / 255

train_dataset = tf.data.Dataset.from_tensor_slices(x_train)
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

epochs = 2

# Iterate over epochs.
for epoch in range(epochs):
    print("Start of epoch %d" % (epoch,))

    # Iterate over the batches of the dataset.
    for step, x_batch_train in enumerate(train_dataset):
        with tf.GradientTape() as tape:
            reconstructed = vae(x_batch_train)
            # Compute reconstruction loss
            loss = mse_loss_fn(x_batch_train, reconstructed)
            loss += sum(vae.losses)  # Add KLD regularization loss

        grads = tape.gradient(loss, vae.trainable_weights)
        optimizer.apply_gradients(zip(grads, vae.trainable_weights))

        loss_metric(loss)

        if step % 100 == 0:
            print("step %d: mean loss = %.4f" % (step, loss_metric.result()))

Start of epoch 0
step 0: mean loss = 0.3265
step 100: mean loss = 0.1250
step 200: mean loss = 0.0989
step 300: mean loss = 0.0890
step 400: mean loss = 0.0841
step 500: mean loss = 0.0808
step 600: mean loss = 0.0786
step 700: mean loss = 0.0770
step 800: mean loss = 0.0759
step 900: mean loss = 0.0749
Start of epoch 1
step 0: mean loss = 0.0746
step 100: mean loss = 0.0739
step 200: mean loss = 0.0735
step 300: mean loss = 0.0730
step 400: mean loss = 0.0727
step 500: mean loss = 0.0723
step 600: mean loss = 0.0720
step 700: mean loss = 0.0717
step 800: mean loss = 0.0714
step 900: mean loss = 0.0712


Note that since the VAE is subclassing Model, it features built-in training loops. So you could also have trained it like this:
注意，由于这个VAE是一个模型子类，所以它具有内置的训练循环。故你可以像这样训练它

In [29]:
vae = VariationalAutoEncoder(784, 64, 32)

optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)

vae.compile(optimizer, loss=tf.keras.losses.MeanSquaredError())
vae.fit(x_train, x_train, epochs=2, batch_size=64)

Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x21a41a14c70>

# Note 学习笔记
As create model by API, the process of creating model by subclass is a Brick laying activities. 

First, we need bricks, so we should create layer by API or define layer class (in the above example, the brick is Sampling layer, Encoder layer, and Decoder layer)

Second, we need a wall made of bricks: the model. In subclass model, the building wall prcoess is in the \__init__()

Third, add optimizer, loss, and metric to our model. In subclass model, the prcoess happens in the call()

finally, train, evaluate, or predict. In API Model, we can use fit() to do it; but in the subclass model, we can write a loop to do this(I think call fit()directly is the better way)

正如通过API建模一样，子类建模的流程也是一个“搬砖”活动。

首先，我们需要砖块。所以我们需要用API或者定义层类来创建新的层（在上面的例子中，砖块是Sampling layer, Encoder layer, and Decoder layer）

之后，我们需要砖墙——模型了。在子类模型中，砌墙过程发生在\__init__()

接着，我们需要添加优化器，损失函数和指标到我们的模型中。在子类模型中，这一过程发生在call()

最后是训练，评估，预测。在API模型中，我们可以通过fit（）来实现。不过在子类模型中，我们可以自己写一个循环来进行训练（但是我觉得还是直接调用fit比较好，又快又方便。自己写循环好麻烦还可能出错：(）

## Beyond object-oriented development: the Functional API
## 超越面向对象的开发： 函数式API

Was this example too much object-oriented development for you? You can also build models using the Functional API. Importantly, choosing one style or another does not prevent you from leveraging components written in the other style: you can always mix-and-match.

对你而言，是否这个示例包含太多的面向对象开发的成分？ 你也可以使用函数式API来构建模型。 重要的事情是，选择其中一种方式并不会阻止你使用由另一种方式编写的模块。你可以混搭使用两种方式（来开发你的模型）

For instance, the Functional API example below reuses the same Sampling layer we defined in the example above:

例如，接下来的这个函数式API示例中复用了我们在前面的示例中编写的取样层

In [30]:
original_dim = 784
intermediate_dim = 64
latent_dim = 32

# Define encoder model.
original_inputs = tf.keras.Input(shape=(original_dim,), name="encoder_input")
x = layers.Dense(intermediate_dim, activation="relu")(original_inputs)
z_mean = layers.Dense(latent_dim, name="z_mean")(x)
z_log_var = layers.Dense(latent_dim, name="z_log_var")(x)
z = Sampling()((z_mean, z_log_var))
encoder = tf.keras.Model(inputs=original_inputs, outputs=z, name="encoder")

# Define decoder model.
latent_inputs = tf.keras.Input(shape=(latent_dim,), name="z_sampling")
x = layers.Dense(intermediate_dim, activation="relu")(latent_inputs)
outputs = layers.Dense(original_dim, activation="sigmoid")(x)
decoder = tf.keras.Model(inputs=latent_inputs, outputs=outputs, name="decoder")

# Define VAE model.
outputs = decoder(z)
vae = tf.keras.Model(inputs=original_inputs, outputs=outputs, name="vae")

# Add KL divergence regularization loss.
kl_loss = -0.5 * tf.reduce_mean(z_log_var - tf.square(z_mean) - tf.exp(z_log_var) + 1)
vae.add_loss(kl_loss)

# Train.
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
vae.compile(optimizer, loss=tf.keras.losses.MeanSquaredError())
vae.fit(x_train, x_train, epochs=3, batch_size=64)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x21a41ec2d90>

For more information, make sure to read [the Functional API guide](https://www.tensorflow.org/guide/keras/functional/)

要了解更多信息，请参阅[这里](https://www.tensorflow.org/guide/keras/functional/)