##### Copyright 2019 The TensorFlow Authors.

In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# 使用 Keras 训练和评估
<!--
# Train and evaluate with Keras
-->


<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/guide/keras/train_and_evaluate"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/guide/keras/train_and_evaluate.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/docs/blob/master/site/en/guide/keras/train_and_evaluate.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/docs/site/en/guide/keras/train_and_evaluate.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>


本指南包含在 TensorFlow 2.0 中训练，评估，和预测（推断）模型的内容，主要讨论两种普遍的情形：

- 什么时候使用内置 API（如 `model.fit()`，`model.evaluate`，以及 `model.predict()`）来训练和验证。这个问题主要在**“使用内置的训练和评估循环”**一节中讨论。
- 什么时候使用 eager execution 和 `GradientTape` 对象来从头编写自定义循环。这个问题在**“从头编写你自己的训练和评估循环”**一节中讨论。

<!--
This guide covers training, evaluation, and prediction (inference)  models in TensorFlow 2.0 in two broad situations:

- When using built-in APIs for training & validation (such as `model.fit()`, `model.evaluate()`, `model.predict()`). This is covered in the section **"Using built-in training & evaluation loops"**.
- When writing custom loops from scratch using eager execution and the `GradientTape` object. This is covered in the section **"Writing your own training & evaluation loops from scratch"**.
-->

一般而言，无论是在使用内置循环还是编写你自己的循环，对于每一种 Keras 模型，模型训练和评估都严格遵循同样的运行方式——这包括 Sequential 模型，使用函数式 API 构建的模型，以及通过模型派生从头编写的模型。
<!--
In general, whether you are using built-in loops or writing your own, model training & evaluation works strictly in the same way across every kind of Keras model -- Sequential models, models built with the Functional API, and models written from scratch via model subclassing.
-->

本指南不介绍分布式训练的内容。
<!--
This guide doesn't cover distributed training.
-->

## Setup

In [1]:
from __future__ import absolute_import, division, print_function, unicode_literals

try:
    %tensorflow_version 2.x
except Exception:
    pass
import tensorflow as tf

import numpy as np

## 第一部分：使用内置的训练和评估循环

当你将数据传递给模型的内置训练循环时，你应当使用 **Numpy 数组**（如果你的数据量很小，能够存储在内存中）或者 **tf.data 的 Dataset** 对象。在接下来的段落里，我们将把 MNIST 数据集表示为 Numpy 数组，来展示如何使用[优化器（optimizers）](https://developers.google.cn/machine-learning/glossary/?hl=zh-CN#%E4%BC%98%E5%8C%96%E5%99%A8-optimizer)，[损失函数（losses）](https://developers.google.cn/machine-learning/glossary/?hl=zh-CN#%E6%8D%9F%E5%A4%B1-loss)，以及[指标（metrics）](https://developers.google.cn/machine-learning/glossary/?hl=zh-CN#%E6%8C%87%E6%A0%87-metric)。
<!--
When passing data to the built-in training loops of a model, you should either use **Numpy arrays** (if your data is small and fits in memory) or **tf.data Dataset** objects. In the next few paragraphs, we'll use the MNIST dataset as Numpy arrays, in order to demonstrate how to use optimizers, losses, and metrics.
-->

### API 概览：一个端到端的例子
<!--
### API overview: a first end-to-end example
-->

让我们来看一下如下模型（这里我们使用函数式 API 来创建它，但是我们也可以使用 Sequential 模型或派生模型）：
<!--
Let's consider the following model (here, we build in with the Functional API, but it could be a Sequential model or a subclassed model as well):
-->

In [2]:
from tensorflow import keras
from tensorflow.keras import layers

inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)

这就是典型的端到端工作流程的样子，它包含了训练，在[维持数据集](https://developers.google.cn/machine-learning/glossary/?hl=zh-CN#%E7%BB%B4%E6%8C%81%E6%95%B0%E6%8D%AE-holdout-data)（从原始训练数据中生成）上的验证，以及最终在测试数据集上的评估：
<!--
Here's what the typical end-to-end workflow looks like, consisting of training, validation on a holdout set generated from the original training data, and finally evaluation on the test data:
-->

为此示例加载玩具数据集
<!--
Load a toy dataset for the sake of this example
-->

In [3]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data (these are Numpy arrays)
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

y_train = y_train.astype('float32')
y_test = y_test.astype('float32')

# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

指定训练配置（优化器，损失函数，指标）
<!--
Specify the training configuration (optimizer, loss, metrics)
-->

In [4]:
model.compile(optimizer=keras.optimizers.RMSprop(),  # Optimizer
              # Loss function to minimize
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              # List of metrics to monitor
              metrics=['sparse_categorical_accuracy'])

训练模型，将数据分成大小为[“batch_size”](https://developers.google.cn/machine-learning/glossary/?hl=zh-CN#%E6%89%B9%E6%AC%A1%E5%A4%A7%E5%B0%8F-batch-size)的[“批次”（“batches”）](https://developers.google.cn/machine-learning/glossary/?hl=zh-CN#%E6%89%B9%E6%AC%A1-batch)，并对整个数据集反复迭代若干[“周期”（“epochs”）](https://developers.google.cn/machine-learning/glossary/?hl=zh-CN#%E5%91%A8%E6%9C%9F-epoch)。
<!--
Train the model by slicing the data into "batches"
of size "batch_size", and repeatedly iterating over
the entire dataset for a given number of "epochs"
-->

In [5]:
print('# Fit model on training data')
history = model.fit(x_train, y_train,
                    batch_size=64,
                    epochs=3,
                    # We pass some validation for
                    # monitoring validation loss and metrics
                    # at the end of each epoch
                    validation_data=(x_val, y_val))

print('\nhistory dict:', history.history)

# Fit model on training data
Train on 50000 samples, validate on 10000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

history dict: {'loss': [0.34615352845668795, 0.15926761425971986, 0.11734869359135627], 'sparse_categorical_accuracy': [0.9017, 0.95216, 0.9643], 'val_loss': [0.19667511559724807, 0.1273672592371702, 0.11695939368903636], 'val_sparse_categorical_accuracy': [0.9405, 0.9647, 0.9632]}


返回的“history”对象保存了训练期间损失函数和指标的记录
<!--
The returned "history" object holds a record
 of the loss values and metric values during training
-->

In [6]:
# Evaluate the model on the test data using `evaluate`
print('\n# Evaluate on test data')
results = model.evaluate(x_test, y_test, batch_size=128)
print('test loss, test acc:', results)

# Generate predictions (probabilities -- the output of the last layer)
# on new data using `predict`
print('\n# Generate predictions for 3 samples')
predictions = model.predict(x_test[:3])
print('predictions: ', predictions)
print('predictions shape:', predictions.shape)


# Evaluate on test data
test loss, test acc: [0.11589125097990036, 0.9635]

# Generate predictions for 3 samples
predictions:  [[ -9.204995    -7.9397      -1.3675845   -1.2221596   -9.813595
   -4.2968235  -18.740984     8.768244    -2.6839263   -3.3358111 ]
 [ -5.959761    -2.8102214    6.0991287   -0.94537205 -15.433664
   -3.9262004   -3.8570764   -9.060004    -2.4071803  -16.32756   ]
 [ -7.9607277    4.9811935   -3.7635896   -4.96414     -3.8687143
   -4.852813    -5.8724465   -0.8015147   -2.8880491   -4.893906  ]]
predictions shape: (3, 10)


### 指定损失函数、指标和优化器
<!--
### Specifying a loss, metrics, and an optimizer
-->

要使用 `fit` 训练模型，你需要指定一个损失函数，一个优化器，以及（可选的）若干指标以用作性能监控。
<!--
To train a model with `fit`, you need to specify a loss function, an optimizer, and optionally, some metrics to monitor.
-->

将这些作为 `compile()` 方法的参数传递给模型：
<!--
You pass these to the model as arguments to the `compile()` method:
-->

In [7]:
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=[keras.metrics.sparse_categorical_accuracy])

`metrics` 参数应当是一个列表——你的模型可以有任意数量的指标。
<!--
The `metrics` argument should be a list -- you model can have any number of metrics.
-->

如果你的模型有多个输出，你可以为每个输出指定不同的损失函数和指标，而且可以调整每个输出对模型总的损失函数的影响程度。你可以在**“将数据传递给多输入、多输出的模型”**一节中了解关于此方面的更多细节。
<!--If your model has multiple outputs, your can specify  different losses and metrics for each output,
and you can modulate the contribution of each output to the total loss of the model. You will find more details about this in the section "**Passing data to multi-input, multi-output models**".
-->

需要注意的是，如果优化器、损失函数以及指标的构造函数的默认设置已经足以解决问题，那么在很多情况下它们可以直接通过字符串标识符来指定（相当于一种快捷方式）：
<!--
Note that if you're satisfied with the default settings, in many cases the optimizer, loss, and metrics can be specified via string identifiers as a shortcut:
-->

In [8]:
model.compile(optimizer='rmsprop',
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['sparse_categorical_accuracy'])

为了方便之后重用上述代码，我们将模型的定义和编译步骤放到函数中；在本指南的不同案例里，这些函数将被多次调用：
<!--
For later reuse, let's put our model definition and compile step in functions; we will call them several times across different examples in this guide.
-->

In [9]:
def get_uncompiled_model():
    inputs = keras.Input(shape=(784,), name='digits')
    x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
    x = layers.Dense(64, activation='relu', name='dense_2')(x)
    outputs = layers.Dense(10, name='predictions')(x)
    model = keras.Model(inputs=inputs, outputs=outputs)
    return model

def get_compiled_model():
    model = get_uncompiled_model()
    model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
                loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['sparse_categorical_accuracy'])
    return model

#### 我们可以使用很多内置的优化器、损失函数和指标
<!--
#### Many built-in optimizers, losses, and metrics are available
-->

一般而言，你无须从头创建你自己的损失函数、指标以及优化器，因为你所需要的很有可能已经是 Keras API 的一部分了：
<!--
In general, you won't have to create from scratch your own losses, metrics, or optimizers, because what you need is likely already part of the Keras API:
-->

优化器：
- `SGD()`（有或者没有动量）
- `RMSprop()`
- `Adam()`
- 等等。

<!--
Optimizers:
- `SGD()` (with or without momentum)
- `RMSprop()`
- `Adam()`
- etc.
-->

损失函数：
- `MeanSquaredError()`
- `KLDivergence()`
- `CosineSimilarity()`
- 等等。

<!--
Losses:
- `MeanSquaredError()`
- `KLDivergence()`
- `CosineSimilarity()`
- etc.
-->

指标：
- `AUC()`
- `Precision()`
- `Recall()`
- 等等。

<!--
Metrics:
- `AUC()`
- `Precision()`
- `Recall()`
- etc.
-->

#### 自定义损失函数
<!--
#### Custom losses
-->

有两种方法可以让我们使用 Keras 来自定义损失函数。其一是创建一个接收 `y_true` 和 `y_pred` 作为输入的函数。下面的例子展示了一个计算真实数据和预测值之间的绝对误差的平均值的损失函数：
<!--
There are two ways to provide custom losses with Keras. The first example creates a function that accepts inputs `y_true` and `y_pred`. The following example shows a loss function that computes the average absolute error between the real data and the predictions:
-->

In [10]:
def basic_loss_function(y_true, y_pred):
    return tf.math.reduce_mean(tf.abs(y_true - y_pred))

model.compile(optimizer=keras.optimizers.Adam(),
              loss=basic_loss_function)

model.fit(x_train, y_train, batch_size=64, epochs=3)

Train on 50000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7ffb96271710>

如果你需要一个可以接收除了 `y_true` 和 `y_pred` 以外其他参数的损失函数，你可以继承 `tf.keras.losses.Loss` 类，并实现以下两个方法：

* `__init__(self)`：在调用损失函数时，传入的参数将被此方法接收；
* `call(self, y_true, y_pred)`：使用目标（`y_true`）和模型预测值（`y_pred`）来计算模型的损失。

<!--
If you need a loss function that takes in parameters beside `y_true` and `y_pred`, you can subclass the `tf.keras.losses.Loss` class and implement the following two methods:

* `__init__(self)` —Accept parameters to pass during the call of your loss function
* `call(self, y_true, y_pred)` —Use the targets (`y_true`) and the model predictions (`y_pred`) to compute the model's loss
-->

传递给 `__init__()` 的参数可以在 `call()` 计算损失值时被使用。
<!--
Parameters passed into `__init__()` can be used during `call()` when calculating loss.
-->

下面的例子展示了如何实现一个 `WeightedCrossEntropy` 损失函数，其计算 `BinaryCrossEntropy` 损失，其中对应于某一类别的损失值乃至整个函数的损失值都能由一个标量来调节。
<!--
The following example shows how to implement a `WeightedCrossEntropy` loss function that calculates a `BinaryCrossEntropy` loss, where the loss of a certain class or the whole function can be modified by a scalar.
-->

In [11]:
class WeightedBinaryCrossEntropy(keras.losses.Loss):
    """
    Args:
      pos_weight: Scalar to affect the positive labels of the loss function.
      weight: Scalar to affect the entirety of the loss function.
      from_logits: Whether to compute loss from logits or the probability.
      reduction: Type of tf.keras.losses.Reduction to apply to loss.
      name: Name of the loss function.
    """
    def __init__(self, pos_weight, weight, from_logits=False,
                 reduction=keras.losses.Reduction.AUTO,
                 name='weighted_binary_crossentropy'):
        super().__init__(reduction=reduction, name=name)
        self.pos_weight = pos_weight
        self.weight = weight
        self.from_logits = from_logits

    def call(self, y_true, y_pred):
        ce = tf.losses.binary_crossentropy(
            y_true, y_pred, from_logits=self.from_logits)[:,None]
        ce = self.weight * (ce*(1-y_true) + self.pos_weight*ce*(y_true))
        return ce
    
help(tf.losses.binary_crossentropy)

Help on function binary_crossentropy in module tensorflow.python.keras.losses:

binary_crossentropy(y_true, y_pred, from_logits=False, label_smoothing=0)



我们使用的是二元损失函数，但数据集有 10 个类别，所以我们将二元损失函数应用到数据集上，就好像模型在对每一个类别进行独立的二元分类预测。要想达到这个效果，我们需要从类别下标来构建[独热向量](https://developers.google.cn/machine-learning/glossary/?hl=zh-CN#%E7%8B%AC%E7%83%AD%E7%BC%96%E7%A0%81-one-hot-encoding)：
<!--
Binary loss but the dataset has 10 classes, so apply the loss to the dataset as if it were making an independent binary prediction for each class. To do that, start by creating one-hot vectors from the class indices:
-->

In [12]:
one_hot_y_train = tf.one_hot(y_train.astype(np.int32), depth=10)
help(tf.one_hot)

Help on function one_hot in module tensorflow.python.ops.array_ops:

one_hot(indices, depth, on_value=None, off_value=None, axis=None, dtype=None, name=None)
    Returns a one-hot tensor.
    
    The locations represented by indices in `indices` take value `on_value`,
    while all other locations take value `off_value`.
    
    `on_value` and `off_value` must have matching data types. If `dtype` is also
    provided, they must be the same data type as specified by `dtype`.
    
    If `on_value` is not provided, it will default to the value `1` with type
    `dtype`
    
    If `off_value` is not provided, it will default to the value `0` with type
    `dtype`
    
    If the input `indices` is rank `N`, the output will have rank `N+1`. The
    new axis is created at dimension `axis` (default: the new axis is appended
    at the end).
    
    If `indices` is a scalar the output shape will be a vector of length `depth`
    
    If `indices` is a vector of length `features`, the outp

现在使用这些独热向量以及自定义损失函数来训练模型：
<!--
Now use those one-hots, and the custom loss to train a model:
-->

In [13]:
model = get_uncompiled_model()

model.compile(
    optimizer=keras.optimizers.Adam(),
    loss=WeightedBinaryCrossEntropy(
        pos_weight=0.5, weight = 2, from_logits=True)
)

model.fit(x_train, one_hot_y_train, batch_size=64, epochs=5)

Train on 50000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7ffb96288710>

#### 自定义指标
<!--
#### Custom metrics
-->

如果你需要的指标不能在 API 中找到，你可以很简单地创建自定义指标，只须继承自 `Metric` 类即可。你需要实现 4 个方法：

- `__init__(self)`：在这里为你的指标创建状态变量。
- `update_state(self, y_true, y_pred, sample_weight=None)`：使用目标 `y_true` 和模型预测值 `y_pred` 来更新状态变量。
- `result(self)`：使用状态变量来计算最终结果。
- `reset_states(self)`：重新初始化指标的状态。

<!--
If you need a metric that isn't part of the API, you can easily create custom metrics by subclassing the `Metric` class. You will need to implement 4 methods:

- `__init__(self)`,  in which you will create state variables for your metric.
- `update_state(self, y_true, y_pred, sample_weight=None)`, which uses the targets `y_true` and the model predictions `y_pred` to update the state variables.
- `result(self)`, which uses the state variables to compute the final results.
- `reset_states(self)`, which reinitializes the state of the metric.
-->

状态更新和结果计算是被分开进行的（分别在 `update_state()` 和 `result()` 中完成），这是因为在一些情况下结果的计算可能非常消耗性能，故只能被周期性地计算。
<!--
State update and results computation are kept separate (in `update_state()` and `result()`, respectively) because in some cases, results computation might be very expensive, and would only be done periodically.
-->

下面的例子展示了如何实现一个 `CategoricalTruePositives` 指标，其统计被正确分类的样本数：
<!--
Here's a simple example showing how to implement a `CategoricalTruePositives` metric, that counts how many samples where correctly classified as belonging to a given class:
-->

In [14]:
class CategoricalTruePositives(keras.metrics.Metric):

    def __init__(self, name='categorical_true_positives', **kwargs):
        super(CategoricalTruePositives, self).__init__(name=name, **kwargs)
        self.true_positives = self.add_weight(name='tp', initializer='zeros')

    def update_state(self, y_true, y_pred, sample_weight=None):
        y_pred = tf.reshape(tf.argmax(y_pred, axis=1), shape=(-1, 1))
        values = tf.cast(y_true, 'int32') == tf.cast(y_pred, 'int32')
        values = tf.cast(values, 'float32')
        if sample_weight is not None:
            sample_weight = tf.cast(sample_weight, 'float32')
            values = tf.multiply(values, sample_weight)
        self.true_positives.assign_add(tf.reduce_sum(values))

    def result(self):
        return self.true_positives

    def reset_states(self):
        # The state of the metric will be reset at the start of each epoch.
        self.true_positives.assign(0.)

In [15]:
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=[CategoricalTruePositives()])
model.fit(x_train, y_train,
          batch_size=64,
          epochs=3)

Train on 50000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7ffba4203e90>

### 处理不符合标准函数签名的损失函数和指标
<!--
#### Handling losses and metrics that don't fit the standard signature
-->

绝大多数损失函数和指标能由 `y_true` 和 `y_pred` 计算出来，其中 `y_pred` 是模型的一个输出值。但总是存在例外。例如，一个正则化损失函数可能只需要某层的激活函数（这种情况下不存在目标），而且该激活函数可能并不是模型的一个输出。
<!--
The overwhelming majority of losses and metrics can be computed from `y_true` and `y_pred`, where `y_pred` is an output of your model. But not all of them. For instance, a regularization loss may only require the activation of a layer (there are no targets in this case), and this activation may not be a model output.
-->

在这种情况下，你可以在一个自定义层的 `call` 方法内部调用 `self.add_loss(loss_value)`。在下面这个简单的例子中，模型会加上 activity regularization（注意，activity regularization 都是内置于所有 Keras 层中的——这里的 `ActivityRegularizationLayer` 层仅仅是用作演示的例子罢了）：
<!--
In such cases, you can call `self.add_loss(loss_value)` from inside the `call` method of a custom layer. Here's a simple example that adds activity regularization (note that activity regularization is built-in in all Keras layers -- this layer is just for the sake of providing a concrete example):
-->

In [16]:
class ActivityRegularizationLayer(layers.Layer):

    def call(self, inputs):
        self.add_loss(tf.reduce_sum(inputs) * 0.1)
        return inputs  # Pass-through layer.

inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)

# Insert activity regularization as a layer
x = ActivityRegularizationLayer()(x)

x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True))

# The displayed loss will be much higher than before
# due to the regularization component.
model.fit(x_train, y_train,
          batch_size=64,
          epochs=1)

Train on 50000 samples


<tensorflow.python.keras.callbacks.History at 0x7ffb97108710>

你也可以为 logging metric values 采取同样的手段：
<!--
You can do the same for logging metric values:
-->

In [17]:
class MetricLoggingLayer(layers.Layer):

    def call(self, inputs):
        # The `aggregation` argument defines
        # how to aggregate the per-batch values
        # over each epoch:
        # in this case we simply average them.
        self.add_metric(keras.backend.std(inputs),
                        name='std_of_activation',
                        aggregation='mean')
        return inputs  # Pass-through layer.


inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)

# Insert std logging as a layer.
x = MetricLoggingLayer()(x)

x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True))
model.fit(x_train, y_train,
          batch_size=64,
          epochs=1)

Train on 50000 samples


<tensorflow.python.keras.callbacks.History at 0x7ffb9752e0d0>

在[函数式 API](functional.ipynb)中，你还可以调用 `model.add_loss(loss_tensor)` 或者 `model.add_metric(metric_tensor, name, aggregation)`。
<!--
In the [Functional API](functional.ipynb), you can also call `model.add_loss(loss_tensor)`, or `model.add_metric(metric_tensor, name, aggregation)`.
-->

下面是一个简单的例子：
<!--
Here's a simple example:
-->

In [18]:
inputs = keras.Input(shape=(784,), name='digits')
x1 = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x2 = layers.Dense(64, activation='relu', name='dense_2')(x1)
outputs = layers.Dense(10, name='predictions')(x2)
model = keras.Model(inputs=inputs, outputs=outputs)

model.add_loss(tf.reduce_sum(x1) * 0.1)

model.add_metric(keras.backend.std(x1),
                 name='std_of_activation',
                 aggregation='mean')

model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True))
model.fit(x_train, y_train,
          batch_size=64,
          epochs=1)

Train on 50000 samples


<tensorflow.python.keras.callbacks.History at 0x7ffb97c9e690>

#### 自动分隔出验证维持数据集
<!--
#### Automatically setting apart a validation holdout set
-->

在你所见到的第一个端到端的例子中，我们使用 `validation_data` 来给模型传入一个 Numpy 数组 `(x_val, y_val)`，用来在每一个训练周期的末尾计算验证损失函数和验证指标值。
<!--
In the first end-to-end example you saw, we used the `validation_data` argument to pass a tuple
of Numpy arrays `(x_val, y_val)` to the model for evaluating a validation loss and validation metrics at the end of each epoch.
-->

我们还有另一种方法：参数 `validation_split` 允许我们自动地将训练数据中的一部分保留为验证数据。该参数的数值代表了原始数据中将被保留用来验证的数据的比重，因此它应当是大于 0 小于 1 的。例如，`validation_split=0.2` 表示“使用 20% 的原始数据来验证”，而 `validation_split=0.6` 表示“使用 60% 的原始数据来验证”。
<!--
Here's another option: the argument `validation_split` allows you to automatically reserve part of your training data for validation. The argument value represents the fraction of the data to be reserved for validation, so it should be set to a number higher than 0 and lower than 1. For instance, `validation_split=0.2` means "use 20% of the data for validation", and `validation_split=0.6` means "use 60% of the data for validation".
-->

验证数据集的选取，是通过*将 `fit` 接收到的数组（尚未打乱顺序）中最后 x% 个样本取出*得到的。
<!--
The way the validation is computed is by *taking the last x% samples of the arrays received by the `fit` call, before any shuffling*.
-->

You can only use `validation_split` when training with Numpy data.

In [19]:
model = get_compiled_model()
model.fit(x_train, y_train, batch_size=64, validation_split=0.2, epochs=1, steps_per_epoch=1)

Train on 40000 samples, validate on 10000 samples
   64/40000 [..............................] - ETA: 5:06 - loss: 2.3046 - sparse_categorical_accuracy: 0.1719 - val_loss: 2.1804 - val_sparse_categorical_accuracy: 0.2044

<tensorflow.python.keras.callbacks.History at 0x7ffb98399b10>

### 使用 tf.data Dataset 来训练和评估
<!--
### Training & evaluation from tf.data Datasets
-->

在上面几段中，你已经知道了在传入数据为 Numpy 数组格式时，我们如何处理损失函数、指标和优化器，而且已经看到我们如何在 `fit` 中使用 `validation_data` 和 `validation_split` 参数。
<!--
In the past few paragraphs, you've seen how to handle losses, metrics, and optimizers, and you've seen how to use the `validation_data` and `validation_split` arguments in `fit`, when your data is passed as Numpy arrays.
-->

现在让我们来看看当数据以 tf.data Dataset 格式出现时我们该怎么处理。
<!--
Let's now take a look at the case where your data comes in the form of a tf.data Dataset.
-->

tf.data API 是 TensorFlow 2.0 中用于加载和预处理数据的工具集，其运行效率高，而且具有可扩展性。
<!--
The tf.data API is a set of utilities in TensorFlow 2.0 for loading and preprocessing data in a way that's fast and scalable.
-->

要想获取关于创建 Dataset 的完整指南，请参见 [tf.data 文档](https://www.tensorflow.org/guide/data)
。
<!--
For a complete guide about creating Datasets, see [the tf.data documentation](https://www.tensorflow.org/guide/data).
-->

你可以向 `fit()`、`evaluate()` 和 `predict()` 中直接传入 Dataset 实例：
<!--
You can pass a Dataset instance directly to the methods `fit()`, `evaluate()`, and `predict()`:
-->

In [20]:
model = get_compiled_model()

# First, let's create a training Dataset instance.
# For the sake of our example, we'll use the same MNIST data as before.
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
# Shuffle and slice the dataset.
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

# Now we get a test dataset.
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_dataset = test_dataset.batch(64)

# Since the dataset already takes care of batching,
# we don't pass a `batch_size` argument.
model.fit(train_dataset, epochs=3)

# You can also evaluate or predict on a dataset.
print('\n# Evaluate')
result = model.evaluate(test_dataset)
dict(zip(model.metrics_names, result))

print(type(train_dataset), type(test_dataset))

Train for 782 steps
Epoch 1/3
Epoch 2/3
Epoch 3/3

# Evaluate
<class 'tensorflow.python.data.ops.dataset_ops.BatchDataset'> <class 'tensorflow.python.data.ops.dataset_ops.BatchDataset'>


注意，Dataset 在每一个训练周期结束后都会被重置，因此它们能直接为下一个训练周期所重用。
<!--
Note that the Dataset is reset at the end of each epoch, so it can be reused of the next epoch.
-->

如果你只想使用该 Dataset 中一定数量的批次来进行训练，你可以传递 `steps_per_epoch` 参数，该参数指定了模型将在使用这个 Dataset 训练多少步之后再开始下一次训练周期（译注：即每个周期的训练步数）。
<!--
If you want to run training only on a specific number of batches from this Dataset, you can pass the `steps_per_epoch` argument, which specifies how many training steps the model should run using this Dataset before moving on to the next epoch.
-->

如果你这样做，数据集在每一个训练周期结束后将不会被重置，相反，我们将直接取用接下来的数据批次。最终，数据集里的数据将被用完（除非它是一个无限循环数据集）。
<!--
If you do this, the dataset is not reset at the end of each epoch, instead we just keep drawing the next batches. The dataset will eventually run out of data (unless it is an infinitely-looping dataset).
-->

In [21]:
model = get_compiled_model()

# Prepare the training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

# Only use the 100 batches per epoch (that's 64 * 100 samples)
model.fit(train_dataset.take(100), epochs=3)

help(model.fit)
help(train_dataset.take)

Train for 100 steps
Epoch 1/3
Epoch 2/3
Epoch 3/3
Help on method fit in module tensorflow.python.keras.engine.training:

fit(x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None, validation_freq=1, max_queue_size=10, workers=1, use_multiprocessing=False, **kwargs) method of tensorflow.python.keras.engine.training.Model instance
    Trains the model for a fixed number of epochs (iterations on a dataset).
    
    Arguments:
        x: Input data. It could be:
          - A Numpy array (or array-like), or a list of arrays
            (in case the model has multiple inputs).
          - A TensorFlow tensor, or a list of tensors
            (in case the model has multiple inputs).
          - A dict mapping input names to the corresponding array/tensors,
            if the model has named inputs.
          - A `tf.dat


Help on method take in module tensorflow.python.data.ops.dataset_ops:

take(count) method of tensorflow.python.data.ops.dataset_ops.BatchDataset instance
    Creates a `Dataset` with at most `count` elements from this dataset.
    
    >>> dataset = tf.data.Dataset.range(10)
    >>> dataset = dataset.take(3)
    >>> list(dataset.as_numpy_iterator())
    [0, 1, 2]
    
    Args:
      count: A `tf.int64` scalar `tf.Tensor`, representing the number of
        elements of this dataset that should be taken to form the new dataset.
        If `count` is -1, or if `count` is greater than the size of this
        dataset, the new dataset will contain all elements of this dataset.
    
    Returns:
      Dataset: A `Dataset`.



#### 使用验证数据集
<!--
#### Using a validation dataset
-->

你可以为 `fit` 中的 `validation_data` 参数传入一个 Dataset 实例：
<!--
You can pass a Dataset instance as the `validation_data` argument in `fit`:
-->

In [22]:
model = get_compiled_model()

# Prepare the training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

# Prepare the validation dataset
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(64)

model.fit(train_dataset, epochs=3, validation_data=val_dataset)

Train for 782 steps, validate for 157 steps
Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7ffb7e806d50>

在每一个训练周期的末尾，模型都会对验证 Dataset 进行迭代，并计算验证损失函数和验证指标。
<!--
At the end of each epoch, the model will iterate over the validation Dataset and compute the validation loss and validation metrics.
-->

如果你只想使用该 Dataset 中一定数量的批次来进行验证，你可以传入 `validation_steps` 参数，该参数指定了模型在每一周期将会使用验证 Dataset 运行多少步，之后中止验证并继续下一周期的训练。
<!--
If you want to run validation only on a specific number of batches from this Dataset, you can pass the `validation_steps` argument, which specifies how many validation steps the model should run with the validation Dataset before interrupting validation and moving on to the next epoch:
-->

In [23]:
model = get_compiled_model()

# Prepare the training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

# Prepare the validation dataset
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(64)

model.fit(train_dataset, epochs=3,
          # Only run validation using the first 10 batches of the dataset
          # using the `validation_steps` argument
          validation_data=val_dataset, validation_steps=10)

Train for 782 steps, validate for 10 steps
Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7ffb97520dd0>


<!--
Note that the Dataset is reset at the end of each epoch, so it can be reused of the next epoch.
-->

如果你只想使用该 Dataset 中一定数量的批次来进行训练，你可以传递 `steps_per_epoch` 参数，该参数指定了模型将在使用这个 Dataset 训练多少步之后再开始下一次训练周期（译注：即每个周期的训练步数）。
<!--
If you want to run training only on a specific number of batches from this Dataset, you can pass the `steps_per_epoch` argument, which specifies how many training steps the model should run using this Dataset before moving on to the next epoch.
-->

如果你这样做，数据集在每一个训练周期结束后将不会被重置，相反，我们将直接取用接下来的数据批次。最终，数据集里的数据将被用完（除非它是一个无限循环数据集）。
<!--
If you do this, the dataset is not reset at the end of each epoch, instead we just keep drawing the next batches. The dataset will eventually run out of data (unless it is an infinitely-looping dataset).
-->

注意，验证 Dataset 在每一次被使用之后都会被重置（因此在不同的周期，你都在基于相同的验证样本进行评估）。
<!--
Note that the validation Dataset will be reset after each use (so that you will always be evaluating on the same samples from epoch to epoch).
-->

使用 Dataset 对象进行训练时，不支持 `validation_split` 参数（该参数可以帮助我们从训练数据中生成维持数据集），因为该特性要求我们能够索引数据集的样本，而这通常是不能使用 Dataset API 做到的。
<!--
The argument `validation_split` (generating a holdout set from the training data) is not supported when training from Dataset objects, since this features requires the ability to index the samples of the datasets, which is not possible in general with the Dataset API.
-->

### 支持的其他输入格式
<!--
### Other input formats supported
-->

在 Numpy 数组和 TensorFlow Dataset 之外，我们还可以使用 Pandas dataframe 或可以生成（yield）数据批次的 Python 生成器来训练 Keras 模型。
<!--
Besides Numpy arrays and TensorFlow Datasets, it's possible to train a Keras model using Pandas dataframes, or from Python generators that yield batches.
-->

通常来说，如果你的数据量比较小，能够在内存中储存下来，我们建议使用 Numpy 输入数据；否则请使用 Dataset。
<!--
In general, we recommend that you use Numpy input data if your data is small and fits in memory, and Datasets otherwise.
-->

### 使用样本加权和类别加权
<!--
### Using sample weighting and class weighting
-->

在使用 `fit` 时，除了输入数据和目标数据，我们还可以为模型传入样本权重或者类别权重：
<!--
Besides input data and target data, it is possible to pass sample weights or class weights to a model when using `fit`:
-->

- 在使用 Numpy 数据训练时：请使用 `sample_weight` 和 `class_weight` 参数。
- 在使用 Dataset 训练时：请让你的 Dataset 返回一个格式为 `(input_batch, target_batch, sample_weight_batch)` 的数组。

<!--
- When training from Numpy data: via the `sample_weight` and `class_weight` arguments.
- When training from Datasets: by having the Dataset return a tuple `(input_batch, target_batch, sample_weight_batch)` .
-->

“sample weights”数组给定了在计算总损失时一个批次中的每个样本应当具有的权重。它通常用于非平衡分类问题（imbalanced classification problems；即给不常见的类别赋予更高的权重）。若权重值非 0 即 1，那么该权重数组可以用作损失函数的*掩码*（*mask*；即整体地丢弃某些样本对于总损失的贡献）。
<!--
A "sample weights" array is an array of numbers that specify how much weight each sample in a batch should have in computing the total loss. It is commonly used in imbalanced classification problems (the idea being to give more weight to rarely-seen classes). When the weights used are ones and zeros, the array can be used as a *mask* for the loss function (entirely discarding the contribution of certain samples to the total loss).
-->

“class weights”字典代表了同样的概念，但其含义更加明确：它将类别序号映射到那些属于该类别的样本应当具有的权重值。例如，如果你的数据中，类别“0”的数量是类别“1”的一半，那么你可以使用 `class_weight={0: 1., 1: 0.5}`。
<!--
A "class weights" dict is a more specific instance of the same concept: it maps class indices to the sample weight that should be used for samples belonging to this class. For instance, if class "0" is twice less represented than class "1" in your data, you could use `class_weight={0: 1., 1: 0.5}`.
-->

下面这个例子使用了 Numpy，我们利用类别权重或者样本权重来提升正确分类类别 #5（即 MNIST 数据集中的数字“5”）的重要性。
<!--
Here's a Numpy example where we use class weights or sample weights to give more importance to the correct classification of class #5 (which is the digit "5" in the MNIST dataset).
-->

In [24]:
import numpy as np

class_weight = {0: 1., 1: 1., 2: 1., 3: 1., 4: 1.,
                # Set weight "2" for class "5",
                # making this class 2x more important
                5: 2.,
                6: 1., 7: 1., 8: 1., 9: 1.}
print('Fit with class weight')
model.fit(x_train, y_train,
          class_weight=class_weight,
          batch_size=64,
          epochs=4)

Fit with class weight
  ...
    to  
  ['...']
Train on 50000 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<tensorflow.python.keras.callbacks.History at 0x7ffb7bc1c910>

In [25]:
# Here's the same example using `sample_weight` instead:
sample_weight = np.ones(shape=(len(y_train),))
sample_weight[y_train == 5] = 2.
print('\nFit with sample weight')

model = get_compiled_model()
model.fit(x_train, y_train,
          sample_weight=sample_weight,
          batch_size=64,
          epochs=4)


Fit with sample weight
  ...
    to  
  ['...']
Train on 50000 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<tensorflow.python.keras.callbacks.History at 0x7ffb98c03e90>

下面是匹配 Dataset 的例子：
<!--
Here's a matching Dataset example:
-->

In [26]:
sample_weight = np.ones(shape=(len(y_train),))
sample_weight[y_train == 5] = 2.

# Create a Dataset that includes sample weights
# (3rd element in the return tuple).
train_dataset = tf.data.Dataset.from_tensor_slices(
    (x_train, y_train, sample_weight))

# Shuffle and slice the dataset.
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

model = get_compiled_model()
model.fit(train_dataset, epochs=3)

Train for 782 steps
Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7ffb7e917810>

### 将数据传递给多输入、多输出的模型
<!--
### Passing data to multi-input, multi-output models
-->

在上面的例子中，我们考虑的是只有一个输入（形状为 `(764,)` 的张量）且只有一个输出（形状为 `(10,)` 的预测张量）的模型。那么对于有着多个输入和多个输出的模型我们该怎么办呢？
<!--
In the previous examples, we were considering a model with a single input (a tensor of shape `(764,)`) and a single output (a prediction tensor of shape `(10,)`). But what about models that have multiple inputs or outputs?
-->

考虑如下模型，它有一个形状为 `(32, 32, 3)` 的图像输入（其形状表示图像的 `(height, width, channels)`），以及一个形状为 `(None, 10)` 的时间序列输入（其形状表示 `(timesteps, features)`）。我们的模型将会根据这些输入计算出两个输出：一个“评分”（形状为 `(1,)`），以及一个关于五个类别的概率分布（形状为 `(5,)`）。
<!--
Consider the following model, which has an image input of shape `(32, 32, 3)` (that's `(height, width, channels)`) and a timeseries input of shape `(None, 10)` (that's `(timesteps, features)`). Our model will have two outputs computed from the combination of these inputs: a "score" (of shape `(1,)`) and a probability distribution over five classes (of shape `(5,)`).
-->


In [27]:
from tensorflow import keras
from tensorflow.keras import layers

image_input = keras.Input(shape=(32, 32, 3), name='img_input')
timeseries_input = keras.Input(shape=(None, 10), name='ts_input')

x1 = layers.Conv2D(3, 3)(image_input)
x1 = layers.GlobalMaxPooling2D()(x1)

x2 = layers.Conv1D(3, 3)(timeseries_input)
x2 = layers.GlobalMaxPooling1D()(x2)

x = layers.concatenate([x1, x2])

score_output = layers.Dense(1, name='score_output')(x)
class_output = layers.Dense(5, name='class_output')(x)

model = keras.Model(inputs=[image_input, timeseries_input],
                    outputs=[score_output, class_output])

让我们绘制这个模型，这样你就能清楚地知道我们这是在做什么了（注意图片中显示的形状是批次的形状，而不是每个样本的形状）。
<!--
Let's plot this model, so you can clearly see what we're doing here (note that the shapes shown in the plot are batch shapes, rather than per-sample shapes).
-->

In [28]:
keras.utils.plot_model(model, 'multi_input_and_output_model.png', show_shapes=True)

Failed to import pydot. You must install pydot and graphviz for `pydotprint` to work.


在编译时，我们可以为不同的输出指定不同的损失函数，只须将损失函数以列表形式传入即可：
<!--
At compilation time, we can specify different losses to different outputs, by passing the loss functions as a list:
-->

In [29]:
model.compile(
    optimizer=keras.optimizers.RMSprop(1e-3),
    loss=[keras.losses.MeanSquaredError(),
          keras.losses.CategoricalCrossentropy(from_logits=True)])

如果我们只为模型传入了一个损失函数，那么它将应用于每个输出，在现在这个例子中这样做并不可取。
<!--
If we only passed a single loss function to the model, the same loss function would be applied to every output, which is not appropriate here.
-->

对于指标，方法类似：
<!--
Likewise for metrics:
-->

In [30]:
model.compile(
    optimizer=keras.optimizers.RMSprop(1e-3),
    loss=[keras.losses.MeanSquaredError(),
          keras.losses.CategoricalCrossentropy(from_logits=True)],
    metrics=[[keras.metrics.MeanAbsolutePercentageError(),
              keras.metrics.MeanAbsoluteError()],
             [keras.metrics.CategoricalAccuracy()]])

因为我们已经为每个输出层取了名字，所以我们也可以使用字典来指定每个输出的损失函数和指标：
<!--
Since we gave names to our output layers, we could also specify per-output losses and metrics via a dict:
-->

In [31]:
model.compile(
    optimizer=keras.optimizers.RMSprop(1e-3),
    loss={'score_output': keras.losses.MeanSquaredError(),
          'class_output': keras.losses.CategoricalCrossentropy(from_logits=True)},
    metrics={'score_output': [keras.metrics.MeanAbsolutePercentageError(),
                              keras.metrics.MeanAbsoluteError()],
             'class_output': [keras.metrics.CategoricalAccuracy()]})

如果你有 2 个以上的输出，我们建议你显式地为输出层命名并使用字典来指定每个输出的损失函数和指标。
<!--
We recommend the use of explicit names and dicts if you have more than 2 outputs.
-->

我们还可以为不同输出的损失函数赋予不同的权重（例如，我们希望将上述例子中“score”的损失函数的重要性提升至“class”的损失函数的两倍），只须使用 `loss_weights` 参数：
<!--
It's possible to give different weights to different output-specific losses (for instance, one might wish to privilege the "score" loss in our example, by giving to 2x the importance of the class loss), using the `loss_weights` argument:
-->

In [32]:
model.compile(
    optimizer=keras.optimizers.RMSprop(1e-3),
    loss={'score_output': keras.losses.MeanSquaredError(),
          'class_output': keras.losses.CategoricalCrossentropy(from_logits=True)},
    metrics={'score_output': [keras.metrics.MeanAbsolutePercentageError(),
                              keras.metrics.MeanAbsoluteError()],
             'class_output': [keras.metrics.CategoricalAccuracy()]},
    loss_weights={'score_output': 2., 'class_output': 1.})

你也可以选择不计算某些输出的损失函数，如果这些输出仅仅表示预测而不会用来训练模型的话：
<!--
You could also chose not to compute a loss for certain outputs, if these outputs meant for prediction but not for training:
-->

In [33]:
# List loss version
model.compile(
    optimizer=keras.optimizers.RMSprop(1e-3),
    loss=[None, keras.losses.CategoricalCrossentropy(from_logits=True)])

# Or dict loss version
model.compile(
    optimizer=keras.optimizers.RMSprop(1e-3),
    loss={'class_output':keras.losses.CategoricalCrossentropy(from_logits=True)})



使用 `fit` 来为多输入多输出模型传入数据，方法和在 `compile` 中指定损失函数一样：你可以传入 *Numpy 数组的列表（输出数据应当和接受了损失函数的输出层一一对应）*，或者是*以输入/输出层名字为键、以相应 Numpy 数组训练数据为值的字典*。
<!--
Passing data to a multi-input or multi-output model in `fit` works in a similar way as specifying a loss function in `compile`:
you can pass *lists of Numpy arrays (with 1:1 mapping to the outputs that received a loss function)* or *dicts mapping output names to Numpy arrays of training data*.
-->

In [34]:
model.compile(
    optimizer=keras.optimizers.RMSprop(1e-3),
    loss=[keras.losses.MeanSquaredError(),
          keras.losses.CategoricalCrossentropy(from_logits=True)])

# Generate dummy Numpy data
img_data = np.random.random_sample(size=(100, 32, 32, 3))
ts_data = np.random.random_sample(size=(100, 20, 10))
score_targets = np.random.random_sample(size=(100, 1))
class_targets = np.random.random_sample(size=(100, 5))

# Fit on lists
model.fit([img_data, ts_data], [score_targets, class_targets],
          batch_size=32,
          epochs=3)

# Alternatively, fit on dicts
model.fit({'img_input': img_data, 'ts_input': ts_data},
          {'score_output': score_targets, 'class_output': class_targets},
          batch_size=32,
          epochs=3)

Train on 100 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3
Train on 100 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7ffb7f3305d0>

如果使用 Dataset，情况和 Numpy 数组类似，Dataset 应当返回一个由字典组成的元组。
<!--
Here's the Dataset use case: similarly as what we did for Numpy arrays, the Dataset should return
a tuple of dicts.
-->

In [35]:
train_dataset = tf.data.Dataset.from_tensor_slices(
    ({'img_input': img_data, 'ts_input': ts_data},
     {'score_output': score_targets, 'class_output': class_targets}))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

model.fit(train_dataset, epochs=3)

Train for 2 steps
Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7ffb7e1dc290>

### 使用回调函数
<!--
### Using callbacks
-->

在 Keras 中，回调函数是在训练过程的不同时间点（如在训练周期的开始，在训练完一个批次之后，或者在训练完一个周期之后，等等）被调用的对象，它们能用来完成如下任务：
<!--
Callbacks in Keras are objects that are called at different point during training (at the start of an epoch, at the end of a batch, at the end of an epoch, etc.) and which can be used to implement behaviors such as:
-->

- 在训练的不同时间点进行验证（而不仅限于内置的每周期验证）
- 每隔一段时间或者在模型达到某个精度阈值后为模型保存[检查点](https://developers.google.cn/machine-learning/glossary/?hl=zh-CN#%E6%A3%80%E6%9F%A5%E7%82%B9-checkpoint)
- 在训练可能陷入稳定区域（plateau）时调整模型的学习率
- 在训练可能陷入稳定区域（plateau）时为顶部的层进行微调
- 训练结束后或者模型达到某个准确度阈值时自动发送邮件或者即时消息提醒
- 等等

<!--
- Doing validation at different points during training (beyond the built-in per-epoch validation)
- Checkpointing the model at regular intervals or when it exceeds a certain accuracy threshold
- Changing the learning rate of the model when training seems to be plateauing
- Doing fine-tuning of the top layers when training seems to be plateauing
- Sending email or instant message notifications when training ends or where a certain performance threshold is exceeded
- Etc.
-->

在你调用 `fit` 时，你可以将回调函数以列表形式传入：
<!--
Callbacks can be passed as a list to your call to `fit`:
-->

In [36]:
model = get_compiled_model()

callbacks = [
    keras.callbacks.EarlyStopping(
        # Stop training when `val_loss` is no longer improving
        monitor='val_loss',
        # "no longer improving" being defined as "no better than 1e-2 less" 
        # (即：和上一周期的值相比减少不超过 1e-2)
        min_delta=1e-2,
        # "no longer improving" being further defined as "for at least 2 epochs"
        patience=2,
        verbose=1)
]
model.fit(x_train, y_train,
          epochs=20,
          batch_size=64,
          callbacks=callbacks,
          validation_split=0.2)

Train on 40000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 00006: early stopping


<tensorflow.python.keras.callbacks.History at 0x7ffb7ff35e10>

#### 有很多内置回调函数可以使用
<!--
#### Many built-in callbacks are available
-->

- `ModelCheckpoint`：按周期保存模型。
- `EarlyStopping`：在训练过程中验证指标不再得到优化后停止训练。
- `TensorBoard`：周期性地将模型日志记录下来，以便在 TensorBoard 可视化查看（更多细节参见“可视化”一节）。
- `CSVLogger`：将损失和指标数据流式存储至 CSV 文件中。
- 等等。

<!--
- `ModelCheckpoint`: Periodically save the model.
- `EarlyStopping`: Stop training when training is no longer improving the validation metrics.
- `TensorBoard`: periodically write model logs that can be visualized in TensorBoard (more details in the section "Visualization").
- `CSVLogger`: streams loss and metrics data to a CSV file.
- etc.
-->



#### 编写你自己的回调函数
<!--
#### Writing your own callback
-->

你可以通过继承基类 keras.callbacks.Callback 来创建自定义回调函数。一个回调函数可以通过类属性 `self.model` 访问其关联的模型。
<!--
You can create a custom callback by extending the base class keras.callbacks.Callback. A callback has access to its associated model through the class property `self.model`.
-->

下面这个例子会保存训练期间每个批次的损失函数值组成的列表：
<!--
Here's a simple example saving a list of per-batch loss values during training:
-->

```python
class LossHistory(keras.callbacks.Callback):

    def on_train_begin(self, logs):
        self.losses = []

    def on_batch_end(self, batch, logs):
        self.losses.append(logs.get('loss'))
```

### 为模型创建检查点
<!--
### Checkpointing models
-->

当你在相当大的数据集上训练模型时，较频繁地为你的模型保存检查点是相当重要的。
<!--
When you're training model on relatively large datasets, it's crucial to save checkpoints of your model at frequent intervals.
-->

最简单的方法是使用 `ModelCheckpoint` 回调：
<!--
The easiest way to achieve this is with the `ModelCheckpoint` callback:
-->

In [None]:
model = get_compiled_model()

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath='mymodel_{epoch}',
        # Path where to save the model
        # The two parameters below mean that we will overwrite
        # the current checkpoint if and only if
        # the `val_loss` score has improved.
        save_best_only=True,
        monitor='val_loss',
        verbose=1)
]
model.fit(x_train, y_train,
          epochs=3,
          batch_size=64,
          callbacks=callbacks,
          validation_split=0.2)

你也可以编写你自己的回调，以便保存和恢复模型。
<!--
You call also write your own callback for saving and restoring models.
-->

关于模型序列化和保存的完整指南，请参见[模型序列化和模型保存的指南](./save_and_serialize.ipynb)。
<!--
For a complete guide on serialization and saving, see [Guide to Saving and Serializing Models](./save_and_serialize.ipynb).
-->

### 使用学习率计划
<!--
### Using learning rate schedules
-->

在训练深度学习模型时，一个常见的模式是随着训练的推进，逐渐减小学习率。这通常被称为“学习率衰减”。
<!--
A common pattern when training deep learning models is to gradually reduce the learning as training progresses. This is generally known as "learning rate decay".
-->

学习率衰减计划可以是静态的（被提前确定好，学习率是训练周期编号或者批次编号的函数），也可以是动态的（学习率会依据当前模型的表现来调整，尤其是会考虑验证损失值）。
<!--
The learning decay schedule could be static (fixed in advance, as a function of the current epoch or the current batch index), or dynamic (responding to the current behavior of the model, in particular the validation loss).
-->

#### 为优化器传入学习率计划
<!--
#### Passing a schedule to an optimizer
-->

你可以很容易地使用静态学习率衰减计划，只须在你的优化器中将计划对象作为 `learning_rate` 参数传入：
<!--
You can easily use a static learning rate decay schedule by passing a schedule object as the `learning_rate` argument in your optimizer:
-->


In [38]:
initial_learning_rate = 0.1
lr_schedule = keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate,
    decay_steps=100000,
    decay_rate=0.96,
    staircase=True)

optimizer = keras.optimizers.RMSprop(learning_rate=lr_schedule)

若干内置计划可以使用：`ExponentialDecay`，`PiecewiseConstantDecay`，`PolynomialDecay`，以及 `InverseTimeDecay`。
<!--
Several built-in schedules are available: `ExponentialDecay`, `PiecewiseConstantDecay`, `PolynomialDecay`, and `InverseTimeDecay`.
-->

#### 使用回调来实现动态学习率计划
<!--
#### Using callbacks to implement a dynamic learning rate schedule
-->

动态学习率计划（例如，在验证损失不再降低时减少学习率）不能通过上述计划对象实现，因为优化器不能获取到验证指标。
<!--
A dynamic learning rate schedule (for instance, decreasing the learning rate when the validation loss is no longer improving) cannot be achieved with these schedule objects since the optimizer does not have access to validation metrics.
-->

然而，回调是可以获取到所有指标的，包括验证指标。因此，要想达到这个目的，你可以使用回调来更改优化器的学习率。事实上，该功能已经被内置为 `ReduceLROnPlateau` 回调。
<!--
However, callbacks do have access to all metrics, including validation metrics! You can thus achieve this pattern by using a callback that modifies the current learning rate on the optimizer. In fact, this is even built-in as the `ReduceLROnPlateau` callback.
-->

### 在训练期间可视化损失函数和指标
<!--
### Visualizing loss and metrics during training
-->

在训练时，能够时刻紧盯你的模型的最好方法是使用 [TensorBoard](https://www.tensorflow.org/tensorboard)，这是一个可以在本地运行的、基于浏览器的应用，它让你可以：
<!--
The best way to keep an eye on your model during training is to use [TensorBoard](https://www.tensorflow.org/tensorboard), a browser-based application that you can run locally that provides you with:
-->

- 为训练和评估实时绘制损失和指标曲线
- （可选）为每层的激活函数绘制直方图
- （可选）为你的 `Embedding` 层学习到的嵌套空间进行 3D 可视化
<!--
- Live plots of the loss and metrics for training and evaluation
- (optionally) Visualizations of the histograms of your layer activations
- (optionally) 3D visualizations of the embedding spaces learned by your `Embedding` layers
-->

如果你是使用 pip 安装 TensorFlow 的，你可以在命令行中启动 TensorBoard：
<!--
If you have installed TensorFlow with pip, you should be able to launch TensorBoard from the command line:
-->

```
tensorboard --logdir=/full_path_to_your_logs
```

#### 使用 TensorBoard 回调
<!--
#### Using the TensorBoard callback
-->

通过 Keras 模型和 `fit` 方法使用 TensorBoard 最容易的方法是使用 `TensorBoard` 回调。
<!--
The easiest way to use TensorBoard with a Keras model and the `fit` method is the `TensorBoard` callback.
-->

最简单的情形只须指定你希望回调将日志保存在何处，即：
<!--
In the simplest case, just specify where you want the callback to write logs, and you're good to go:
-->

```python
tensorboard_cbk = keras.callbacks.TensorBoard(log_dir='/full_path_to_your_logs')
model.fit(dataset, epochs=10, callbacks=[tensorboard_cbk])
```

`TensorBoard` 回调有很多有用的选项，包括是否记录嵌套、直方图，以及保存日志的频率：
<!--
The `TensorBoard` callback has many useful options, including whether to log embeddings, histograms, and how often to write logs:
-->

```python
keras.callbacks.TensorBoard(
    log_dir='/full_path_to_your_logs',
    histogram_freq=0,  # How often to log histogram visualizations
    embeddings_freq=0,  # How often to log embedding visualizations
    update_freq='epoch')  # How often to write logs (default: once per epoch)
```



## 第二部分：从零开始编写你自己的训练和评估循环
<!--
## Part II: Writing your own training & evaluation loops from scratch
-->

如果你想让你的训练和评估循环比 `fit()` 和 `evaluate()` 所提供的更加底层，你应当自己动手编写。实际上这非常简单！不过你需要自己做更多的 debug。
<!--
If you want lower-level over your training & evaluation loops than what `fit()` and `evaluate()` provide, you should write your own. It's actually pretty simple! But you should be ready to have a lot more debugging to do on your own.
-->

### 使用 GradientTape：第一个端到端的例子
<!--
### Using the GradientTape: a first end-to-end example
-->

在 `GradientTape` 内调用一个模型，你将可以得到损失函数关于层的可训练参数的梯度。使用优化器实例，你可以利用这些梯度值来更行这些可训练参数的数值（这些可训练参数可以通过 `model.trainable_weights` 得到）。
<!--
Calling a model inside a `GradientTape` scope enables you to retrieve the gradients of the trainable weights of the layer with respect to a loss value. Using an optimizer instance, you can use these gradients to update these variables (which you can retrieve using `model.trainable_weights`).
-->

让我们继续使用第一部分最开始的 MNIST 模型，并在训练循环中使用小批次梯度来训练它。
<!--
Let's reuse our initial MNIST model from Part I, and let's train it using mini-batch gradient with a training loop.
-->

In [39]:
# Get the model.
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)

# Instantiate an optimizer.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Prepare the training dataset.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

训练若干周期：
<!--
Run a training loop for a few epochs:
-->

In [40]:
epochs = 3
for epoch in range(epochs):
    print('Start of epoch %d' % (epoch,))

    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

        # Open a GradientTape to record the operations run
        # during the forward pass, which enables autodifferentiation.
        with tf.GradientTape() as tape:

            # Run the forward pass of the layer.
            # The operations that the layer applies
            # to its inputs are going to be recorded
            # on the GradientTape.
            logits = model(x_batch_train, training=True)  # Logits for this minibatch

            # Compute the loss value for this minibatch.
            loss_value = loss_fn(y_batch_train, logits)

        # Use the gradient tape to automatically retrieve
        # the gradients of the trainable variables with respect to the loss.
        grads = tape.gradient(loss_value, model.trainable_weights)

        # Run one step of gradient descent by updating
        # the value of the variables to minimize the loss.
        optimizer.apply_gradients(zip(grads, model.trainable_weights))

        # Log every 200 batches.
        if step % 200 == 0:
            print(
                'Training loss (for one batch) at step %s: %s' % 
                (step, float(loss_value))
            )
            print('Seen so far: %s samples' % ((step + 1) * 64))

Start of epoch 0
Training loss (for one batch) at step 0: 2.318831443786621
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.2238218784332275
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.1913812160491943
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 2.0636324882507324
Seen so far: 38464 samples
Start of epoch 1
Training loss (for one batch) at step 0: 2.003420829772949
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.873820185661316
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.9026579856872559
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 1.688047170639038
Seen so far: 38464 samples
Start of epoch 2
Training loss (for one batch) at step 0: 1.6979117393493652
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.5427498817443848
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.5922415256500244
Seen so far: 256

### 指标的底层操作
<!--
### Low-level handling of metrics
-->

让我们接着添加指标。在上面这样从零开始编写的训练循环中，你仍然可以使用内置的（或者是你自定义的）指标。具体过程如下：
<!--
Let's add metrics to the mix. You can readily reuse the built-in metrics (or custom ones you wrote) in such training loops written from scratch. Here's the flow:
-->

- 在循环开始时初始化指标
- 在每批次训练完后调用 `metric.update_state()`
- 在你需要显示指标当前的值时，调用 `metric.result()`
- 在你需要清除指标的状态时（通常是在一个周期结束后），调用 `metric.reset_states()`

<!--
- Instantiate the metric at the start of the loop
- Call `metric.update_state()` after each batch
- Call `metric.result()` when you need to display the current value of the metric
- Call `metric.reset_states()` when you need to clear the state of the metric (typically at the end of an epoch)
-->

基于这些知识，让我们在每个周期结束后在验证数据集上计算 `SparseCategoricalAccuracy`：
<!--
Let's use this knowledge to compute `SparseCategoricalAccuracy` on validation data at the end of each epoch:
-->

In [41]:
# Get model
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)

# Instantiate an optimizer to train the model.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Prepare the metrics.
train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()

# Prepare the training dataset.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

# Prepare the validation dataset.
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(64)

训练若干周期：
<!--
Run a training loop for a few epochs:
-->

In [42]:
epochs = 3
for epoch in range(epochs):
    print('Start of epoch %d' % (epoch,))

    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
        with tf.GradientTape() as tape:
            logits = model(x_batch_train)
            loss_value = loss_fn(y_batch_train, logits)
        grads = tape.gradient(loss_value, model.trainable_weights)
        optimizer.apply_gradients(zip(grads, model.trainable_weights))

        # Update training metric.
        train_acc_metric(y_batch_train, logits)

        # Log every 200 batches.
        if step % 200 == 0:
            print(
                'Training loss (for one batch) at step %s: %s' % 
                (step, float(loss_value))
            )
            print('Seen so far: %s samples' % ((step + 1) * 64))

    # Display metrics at the end of each epoch.
    train_acc = train_acc_metric.result()
    print('Training acc over epoch: %s' % (float(train_acc),))
    # Reset training metrics at the end of each epoch
    train_acc_metric.reset_states()

    # Run a validation loop at the end of each epoch.
    for x_batch_val, y_batch_val in val_dataset:
        val_logits = model(x_batch_val)
        # Update val metrics
        val_acc_metric(y_batch_val, val_logits)
    val_acc = val_acc_metric.result()
    val_acc_metric.reset_states()
    print('Validation acc: %s' % (float(val_acc),))

Start of epoch 0
Training loss (for one batch) at step 0: 2.322082996368408
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.1729562282562256
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.1619951725006104
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 2.1269915103912354
Seen so far: 38464 samples
Training acc over epoch: 0.289000004529953
Validation acc: 0.45239999890327454
Start of epoch 1
Training loss (for one batch) at step 0: 2.0694031715393066
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.9236276149749756
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.8952229022979736
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 1.7053300142288208
Seen so far: 38464 samples
Training acc over epoch: 0.5364800095558167
Validation acc: 0.6445000171661377
Start of epoch 2
Training loss (for one batch) at step 0: 1.6053411960601807
Seen so far: 64 samples
Traini

### 对额外损失函数的底层操作
<!--
### Low-level handling of extra losses
-->

在前一节中，你已经知道我们可以通过在层的 `call` 方法中调用 `self.add_loss(value)` 来附加正则化损失。
<!--
You saw in the previous section that it is possible for regularization losses to be added by a layer by calling `self.add_loss(value)` in the `call` method.
-->

一般而言，你需要在训练循环中将这些损失考虑在内（除非你自己编写模型，并且你已经知道该模型不存在这种损失）。
<!--
In the general case, you will want to take these losses into account in your training loops (unless you've written the model yourself and you already know that it creates no such losses).
-->

回忆一下上一节中的这个例子（它定义了一个计算正则化损失的层）：
<!--
Recall this example from the previous section, featuring a layer that creates a regularization loss:
-->


In [43]:
class ActivityRegularizationLayer(layers.Layer):

    def call(self, inputs):
        self.add_loss(1e-2 * tf.reduce_sum(inputs))
        return inputs

inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
# Insert activity regularization as a layer
x = ActivityRegularizationLayer()(x)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)

当你像这样调用一个模型时：
<!--
When you call a model, like this:
-->

```python
logits = model(x_train)
```

上述正则化层在前向传播过程中带来的损失将被加到 `model.losses` 属性上：
<!--
the losses it creates during the forward pass are added to the `model.losses` attribute:
-->

In [44]:
logits = model(x_train[:64])
print(model.losses)

[<tf.Tensor: shape=(), dtype=float32, numpy=7.7824626>]


模型记录的损失在调用其 `__call__` 方法时将首先被重置，因此你只能看到在这*一次*前向传播中产生的损失。例如，反复调用模型，然后查询 `losses`，程序将只会显示最后的损失函数值，也就是最后一次调用过程产生的损失值：
<!--
The tracked losses are first cleared at the start of the model `__call__`, so you will only see the losses created during this one forward pass. For instance, calling the model repeatedly and then querying `losses` only displays the latest losses, created during the last call:
-->

In [45]:
logits = model(x_train[:64])
logits = model(x_train[64: 128])
logits = model(x_train[128: 192])
print(model.losses)

[<tf.Tensor: shape=(), dtype=float32, numpy=7.685816>]


要想在训练过程中将这些损失全部考虑在内，你需要做的就是修改你的训练循环，将 `sum(model.losses)` 加到你的总损失中：
<!--
To take these losses into account during training, all you have to do is to modify your training loop to add `sum(model.losses)` to your total loss:
-->

In [46]:
optimizer = keras.optimizers.SGD(learning_rate=1e-3)

epochs = 3
for epoch in range(epochs):
    print('Start of epoch %d' % (epoch,))

    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
        with tf.GradientTape() as tape:
            logits = model(x_batch_train)
            loss_value = loss_fn(y_batch_train, logits)

            # Add extra losses created during this forward pass:
            loss_value += sum(model.losses)

        grads = tape.gradient(loss_value, model.trainable_weights)
        optimizer.apply_gradients(zip(grads, model.trainable_weights))

        # Log every 200 batches.
        if step % 200 == 0:
            print(
                'Training loss (for one batch) at step %s: %s' % 
                (step, float(loss_value))
            )
            print('Seen so far: %s samples' % ((step + 1) * 64))

Start of epoch 0
Training loss (for one batch) at step 0: 9.751727104187012
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.4815170764923096
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.39096736907959
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 2.354135274887085
Seen so far: 38464 samples
Start of epoch 1
Training loss (for one batch) at step 0: 2.3341073989868164
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.332087278366089
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.324887275695801
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 2.30796480178833
Seen so far: 38464 samples
Start of epoch 2
Training loss (for one batch) at step 0: 2.3236911296844482
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.313704490661621
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.3197813034057617
Seen so far: 25664 sa

这便是拼图的最后一块！你已经来到了本指南的最后。
<!--
That was the last piece of the puzzle! You've reached the end of this guide.
-->

现在你已经知道了使用内置训练循环和自己从零开始编写训练循环的方法~
<!--
Now you know everything there is to know about using built-in training loops and writing your own from scratch.
-->
