模型进度可在训练期间和之后保存。这意味着，您可以从上次暂停的地方继续训练模型，避免训练时间过长。此外，可以保存意味着您可以分享模型，而他人可以对您的工作成果进行再创作发布研究模型和相关技术时，大部分机器学习从业者会分享以下内容：

- 用于创建模型的代码，以及
- 模型的训练权重或参数

分享此类数据有助于他人了解模型的工作原理并尝试使用新数据自行尝试模型。

> 注意：请谨慎使用不可信的代码 - TensorFlow模型就是代码。有关详情，请参阅安全地使用TensorFlow。

#### 选项
您可以通过多种不同的方法保存TensorFlow模型，具体取决于您使用的API。本指南使用的是tf.keras，它是一种用于在TensorFlow中构建和训练模型的高阶API。要了解其他方法，请参阅TensorFlow 保存和恢复指南或在Eager中保存。

#### 设置
###### 安装和导入
安装并导入TensorFlow和依赖项：

##### 获取示例数据集
我们将使用MNIST数据集训练模型，以演示如何保存权重。要加快演示运行速度，请仅使用前1000个样本：

In [1]:
from __future__ import absolute_import,division,print_function

import os
import tensorflow as tf 
from tensorflow import keras

tf.__version__

'1.13.1'

In [2]:
(train_images,train_labels),(test_images,test_labels) = tf.keras.datasets.mnist.load_data()

train_labels = train_labels[:1000]
test_labels = test_labels[:1000]

train_images = train_images[:1000].reshape(-1,28*28) /255.0
test_images = train_images[:1000].reshape(-1,28*28) /255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


#### 定义模型
我们来构建一个简单的模型，以演示如何保存和加载权重。

In [7]:
# Returns a short sequential model
def create_model():
    model = tf.keras.models.Sequential([
        keras.layers.Dense(512,activation=tf.nn.relu,input_shape=(784,)),
        keras.layers.Dropout(0.2),
        keras.layers.Dense(10,activation=tf.nn.softmax)
    ])
    
    model.compile(optimizer=tf.keras.optimizers.Adam(),
                 loss=tf.keras.losses.sparse_categorical_crossentropy,
                 metrics=['accuracy'])
    return model

# Create a basic modelcreate_model
model = create_model()
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_2 (Dense)              (None, 512)               401920    
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                5130      
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________


#### 在训练期间保存检查点
主要用例是，在训练期间或训练结束时自动保存检查点。这样一来，您便可以使用经过训练的模型，而无需重新训练该模型，或从上次暂停的地方继续训练，以防训练过程中断。

`tf.keras.callbacks.ModelCheckpoint` 是执行此任务的回调。该回调需要几个参数来配置检查点。

检查点回调用法
训练模型，并将`ModelCheckpoint`回调传递给该模型：

In [20]:
checkpoint_path = "./model/training_1/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

#Create checkpoint callback
cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path,
                                                 save_weights_only=True,
                                                verbose=1)

model = create_model()
model.fit(train_images,train_labels,epochs=10,
         validation_data=(test_images,test_labels),
         callbacks=[cp_callback])

Train on 1000 samples, validate on 1000 samples
Epoch 1/10
Epoch 00001: saving model to ./model/training_1/cp.ckpt

Consider using a TensorFlow optimizer from `tf.train`.
Epoch 2/10
Epoch 00002: saving model to ./model/training_1/cp.ckpt

Consider using a TensorFlow optimizer from `tf.train`.
Epoch 3/10
Epoch 00003: saving model to ./model/training_1/cp.ckpt

Consider using a TensorFlow optimizer from `tf.train`.
Epoch 4/10
Epoch 00004: saving model to ./model/training_1/cp.ckpt

Consider using a TensorFlow optimizer from `tf.train`.
Epoch 5/10
Epoch 00005: saving model to ./model/training_1/cp.ckpt

Consider using a TensorFlow optimizer from `tf.train`.
Epoch 6/10
Epoch 00006: saving model to ./model/training_1/cp.ckpt

Consider using a TensorFlow optimizer from `tf.train`.
Epoch 7/10
Epoch 00007: saving model to ./model/training_1/cp.ckpt

Consider using a TensorFlow optimizer from `tf.train`.
Epoch 8/10
Epoch 00008: saving model to ./model/training_1/cp.ckpt

Consider using a Tensor

<tensorflow.python.keras.callbacks.History at 0x1c8d988fa58>

上述代码将创建一个TensorFlow检查点文件集合，这些文件在每个周期结束时更新：

创建一个未经训练的全新模型。仅通过权重恢复模型时，您必须有一个与原始模型架构相同的模型。由于模型架构相同，因此我们可以分享权重（尽管是不同的模型实例）。

现在，重新构建一个未经训练的全新模型，并用测试集对其进行评估。未训练模型的表现有很大的偶然性（准确率约为10％）：

In [21]:
model = create_model()

loss,acc = model.evaluate(test_images,test_labels)
print("Untrained model,accuracy:{:5.2f}%".format(100*acc))

Untrained model,accuracy:10.70%


In [22]:
model.load_weights(checkpoint_path)
loss,acc = model.evaluate(test_images,test_labels)
print("Untrained model,accuracy:{:5.2f}%".format(100*acc))

Untrained model,accuracy:13.20%


#### 检查点回调选项
该回调提供了多个选项，用于为生成的检查点提供独一无二的名称，以及调整检查点创建频率。

训练一个新模型，每隔5个周期保存一次检查点并设置唯一名称：

In [23]:
# inclue the epoch in the file name (uses 'str.format')
checkpoint_path = "./model/training_2/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(
                checkpoint_path,verbose=1,save_weights_only=True,
                #Save weights, every 5-epochs.
                period=5)

model = create_model()
model.fit(train_images,train_labels,
         epochs= 50,callbacks=[cp_callback],
         validation_data=(test_images,test_labels),
         verbose=0)


Epoch 00005: saving model to ./model/training_2/cp-0005.ckpt

Consider using a TensorFlow optimizer from `tf.train`.

Epoch 00010: saving model to ./model/training_2/cp-0010.ckpt

Consider using a TensorFlow optimizer from `tf.train`.

Epoch 00015: saving model to ./model/training_2/cp-0015.ckpt

Consider using a TensorFlow optimizer from `tf.train`.

Epoch 00020: saving model to ./model/training_2/cp-0020.ckpt

Consider using a TensorFlow optimizer from `tf.train`.

Epoch 00025: saving model to ./model/training_2/cp-0025.ckpt

Consider using a TensorFlow optimizer from `tf.train`.

Epoch 00030: saving model to ./model/training_2/cp-0030.ckpt

Consider using a TensorFlow optimizer from `tf.train`.

Epoch 00035: saving model to ./model/training_2/cp-0035.ckpt

Consider using a TensorFlow optimizer from `tf.train`.

Epoch 00040: saving model to ./model/training_2/cp-0040.ckpt

Consider using a TensorFlow optimizer from `tf.train`.

Epoch 00045: saving model to ./model/training_2/cp-0045

<tensorflow.python.keras.callbacks.History at 0x1c8c11cdb00>

In [24]:
latest = tf.train.latest_checkpoint(checkpoint_dir)
latest

'./model/training_2\\cp-0050.ckpt'

要进行测试，请重置模型并加载最新的检查点：

In [25]:
model = create_model()
model.load_weights(latest)
loss,acc = model.evaluate(test_images,test_labels)
print("Restored model,accuracy: {:5.2f}%".format(100*acc))

Restored model,accuracy: 13.20%
