## 【说明】
### 1、该文档在类编写的CNN模型上，对MNIST数据集进行模型的训练和测试，并在保存的模型的基础上：①实现基于保存的模型对测试集和验证集进行测试；②实现基于保存的模型再次训练
### 2、该文档中所使用的CNN模型使用了dropout技术和动态学习率技术，训练好的模型在验证集上最好的准确率是0.9762

# 一、导入相关函数库

In [3]:
import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from cnn_inference_by_class import ImageCnn

# 二、读入数据

In [5]:
mnist = input_data.read_data_sets('MNIST_data',one_hot=True)

Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz


# 三、首次训练过程

## 1、模型搭建

### CNN模型的前向传播过程、损失函数和精度均已在 ImageCnn() 类中实现，后期只需直接调用即可

In [6]:
tf.reset_default_graph()
cnn = ImageCnn()

## 2、定义优化函数

### CNN模型的优化目标，即损失函数，已经在 ImageCnn() 类中实现，可直接调用即可。故此处只需定义优化函数即可

In [7]:
# 固定学习率——在刚开始搭建模型时，建议使用固定学习率，后期再进行优化
# learning_rate = 0.01 
global_step = tf.Variable(0,trainable=False)       # 需从0开始，否则在模型保存时模型的保存名称不准确

# 对学习率进行优化——采用指数衰减法调节学习率（在才开始训练时具有较大的学习率，随着训练的进行，学习率逐渐减小）
# 指数衰减法的不足：当指数趋于正无穷时，学习率会逐渐趋于0（）
learning_rate_basic = 1.5            # 基础学习率
decay_rate = 0.9                     # 衰减系数
decay_steps = 500                    # 衰减速度（最大迭代次数约为250000）

learning_rate = tf.train.exponential_decay(learning_rate=learning_rate_basic,
                                          global_step=global_step,
                                          decay_steps=decay_steps,
                                          decay_rate=decay_rate)

In [15]:
optimizer = tf.train.AdadeltaOptimizer(learning_rate).minimize(cnn.loss,global_step=global_step)

## 3、迭代训练（迭代更新学习参数）、模型保存

### 在训练过程中加入如下操作：
#### 1、每显示一次信息就保存一次模型
#### 2、每显示一次信息就对测试集进行一次测试

In [10]:
training_epochs = 3000                  # 迭代3000次
display_peochs = 100                    # 显示信息的时机（显示的内容需自行定义）
batch_size = 500                        # 参与每轮迭代的样本数

# 由于存储时，每张图片被转换为 1*784 的矩阵，现在需要将其变为一个 28*28 的矩阵，定义一个函数实现该功能
def data_reshape(data):
    return np.reshape(data,(-1,28,28,1))        # 不能使用 tf.reshape(), 否则会报错

# 保存模型操作
saver_1 = tf.train.Saver(max_to_keep=2)
saverdir = 'model/cnn_model_by_class/'                   # 模型保存地址

# 启动会话窗口，开始训练
with tf.Session() as sess_1:
    sess_1.run(tf.global_variables_initializer())              # 初始化所有变量
    
    for epoch in range(training_epochs):
        train_x,train_y = mnist.train.next_batch(batch_size)
        training_feed_dict = {
            cnn.input_x: data_reshape(train_x),
            cnn.input_y: train_y,
            cnn.dropout_keep_prob: 0.8
        }
        _,training_loss = sess_1.run([optimizer,cnn.loss],training_feed_dict)
        
        if (epoch+1) % display_peochs == 0:
            saver_1.save(sess_1,saverdir+'cnn_model_by_class.ckpt',global_step=global_step)
            
            test_x, test_y = mnist.test.images,mnist.test.labels                  # 提取测试数据集
            test_feed_dict = {
                cnn.input_x:data_reshape(test_x),
                cnn.input_y:test_y,
                cnn.dropout_keep_prob:1.0
            }
            test_accuracy = sess_1.run(cnn.accuracy,test_feed_dict) # 计算当前训练好的模型的精度
            
            print("After {} epochs, loss on training data is {}, accuracy on test is {}".format(epoch+1,training_loss,test_accuracy))
        

After 100 epochs, loss on training data is 2.1094846725463867, accuracy on test is 0.7879999876022339
After 200 epochs, loss on training data is 1.024778127670288, accuracy on test is 0.8489000201225281
After 300 epochs, loss on training data is 0.9430018067359924, accuracy on test is 0.8773000240325928
After 400 epochs, loss on training data is 0.63228440284729, accuracy on test is 0.9041000008583069
After 500 epochs, loss on training data is 0.6035710573196411, accuracy on test is 0.9204000234603882
After 600 epochs, loss on training data is 0.42371881008148193, accuracy on test is 0.9282000064849854
After 700 epochs, loss on training data is 0.3792960047721863, accuracy on test is 0.9301999807357788
After 800 epochs, loss on training data is 0.30904126167297363, accuracy on test is 0.944100022315979
After 900 epochs, loss on training data is 0.19652630388736725, accuracy on test is 0.9491999745368958
After 1000 epochs, loss on training data is 0.2855652868747711, accuracy on test is

# 四、基于保存的模型对测试集再次进行验证

#### 基于保存的模型进行验证或预测时，模型的结构必须和训练时模型的结构一致。此处，可直接复用 cnn = ImageCnn() 来实现上述要求

In [14]:
saver_2 = tf.train.Saver(max_to_keep=2)          # 再次创建一个保存的操作
saverdir = 'model/cnn_model_by_class/'           # 保存模型的文件目录
newest_model = tf.train.latest_checkpoint(saverdir)

# 同样需要对数据进行重塑
def data_reshape(data):
    return np.reshape(data,(-1,28,28,1))        # 不能使用 tf.reshape(), 否则会报错

# 重新启动一个会话
with tf.Session() as sess_2:
    saver_2.restore(sess_2,newest_model)        # 将保存的模型恢复到当前图中
    
    test_x, test_y = mnist.test.images, mnist.test.labels
    test_feed_dict = {
                cnn.input_x:data_reshape(test_x),
                cnn.input_y:test_y,
                cnn.dropout_keep_prob:1.0
            }
    test_accuracy = sess_2.run(cnn.accuracy,test_feed_dict) # 计算训练好的模型在测试集上的精度
    
    validation_x, validation_y = mnist.validation.images, mnist.validation.labels
    validation_feed_dict = {
                cnn.input_x:data_reshape(validation_x),
                cnn.input_y:validation_y,
                cnn.dropout_keep_prob:1.0
            }
    validation_accuracy = sess_2.run(cnn.accuracy,validation_feed_dict) # 计算训练好的模型在测试集上的精度
            
    print("Accuracy on test data is {}, accuracy on validation data is {}".format(test_accuracy,validation_accuracy))

INFO:tensorflow:Restoring parameters from model/cnn_model_by_class/cnn_model_by_class.ckpt-3300
Accuracy on test data is 0.9750999808311462, accuracy on validation data is 0.9761999845504761


# 五、基于保存的模型再次进行训练

#### 基于保存的模型进行再次训练时，模型的结构也必须和首次训练的模型结构一致。此处，可直接复用 cnn = ImageCnn() 来实现上述要求

In [13]:
retraining_epochs = 3000                  # 再次训练时的迭代次数
retraining_display_peochs = 100           # 再次训练时显示信息的时机
retraining_batch_size = 500               # 再次训练时，参加每轮迭代的样本数

# 同首次训练时一样，也需要对原始数据进行重塑
def data_reshape(data):
    return np.reshape(data,(-1,28,28,1))

# 创建一个新的 saver 实例，用于恢复之前的模型和保存新训练好的模型
saver_3 = tf.train.Saver(max_to_keep=2)
saverdir = 'model/cnn_model_by_class/'
saved_model = tf.train.latest_checkpoint(saverdir)

# 重启一个新的会话窗口
with tf.Session() as sess_3:
    saver_3.restore(sess_3,saved_model)
    
    for epoch in range(retraining_epochs):
        train_x, train_y = mnist.train.next_batch(retraining_batch_size)
        retraining_feed_dict = {
            cnn.input_x: data_reshape(train_x),
            cnn.input_y: train_y,
            cnn.dropout_keep_prob: 0.8
        }
        _,retraining_loss = sess_3.run([optimizer,cnn.loss],retraining_feed_dict)
        
        if (epoch+1) % retraining_display_peochs == 0:
            test_x, test_y = mnist.test.images,mnist.test.labels                  # 提取测试数据集
            test_feed_dict = {
                cnn.input_x:data_reshape(test_x),
                cnn.input_y:test_y,
                cnn.dropout_keep_prob:1.0
            }
            test_accuracy = sess_3.run(cnn.accuracy,test_feed_dict) # 计算当前训练好的模型的精度

            saver_3.save(sess_3,saverdir+'cnn_model_by_class.ckpt',global_step=global_step)
            print("After {} epochs, loss on training data is {}, accuracy on test is {}".format(epoch+1,retraining_loss,test_accuracy))


INFO:tensorflow:Restoring parameters from model/cnn_model_by_class/cnn_model_by_class.ckpt-3000
After 100 epochs, loss on training data is 0.09418675303459167, accuracy on test is 0.9754999876022339
After 200 epochs, loss on training data is 0.13629505038261414, accuracy on test is 0.9739000201225281
After 300 epochs, loss on training data is 0.06830572336912155, accuracy on test is 0.9750999808311462
