# 回顾 mnist 例子

网络 层 损失函数 优化器之间的关系如下图:

![神经网络组件关系](./神经网络各个组件关系.png)


这一节将使用 tensorflow 实现 minist 的神经网络训练.并且与 keras 对比.


In [35]:
from tensorflow.keras import models
from tensorflow.keras import layers

model = models.Sequential([
    layers.Dense(512, activation='relu'),  #全连接层，512个节点
    layers.Dense(10, activation='softmax')  #全连接 输出层，10个节点，输出层全连接
])

这里是 keras 定义神经网络的起点,两个互相连接的 Dense 层.一个输出尺寸是 512 另一个是 10.激活函数一个是 relu 另一个是 softmax


In [36]:
import tensorflow as tf
import numpy as np


class NaiveDense:  #一个基本的Dense层
    def __init__(self, input_size, output_size, activation):
        self.activation = activation
        w_shape = (input_size, output_size
                   )  #创建矩阵 w 尺寸为 (input_size, output_size) 随机初始化
        w_initial_value = tf.random.uniform(w_shape, minval=0, maxval=1e-1)
        self.W = tf.Variable(w_initial_value)

        b_shape = (output_size, )  #创建矩阵 b 尺寸为 (output_size) 随机初始化
        b_initial_value = tf.zeros(b_shape)
        self.b = tf.Variable(b_initial_value)

    def __call__(self, inputs):
        return self.activation(tf.matmul(inputs, self.W) + self.b)  # 定义激活函数

    @property
    def weights(self):  #获取权重
        return [self.W, self.b]

在 tensorflow 中我们先实现定义一个 Dense 层.Dense 层的 `output = activation(dot(W, input) + b)`

- `__init__`: 输入/输出/激活函数.
  - 随机初始化 W 和 b.
- `__call__`: 输出
- `weights`: 将权重转换成属性,方便获取/修改.


In [37]:
class NaiveSequential:  #连接
    def __init__(self, layers):
        self.layers = layers

    def __call__(self, inputs):  #自上而下调用层
        x = inputs
        for layer in self.layers:
            x = layer(x)
        return x

    @property
    def weights(self):  #获取权重
        weights = []
        for layer in self.layers:
            weights += layer.weights
        return weights

定义了 Dense 层之后,我们还需要将这些层输入输出连接起来.这是 NaiveSequential 的功能.

- `__init__`: 所有神经网络层
- `__call__`: 输入 input 调用所有层的 call 方法,获取神经网络最后的输出.
- `weights`: 获取全部层的权重


In [38]:
model2 = NaiveSequential([
    NaiveDense(input_size=28 * 28, output_size=512, activation=tf.nn.relu),
    NaiveDense(input_size=512, output_size=10, activation=tf.nn.softmax)
])

assert len(model2.weights) == 4

终于到了与 `models.Sequential` 相同的定义了.

- 2 个 Dense 层,第一层输入 28*28 输出 512 ,第二层输入 512 输出 10.
- 第一层激活函数是 `relu` 第二层是 `softmax`


In [39]:
model.compile(  #编译模型
    optimizer='rmsprop',  #优化器
    loss='sparse_categorical_crossentropy',  #损失函数
    metrics=['accuracy'])  #指标

这部分是模型训练部分,写完才知道 keras 封装了 NN 多的细节.


In [40]:
from tensorflow.keras import optimizers

optimizer = optimizers.SGD(learning_rate=1e-3)

def update_weights(gradients, weights):  #更新权重
    optimizer.apply_gradients(zip(gradients, weights))

# learning_rate = 1e-3

# def update_weights(gradients, weights):
#     for g, w in zip(gradients, model.weights):
#         w.assign_sub(w * learning_rate)

更新权重即是将权重向梯度相反方向移动.移动的幅度与具体的优化器有关.

上文注释掉部分示例了一个最简单的权重更新,即每次向梯度反方向移动 w * learning_rate.

- learning_rate 即学习率,这里是取 1e-3

最简单的更新权重效果并不算很好,原书直接使用了 tensorflow 内置的优化器.


In [41]:
def one_training_step(model, images_batch, labels_batch):  #一次训练
    with tf.GradientTape() as tape:
        predictions = model(images_batch)  #计算预测值
        per_sample_losses = tf.keras.losses.sparse_categorical_crossentropy(
            labels_batch, predictions)  #计算每个样本的损失
        average_loss = tf.reduce_mean(per_sample_losses)  #计算平均损失
    gradients = tape.gradient(average_loss, model.weights)  #
    update_weights(gradients, model.weights)  #更新权重
    return average_loss  #返回平均的损失

终于到了一次具体的模型训练

- 输入是 模型/本轮的图像/本轮的标签
- 开始计算损失
  - 首先在模型上取预测值
  - 调用 tf 库计算每个样本的损失
  - 计算这个批次的平均损失
- 计算梯度
- 更新权重
- 返回平均损失.

In [42]:
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape(
    (60000, 28 * 28))  #把训练集的数据变成一个60000*784的数组)
train_images = train_images.astype('float32') / 255  #把数组元素转化为float32类型，值范围为0~1
test_images = test_images.reshape((10000, 28 * 28))  #把测试集的数据变成一个10000*784的数组)
test_images = test_images.astype('float32') / 255  #把数组元素转化为float32类型，值范围为0~1

In [43]:
class BatchGenerator:  #生成器 处理 mnist 数据返回每次训练需要的 mnist 数据s
    def __init__(self, images, labels, batch_size=128):
        self.index = 0
        self.images = images
        self.labels = labels
        self.batch_size = batch_size

    def next(self):
        images = self.images[self.index:self.index + self.batch_size]
        labels = self.labels[self.index:self.index + self.batch_size]
        self.index += self.batch_size
        return images, labels

准备数据,无论是 keras 或者 tf 都是相同的. tf 增加了 BatchGenerator 迭代器.


In [44]:
model.fit(train_images, train_labels, \
            epochs=5,batch_size=128)  #训练模型，训练5轮，每批128个样本

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x19b32b6eca0>

keras 开始训练模型


In [45]:
def fit(model, images, labels, epochs, batch_size=128):  #训练
    for epoch_counter in range(epochs):  #epochs个轮次
        print('Epoch %d' % epoch_counter)  #打印当前训练的轮次
        batch_generator = BatchGenerator(images, labels)  #训练数据获取
        for batch_counter in range(len(images) // batch_size):
            images_batch, labels_batch = batch_generator.next()  #本次训练数据
            loss = one_training_step(model, images_batch, labels_batch)  #一次训练
            if batch_counter % 100 == 0:
                print('loss at batch %d: %.2f' % (batch_counter, loss))  # 打印损失

fit(model2, train_images, train_labels, epochs=10, batch_size=128)

Epoch 0
loss at batch 0: 5.57
loss at batch 100: 2.24
loss at batch 200: 2.18
loss at batch 300: 2.09
loss at batch 400: 2.25
Epoch 1
loss at batch 0: 1.93
loss at batch 100: 1.88
loss at batch 200: 1.80
loss at batch 300: 1.71
loss at batch 400: 1.85
Epoch 2
loss at batch 0: 1.59
loss at batch 100: 1.58
loss at batch 200: 1.47
loss at batch 300: 1.42
loss at batch 400: 1.52
Epoch 3
loss at batch 0: 1.32
loss at batch 100: 1.34
loss at batch 200: 1.21
loss at batch 300: 1.20
loss at batch 400: 1.28
Epoch 4
loss at batch 0: 1.11
loss at batch 100: 1.16
loss at batch 200: 1.01
loss at batch 300: 1.04
loss at batch 400: 1.11
Epoch 5
loss at batch 0: 0.97
loss at batch 100: 1.02
loss at batch 200: 0.87
loss at batch 300: 0.92
loss at batch 400: 0.99
Epoch 6
loss at batch 0: 0.86
loss at batch 100: 0.91
loss at batch 200: 0.77
loss at batch 300: 0.83
loss at batch 400: 0.90
Epoch 7
loss at batch 0: 0.77
loss at batch 100: 0.83
loss at batch 200: 0.70
loss at batch 300: 0.76
loss at batch 40

fit 基本上是上面函数的调用了,流程看注释了.开始训练.


In [46]:
test_loss, test_acc = model.evaluate(test_images, test_labels)  #对测试集进行评估
test_acc  #测试集的准确率



0.9787999987602234

In [47]:
predictions = model2(test_images)
predictions = predictions.numpy()
predicted_labels = np.argmax(predictions, axis=1)
matches = predicted_labels == test_labels
print(f"accuracy: {matches.mean():.2f}")

accuracy: 0.84


tf 和 keras 上模型的在测试集的准确度... 谢天谢地有 keras 才不至于关注到 NN 多的细节.
