## 正则化

避免过拟合。

> 如何避免过拟合

- 早期停止
- L1和L2正则



In [35]:
import tensorflow as tf
import numpy as np
import matplotlib
from matplotlib import pyplot as plt

import os
# 使用gpu
# (useful when running multiple experiments in parallel, on different GPUs):
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

# 随机种子:
random_seed = 42

# 定制日志格式:
log_begin_red, log_begin_blue, log_begin_green = '\033[91m', '\033[94m', '\033[92m'
log_begin_bold, log_begin_underline = '\033[1m', '\033[4m'
log_end_format = '\033[0m'


# 准备数据

我们再次使用 [MNIST](http://yann.lecun.com/exdb/mnist) 数据[$^1$](#ref) 作为演示。
因此，我们按照之前笔记本中的方法准备数据:

In [36]:
num_classes = 10
img_rows, img_cols, img_ch = 28, 28, 1
input_shape = (img_rows, img_cols, img_ch)

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

x_train = x_train.reshape(x_train.shape[0], *input_shape)
x_test = x_test.reshape(x_test.shape[0], *input_shape)

这一次，为了突出正则化的优势，我们将通过人为 __减少可用于训练集数量__ 来增加识别任务的难度

In [37]:
# ... 200 training samples instead of 60,000...
x_train, y_train = x_train[:200], y_train[:200]

print('Training data: {}'.format(x_train.shape))
print('Testing data: {}'.format(x_test.shape))


Training data: (200, 28, 28, 1)
Testing data: (10000, 28, 28, 1)


## 训练一个带正则化的模型

根据本章介绍的代码，我们将首先演示如何实施和应用规则化。

然后，我们将展示如何直接使用Keras API使用标准正则化解决方案来训练模型 (*L1/L2*, *失活*, *批量标准*), 
用于比较效果.   

我们将使用 *LeNet-5*[$^2$](#ref) 和 MNIST 举例说明.

In [38]:
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import (Input, Activation, Dense, Flatten, Conv2D,
                                     MaxPooling2D, Dropout, BatchNormalization)

epochs = 200
batch_size = 32


### 手动处理正则化损失

为了演示如何将正则化损耗添加到任何层，我们将从我们在本书和之前的[note](./note2_实现第一个CNN.ipynb)中介绍的简单卷积层开始

In [39]:
@tf.function
def conv_layer(x, kernels, bias, s):
    z = tf.nn.conv2d(x, kernels, strides=[1, s, s, 1], padding='VALID')
    # Finally, applying the bias and activation function (e.g. ReLU):
    return tf.nn.relu(z + bias)

class SimpleConvolutionLayer(tf.keras.layers.Layer):

    def __init__(self, num_kernels=32, kernel_size=(3, 3), stride=1):
        """
        初始化 layer.
        :param num_kernels:     卷积核数量
        :param kernel_size:     核尺寸 (H x W)
        :param stride:          步长
        """
        # Then we assign the parameters:
        super().__init__()
        self.num_kernels = num_kernels
        self.kernel_size = kernel_size
        self.stride = stride

    def build(self, input_shape):
        """
        构建 layer, 根据输入形状初始化其参数.
        不过，第一次使用该层时，将在内部调用该函数，它也可以手动调用.
        :param input_shape: 输入图层将接收的形状(e.g. B x H x W x C)
        """

        #  获取 通道数量:
        num_input_ch = input_shape[-1]

       # 重新调整核的形状
        kernels_shape = (*self.kernel_size, num_input_ch, self.num_kernels)

        # 我们使用从Glorot分布中选取的值初始化过滤器:
        glorot_init = tf.initializers.GlorotUniform()

        self.kernels = self.add_weight(
            name='kernels', shape=kernels_shape, initializer=glorot_init,
            trainable=True)  # 可训练的变量

        # 使用B:
        self.bias = self.add_weight(
            name='bias', shape=(self.num_kernels,),
            initializer='random_normal', trainable=True)

    def call(self, inputs):
        """
        调用层并对输入张量执行其操作
        :param inputs:  Input tensor
        :return:        Output tensor
        """
        return conv_layer(inputs, self.kernels, self.bias, self.stride)

    def get_config(self):
        """
        辅助函数返回定义的层和参数信息.
        :return:        Dictionary containing the layer's configuration
        """
        return {'num_kernels': self.num_kernels,
                'kernel_size': self.kernel_size,
                'strides': self.strides,
                'use_bias': self.use_bias}


我们将扩展这个layer类来添加  内核/偏置数 正则化。 

书中所示，使用`Layer.add_loss()`实现：

In [40]:
from functools import partial


def l2_reg(coef=1e-2):
    """
    返回一个函数，该函数计算给定张量的加权L2范数.
    (this is basically a reimplementation of f.keras.regularizers.l2())
    :param coef:    系数-标准权重
    :return:        Loss function (损失函数)
    """
    return lambda x: tf.reduce_sum(x ** 2) * coef


class ConvWithRegularizers(SimpleConvolutionLayer):
    """
    带正则的卷积层
    """
    def __init__(self, num_kernels=32, kernel_size=(3, 3), stride=1,
                 kernel_regularizer=l2_reg(), bias_regularizer=None):
        """ 
        Initialize the layer.
        :param num_kernels:        卷积核的数量
        :param kernel_size:        核尺寸 (H x W)
        :param stride:             Vertical/horizontal 步长
        :param kernel_regularizer: (opt.) 核的损失函数
        :param bias_regularizer:   (opt.) bias 的损失函数
        """
        super().__init__(num_kernels, kernel_size, stride)
        self.kernel_regularizer = kernel_regularizer
        self.bias_regularizer = bias_regularizer

    def build(self, input_shape):
        """
        构建 layer, 初始化其组件.
        """
        super().build(input_shape)
        # Attaching the regularization losses to the variables.
        if self.kernel_regularizer is not None:
            self.add_loss(partial(self.kernel_regularizer, self.kernels))
        if self.bias_regularizer is not None:
            self.add_loss(partial(self.bias_regularizer, self.bias))


初始化这一层，正则化器 将 作为属性传递到这一层，   
无论何时，都可以获得这些正则化器的损失值，只需调用层的属性 `.losses`


In [41]:
conv = ConvWithRegularizers(num_kernels=32, kernel_size=(3, 3), stride=1,
                            kernel_regularizer=l2_reg(1.), bias_regularizer=l2_reg(1.))

conv.build(input_shape=tf.TensorShape((None, 28, 28, 1)))

# Fetching the layer's losses:
reg_losses = conv.losses
print('核参数和偏差参数的正则化损失: {}'.format(
    [loss.numpy() for loss in reg_losses]))

# 与核张量和偏张量的L2范数比较:
kernel_norm, bias_norm = tf.reduce_sum(
    conv.kernels ** 2).numpy(), tf.reduce_sum(conv.bias ** 2).numpy()
print('核参数和偏差参数的L2参数: {}'.format(
    [kernel_norm, bias_norm]))


核参数和偏差参数的正则化损失: [1.9108772, 0.07420379]
核参数和偏差参数的L2参数: [1.9108772, 0.07420379]


The neat thing with the property `.losses` is that it also list the losses attached to all the layers and  models composing an instance. For example:

In [42]:
model = Sequential([
    Input(shape=input_shape),
    ConvWithRegularizers(kernel_regularizer=l2_reg(1.),
                         bias_regularizer=l2_reg(1.)),
    ConvWithRegularizers(kernel_regularizer=l2_reg(1.),
                         bias_regularizer=l2_reg(1.)),
    ConvWithRegularizers(kernel_regularizer=l2_reg(1.),
                         bias_regularizer=l2_reg(1.))
])

print('Losses 函数附加到模型以及他们的layers :\n\r{} ({} losses)'.format(
    [loss.numpy() for loss in model.losses], len(model.losses)))


Losses 函数附加到模型以及他们的layers :
[2.0546362, 0.06137651, 32.179474, 0.07327281, 32.16109, 0.105349846] (6 losses)


In [43]:
class LeNet5(Model):  # `Model` has the same API as `Layer` + extends it

    def __init__(self, num_classes,
                 kernel_regularizer=l2_reg(), bias_regularizer=l2_reg()):
        # Create the model and its layers:
        super(LeNet5, self).__init__()
        self.conv1 = ConvWithRegularizers(
            6, kernel_size=(5, 5),
            kernel_regularizer=kernel_regularizer, bias_regularizer=bias_regularizer)
        self.conv2 = ConvWithRegularizers(
            16, kernel_size=(5, 5),
            kernel_regularizer=kernel_regularizer, bias_regularizer=bias_regularizer)
        self.max_pool = MaxPooling2D(pool_size=(2, 2))
        self.flatten = Flatten()
        self.dense1 = Dense(120, activation='relu')
        self.dense2 = Dense(84, activation='relu')
        self.dense3 = Dense(num_classes, activation='softmax')

    def call(self, x):  # Apply the layers in order to process the inputs
        x = self.max_pool(self.conv1(x))  # 1st block
        x = self.max_pool(self.conv2(x))  # 2nd block
        x = self.flatten(x)
        x = self.dense3(self.dense2(self.dense1(x)))  # dense layers
        return x


In [44]:
optimizer = tf.optimizers.SGD()
dataset = tf.data.Dataset.from_tensor_slices(
    (x_train, y_train)).batch(batch_size)
log_string_template = 'Epoch {0:3}/{1}: main loss = {5}{2:5.3f}{6}; ' + \
                      'reg loss = {5}{3:5.3f}{6}; val acc = {5}{4:5.3f}%{6}'


def train_classifier_on_mnist(model, log_frequency=10):

    avg_main_loss = tf.keras.metrics.Mean(
        name='avg_main_loss', dtype=tf.float32)
    avg_reg_loss = tf.keras.metrics.Mean(name='avg_reg_loss', dtype=tf.float32)

    print("Training: {}start{}".format(log_begin_red, log_end_format))
    for epoch in range(epochs):
        for (batch_images, batch_gts) in dataset:    # For each batch of this epoch

            with tf.GradientTape() as grad_tape:     # Tell TF to tape the gradients
                y = model(batch_images)              # Feed forward
                main_loss = tf.losses.sparse_categorical_crossentropy(
                    batch_gts, y)                    # Compute loss
                # List and add other losses
                reg_loss = sum(model.losses)
                loss = main_loss + reg_loss

            # Get the gradients of combined losses and back-propagate:
            grads = grad_tape.gradient(loss, model.trainable_variables)
            optimizer.apply_gradients(zip(grads, model.trainable_variables))

            # Keep track of losses for display:
            avg_main_loss.update_state(main_loss)
            avg_reg_loss.update_state(reg_loss)

        # Log some metrics
        if epoch % log_frequency == 0 or epoch == (epochs - 1):
            # Validate, computing the accuracy on test data:
            acc = tf.reduce_mean(tf.metrics.sparse_categorical_accuracy(
                tf.constant(y_test), model(x_test))).numpy() * 100

            main_loss = avg_main_loss.result()
            reg_loss = avg_reg_loss.result()

            print(log_string_template.format(
                epoch, epochs, main_loss, reg_loss, acc, log_begin_blue, log_end_format))

        avg_main_loss.reset_states()
        avg_reg_loss.reset_states()
    print("Training: {}end{}".format(log_begin_green, log_end_format))
    return model


model = LeNet5(10, kernel_regularizer=l2_reg(), bias_regularizer=l2_reg())
model = train_classifier_on_mnist(model, log_frequency=10)


Training: [91mstart[0m
Epoch   0/200: main loss = [94m2.260[0m; reg loss = [94m0.106[0m; val acc = [94m25.310%[0m
Epoch  10/200: main loss = [94m0.805[0m; reg loss = [94m0.151[0m; val acc = [94m69.140%[0m
Epoch  20/200: main loss = [94m0.007[0m; reg loss = [94m0.103[0m; val acc = [94m82.550%[0m
Epoch  30/200: main loss = [94m0.006[0m; reg loss = [94m0.063[0m; val acc = [94m82.940%[0m
Epoch  40/200: main loss = [94m0.006[0m; reg loss = [94m0.045[0m; val acc = [94m83.110%[0m
Epoch  50/200: main loss = [94m0.006[0m; reg loss = [94m0.036[0m; val acc = [94m83.400%[0m
Epoch  60/200: main loss = [94m0.005[0m; reg loss = [94m0.031[0m; val acc = [94m83.600%[0m
Epoch  70/200: main loss = [94m0.005[0m; reg loss = [94m0.028[0m; val acc = [94m83.580%[0m
Epoch  80/200: main loss = [94m0.004[0m; reg loss = [94m0.025[0m; val acc = [94m83.680%[0m
Epoch  90/200: main loss = [94m0.004[0m; reg loss = [94m0.023[0m; val acc = [94m83.790%[0m
Epoch

有趣的是，首先，正则化损失增加，而分类损失减少。
由于后者的值一开始要高得多，网络基本上专注于最小化它，而不管它的内核/偏差值是多少。
一旦分类下降到足够低的水平，那么规则化损失也开始被考虑在内。

让我们将正则化网络的精度与没有这些项的网络进行比较：

In [45]:
model = LeNet5(10, kernel_regularizer=None, bias_regularizer=None)
model = train_classifier_on_mnist(model, log_frequency=50)


Training: [91mstart[0m
Epoch   0/200: main loss = [94m2.250[0m; reg loss = [94m0.000[0m; val acc = [94m23.310%[0m
Epoch  50/200: main loss = [94m0.000[0m; reg loss = [94m0.000[0m; val acc = [94m83.120%[0m
Epoch 100/200: main loss = [94m0.000[0m; reg loss = [94m0.000[0m; val acc = [94m83.150%[0m
Epoch 150/200: main loss = [94m0.000[0m; reg loss = [94m0.000[0m; val acc = [94m83.250%[0m
Epoch 199/200: main loss = [94m0.000[0m; reg loss = [94m0.000[0m; val acc = [94m83.330%[0m
Training: [92mend[0m


测试集的精度提高了很少%，这是不容忽视的！

使用 **`add_loss()`** 和 **`.losses`**  方法， 损失是这个实验的主要差异 , 因为它们可以用于更复杂的模型, 例如，当我们想要应用特定于层的损耗时.



## 应用各种预先实现的正则化方法

除了L1/L2正则化，本章还介绍了其他方法


完全切换到KerasAPI，我们将试验这些方法，并快速比较它们对我们的玩具用例的影响。


为此，让我们创建另一个_LeNet-5_工厂函数（这次使用顺序API。只是为了说明区别）



In [46]:
def lenet(name='lenet', input_shape=input_shape,
          use_dropout=False, use_batchnorm=False, regularizer=None):
    """
    Create a LeNet-5 Keras model, with optional regularization schemes.
    :param name:           Name for the model
    :param input_shape:    Model's input shape
    :param use_dropout:    Flag to add Dropout layers after key layers
    :param use_batchnorm:  Flag to add BatchNormalization layers after key layers
    :param regularizer:    Regularization function to be applied to layers' kernels
    :return:               LeNet-5 Keras model
    """

    layers = []

    # 1st block:
    layers += [Conv2D(6, kernel_size=(5, 5), padding='same',
                      input_shape=input_shape, kernel_regularizer=regularizer)]
    if use_batchnorm:
        layers += [BatchNormalization()]
    layers += [Activation('relu'),
               MaxPooling2D(pool_size=(2, 2))]
    if use_dropout:
        layers += [Dropout(0.25)]

    # 2nd block:
    layers += [
        Conv2D(16, kernel_size=(5, 5), kernel_regularizer=regularizer)]
    if use_batchnorm:
        layers += [BatchNormalization()]
    layers += [Activation('relu'),
               MaxPooling2D(pool_size=(2, 2))]
    if use_dropout:
        layers += [Dropout(0.25)]

    # Dense layers:
    layers += [Flatten()]

    layers += [Dense(120, kernel_regularizer=regularizer)]
    if use_batchnorm:
        layers += [BatchNormalization()]
    layers += [Activation('relu')]
    if use_dropout:
        layers += [Dropout(0.25)]

    layers += [Dense(84, kernel_regularizer=regularizer)]
    layers += [Activation('relu')]

    layers += [Dense(num_classes, activation='softmax')]

    model = Sequential(layers, name=name)
    return model


为了展示流星的 优化器（Tensorflow和Keras中提供）对训练的影响，

我们将创建几个类似的LeNet实例，并使用不同的正则化技术组合[$^{3,4,5}$]（#ref）对每个实例进行培训。

In [49]:
configurations = {
    'none':         {'use_dropout': False, 'use_batchnorm': False, 'regularizer': None},
    'l1':           {'use_dropout': False, 'use_batchnorm': False, 'regularizer': tf.keras.regularizers.l1(0.01)},
    'l2':           {'use_dropout': False, 'use_batchnorm': False, 'regularizer': tf.keras.regularizers.l2(0.01)},
    'dropout':      {'use_dropout': True,  'use_batchnorm': False, 'regularizer': None},
    'bn':           {'use_dropout': False, 'use_batchnorm': True,  'regularizer': None},
    # 'l1+dropout':   {'use_dropout': False, 'use_batchnorm': True,  'regularizer': tf.keras.regularizers.l1(0.01)},
    'l1+bn':        {'use_dropout': False, 'use_batchnorm': True,  'regularizer': tf.keras.regularizers.l1(0.01)},
    'l1+dropout+bn': {'use_dropout': False, 'use_batchnorm': True,  'regularizer': tf.keras.regularizers.l1(0.01)}
    # ...
}


对于我们正在考虑的每个正则化配置，
我们将实例化一个新的LeNet模型并使用它进行训练。我们将保存他们的训练“历史记录”（包含续联期间的损失和指标历史记录），以供比较*（此过程需要时间，尤其是在CPU上！）

In [48]:
history_per_instance = dict()

print("Experiment: {0}start{1} (training logs = off)".format(
    log_begin_red, log_end_format))
for config_name in configurations:
    # Resetting the seeds (for random number generation), to reduce the impact of randomness on the comparison:
    tf.random.set_seed(random_seed)
    np.random.seed(random_seed)
    # Creating the model:
    model = lenet("lenet_{}".format(config_name),
                  **configurations[config_name])
    model.compile(optimizer='sgd',
                  loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    # Launching the training (we set `verbose=0`, so the training won't generate any logs):
    print("\t> Training with {0}: {1}start{2}".format(
        config_name, log_begin_red, log_end_format))
    history = model.fit(x_train, y_train,
                        batch_size=32, epochs=300, validation_data=(x_test, y_test),
                        verbose=0)
    history_per_instance[config_name] = history
    print('\t> Training with {0}: {1}done{2}.'.format(
        config_name, log_begin_green, log_end_format))
print("Experiment: {0}done{1}".format(log_begin_green, log_end_format))


Experiment: [91mstart[0m (training logs = off)
	> Training with none: [91mstart[0m
	> Training with none: [92mdone[0m.
	> Training with l1: [91mstart[0m
	> Training with l1: [92mdone[0m.
	> Training with l2: [91mstart[0m
	> Training with l2: [92mdone[0m.
	> Training with dropout: [91mstart[0m
	> Training with dropout: [92mdone[0m.
	> Training with bn: [91mstart[0m
	> Training with bn: [92mdone[0m.
	> Training with l1+dropout: [91mstart[0m


ValueError: in user code:

    /Users/theone/anaconda3/envs/learn-tensorflow/lib/python3.6/site-packages/keras/engine/training.py:853 train_function  *
        return step_function(self, iterator)
    /Users/theone/anaconda3/envs/learn-tensorflow/lib/python3.6/site-packages/keras/engine/training.py:842 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    /Users/theone/anaconda3/envs/learn-tensorflow/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:1286 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /Users/theone/anaconda3/envs/learn-tensorflow/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:2849 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /Users/theone/anaconda3/envs/learn-tensorflow/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:3632 _call_for_each_replica
        return fn(*args, **kwargs)
    /Users/theone/anaconda3/envs/learn-tensorflow/lib/python3.6/site-packages/keras/engine/training.py:835 run_step  **
        outputs = model.train_step(data)
    /Users/theone/anaconda3/envs/learn-tensorflow/lib/python3.6/site-packages/keras/engine/training.py:787 train_step
        y_pred = self(x, training=True)
    /Users/theone/anaconda3/envs/learn-tensorflow/lib/python3.6/site-packages/keras/engine/base_layer.py:1028 __call__
        with tf.name_scope(name_scope):
    /Users/theone/anaconda3/envs/learn-tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py:6729 __enter__
        scope_name = scope.__enter__()
    /Users/theone/anaconda3/envs/learn-tensorflow/lib/python3.6/contextlib.py:81 __enter__
        return next(self.gen)
    /Users/theone/anaconda3/envs/learn-tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py:4279 name_scope
        raise ValueError("'%s' is not a valid scope name" % name)

    ValueError: 'lenet_l1+dropout/' is not a valid scope name
