## 第七章 网络优化与正则化

### 1.<span style="color:red">(必修题)</span>
    在第五章（上）《卷积神经网络理论解读》中，我们基于LeNet网络实现了手写体数字识别实验。在本实践中，我们重复该实验，以最初的试验结果作为基线（Baseline），运用在本章内学到的网络优化方法进行调优，尝试提升模型精度。
    a) 实验baseline：给出代码，跑出baseline结果为0.895
    b) 尝试修改学习率、批大小、更换优化器、增加训练轮数、增加学习率衰减、学习率预热等策略来提升模型精度。

#### 数据准备

In [1]:
import json
import gzip

# 打印并观察数据集分布情况
train_set, dev_set, test_set = json.load(gzip.open('./mnist.json.gz'))
train_images, train_labels = train_set[0][:1000], train_set[1][:1000]
dev_images, dev_labels = dev_set[0][:200], dev_set[1][:200]
test_images, test_labels = test_set[0][:200], test_set[1][:200]
train_set, dev_set, test_set = [train_images, train_labels], [dev_images, dev_labels], [test_images, test_labels]
print('Length of train/dev/test set:{}/{}/{}'.format(len(train_set[0]), len(dev_set[0]), len(test_set[0])))

Length of train/dev/test set:1000/200/200


In [2]:
from paddle.vision.transforms import Compose, Resize, Normalize

# 数据预处理
transforms = Compose([Resize(32), Normalize(mean=[127.5], std=[127.5], data_format='CHW')])

In [3]:
import random
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import paddle.io as io

class MNIST_dataset(io.Dataset):
    def __init__(self, dataset, transforms, mode='train'):
        self.mode = mode
        self.transforms =transforms
        self.dataset = dataset

    def __getitem__(self, idx):
        # 获取图像和标签
        image, label = self.dataset[0][idx], self.dataset[1][idx]
        image, label = np.array(image).astype('float32'), int(label)
        image = np.reshape(image, [28,28])
        image = Image.fromarray(image.astype('uint8'), mode='L')
        image = self.transforms(image)

        return image, label

    def __len__(self):
        return len(self.dataset[0])

  from collections import MutableMapping
  from collections import Iterable, Mapping
  from collections import Sized


In [4]:
# 固定随机种子
random.seed(0)
# 加载 mnist 数据集
train_dataset = MNIST_dataset(dataset=train_set, transforms=transforms, mode='train')
test_dataset = MNIST_dataset(dataset=test_set, transforms=transforms, mode='test')
dev_dataset = MNIST_dataset(dataset=dev_set, transforms=transforms, mode='dev')

#### 1. a). 复现baseline

In [5]:
import paddle
import paddle.nn as nn
import paddle.nn.functional as F

class Paddle_LeNet(nn.Layer):
    def __init__(self, in_channels, num_classes=10):
        super(Paddle_LeNet, self).__init__()
        self.conv1 = nn.Conv2D(in_channels=in_channels, out_channels=6, kernel_size=5)
        self.pool2 = nn.MaxPool2D(kernel_size=2, stride=2)
        self.conv3 = nn.Conv2D(in_channels=6, out_channels=16, kernel_size=5)
        self.pool4 = nn.AvgPool2D(kernel_size=2, stride=2)
        self.conv5 = nn.Conv2D(in_channels=16, out_channels=120, kernel_size=5)
        self.linear6 = nn.Linear(in_features=120, out_features=84)
        self.linear7 = nn.Linear(in_features=84, out_features=num_classes)

    def forward(self, x):
        output = F.relu(self.conv1(x))
        output = self.pool2(output)
        output = F.relu(self.conv3(output))
        output = self.pool4(output)
        output = F.relu(self.conv5(output))
        output = paddle.squeeze(output, axis=[2,3])
        output = F.relu(self.linear6(output))
        output = self.linear7(output)
        return output

In [6]:
import paddle.optimizer as opt
from nndl import RunnerV3, metric

paddle.seed(100)
# 学习率大小
lr = 0.1

# 批次大小
batch_size = 64

# 加载数据
train_loader = io.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
dev_loader = io.DataLoader(dev_dataset, batch_size=batch_size)
test_loader = io.DataLoader(test_dataset, batch_size=batch_size)

# 定义LeNet网络
model = Paddle_LeNet(in_channels=1, num_classes=10)
# 定义优化器
optimizer = opt.SGD(learning_rate=lr, parameters=model.parameters())
# 定义损失函数
loss_fn = F.cross_entropy
# 定义评价指标
metric = metric.Accuracy(is_logist=True)
# 实例化 RunnerV3 类，并传入训练配置。
runner = RunnerV3(model, optimizer, loss_fn, metric)
# 启动训练
log_steps = 15
eval_steps = 15
runner.train(train_loader, dev_loader, num_epochs=10, log_steps=log_steps, 
                eval_steps=eval_steps, save_path="best_baseline.pdparams")

W0727 10:55:07.290863  5050 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0727 10:55:07.295270  5050 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.


[Train] epoch: 0/10, step: 0/160, loss: 2.77382
[Train] epoch: 0/10, step: 15/160, loss: 1.44361
[Evaluate]  dev score: 0.41500, dev loss: 1.72289
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.41500
[Train] epoch: 1/10, step: 30/160, loss: 0.72785
[Evaluate]  dev score: 0.79500, dev loss: 0.56832
[Evaluate] best accuracy performence has been updated: 0.41500 --> 0.79500
[Train] epoch: 2/10, step: 45/160, loss: 0.58829
[Evaluate]  dev score: 0.79500, dev loss: 0.52878
[Train] epoch: 3/10, step: 60/160, loss: 0.49997
[Evaluate]  dev score: 0.88000, dev loss: 0.36518
[Evaluate] best accuracy performence has been updated: 0.79500 --> 0.88000
[Train] epoch: 4/10, step: 75/160, loss: 0.13955
[Evaluate]  dev score: 0.89500, dev loss: 0.26394
[Evaluate] best accuracy performence has been updated: 0.88000 --> 0.89500
[Train] epoch: 5/10, step: 90/160, loss: 0.11211
[Evaluate]  dev score: 0.91500, dev loss: 0.20057
[Evaluate] best accuracy performence has been updated: 0.8

In [7]:
# 加载最优模型
runner.load_model('best_baseline.pdparams')
# 模型评价
score, loss = runner.evaluate(test_loader)
print("[Test] accuracy/loss: {:.4f}/{:.4f}".format(score, loss))

[Test] accuracy/loss: 0.9150/0.2239


epoch数设为10.
复现baseline的结果为**0.9150**.

#### 1. b). 采取策略提升模型精度

In [8]:
import paddle.optimizer as opt
from nndl import RunnerV3, metric

paddle.seed(100)
# 学习率大小
lr = 0.01

# 批次大小
batch_size = 512

# 加载数据
train_loader = io.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
dev_loader = io.DataLoader(dev_dataset, batch_size=batch_size)
test_loader = io.DataLoader(test_dataset, batch_size=batch_size)

# 定义LeNet网络
model = Paddle_LeNet(in_channels=1, num_classes=10)

# 定义优化器
# optimizer = opt.SGD(learning_rate=lr, parameters=model.parameters())
optimizer = paddle.optimizer.Adam(learning_rate=lr, parameters=model.parameters())

# 定义损失函数
loss_fn = F.cross_entropy
# 定义评价指标
metric = metric.Accuracy(is_logist=True)
# 实例化 RunnerV3 类，并传入训练配置。
runner = RunnerV3(model, optimizer, loss_fn, metric)
# 启动训练
log_steps = 15
eval_steps = 15
runner.train(train_loader, dev_loader, num_epochs=30, log_steps=log_steps, 
                eval_steps=eval_steps, save_path="best_tuned.pdparams")

[Train] epoch: 0/30, step: 0/60, loss: 2.62919
[Train] epoch: 7/30, step: 15/60, loss: 0.41444
[Evaluate]  dev score: 0.84000, dev loss: 0.40736
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.84000
[Train] epoch: 15/30, step: 30/60, loss: 0.11065
[Evaluate]  dev score: 0.88000, dev loss: 0.27252
[Evaluate] best accuracy performence has been updated: 0.84000 --> 0.88000
[Train] epoch: 22/30, step: 45/60, loss: 0.02630
[Evaluate]  dev score: 0.92500, dev loss: 0.21843
[Evaluate] best accuracy performence has been updated: 0.88000 --> 0.92500
[Evaluate]  dev score: 0.93000, dev loss: 0.19045
[Evaluate] best accuracy performence has been updated: 0.92500 --> 0.93000
[Train] Training done!


In [9]:
# 加载最优模型
runner.load_model('best_tuned.pdparams')
# 模型评价
score, loss = runner.evaluate(test_loader)
print("[Test] accuracy/loss: {:.4f}/{:.4f}".format(score, loss))

[Test] accuracy/loss: 0.9400/0.1236


初始学习率保持为0.01，批次大小调整为512，epoch数设为30，优化器选择Adam算法.
模型的精度为**0.9400**.


### 2.<span style="color:red">(附加题&加分题)</span>

    在课程中我们讲到了Adam优化器及其实现方法，也了解到可以通过在损失函数中引入ℓ2正则化来缓解过拟合。但Adam优化器中自适应学习率的存在会使得ℓ2正则化失效。AdamW优化器的提出则可解决这一问题。
    大家可通过阅读论文DECOUPLED WEIGHT DECAY REGULARIZATION详细了解AdamW优化器，并通过调用paddle.optimzer.AdamW API实现AdamW优化器指导LeNet网络在MNIST数据集上完成训练。


In [10]:
import paddle.optimizer as opt
from nndl import RunnerV3, metric

paddle.seed(100)
# 学习率大小
lr = 0.01

# 批次大小
batch_size = 512

# 加载数据
train_loader = io.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
dev_loader = io.DataLoader(dev_dataset, batch_size=batch_size)
test_loader = io.DataLoader(test_dataset, batch_size=batch_size)

# 定义LeNet网络
model = Paddle_LeNet(in_channels=1, num_classes=10)

# 定义优化器
# optimizer = opt.SGD(learning_rate=lr, parameters=model.parameters())
optimizer = paddle.optimizer.AdamW(learning_rate=lr, parameters=model.parameters())

# 定义损失函数
loss_fn = F.cross_entropy
# 定义评价指标
metric = metric.Accuracy(is_logist=True)
# 实例化 RunnerV3 类，并传入训练配置。
runner = RunnerV3(model, optimizer, loss_fn, metric)
# 启动训练
log_steps = 15
eval_steps = 15
runner.train(train_loader, dev_loader, num_epochs=30, log_steps=log_steps, 
                eval_steps=eval_steps, save_path="best_adamw.pdparams")

[Train] epoch: 0/30, step: 0/60, loss: 2.58380
[Train] epoch: 7/30, step: 15/60, loss: 0.44040
[Evaluate]  dev score: 0.88000, dev loss: 0.32249
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.88000
[Train] epoch: 15/30, step: 30/60, loss: 0.09892
[Evaluate]  dev score: 0.92000, dev loss: 0.20775
[Evaluate] best accuracy performence has been updated: 0.88000 --> 0.92000
[Train] epoch: 22/30, step: 45/60, loss: 0.02301
[Evaluate]  dev score: 0.92500, dev loss: 0.25228
[Evaluate] best accuracy performence has been updated: 0.92000 --> 0.92500
[Evaluate]  dev score: 0.95000, dev loss: 0.22901
[Evaluate] best accuracy performence has been updated: 0.92500 --> 0.95000
[Train] Training done!


In [11]:
# 加载最优模型
runner.load_model('best_adamw.pdparams')
# 模型评价
score, loss = runner.evaluate(test_loader)
print("[Test] accuracy/loss: {:.4f}/{:.4f}".format(score, loss))

[Test] accuracy/loss: 0.9650/0.1126


初始学习率和批次大小与前面用Adam算法做优化器的模型保持一致，优化器选择AdamW算法. 模型的精度为**0.9650**.

### 3.<span style="color:red">(附加题&简答题&加分题)</span>

	小明意识到了自己在搭建宠物店猫狗识别系统的过程中，在采集数据集犯下的错误。经过调整后，小明开始训练网络，发现在训练数据上损失不断下降而在验证数据上损失先降后增，请分析该现象是什么，如何缓解？

+ 训练中发生了过拟合。可以尝试引入$l_1$和$l_2$正则化，以及提前停止等正则化方法，减轻模型的过拟合。

+ 训练集和验证集的数据分布可能相差过大。可以尝试随机打乱数据、重新划分数据集或者对实验数据进行扩充。