# 一、环境配置
<font size=4>本教程基于PaddlePaddle 2.3.0 编写，如果你的环境不是本版本，请先参考官网安装 PaddlePaddle 2.3.0 。</font>

In [9]:
import paddle
import paddle.nn.functional as F
from paddle.nn import Conv2D,MaxPool2D,Linear,Dropout
from paddle.vision.transforms import ToTensor
import numpy as np
import matplotlib.pyplot as plt

print(paddle.__version__)

2.3.2


# 二、加载数据集
<font size=4>本案例将会使用飞桨提供的API完成数据集的下载并为后续的训练任务准备好数据迭代器。cifar10数据集由60000张大小为32 * 32的彩色图片组成，其中有50000张图片组成了训练集，另外10000张图片组成了测试集。这些图片分为10个类别，将训练一个模型能够把图片进行正确的分类。</font>

In [10]:
transform = ToTensor()
cifar10_train = paddle.vision.datasets.Cifar10(mode='train',
                                               transform=transform)
cifar10_test = paddle.vision.datasets.Cifar10(mode='test',
                                              transform=transform)

# 三、组建网络
<font size=4>接下来使用飞桨定义一个使用了三个二维卷积（ Conv2D ) 且每次卷积之后使用 relu 激活函数，两个二维池化层（ MaxPool2D ），和两个线性变换层组成的分类网络，来把一个(32, 32, 3)形状的图片通过卷积神经网络映射为10个输出，这对应着10个分类的类别。</font>

## 1.LetNet
<font size=4>LeNet大体上由提取特征的三个卷积层和两个分类的全连接层组成。(图片来自网络)</font>
![](https://ai-studio-static-online.cdn.bcebos.com/04528378411f443bb85eb70e96777c00df52c048f8d740fc98450cdc9febc548)
<font size=4>卷积层和全连接层采用Sigmoid激活函数。三个全连接层之间插入了两个池化层来缩小特征图，以使后面的卷积层提取更大尺度的特征。池化层采用最大池化方式。原版的输出层由欧式径向基函数单元组成。此处用softmax输出单元。输出数量为分类的类别数量。LeNet原本是设计用来分类输入尺寸为32×32的手写数字图片的。</font>

In [11]:
class MyNet(paddle.nn.Layer):
    def __init__(self, num_classes=1):
        super(MyNet, self).__init__()

        self.conv1 = paddle.nn.Conv2D(in_channels=3, out_channels=32, kernel_size=(3, 3))
        self.pool1 = paddle.nn.MaxPool2D(kernel_size=2, stride=2)

        self.conv2 = paddle.nn.Conv2D(in_channels=32, out_channels=64, kernel_size=(3,3))
        self.pool2 = paddle.nn.MaxPool2D(kernel_size=2, stride=2)

        self.conv3 = paddle.nn.Conv2D(in_channels=64, out_channels=64, kernel_size=(3,3))

        self.flatten = paddle.nn.Flatten()

        self.linear1 = paddle.nn.Linear(in_features=1024, out_features=64)
        self.linear2 = paddle.nn.Linear(in_features=64, out_features=num_classes)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.pool1(x)

        x = self.conv2(x)
        x = F.relu(x)
        x = self.pool2(x)

        x = self.conv3(x)
        x = F.relu(x)

        x = self.flatten(x)
        x = self.linear1(x)
        x = F.relu(x)
        x = self.linear2(x)
        return x

class LeNet(paddle.nn.Layer):
    def __init__(self, num_classes=1):
        super(LeNet, self).__init__()
        self.conv1 = Conv2D(in_channels=3, out_channels=6, kernel_size=5)
        self.pool1 = MaxPool2D(kernel_size=2, stride=2)
        self.conv2 = Conv2D(in_channels=6, out_channels=16,  kernel_size=5)
        self.pool2 = MaxPool2D(kernel_size=2, stride=2)
        self.conv3 = Conv2D(in_channels=16, out_channels=120, kernel_size=1)
        
        self.flatten = paddle.nn.Flatten()
       
        self.fc1 = Linear(in_features=120*5*5,  out_features=84)
        self.fc2 = Linear(in_features=84, out_features=num_classes)
    
    # 前向计算过程
    def forward(self, x):
        x = self.conv1(x)
        x = F.sigmoid(x)
        x = self.pool1(x)
        x = self.conv2(x)
        x = F.sigmoid(x)
        x = self.pool2(x)
        x = self.conv3(x)
        x = F.sigmoid(x)
        x = self.flatten(x)
        #x = fluid.layers.reshape(x, [-1, 120 * 50 * 50])# 将二维的卷积层输出的特征图拉伸为同等大小的1维
        x = self.fc1(x)
        x = F.sigmoid(x)
        x = self.fc2(x)
        return x


## 2.AlexNet
<font size=4> 得益于硬件的发展（GPU的使用等）和各种算法的改进，在2012的 ImageNet 图像分类竞赛中，AlexeNet 以远超第二名的成绩夺冠，使得深度学习重回历史舞台，具有重大历史意义。AlexNet主要由5层卷积层和3层全连接层组成。(图片来自网络)</font>
![](https://ai-studio-static-online.cdn.bcebos.com/400d29e800094f61a09a90ae38e3caf47fa590097e914ef79318eafe53530f68)

<font size=4>采用ReLU激活函数（The Rectified Linear Unit修正线性单元）代替了LeNet中的Sigmoid激活函数。ReLU激活函数的单侧抑制特性，使得神经网络中的神经元具有了稀疏激活性，可以在 BP 的时候将梯度很好地传到较前面的网络。
在卷积层后使用尺寸（为3）大于步长（为2）的重叠池化降低网络的过拟合。
在全连接层之间采用DropOut层随机抛掉部分神经元以降低网络的过拟合。
开始使用GPU加速训练、使用mini batch进行带动量的随机梯度下降、使用数据增强（GPU加速、mini batch划分现在已普遍使用。在本项目中为了比较各个模型的性能，均未使用数据增强）</font>

<font size=4>由于AlexNet输入数据图像为224，要运行32像素的Cifar10数据集不现实，需要对原结构进行微调，主要在以下三点：<br></br>
1. 横向拓宽了网络，比如原本conv1输出64个卷积结果，现在改为输出96个卷积结果。<br></br>
2. 将conv2和conv5后面的两个最大值池化层改为均值池化层；<br></br>
3. 将fc1和fc2的输出数据个数均改为4096。<br></br>
修改后的网络结构如下图：</font>
![](https://ai-studio-static-online.cdn.bcebos.com/fe9013ebd2264c169d698079ea982ef9829ada29f72f4472a0d2c383cccd87a3)


In [12]:
class AlexNet(paddle.nn.Layer):
    def __init__(self, num_classes=1):
        super(AlexNet, self).__init__()

        self.conv1 = Conv2D(3, 64, 3, stride=1, padding=1)
        self.pool1 = MaxPool2D(kernel_size=2, stride=2)
        self.conv2 = Conv2D(64, 192, 3, stride=1, padding=1)
        self.pool2 = MaxPool2D(kernel_size=2, stride=2)
        self.conv3 = Conv2D(192, 384, 3, stride=1, padding=1)
        self.conv4 = Conv2D(384, 256, 3, stride=1, padding=1)
        self.conv5 = Conv2D(256, 256, 3, stride=1, padding=1)
        self.pool5 = MaxPool2D(kernel_size=2, stride=2)
        self.flatten = paddle.nn.Flatten()
        self.fc1 = Linear(256 * 4 * 4, 2048)
        self.drop_out1 = Dropout(p=0.5)
        self.fc2 = Linear(2048, 1024)
        self.drop_out2 = Dropout(p=0.5)
        self.fc3 = Linear(1024, num_classes)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.pool1(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = self.pool2(x)
        x = self.conv3(x)
        x = F.relu(x)
        x = self.conv4(x)
        x = F.relu(x)
        x = self.conv5(x)
        x = F.relu(x)
        x = self.pool5(x)        
        x = self.flatten(x)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.drop_out1(x)
        x = self.fc2(x)
        x = F.relu(x)
        x = self.drop_out2(x)
        x = self.fc3(x)
        return x

## 3.VGG
<font size=4>“VGG”代表了牛津大学的Oxford Visual Geometry Group。VGG模型采用模块化的方式将网络堆叠到了19层以增强性能。(图片来自网络)</font>
![](https://ai-studio-static-online.cdn.bcebos.com/ed1c70900e054aa1b45f3846667610d3d76e3a7b4f3c40098990d18c195585a4)
<font size=4>VGG网络的研究者证明了小尺寸卷积核（3x3 ）的深层网络要优于大尺寸卷积核的浅层网络，所以全部采用3×3的卷积核代替了其他的大尺寸卷积核。由于网络深度较深，因此网络权重的初始化很重要，可以通过 Xavier均匀初始化，否则可能会阻碍学习。VGG有13、16、19等多种尺度规格。训练VGG16、VGG19这样的深层网络时，可以逐层训练。先训练VGG13，然后冻结前面的层对后面的层进行微调。</font>

In [13]:
class VGG(paddle.nn.Layer):
    def __init__(self, num_classes=1,layer=13):
        super(VGG, self).__init__()

        self.layer = layer

        self.pool = MaxPool2D(kernel_size=2, stride=2)
        self.drop_out = Dropout(p=0.5)

        self.conv1 = Conv2D(3, 64, 3, padding=1)
        self.conv2 = Conv2D(64, 64, 3, padding=1)

        self.conv3 = Conv2D(64, 128, 3, padding=1)
        self.conv4 = Conv2D(128, 128, 3, padding=1)

        self.conv5 = Conv2D(128, 256, 3, padding=1)
        self.conv6 = Conv2D(256, 256, 3, padding=1)
        self.conv7 = Conv2D(256, 256, 3, padding=1)
        self.conv8 = Conv2D(256, 256, 3, padding=1)

        self.conv9 = Conv2D(256, 512, 3, padding=1)
        self.conv10 = Conv2D(512, 512, 3, padding=1)
        self.conv11 = Conv2D(512, 512, 3, padding=1)
        self.conv12 = Conv2D(512, 512, 3, padding=1)

        self.conv13 = Conv2D(512, 512, 3, padding=1)
        self.conv14 = Conv2D(512, 512, 3, padding=1)
        self.conv15 = Conv2D(512, 512, 3, padding=1)
        self.conv16 = Conv2D(512, 512, 3, padding=1)

        self.flatten = paddle.nn.Flatten()

        self.fc1 = Linear(in_features=512, out_features=4096)
        self.fc2 = Linear(in_features=4096, out_features=4096)
        self.fc3 = Linear(in_features=4096, out_features=num_classes)

    # 网络的前向计算过程
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = self.pool(x)

        x = F.relu(self.conv3(x))
        # x = F.relu(x)
        x = F.relu(self.conv4(x))
        x = self.pool(x)

        x = F.relu(self.conv5(x))
        x = F.relu(self.conv6(x))
        if self.layer >= 16:
            x = F.relu(self.conv7(x))
        if self.layer >= 19:
            x = F.relu(self.conv8(x))
        x = self.pool(x)

        x = F.relu(self.conv9(x))
        x = F.relu(self.conv10(x))
        if self.layer >= 16:
            x = F.relu(self.conv11(x))
        if self.layer >= 19:
            x = F.relu(self.conv12(x))
        x = self.pool(x)

        x = F.relu(self.conv13(x))
        x = F.relu(self.conv14(x))
        if self.layer >= 16:
            x = F.relu(self.conv15(x))
        if self.layer >= 19:
            x = F.relu(self.conv16(x))
        x = self.pool(x)
        x = self.flatten(x)
        #x = fluid.layers.reshape(x, [-1, 512 * 7 * 7])
        x = F.relu(self.fc1(x))
        x = self.drop_out(x)
        x = F.relu(self.fc2(x))
        x = self.drop_out(x)
        x = self.fc3(x)
        return x


# 四、模型训练&预测
<font size=4>接下来，用一个循环来进行模型的训练，将会:使用 paddle.optimizer.Adam 优化器来进行优化。使用 F.cross_entropy 来计算损失值。使用 paddle.io.DataLoader 来加载数据并组建batch。</font>

In [6]:
epoch_num = 10
batch_size = 100
learning_rate = 0.001
val_acc_history = []
val_loss_history = []
paddle.device.set_device('gpu:0')
def train(model):
    print('start training ... ')
    # turn into training mode
    model.train()

    opt = paddle.optimizer.Adam(learning_rate=learning_rate,
                                parameters=model.parameters())

    train_loader = paddle.io.DataLoader(cifar10_train,
                                        shuffle=True,
                                        batch_size=batch_size)

    valid_loader = paddle.io.DataLoader(cifar10_test, batch_size=batch_size)
    
    for epoch in range(epoch_num):
        for batch_id, data in enumerate(train_loader()):
            x_data = data[0]
            y_data = paddle.to_tensor(data[1])
            y_data = paddle.unsqueeze(y_data, 1)

            logits = model(x_data)
            loss = F.cross_entropy(logits, y_data)

            if batch_id % 1000 == 0:
                print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, loss.numpy()))
            loss.backward()
            opt.step()
            opt.clear_grad()
        # evaluate model after one epoch
        model.eval()
        accuracies = []
        losses = []
        for batch_id, data in enumerate(valid_loader()):
            x_data = data[0]
            y_data = paddle.to_tensor(data[1])
            y_data = paddle.unsqueeze(y_data, 1)

            logits = model(x_data)
            loss = F.cross_entropy(logits, y_data)
            acc = paddle.metric.accuracy(logits, y_data)
            accuracies.append(acc.numpy())
            losses.append(loss.numpy())

        avg_acc, avg_loss = np.mean(accuracies), np.mean(losses)
        print("[validation] accuracy/loss: {}/{}".format(avg_acc, avg_loss))
        val_acc_history.append(avg_acc)
        val_loss_history.append(avg_loss)
        model.train()

model = VGG(num_classes=10)
train(model)

start training ... 
epoch: 0, batch_id: 0, loss is: [2.3600495]


In [7]:
plt.plot(val_acc_history, label = 'validation accuracy')

plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0, 1])
plt.legend(loc='lower right')