## 第五章 卷积神经网络

### 1.本文实现了ResNet18/34中的残差单元，自行了解ResNet50中的残差单元结构，并实现。<span style="color:red">(必修题)</span>

![](https://ai-studio-static-online.cdn.bcebos.com/f8aa25f5ba0d44bfb9889da1e0fe4e7bfbdaf52930ae49e588f50e288d1eb1fe)

18,34-layer的基本模块记为Basicblock，包含2次卷积；50,101,152layer的基本模块记为Bottleneck，包含3次卷积。

In [1]:
import paddle
import paddle.nn as nn
from paddle.vision import transforms

class Bottleneck(nn.Layer):
    def __init__(self, in_channels, filters, stride=1, is_downsample = False):
        super(Bottleneck, self).__init__()
        filter1, filter2, filter3 = filters
        self.conv1 = nn.Conv2D(in_channels, filter1, kernel_size=1, stride=stride, bias_attr=False)
        self.bn1 = nn.BatchNorm2D(filter1)
        self.conv2 = nn.Conv2D(filter1, filter2, kernel_size=3, stride=1, padding=1, bias_attr=False)
        self.bn2 = nn.BatchNorm2D(filter2)
        self.conv3 = nn.Conv2D(filter2, filter3, kernel_size=1, stride=1, bias_attr=False)
        self.bn3 = nn.BatchNorm2D(filter3)
        # self.relu = nn.ReLU(inplace=True)
        self.relu = nn.ReLU()
        self.is_downsample = is_downsample
        self.parameters()
        if is_downsample:
            self.downsample = nn.Sequential(nn.Conv2D(in_channels, filter3, kernel_size=1, stride=stride, bias_attr=False),
                                            nn.BatchNorm2D(filter3))
 
    def forward(self, X):
        X_shortcut = X
        X = self.conv1(X)
        X = self.bn1(X)
        X = self.relu(X)
 
        X = self.conv2(X)
        X = self.bn2(X)
        X = self.relu(X)
 
        X = self.conv3(X)
        X = self.bn3(X)
 
        if self.is_downsample:
            X_shortcut = self.downsample(X_shortcut)
 
        X = X + X_shortcut
        X = self.relu(X)
        return X

### 2.自行调研ResNet50网络的网络结构，基于1.1实现的残差单元，实现ResNet50网络，并按照本章介绍的方法统计参数量和计算量。<span style="color:red">(附加题&加分题)</span>

In [2]:
Layers = [3, 4, 6, 3]

class ResNet50Model(nn.Layer):
    def __init__(self):
        super(ResNet50Model, self).__init__()
        self.conv1 = nn.Conv2D(3, 64, kernel_size=7, stride=2, padding=3, bias_attr=False)
        self.bn1 = nn.BatchNorm2D(num_features=64)
        # self.relu = nn.ReLU(inplace=True)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
 
        self.layer1 = self._make_layer(64, (64, 64, 256), Layers[0])
        self.layer2 = self._make_layer(256, (128, 128, 512), Layers[1], 2)
        self.layer3 = self._make_layer(512, (256, 256, 1024), Layers[2], 2)
        self.layer4 = self._make_layer(1024, (512, 512, 2048), Layers[3], 2)
        self.avgpool = nn.AdaptiveAvgPool2D((1, 1))
        self.fc = nn.Linear(2048, 1000)
        # self.named_parameters()
 
    def forward(self, input):
        # print("--ResNetModel_1--forward--input.shape={}".format(input.shape))
        X = self.conv1(input)
        X = self.bn1(X)
        X = self.relu(X)
        X = self.maxpool(X)
        X = self.layer1(X)
        X = self.layer2(X)
        X = self.layer3(X)
        X = self.layer4(X)
 
        X = self.avgpool(X)
        X = paddle.flatten(X, 1)
        X = self.fc(X)
        return X
 
    def _make_layer(self, in_channels, filters, blocks, stride = 1):
        layers = []
        block_one = Bottleneck(in_channels, filters, stride=stride, is_downsample=True)
        layers.append(block_one)
        for i in range(1, blocks):
            layers.append(Bottleneck(filters[2], filters, stride=1, is_downsample=False))
 
        return nn.Sequential(*layers)

In [3]:
model = ResNet50Model()
params_info = paddle.summary(model, (1080, 3, 64, 64))
print(params_info)

W0727 09:42:59.449671  5595 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0727 09:42:59.453424  5595 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.


--------------------------------------------------------------------------------
   Layer (type)          Input Shape          Output Shape         Param #    
     Conv2D-1        [[1080, 3, 64, 64]]   [1080, 64, 32, 32]       9,408     
   BatchNorm2D-1    [[1080, 64, 32, 32]]   [1080, 64, 32, 32]        256      
      ReLU-1        [[1080, 64, 32, 32]]   [1080, 64, 32, 32]         0       
    MaxPool2D-1     [[1080, 64, 32, 32]]   [1080, 64, 16, 16]         0       
     Conv2D-2       [[1080, 64, 16, 16]]   [1080, 64, 16, 16]       4,096     
   BatchNorm2D-2    [[1080, 64, 16, 16]]   [1080, 64, 16, 16]        256      
      ReLU-2        [[1080, 256, 16, 16]] [1080, 256, 16, 16]         0       
     Conv2D-3       [[1080, 64, 16, 16]]   [1080, 64, 16, 16]      36,864     
   BatchNorm2D-3    [[1080, 64, 16, 16]]   [1080, 64, 16, 16]        256      
     Conv2D-4       [[1080, 64, 16, 16]]  [1080, 256, 16, 16]      16,384     
   BatchNorm2D-4    [[1080, 256, 16, 16]] [1080, 2

可以看到参数量为**25610152**，其中可学习参数有**25503912**个。

### 3.按照本章介绍的流程与飞桨HAPI中的ResNet50网络进行对齐。<span style="color:red">(附加题&加分题)</span>

In [4]:
# ! wget http://ai-atest.bj.bcebos.com/cifar-10-python.tar.gz
# ! mkdir /home/aistudio/datasets/
# ! tar -xvf ./cifar-10-python.tar.gz -C /home/aistudio/datasets/

In [5]:
import numpy as np
from paddle.vision.models import resnet50

import warnings
warnings.filterwarnings("ignore")

hpai_resnet50_model = resnet50()
my_resnet50_model = ResNet50Model()

# 获取网络的权重
params = hpai_resnet50_model.state_dict()
# 用来保存参数名映射后的网络权重
new_params = {}
# 将参数名进行映射
for key in params:
    if 'layer' in key:
        if 'downsample.0' in key:
            new_params['net.' + key[5:8] + '.shortcut' + key[-7:]] = params[key]
        elif 'downsample.1' in key:
            new_params['net.' + key[5:8] + '.shorcutt' + key[23:]] = params[key]
        else:
            new_params['net.' + key[5:]] = params[key]
    elif 'conv1.weight' == key:
        new_params['net.0.0.weight'] = params[key]
    elif 'bn1' in key:
        new_params['net.0.1' + key[3:]] = params[key]
    elif 'fc' in key:
        new_params['net.7' + key[2:]] = params[key]

# 将飞桨HAPI中实现的resnet50模型的权重参数赋予自定义的resnet50模型，保持两者一致
my_resnet50_model.set_state_dict(new_params)

# 这里用np.random创建一个随机数组作为测试数据
inputs = np.random.randn(*[1,3,32,32]) # [N, C, H, W]
inputs = inputs.astype('float32')
x = paddle.to_tensor(inputs)

output = my_resnet50_model(x)
hapi_out = hpai_resnet50_model(x)

# 计算两个模型输出的差异
diff = output - hapi_out
# 取差异最大的值
max_diff = paddle.max(diff)
print(max_diff)

Tensor(shape=[1], dtype=float32, place=Place(gpu:0), stop_gradient=False,
       [0.])


可以看到基本可以与paddle中的resnet50对齐。

### 4.暑假期间，小明为一个宠物店搭建猫狗识别系统，为了能够获取更多的训练数据，小明在街道上拍摄了大量猫狗的照片，可是最终训练的猫狗识别模型用于宠物店并没有取得很好的效果，请分析主要原因是什么？<span style="color:red">(附加题&简答题&加分题)</span>

数据场景不同。在街道上拍摄的猫狗图片，其中可能包含了大量的行人和街道上的其他物体，直接将其用于猫狗识别模型的训练相当于引入了大量噪声，而宠物店的识别场景中，输入的图往往是背景中只有少量其他物体的照片。