# ResNets

## 什么是深度残差网络？

原始论文 https://arxiv.org/pdf/1512.03385.pdf 



当我们使用深层网络的时候，它们能够模拟很深层次的复杂函数，同时也可以从不同的抽象层次上提取特征。这一点会比浅层网络强很多。

但是天下没有免费的午餐，深层网络有优点自然也有缺点，那就是——梯度消失(在做反向传播的时候，因为每一步都要乘以一个权重矩阵，这样就容易导致传播到第一层的时候，梯度接近于0)

但是在Resnet中 "shortcut" 的存在允许梯度直接反向传播到前一层，这样就可以帮我们缓解梯度消失的现象。

![ResNetsBlock](./images/shortcut.png)



ResNet是由很多小的block组成,每个block的组成如下图所示：

![ResNetsBlock](./images/block.png)



ResNet34的结构图如下所示：

![ResNets](./images/resNets.jpg)



所以，我们只需要在实现block上，做一些小技巧，就可以了。


上文中的图，[参考 DeepLearning.ai 的课程4](https://www.coursera.org/learn/convolutional-neural-networks/home/welcome)

In [10]:
import torch
from torch import nn
from torch.nn import functional as F

# 我们这里以 ResNets34 为例子

# 先实现一个Block
class Block(nn.Module):
    def __init__(self, in_channel, out_channel, filter_size=3, stride=1, padding=1, short_cut=None):
        super(Block, self).__init__()
        self.block = nn.Sequential(
            nn.Conv2d(in_channel, out_channel, filter_size, stride, padding, bias=False),
            nn.BatchNorm2d(out_channel),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channel, out_channel, filter_size, stride, padding, bias=False),
            nn.BatchNorm2d(out_channel)
        )
        self.shortcut = short_cut
    
    def forward(self, x):
        out = self.block(x)
        x_short_cut = x if self.shortcut is None else self.shortcut(x)
        out = out + x_short_cut
        return F.relu(out)

# 开始实现 ResNets34
class ResNet34(nn.Module):
    def __init__(self, num_classes=10):
        super(ResNet34, self).__init__()
        self.model_name = 'resnet34'

        # 最开始的几层
        self.pre = nn.Sequential(
                nn.Conv2d(3, 64, 7, 2, 3, bias=False),
                nn.BatchNorm2d(64),
                nn.ReLU(inplace=True),
                nn.MaxPool2d(3, 2, 1))
        
        # 从论文的图中，可以看到，我们有3，4，6，3个block
        self.layer1 = self._make_layer(64, 128, 3)
        self.layer2 = self._make_layer(128, 256, 4, stride=2)
        self.layer3 = self._make_layer(256, 512, 6, stride=2)
        self.layer4 = self._make_layer(512, 512, 3, stride=2)

        # 分类用的全连接
        self.fc = nn.Linear(512, num_classes)
    
    def _make_layer(self,  in_channel, out_channel, block_num, stride=1):
        shortcut = nn.Sequential(
                nn.Conv2d(in_channel,out_channel,1,stride, bias=False),
                nn.BatchNorm2d(out_channel))
        
        layers = []
        layers.append(Block(in_channel, out_channel, stride, shortcut))
        
        for i in range(1, block_num):
            layers.append(Block(out_channel, out_channel))
        return nn.Sequential(*layers)
        
    def forward(self, x):
        x = self.pre(x)
        
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = F.avg_pool2d(x, 7)
        x = x.view(x.size(0), -1)
        return self.fc(x)

In [11]:
resnet = ResNet34()
print(resnet)

ResNet34 (
  (pre): Sequential (
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    (2): ReLU (inplace)
    (3): MaxPool2d (size=(3, 3), stride=(2, 2), padding=(1, 1), dilation=(1, 1))
  )
  (layer1): Sequential (
    (0): Block (
      (block): Sequential (
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(Sequential (
          (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
        ), Sequential (
          (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
        )), padding=(1, 1), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(128, 128, kernel_size=(1, 1), stride=(Sequential (
          (0): Conv2d(64, 128, kernel_size