# DenseNet
[CVPR 2017 Best Paper [Densely Connected Convolutional Networks]](https://arxiv.org/abs/1608.06993)

[DenseNet 可视化](https://zhuanlan.zhihu.com/p/141178215)
## 回顾ResNet：
任意函数的泰勒公式将函数分解为越来越高阶的项，在0处展开如下(麦克劳林式)：
$$f(x) = f(0) + f'(0)x + \frac{f''(0)}{2!}x^2 + ·····$$
而ResNet将函数展开成一个简单线性项和一个复杂非线性项，如下:
$$f(X) = x + g(x)$$
这里的非线性项可以看作是泰勒公式中的多项式项组合。

## DenseNet思想
DenseNet想要将f拓展成超过两部分的信息。
$$x -> [x,f_1(x),f_2([x,f_1(x)]),·····]$$
DenseNet与ResNet的区别在于DenseNet输出与输入做Concat,ResNet做的是加法。
<div align = center> <img src ='./img/densenet-block.svg'></img></div>
DenseNet最后一层与前面的所有层紧密相连,最后将这些展开式结合到全连接。
<div align = center> <img src ='./img/densenet.svg'></img></div>

## DenseBlock
DenseBlock使用的是Residual Block的改进版本（先BN，激活再卷积）。每个DenseBlock由若干个卷积块组成，每个卷积块输出通道相同，但在前向传播中会将卷积块输入输出做通道维度的连结。卷积块的数量控制了输出通道数相对于输入通道数的增长，因此输出通道也叫做增长率。

In [2]:
import torch
from torch import nn

# 改良的Residual block
def conv_block(in_channels,out_channels):
    return nn.Sequential(
        nn.BatchNorm2d(in_channels),
        nn.ReLU(),
        nn.Conv2d(in_channels,out_channels,kernel_size=3,padding=1)
    )

# DenseBlock输入输出分辨率相同，只是通道数改变
# 内部每个block都进行Concat
class DenseBlock(nn.Module):
    def __init__(self,num_convs,in_channels,out_channels):
        super(DenseBlock,self).__init__()
        layer = []
        for i in range (num_convs):
            layer.append(conv_block(out_channels*i + in_channels,out_channels))
        self.net = nn.Sequential(*layer)

    def forward(self,X):
        for block in self.net:
            Y = block(X)
            X = torch.cat((X,Y),dim = 1)
        return X

In [3]:
# 输出通道数 10+10+3
blk = DenseBlock(2, 3, 10)
X = torch.randn(4, 3, 8, 8)
Y = blk(X)
Y.shape

torch.Size([4, 23, 8, 8])

# 过渡层
稠密块会带来通道数的增加，过多的稠密块会使模型过于复杂，过渡层就是为了减小通道数，来控制复杂度。
过渡层使用 $1\times 1$卷积核，减小通道数，同时使用$2 \times 2$步长为2的平均池化层来减半高和宽。


In [4]:
# 过渡层 分辨率减半，通道数改变
def transition_block(in_channels,out_channels):
    return nn.Sequential(nn.BatchNorm2d(in_channels),
                         nn.ReLU(),
                         nn.Conv2d(in_channels,out_channels,kernel_size=1),
                         nn.AvgPool2d(kernel_size=2,stride=2))

In [5]:
trans_block = transition_block(23,10)
trans_block(Y).shape

torch.Size([4, 10, 4, 4])

## DenseNet网络结构
网络结构与ResNet类似，只是将Residual Block替换成了Dense Block。仿照ResNet18，可以设置4个DenseBlock，每个Block有4个卷积层。每个卷积层的输出通道数设为32(growth_rate)。每个DenseBlock就会增加128通道。

DenseBlock与DenseBlock之间加过渡层（减半分辨率和通道数）

DenseNet相比起ResNet：可学习参数量少（卷积核输出维度少，多个卷积核concat成了大通道）；更加占用显存，DenseBlock中，每个卷积层的输出都需要保留，直到Block结束。

In [6]:
from torchsummary import summary
num_convs = [4,4,4,4]
growth_rate = 64
num_channels = 64
num_classes = 10
class DenseNet(nn.Module):
    def __init__(self,in_channels,num_channels,nums_class):
        super(DenseNet,self).__init__()
        self.conv = nn.Sequential(nn.Conv2d(in_channels,num_channels,kernel_size=7, padding = 3, stride = 2),
                                  nn.BatchNorm2d(num_channels),
                                  nn.ReLU(),
                                  nn.MaxPool2d(kernel_size=3,stride=2,padding=1))
        dense_blocks = []
        for i, num_conv in enumerate(num_convs):
            dense_blocks.append(DenseBlock(num_conv,num_channels,growth_rate))
            num_channels+=num_conv*growth_rate
            # 不是最后一个block，需要加过渡层
            # //是Math.floor()
            if i != (len(num_convs) - 1):
                dense_blocks.append(transition_block(num_channels,num_channels//2))
                num_channels = num_channels//2
        # 输出尺寸为
        self.dense = nn.Sequential(*dense_blocks)
        self.fc = nn.Sequential(nn.BatchNorm2d(num_channels),
                                nn.ReLU(),
                                nn.AdaptiveAvgPool2d((1,1)),
                                nn.Flatten(),
                                nn.Linear(num_channels,nums_class))
    
    def forward(self,X):
        X = self.fc(self.dense(self.conv(X)))
        return X

net = DenseNet(3,num_channels,num_classes)

summary(net,(3,224,224),device="cpu")

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           9,472
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
       BatchNorm2d-5           [-1, 64, 56, 56]             128
              ReLU-6           [-1, 64, 56, 56]               0
            Conv2d-7           [-1, 64, 56, 56]          36,928
       BatchNorm2d-8          [-1, 128, 56, 56]             256
              ReLU-9          [-1, 128, 56, 56]               0
           Conv2d-10           [-1, 64, 56, 56]          73,792
      BatchNorm2d-11          [-1, 192, 56, 56]             384
             ReLU-12          [-1, 192, 56, 56]               0
           Conv2d-13           [-1, 64, 56, 56]         110,656
      BatchNorm2d-14          [-1, 256,