# DensNet
### DenseNet借鉴了ResNet的思想，唯一的不同在于，ResNet中连接是将输入和输出相加，而DenseNet则是在通道上进行连接

 * DenseNet中的单元叫做稠密块，它采用了额ResNet中的改良版本 'BN + 激活函数 + 卷积'结构

In [1]:
import sys 
sys.path.append('../')

In [2]:
import gluonbook as gb
from mxnet import gluon, init, nd
from mxnet.gluon import nn
import mxnet as mx

  from ._conv import register_converters as _register_converters


In [3]:
def conv_block(num_channels):
    blk = nn.Sequential()
    blk.add(nn.BatchNorm(),nn.Activation('relu'),nn.Conv2D(num_channels,kernel_size=3,padding=1))
    return blk

* 稠密块由多个 conv_block组成，每块使用相同的输出通道数。前向计算式，我们将每块的输入和输出在通道维上连接

In [4]:
class DenseBlock(nn.Block):
    def __init__(self,num_convs,num_channels,**kwargs):
        super(DenseBlock,self).__init__(**kwargs)
        self.net = nn.Sequential()    
        for _ in range(num_convs):
            self.net.add(conv_block(num_channels))
            
    def forward(self,X):
        for blk in self.net:
            Y = blk(X)
            X = nd.concat(X,Y,dim=1) #在通道维上进行输出连接
        return X

In [5]:
blk = DenseBlock(2,10)
blk.initialize(ctx=mx.gpu(),force_reinit=True)
X =nd.random.uniform(shape=(4,3,8,8),ctx=mx.gpu())
Y = blk(X)
Y.shape

(4, 23, 8, 8)

## 过渡层

* 由于每个稠密块都会带来通道数的增加，使用过多则会带来过于复杂的模型。过渡层用来控制模型的复杂度。它通过1\*1卷积层来减小通道数，并且使用步幅为2的平均池化层来减半高和宽，从而进一步降低模型复杂度

In [6]:
def transition_block(num_channels):
    blk = nn.Sequential()
    blk.add(nn.BatchNorm(),nn.Activation('relu'),
            nn.Conv2D(num_channels,kernel_size=1),
            nn.MaxPool2D(pool_size=2,strides=2)
            )
    return blk

In [7]:
blk = transition_block(10)
blk.initialize(ctx= mx.gpu())
blk(Y).shape

(4, 10, 4, 4)

## DenseNet模型

In [8]:
DenseNet = nn.Sequential()

In [9]:
DenseNet.add(
             nn.Conv2D(channels=32,kernel_size=7,strides=2,padding=3),
             nn.BatchNorm(),nn.Activation('relu'),
             nn.MaxPool2D(pool_size=3,strides=2,padding=1)
            )

* 接下来使用四个稠密块

In [10]:
num_channels,growth_rate = 32,16
num_convs_in_dense_blocks = [2,2,2,2]

for i,num_convs in enumerate(num_convs_in_dense_blocks):
    DenseNet.add(DenseBlock(num_convs,growth_rate))
    
    #上一个稠密通道的输出通道数
    num_channels += num_convs*growth_rate  #每个稠密块将增加2*16=32个通道数
    
    if i!=len(num_convs_in_dense_blocks)-1:
        DenseNet.add(transition_block(num_channels//2))  #将通道减半，降低模型复杂度

In [11]:
DenseNet.add(nn.BatchNorm(),nn.Activation('relu'),nn.GlobalAvgPool2D(),nn.Dense(10))

## 训练模型

In [12]:
lr, num_epochs, batch_size, ctx = 0.1,10, 64, gb.try_gpu()
DenseNet.initialize(ctx=ctx, init=init.Xavier(),force_reinit=True)
trainer = gluon.Trainer(DenseNet.collect_params(), 'sgd', {'learning_rate': lr})
train_iter, test_iter = gb.load_data_fashion_mnist(batch_size, resize=96)
gb.train_ch5(DenseNet, train_iter, test_iter, batch_size, trainer, ctx, num_epochs)

training on gpu(0)
epoch 1, loss 0.4823, train acc 0.834, test acc 0.887, time 42.1 sec
epoch 2, loss 0.3015, train acc 0.890, test acc 0.881, time 40.0 sec
epoch 3, loss 0.2609, train acc 0.906, test acc 0.888, time 39.8 sec
epoch 4, loss 0.2370, train acc 0.914, test acc 0.918, time 40.3 sec
epoch 5, loss 0.2166, train acc 0.921, test acc 0.877, time 39.5 sec


## 批量数较小时，相对来说训练速度会变慢

In [13]:
num_epochs = 10
gb.train_ch5(DenseNet, train_iter, test_iter, batch_size, trainer, ctx, num_epochs)

training on gpu(0)
epoch 1, loss 0.3164, train acc 0.886, test acc 0.897, time 41.3 sec
epoch 2, loss 0.2403, train acc 0.911, test acc 0.914, time 42.8 sec
epoch 3, loss 0.2161, train acc 0.922, test acc 0.886, time 40.1 sec
epoch 4, loss 0.1981, train acc 0.929, test acc 0.907, time 42.6 sec
epoch 5, loss 0.1837, train acc 0.933, test acc 0.915, time 39.5 sec
