## DenseNet：稠密连接的卷积神经网络
ResNet的跨层连接思想影响了接下来的众多工作。这里我们介绍其中的一个：DenseNet。下图展示了这两个的主要区别：

![image.png](http://zh.gluon.ai/_images/densenet.svg)

可以看到DenseNet里来自跳层的输出不是通过加法（+）而是拼接（concat）来跟目前层的输出合并。因为是拼接，所以底层的输出会保留的进入上面所有层。这是为什么叫“稠密连接”的原因

### 稠密块（Dense Block）
我们先来定义一个稠密连接块。DenseNet的卷积块使用ResNet改进版本的``BN->Relu->Conv``。每个卷积的输出通道数被称之为``growth_rate``，这是因为假设输出为``in_channels``，而且有``layers``层，那么输出的通道数就是``in_channels+growth_rate*layers``。

In [1]:
import numpy as np
import tensorflow as tf
slim = tf.contrib.slim

def conv_block(input, channels, is_training=False, scope='conv'):
    with tf.variable_scope(scope):
        bn = slim.batch_norm(input, is_training=is_training)
        relu = tf.nn.relu(bn)
        conv = slim.conv2d(relu, channels, [3, 3], padding='SAME')
        return conv

def DenseBlock(input, num_layers, growth_rate, is_training=False, scope='dense'):
    with tf.variable_scope(scope):
        for i in range(num_layers):
            conv = conv_block(input, growth_rate, is_training=is_training, scope='block'+str(i))
            input = tf.concat([input, conv], -1)
        return input 


x = np.random.uniform(size=(4,8,8,3)).astype(np.float32)
print DenseBlock(x, 2, 10)

Tensor("dense/concat_1:0", shape=(4, 8, 8, 23), dtype=float32)


### 过渡块（Transition Block）
因为使用拼接的缘故，每经过一次拼接输出通道数可能会激增。为了控制模型复杂度，这里引入一个过渡块，它不仅把输入的长宽减半，同时也使用$1×1$卷积来改变通道数。

In [2]:
def transition_block(input, channels, is_training=False, scope='transition_block'):
    with tf.variable_scope(scope):
        bn = slim.batch_norm(input, is_training=is_training)
        relu = tf.nn.relu(bn)
        conv = slim.conv2d(relu, channels, [1, 1])
        return slim.avg_pool2d(conv, [2, 2], 2)

print transition_block(x, 10)

Tensor("transition_block/AvgPool2D/AvgPool:0", shape=(4, 4, 4, 10), dtype=float32)


### DenseNet
DenseNet的主体就是交替串联稠密块和过渡块。它使用全局的`growth_rate`使得配置更加简单。过渡层每次都将通道数减半。下面定义一个121层的DenseNet。

In [3]:
init_channels = 64
growth_rate = 32
block_layers = [6, 12, 24, 16]
num_classes = 10

def dense_net(input, is_training=False, scope='densenet'):
    with tf.variable_scope(scope):
        # first block
        input = tf.pad(input, [[0,0],[3,3],[3,3],[0,0]])
        conv1 = slim.conv2d(input, init_channels, [7, 7], stride=2, scope='conv1')
        bn1 = slim.batch_norm(conv1, is_training=is_training)
        relu1 = tf.nn.relu(bn1)
        relu1 = tf.pad(relu1, [[0,0],[1,1],[1,1],[0,0]])
        pool1 = slim.max_pool2d(relu1, [3,3], 2)
        
        denseinput = pool1
        
        # dense blocks
        channels = init_channels
        for i, layers in enumerate(block_layers):
            denseoutput = DenseBlock(denseinput, layers, channels, is_training=is_training, scope='dense'+str(i))
            channels += layers * growth_rate
            print 'num_channels: ' + str(channels)
            if i != len(block_layers)-1:
                denseoutput =  transition_block(denseoutput, channels/2, is_training=is_training, scope='transition_block'+str(i))
            denseinput = denseoutput
        
        # last block
        bn_last = slim.batch_norm(denseoutput, is_training=is_training)
        relu_last = tf.nn.relu(bn_last)
        pool_last = slim.avg_pool2d(relu_last, [1,1])
        return slim.fully_connected(slim.flatten(pool_last), num_classes, activation_fn=None, scope='fc')



### 获取数据并训练
因为这里我们使用了比较深的网络，所以我们进一步把输入减少到$32×32$来训练。

In [None]:
import sys
import numpy as np

sys.path.append('../../utils')
import utils

data_dir = '../../data/fashion_mnist'
train_images, train_labels, test_images, test_labels = utils.load_data_fashion_mnist(data_dir, one_hot=True)
print train_images.shape
print test_images.shape

from tensorflow.contrib.learn.python.learn.datasets.mnist import DataSet
train_dataset = DataSet(train_images, train_labels, one_hot=True)
test_dataset = DataSet(test_images, test_labels, one_hot=True)

Extracting ../../data/fashion_mnist/train-images-idx3-ubyte.gz
Extracting ../../data/fashion_mnist/train-labels-idx1-ubyte.gz
Extracting ../../data/fashion_mnist/t10k-images-idx3-ubyte.gz
Extracting ../../data/fashion_mnist/t10k-labels-idx1-ubyte.gz
(60000, 28, 28, 1)
(10000, 28, 28, 1)


In [None]:
learning_rate = 1e-1
max_steps = 1000
batch_size = 8
height = width = 28
num_channels = 1
num_outputs = 10

tf.reset_default_graph()

input_placeholder = tf.placeholder(tf.float32, [None, height, width, num_channels])
resize_input = tf.image.resize_images(input_placeholder, [32, 32])
gt_placeholder = tf.placeholder(tf.int64, [None, num_outputs])
is_training = tf.placeholder(tf.bool)


logits = dense_net(resize_input, is_training)
loss = tf.losses.softmax_cross_entropy(logits=logits, onehot_labels=gt_placeholder)
acc = utils.accuracy(logits, gt_placeholder)

test_images_reshape = np.reshape(np.squeeze(test_images), (test_images.shape[0], height, width, 1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

train_op = optimizer.minimize(loss)
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)
test_acc = []


for step in range(max_steps):
    data, label = train_dataset.next_batch(batch_size)
    data = np.reshape(data, (batch_size, height, width, num_channels))
    feed_dict = {input_placeholder: data, gt_placeholder: label, is_training: True}
    loss_, acc_, _ = sess.run([loss, acc, train_op], feed_dict=feed_dict)
    print("Batch %d, Loss: %f, Train acc %f " % (step, loss_, acc_))
        
for i in range(100):
    test_data, test_label = test_dataset.next_batch(100)
    test_data = np.reshape(test_data, (100, height, width, num_channels))
    test_loss_, test_acc_ = sess.run([loss, acc], feed_dict={input_placeholder: test_data, gt_placeholder: test_label, is_training: False})
    test_acc.append(test_acc_)
print ("Test Loss: %f, Test acc %f " % (np.mean(test_loss_), np.mean(test_acc_)))


num_channels: 256
num_channels: 640
num_channels: 1408
num_channels: 1920
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

