## ResNet：深度残差网络
当大家还在惊叹GoogLeNet用结构化的连接纳入了大量卷积层的时候，微软亚洲研究院的研究员已经在设计更深但结构更简单的网络ResNet。他们凭借这个网络在2015年的Imagenet竞赛中大获全胜。

ResNet有效的解决了深度卷积神经网络难训练的问题。这是因为在误差反传的过程中，梯度通常变得越来越小，从而权重的更新量也变小。这个导致远离损失函数的层训练缓慢，随着层数的增加这个现象更加明显。之前有两种常用方案来尝试解决这个问题：

按层训练。先训练靠近数据的层，然后慢慢的增加后面的层。但效果不是特别好，而且比较麻烦。
使用更宽的层（增加输出通道）而不是更深来增加模型复杂度。但更宽的模型经常不如更深的效果好。
ResNet通过增加跨层的连接来解决梯度逐层回传时变小的问题。虽然这个想法之前就提出过了，但ResNet真正的把效果做好了。

下图演示了一个跨层的连接。

![image.png](http://zh.gluon.ai/_images/residual.svg)

最底下那层的输入不仅仅是输出给了中间层，而且其与中间层结果相加进入最上层。这样在梯度反传时，最上层梯度可以直接跳过中间层传到最下层，从而避免最下层梯度过小情况。

为什么叫做残差网络呢？我们可以将上面示意图里的结构拆成两个网络的和，一个一层，一个两层，最下面层是共享的。


在训练过程中，左边的网络因为更简单所以更容易训练。这个小网络没有拟合到的部分，或者说残差，则被右边的网络抓取住。所以直观上来说，即使加深网络，跨层连接仍然可以使得底层网络可以充分的训练，从而不会让训练更难。
![image.png](http://zh.gluon.ai/_images/residual2.svg)

### Residual块
ResNet沿用了VGG的那种全用$3×3$卷积，但在卷积和池化层之间加入了批量归一层来加速训练。每次跨层连接跨过两层卷积。这里我们定义一个这样的残差块。注意到如果输入的通道数和输出不一样时（`same_shape=False`），我们使用一个额外的$1×1$卷积来做通道变化，同时使用`strides=2`来把长宽减半。



In [1]:
import tensorflow as tf
slim = tf.contrib.slim

def Residual(input, num_channels, same_shape=True, scope='residual',is_training=False):
    with tf.variable_scope(scope):
        # path 1
        stride = 1 if same_shape else 2
        conv1_1 = slim.conv2d(input, num_channels, [3, 3], stride, scope='conv1_1', activation_fn=None)
        bn1_1 = slim.batch_norm(conv1_1, is_training=is_training)
        relu1_1 = tf.nn.relu(bn1_1)
        conv1_2 = slim.conv2d(relu1_1, num_channels, [3, 3], stride=1, scope='conv1_2', activation_fn=None)
        bn1_2 = slim.batch_norm(conv1_2, is_training=is_training)

        # path 2
        if not same_shape:
            conv2_1 = slim.conv2d(input, num_channels, [1, 1], stride, scope='conv2_1', activation_fn=None)
        else:
            conv2_1 = input
        
        return tf.nn.relu(conv2_1+bn1_2)
    

输入输出通道相同：


In [2]:
import numpy as np

x = np.random.uniform(size=(4, 6, 6, 3)).astype(np.float32)
print Residual(x, 3, scope='test1')


Tensor("test1/Relu_1:0", shape=(4, 6, 6, 3), dtype=float32)


输入输出通道不同：



In [3]:
print Residual(x, 8, same_shape=False, scope='test2')

Tensor("test2/Relu_1:0", shape=(4, 3, 3, 8), dtype=float32)


### 构建ResNet
类似GoogLeNet主体是由Inception块串联而成，ResNet的主体部分串联多个Residual块。下面我们定义18层的ResNet。另外注意到一点是，这里我们没用池化层来减小数据长宽，而是通过有通道变化的Residual块里面的使用`strides=2`的卷积层。



In [4]:
def ResNet(input, num_classes, scope='resnet', is_training=False):
    with tf.variable_scope(scope):
        # block 1
        b1 = slim.conv2d(input, 64, [7, 7], 2, scope='block1')
        print b1
        
        # block 2
        b2_pool = slim.max_pool2d(b1, [3, 3], 2, scope='block2_pool')
        b2_res_1 = Residual(b2_pool, 64, is_training=is_training, scope='b2_res_1')
        b2_res_2 = Residual(b2_res_1, 64, is_training=is_training, scope='b2_res_2')
        print b2_res_2
        
        # block 3
        b3_res_1 = Residual(b2_res_2, 128, same_shape=False, is_training=is_training, scope='b3_res_1')
        b3_res_2 = Residual(b3_res_1, 128, is_training=is_training, scope='b3_res_2')
        print b3_res_2
        
        
        # block 4
        b4_res_1 = Residual(b3_res_2, 256, same_shape=False, is_training=is_training, scope='b4_res_1')
        b4_res_2 = Residual(b4_res_1, 256, is_training=is_training, scope='b4_res_2')
        print b4_res_2
        
        # block 5
        b5_res_1 = Residual(b4_res_2, 512, same_shape=False, is_training=is_training, scope='b5_res_1')
        b5_res_2 = Residual(b5_res_1, 512, is_training=is_training, scope='b5_res_2')
        print b5_res_2
        
        # block 6
        b6_pool = slim.avg_pool2d(b5_res_2, [3, 3], scope='b6_avg_pool')
        return tf.squeeze(slim.fully_connected(b6_pool, num_classes, scope='fc', activation_fn = None))
    

这里演示数据在块之间的形状变化：



In [5]:
x = np.random.uniform(size=(4, 96, 96, 3)).astype(np.float32)
y = ResNet(x, 10)
print y.shape

Tensor("resnet/block1/Relu:0", shape=(4, 48, 48, 64), dtype=float32)
Tensor("resnet/b2_res_2/Relu_1:0", shape=(4, 23, 23, 64), dtype=float32)
Tensor("resnet/b3_res_2/Relu_1:0", shape=(4, 12, 12, 128), dtype=float32)
Tensor("resnet/b4_res_2/Relu_1:0", shape=(4, 6, 6, 256), dtype=float32)
Tensor("resnet/b5_res_2/Relu_1:0", shape=(4, 3, 3, 512), dtype=float32)
(4, 10)


### 获取数据并训练
跟前面类似，但因为有批量归一化，所以使用了较大的学习率。

In [6]:
import sys
import numpy as np

sys.path.append('../../utils')
import utils

data_dir = '../../data/fashion_mnist'
train_images, train_labels, test_images, test_labels = utils.load_data_fashion_mnist(data_dir, one_hot=True)
print train_images.shape
print test_images.shape

from tensorflow.contrib.learn.python.learn.datasets.mnist import DataSet
train_dataset = DataSet(train_images, train_labels, one_hot=True)
test_dataset = DataSet(test_images, test_labels, one_hot=True)

Extracting ../../data/fashion_mnist/train-images-idx3-ubyte.gz
Extracting ../../data/fashion_mnist/train-labels-idx1-ubyte.gz
Extracting ../../data/fashion_mnist/t10k-images-idx3-ubyte.gz
Extracting ../../data/fashion_mnist/t10k-labels-idx1-ubyte.gz
(60000, 28, 28, 1)
(10000, 28, 28, 1)


In [7]:
learning_rate = 1e-1
max_steps = 1000
batch_size = 64
height = width = 28
num_channels = 1
num_outputs = 10

tf.reset_default_graph()

input_placeholder = tf.placeholder(tf.float32, [None, height, width, num_channels])
resize_input = tf.image.resize_images(input_placeholder, [96, 96])
gt_placeholder = tf.placeholder(tf.int64, [None, num_outputs])
is_training = tf.placeholder(tf.bool)


logits = ResNet(resize_input, num_outputs)

loss = tf.losses.softmax_cross_entropy(logits=logits, onehot_labels=gt_placeholder)
acc = utils.accuracy(logits, gt_placeholder)

test_images_reshape = np.reshape(np.squeeze(test_images), (test_images.shape[0], height, width, 1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

train_op = optimizer.minimize(loss)
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)
test_acc = []


for step in range(max_steps):
    data, label = train_dataset.next_batch(batch_size)
    data = np.reshape(data, (batch_size, height, width, num_channels))
    feed_dict = {input_placeholder: data, gt_placeholder: label, is_training: True}
    loss_, acc_, _ = sess.run([loss, acc, train_op], feed_dict=feed_dict)
    print("Batch %d, Loss: %f, Train acc %f " % (step, loss_, acc_))
        
for i in range(100):
    test_data, test_label = test_dataset.next_batch(100)
    test_data = np.reshape(test_data, (100, height, width, num_channels))
    test_loss_, test_acc_ = sess.run([loss, acc], feed_dict={input_placeholder: test_data, gt_placeholder: test_label, is_training: False})
    test_acc.append(test_acc_)
print ("Test Loss: %f, Test acc %f " % (np.mean(test_loss_), np.mean(test_acc_)))


Tensor("resnet/block1/Relu:0", shape=(?, 48, 48, 64), dtype=float32)
Tensor("resnet/b2_res_2/Relu_1:0", shape=(?, 23, 23, 64), dtype=float32)
Tensor("resnet/b3_res_2/Relu_1:0", shape=(?, 12, 12, 128), dtype=float32)
Tensor("resnet/b4_res_2/Relu_1:0", shape=(?, 6, 6, 256), dtype=float32)
Tensor("resnet/b5_res_2/Relu_1:0", shape=(?, 3, 3, 512), dtype=float32)
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

Batch 0, Loss: 2.305954, Train acc 0.078125 
Batch 1, Loss: 2.291968, Train acc 0.109375 
Batch 2, Loss: 2.282601, Train acc 0.078125 
Batch 3, Loss: 2.253130, Train acc 0.125000 
Batch 4, Loss: 2.269571, Train acc 0.125000 
Batch 5, Loss: 2.247785, Train acc 0.187500 
Batch 6, Loss: 2.306907, Train acc 0.156250 
Batch 7, Loss: 2.293811, Train acc 0.093750 
Batch 8, Loss: 2.261243, Train acc 0.078125 
Batch 9, Loss: 2.236231, Train acc 0.296875 
B

Batch 160, Loss: 2.308737, Train acc 0.093750 
Batch 161, Loss: 2.302434, Train acc 0.109375 
Batch 162, Loss: 2.304134, Train acc 0.093750 
Batch 163, Loss: 2.292995, Train acc 0.125000 
Batch 164, Loss: 2.300699, Train acc 0.187500 
Batch 165, Loss: 2.290703, Train acc 0.187500 
Batch 166, Loss: 2.291524, Train acc 0.156250 
Batch 167, Loss: 2.312494, Train acc 0.093750 
Batch 168, Loss: 2.285382, Train acc 0.156250 
Batch 169, Loss: 2.311693, Train acc 0.062500 
Batch 170, Loss: 2.310385, Train acc 0.078125 
Batch 171, Loss: 2.299496, Train acc 0.093750 
Batch 172, Loss: 2.291495, Train acc 0.125000 
Batch 173, Loss: 2.310758, Train acc 0.078125 
Batch 174, Loss: 2.306971, Train acc 0.062500 
Batch 175, Loss: 2.308156, Train acc 0.078125 
Batch 176, Loss: 2.298414, Train acc 0.171875 
Batch 177, Loss: 2.308212, Train acc 0.109375 
Batch 178, Loss: 2.317395, Train acc 0.046875 
Batch 179, Loss: 2.307318, Train acc 0.046875 
Batch 180, Loss: 2.287111, Train acc 0.171875 
Batch 181, Lo

Batch 335, Loss: 1.424173, Train acc 0.484375 
Batch 336, Loss: 2.047081, Train acc 0.296875 
Batch 337, Loss: 3.000291, Train acc 0.062500 
Batch 338, Loss: 3.263195, Train acc 0.171875 
Batch 339, Loss: 2.402110, Train acc 0.125000 
Batch 340, Loss: 2.353728, Train acc 0.062500 
Batch 341, Loss: 2.244970, Train acc 0.218750 
Batch 342, Loss: 2.267758, Train acc 0.187500 
Batch 343, Loss: 2.230744, Train acc 0.187500 
Batch 344, Loss: 2.250272, Train acc 0.140625 
Batch 345, Loss: 2.262671, Train acc 0.109375 
Batch 346, Loss: 2.164300, Train acc 0.125000 
Batch 347, Loss: 2.125915, Train acc 0.250000 
Batch 348, Loss: 2.089604, Train acc 0.109375 
Batch 349, Loss: 2.064756, Train acc 0.203125 
Batch 350, Loss: 1.905453, Train acc 0.312500 
Batch 351, Loss: 1.704512, Train acc 0.375000 
Batch 352, Loss: 1.732079, Train acc 0.250000 
Batch 353, Loss: 1.551878, Train acc 0.437500 
Batch 354, Loss: 1.979902, Train acc 0.187500 
Batch 355, Loss: 3.036088, Train acc 0.218750 
Batch 356, Lo

KeyboardInterrupt: 