# ResNet 残差网络
对神经网络模型添加新的层，充分训练后的模型是否只可能更有效地降低训练误差？理论上，原模型解的空间只是新模型解的空间的子空间。也就是说，如果我们能将新添加的层训练成恒等映射f(x) = xf(x)=x，新模型和原模型将同样有效。由于新模型可能得出更优的解来拟合训练数据集，因此添加层似乎更容易降低训练误差。然而在实践中，添加过多的层后训练误差往往不降反升。即使利用批量归一化带来的数值稳定性使训练深层模型更加容易，该问题仍然存在。针对这一问题，何恺明等人提出了残差网络（ResNet）。它在2015年的ImageNet图像识别挑战赛夺魁，并深刻影响了后来的深度神经网络的设计。

## 残差块
![avatar](./Pictures/resnet.svg)
聚焦于神经网络局部。如图所示，设输入为\boldsymbol{x}x。假设我们希望学出的理想映射为f(\boldsymbol{x})f(x)，从而作为图上方激活函数的输入。左图虚线框中的部分需要直接拟合出该映射f(\boldsymbol{x})f(x)，而右图虚线框中的部分则需要拟合出有关恒等映射的残差映射f(\boldsymbol{x})-\boldsymbol{x}f(x)−x。残差映射在实际中往往更容易优化。以本节开头提到的恒等映射作为我们希望学出的理想映射f(\boldsymbol{x})f(x)。我们只需将图中右图虚线框内上方的加权运算（如仿射）的权重和偏差参数学成0，那么f(\boldsymbol{x})f(x)即为恒等映射。实际中，当理想映射f(\boldsymbol{x})f(x)极接近于恒等映射时，残差映射也易于捕捉恒等映射的细微波动。图右图也是ResNet的基础块，即残差块（residual block）。在残差块中，输入可通过跨层的数据线路更快地向前传播。

ResNet沿用了VGG全$3\times 3$卷积层的设计。残差块里首先有2个有相同输出通道数的$3\times 3$卷积层。每个卷积层后接一个批量归一化层和ReLU激活函数。然后我们将输入跳过这两个卷积运算后直接加在最后的ReLU激活函数前。这样的设计要求两个卷积层的输出与输入形状一样，从而可以相加。如果想改变通道数，就需要引入一个额外的$1\times 1$卷积层来将输入变换成需要的形状后再做相加运算。

## ResNet模型
ResNet的前两层跟之前介绍的GoogLeNet中的一样：在输出通道数为64、步幅为2的7\times 77×7卷积层后接步幅为2的3\times 33×3的最大池化层。不同之处在于ResNet每个卷积层后增加的批量归一化层。

一个模块的通道数同输入通道数一致。由于之前已经使用了步幅为2的最大池化层，所以无须减小高和宽。之后的每个模块在第一个残差块里将上一个模块的通道数翻倍，并将高和宽减半。


In [31]:
import tensorflow as tf
from tensorflow.keras import layers, models, losses, activations
import numpy as np 
import pandas as pd 
import plotly as py 
import plotly.graph_objects as go 
print('Tensorflow version:', tf.__version__)
print('Numpy version:', np.__version__)
print('Pandas version:', pd.__version__)
print('Plotly version:', py.__version__)


Tensorflow version: 2.2.0
Numpy version: 1.18.1
Pandas version: 1.0.1
Plotly version: 4.8.1


In [32]:
class Residual(tf.keras.Model):
    def __init__(self, num_channels, use_1x1conv=False, strides=1, **kwargs):
        super(Residual, self).__init__(**kwargs)
        self.conv1 = layers.Conv2D(num_channels,
                                   padding='same',
                                   kernel_size=3,
                                   strides=strides)
        self.conv2 = layers.Conv2D(num_channels, kernel_size=3,padding='same')
        if use_1x1conv:
            self.conv3 = layers.Conv2D(num_channels,
                                       kernel_size=1,
                                       strides=strides)
        else:
            self.conv3 = None
        self.bn1 = layers.BatchNormalization()
        self.bn2 = layers.BatchNormalization()

    def call(self, X):
        Y = activations.relu(self.bn1(self.conv1(X)))
        Y = self.bn2(self.conv2(Y))
        if self.conv3:
            X = self.conv3(X)
        return activations.relu(Y + X)

class ResnetBlock(tf.keras.layers.Layer):
    def __init__(self,num_channels, num_residuals, first_block=False,**kwargs):
        super(ResnetBlock, self).__init__(**kwargs)
        self.listLayers=[]
        for i in range(num_residuals):
            if i == 0 and not first_block:
                self.listLayers.append(Residual(num_channels, use_1x1conv=True, strides=2))
            else:
                self.listLayers.append(Residual(num_channels))      

    def call(self, X):
        for layer in self.listLayers.layers:
            X = layer(X)
        return X

class ResNet(tf.keras.Model):
    def __init__(self,num_blocks,**kwargs):
        super(ResNet, self).__init__(**kwargs)
        self.conv=layers.Conv2D(64, kernel_size=7, strides=2, padding='same')
        self.bn=layers.BatchNormalization()
        self.relu=layers.Activation('relu')
        self.mp=layers.MaxPool2D(pool_size=3, strides=2, padding='same')
        self.resnet_block1=ResnetBlock(64,num_blocks[0], first_block=True)
        self.resnet_block2=ResnetBlock(128,num_blocks[1])
        self.resnet_block3=ResnetBlock(256,num_blocks[2])
        self.resnet_block4=ResnetBlock(512,num_blocks[3])
        self.gap=layers.GlobalAvgPool2D()
        self.fc=layers.Dense(units=10,activation=tf.keras.activations.softmax)

    def call(self, x):
        x=self.conv(x)
        x=self.bn(x)
        x=self.relu(x)
        x=self.mp(x)
        x=self.resnet_block1(x)
        x=self.resnet_block2(x)
        x=self.resnet_block3(x)
        x=self.resnet_block4(x)
        x=self.gap(x)
        x=self.fc(x)
        return x
    def see_output_shape(self):
        X = tf.random.uniform(shape=(1,  224, 224 , 1))
        for layer in mynet.layers:
            X = layer(X)
            print(layer.name, 'output shape:\t', X.shape)
            
    def train_resnet(self,mynet):
        (x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
        x_train = x_train.reshape((60000, 28, 28, 1)).astype('float32') / 255
        x_test = x_test.reshape((10000, 28, 28, 1)).astype('float32') / 255

        mynet.compile(loss='sparse_categorical_crossentropy',
                      optimizer=tf.keras.optimizers.Adam(),
                      metrics=['accuracy'])
        
        with tf.device('/gpu:0'):
            history = mynet.fit(x_train, y_train,
                                batch_size=64,
                                epochs=5,
                                validation_split=0.2)
        
        test_scores = mynet.evaluate(x_test, y_test, verbose=2)
        return mynet

In [None]:
if __name__ == '__main__':
    mynet=ResNet([2,2,2,2])
    net = mynet.train_resnet(mynet)
## accuracy:0.9117