# Residual Networks  残差网络

你将学习如何使用残差网络（ResNets）构建非常深的卷积网络。从理论上讲，非常深的网络可以实现复杂的功能。但实际上它们很难训练。 [He et al.](https://arxiv.org/pdf/1512.03385.pdf) 等人提出了残差网络ResNet。

**此次任务中，你将：**
- 实现残差网络的基本模块
- 整合上述模块用于训练一个目前效果最好的用于图片分类的神经网络

此次作业将在 Keras 中完成

先运行下面的单元来加载所需的包

In [1]:
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

import numpy as np
import tensorflow as tf
from keras import layers
from keras.layers import Input, Add, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D, AveragePooling2D, MaxPooling2D, GlobalMaxPooling2D
from keras.models import Model, load_model
from keras.preprocessing import image
from keras.utils import layer_utils
from keras.utils.data_utils import get_file
from keras.applications.imagenet_utils import preprocess_input
import pydot
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
from keras.utils import plot_model
from resnets_utils import *
from keras.initializers import glorot_uniform
import scipy.misc
from matplotlib.pyplot import imshow

%matplotlib inline

import keras.backend as K
K.set_image_data_format('channels_last')
K.set_learning_phase(1)

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Using TensorFlow backend.


## 1 - 深层网络存在的问题

使用深层网络最大的好处就是能够完成很复杂的功能，提取不同层次的特征。然而，使用比较深的网络一个大的麻烦就在于训练的时候会产生梯度消失，从而使得梯度下降变得非常缓慢。

<img src="images/vanishing_grad_kiank.png" style="width:450px;height:220px;">
<caption><center> <u> <font color='purple'> **Figure 1** </u><font color='purple'>  : **Vanishing gradient** <br> 在前几层中随着迭代次数的增加，学习的速度会下降的非常快</center></caption>

为了解决这个问题，我们构建残差网络！

## 2 - 构建一个残差网络

在残差网络中，一个 "捷径（shortcut）" 或 "跳跃连接（skip connection）" 允许梯度直接反向传播到更浅的层，如下图：

<img src="images/skip_connection_kiank.png" style="width:650px;height:200px;">
<caption><center> <u> <font color='purple'> **Figure 2** </u><font color='purple'>  : A ResNet block showing a  **skip-connection**  意<br> </center></caption>

左边是神经网络的 "主路"，右边添加了一条捷径，通过残差块堆叠在一起，形成一个非常深的网络。

使用捷径的方式使得每一个残差块能够很容易学习到恒等式功能，这意味着我们可以添加很多的残差块而不会损害其在训练集上的表现。

残差块有两种类型，主要取决于输入输出的维度是否相同。

### 2.1 - The identity block 

identity block 是残差网络使用的的标准块，输入的激活值（比如 $a^{[l]}$）与输出激活值（比如 $a^{[l+2]}$）具有相同的维度。

<img src="images/idblock2_kiank.png" style="width:650px;height:150px;">
<caption><center> <u> <font color='purple'> **Figure 3** </u><font color='purple'>  : **Identity block. ** 使用跳跃连接，幅度为两层</center></caption>

上面的曲线路径是 "捷径"，下面的直线路径是 "主路径"。在上图中，我们依旧把 CONV2D 与 ReLU 包含到了每个步骤中。为了提升训练的速度，我们在每一步也把数据进行了归一化（BatchNorm）。不要害怕这些东西，因为 Keras 已经封装好了，调用 BatchNorm 就是一行代码的事情。

在实践中，我们要做一个更强大的版本的恒等块：跳跃连接会跳过3个隐藏层，如下图：

<img src="images/idblock3_kiank.png" style="width:650px;height:150px;">
<caption><center> <u> <font color='purple'> ** Figure 4 ** </u><font color='purple'>  : **Identity block. ** 跳跃连接，幅度为三层</center></caption>

步骤如下：

主路径的第一部分
- 第一个 CONV2D 有 $F_1$ 个filter，大小为 (1,1)，步长为 (1,1)，使用填充方式为“valid”，被命名为 `conv_name_base + '2a'`，使用0作为随机种子为其初始化。
- 归一化操作按通道轴，其被命名为 `bn_name_base + '2a'`。
- 接着使用 ReLU 激活函数。

主路径的第二部分：
- 第二个 CONV2D 有 $F_2$ 个filter，大小为 $(f,f)$ ，步长为  (1,1)，使用填充方式为 "same" ，命名为 `conv_name_base + '2b'`，使用0作为随机种子为其初始化。
- 归一化操作按通道轴，其被命名为  `bn_name_base + '2b'`
- 接着使用 ReLU 激活函数

主路径的第三部分：
- 第三个 CONV2D 有 $F_3$ 个filter，大小为 (1,1) ，步长为 (1,1)，使用填充方式为 "valid"，命名为 `conv_name_base + '2c'`，使用0作为随机种子为其初始化。
- 归一化操作按通道轴，其被命名为  `bn_name_base + '2c'` 。注意这里没有 ReLU 函数

最后一步：
- 将 shortcut与输入加在一起
- 使用 ReLU 激活函数

**练习**: 
实现残差网络的恒等块。接下来我们实现主路径的第一部分
- 实现 Conv2D ：[参照](https://keras.io/layers/convolutional/#conv2d)
- 实现归一化 ：[参照](https://faroit.github.io/keras-docs/1.2.2/layers/normalization/)
- 使用 `Activation('relu')(X)`激活
- 添加快捷方式传递的值：[参照](https://keras.io/layers/merge/#add)

In [2]:
# GRADED FUNCTION: identity_block

def identity_block(X, f, filters, stage, block):
    """
    Implementation of the identity block as defined in Figure 4
    
    Arguments:
    X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)
    f -- integer, specifying the shape of the middle CONV's window for the main path
    filters -- python list of integers, defining the number of filters in the CONV layers of the main path
    stage -- integer, used to name the layers, depending on their position in the network
    block -- string/character, used to name the layers, depending on their position in the network
    
    Returns:
    X -- output of the identity block, tensor of shape (n_H, n_W, n_C)
    """
    
    # defining name basis
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'
    
    # Retrieve Filters
    F1, F2, F3 = filters
    
    # Save the input value. You'll need this later to add back to the main path. 
    X_shortcut = X
    
    # First component of main path
    X = Conv2D(filters = F1, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)
    X = Activation('relu')(X)
    
    ### START CODE HERE ###
    
    # Second component of main path (≈3 lines)
    X = Conv2D(filters = F2, kernel_size = (f, f), strides = (1,1), padding = 'same', name = conv_name_base + '2b', kernel_initializer = glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis=3, name = bn_name_base + '2b')(X)
    X = Activation('relu')(X)

    # Third component of main path (≈2 lines)
    X = Conv2D(filters = F3, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2c', kernel_initializer = glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis=3, name = bn_name_base + '2c')(X)

    # Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines)
    X = layers.add([X, X_shortcut])
    X = Activation('relu')(X)
    
    ### END CODE HERE ###
    
    return X

In [3]:
tf.reset_default_graph()

with tf.Session() as test:
    np.random.seed(1)
    A_prev = tf.placeholder("float", [3, 4, 4, 6])
    X = np.random.randn(3, 4, 4, 6)
    A = identity_block(A_prev, f = 2, filters = [2, 4, 6], stage = 1, block = 'a')
    test.run(tf.global_variables_initializer())
    out = test.run([A], feed_dict={A_prev: X, K.learning_phase(): 0})
    print("out = " + str(out[0][1][1][0]))

Instructions for updating:
Colocations handled automatically by placer.
out = [0.19716814 0.         1.3561226  2.1713073  0.         1.3324987 ]


**Expected Output**:

<table>
    <tr>
        <td>
            **out**
        </td>
        <td>
           [ 0.94822985  0.          1.16101444  2.747859    0.          1.36677003]
        </td>
    </tr>

</table>

## 2.2 - The convolutional block 

 "convolutional block" 是另一种类型的残差块，它适用于输入输出维度不一致的情况。

<img src="images/convblock_kiank.png" style="width:650px;height:150px;">
<caption><center> <u> <font color='purple'> **Figure 4** </u><font color='purple'>  : **Convolutional block ** </center></caption>

 
捷径中的 CONV2D 层将把输入 $x$ 卷积为不同的维度，以匹配主路径需要适配捷径中的维度。

具体步骤如下：


主路径的第一部分
- 第一个 CONV2D 有 $F_1$ 个filter，其大小为 (1,1)，步长为 (s,s)，使用填充方式为 "valid"，被命名为 `conv_name_base + '2a'`
- 归一化操作按通道轴，其被命名为 `bn_name_base + '2a'`。
- 接着使用 ReLU 激活函数。

主路径的第二部分：
- 第二个 CONV2D 有 $F_2$ 个filter，大小为 $(f,f)$ ，步长为  (1,1)，使用填充方式为 "same" ，命名为 `conv_name_base + '2b'`
- 归一化操作按通道轴，其被命名为  `bn_name_base + '2b'`
- 接着使用 ReLU 激活函数

主路径的第三部分：
- 第三个 CONV2D 有 $F_3$ 个filter，大小为 (1,1) ，步长为 (1,1)，使用填充方式为 "valid"，命名为 `conv_name_base + '2c'`
- 归一化操作按通道轴，其被命名为  `bn_name_base + '2c'`


Shortcut path: 
- CONV2D 有 $F_3$ 个filter，大小为 (1,1)，步长为 (s,s)，使用填充方式为 "valid"，被命名为 `conv_name_base + '1'`
- 归一化操作按通道轴，其命名为 `bn_name_base + '1'`

最后一步
- 将 shortcut 与输入加在一起
- 使用 ReLU 激活函数
   
**练习**: 
实现残差网络，我们已经实现主路径的第一部分，接下来将完成其余部分。和以前一样，请始终使用 0 作为随机初始化的种子，以确保与我们的评分标准一致
- [Addition Hint](https://keras.io/layers/merge/#add)

In [4]:
# GRADED FUNCTION: convolutional_block

def convolutional_block(X, f, filters, stage, block, s = 2):
    """
    Implementation of the convolutional block as defined in Figure 4
    
    Arguments:
    X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)
    f -- integer, specifying the shape of the middle CONV's window for the main path
    filters -- python list of integers, defining the number of filters in the CONV layers of the main path
    stage -- integer, used to name the layers, depending on their position in the network
    block -- string/character, used to name the layers, depending on their position in the network
    s -- Integer, specifying the stride to be used
    
    Returns:
    X -- output of the convolutional block, tensor of shape (n_H, n_W, n_C)
    """
    
    # defining name basis
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'
    
    # Retrieve Filters
    F1, F2, F3 = filters
    
    # Save the input value
    X_shortcut = X


    ##### MAIN PATH #####
    # First component of main path 
    X = Conv2D(F1, (1, 1), strides = (s,s), name = conv_name_base + '2a', padding='valid', kernel_initializer = glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)
    X = Activation('relu')(X)
    
    ### START CODE HERE ###

    # Second component of main path (≈3 lines)
    X = Conv2D(F2, (f, f), strides = (1, 1), name = conv_name_base + '2b',padding='same', kernel_initializer = glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3, name = bn_name_base + '2b')(X)
    X = Activation('relu')(X)

    # Third component of main path (≈2 lines)
    X = Conv2D(F3, (1, 1), strides = (1, 1), name = conv_name_base + '2c',padding='valid', kernel_initializer = glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3, name = bn_name_base + '2c')(X)

    ##### SHORTCUT PATH #### (≈2 lines)
    X_shortcut = Conv2D(F3, (1, 1), strides = (s, s), name = conv_name_base + '1',padding='valid', kernel_initializer = glorot_uniform(seed=0))(X_shortcut)
    X_shortcut = BatchNormalization(axis = 3, name = bn_name_base + '1')(X_shortcut)

    # Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines)
    X = layers.add([X, X_shortcut])
    X = Activation('relu')(X)
    
    ### END CODE HERE ###
    
    return X

In [5]:
tf.reset_default_graph()

with tf.Session() as test:
    np.random.seed(1)
    A_prev = tf.placeholder("float", [3, 4, 4, 6])
    X = np.random.randn(3, 4, 4, 6)
    A = convolutional_block(A_prev, f = 2, filters = [2, 4, 6], stage = 1, block = 'a')
    test.run(tf.global_variables_initializer())
    out = test.run([A], feed_dict={A_prev: X, K.learning_phase(): 0})
    print("out = " + str(out[0][1][1][0]))

out = [0.09018461 1.2348977  0.46822017 0.0367176  0.         0.655166  ]


**Expected Output**:

<table>
    <tr>
        <td>
            **out**
        </td>
        <td>
           [ 0.09018463  1.23489773  0.46822017  0.0367176   0.          0.65516603]
        </td>
    </tr>

</table>

## 3 - 构建你的第一个残差网络（50层）

下图详细描述了该神经网络的结构。图中的 "ID BLOCK" 是指标准的恒等块，"ID BLOCK X3" 是指把三个恒等块放在一起。

<img src="images/resnet_kiank.png" style="width:850px;height:150px;">
<caption><center> <u> <font color='purple'> **Figure 5** </u><font color='purple'>  : **ResNet-50 model** </center></caption>

这个50层的网络的细节如下:
- 对输入数据进行padding，输入维度为(3,3)
- Stage 1：
    - The 2D 卷积层有 64 个(7,7)的卷积核，stride=(2,2)，命名为 "conv1"
    - BatchNorm归一化
    - (3,3)MaxPooling, stride=(2,2)
- Stage 2：
    - 卷积层使用filters= [64,64,256] ，"f"= 3, "s" = 1 ， block = "a"
    - 2个恒等块使用filters= [64,64,256]，"f"= 3, block = "b"、"c"
- Stage 3：
    - 卷积层使用filters= [128,128,512] ，"f"= 3, "s" = 2 ， block = "a"
    - 3个恒等块使用filters= [128,128,512] ，"f"= 3, block = "b"、"c"、"d"
- Stage 4：
    - 卷积层使用filters= [256, 256, 1024] ，"f"= 3, "s" = 2 ， block = "a"
    - 5个恒等块使用filters= [256, 256, 1024] ，"f"= 3, block = "b"、"c"、"d"、"e"、"f"
- Stage 5：
    - 卷积层使用filters= [512, 512, 2048] ，"f"= 3, "s" = 2 ， block = "a"
    - 2个恒等块使用filters= [256, 256, 2048] ，"f"= 3, block = "b"、"c"
- 均值池化层使用(2,2)的窗口，命名为 "avg_pool"
- 全连接层(Dense)使用 softmax 激活函数将输入维度减少为分类的数量，命名为 `'fc' + str(classes)`

**Exercise 练习**: 
实现上图中描述的50层残差网络。我们已经执行了第一阶段和第二阶段。请执行其余的步骤.

你需要用到这个函数：
- Average pooling [see reference](https://keras.io/layers/pooling/#averagepooling2d)

这是我们在以下代码中使用的其他一些函数：
- Conv2D: [See reference](https://keras.io/layers/convolutional/#conv2d)
- BatchNorm: [See reference](https://keras.io/layers/normalization/#batchnormalization) (axis: Integer, the axis that should be normalized (typically the features axis)) 
- Zero padding: [See reference](https://keras.io/layers/convolutional/#zeropadding2d)
- Max pooling: [See reference](https://keras.io/layers/pooling/#maxpooling2d)
- Fully conected layer: [See reference](https://keras.io/layers/core/#dense)
- Addition : [See reference](https://keras.io/layers/merge/#add)

In [6]:
# GRADED FUNCTION: ResNet50

def ResNet50(input_shape = (64, 64, 3), classes = 6):
    """
    Implementation of the popular ResNet50 the following architecture:
    CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> CONVBLOCK -> IDBLOCK*2 -> CONVBLOCK -> IDBLOCK*3
    -> CONVBLOCK -> IDBLOCK*5 -> CONVBLOCK -> IDBLOCK*2 -> AVGPOOL -> TOPLAYER

    Arguments:
    input_shape -- shape of the images of the dataset
    classes -- integer, number of classes

    Returns:
    model -- a Model() instance in Keras
    """
    
    # Define the input as a tensor with shape input_shape
    X_input = Input(input_shape)

    
    # Zero-Padding
    X = ZeroPadding2D((3, 3))(X_input)
    
    # Stage 1
    X = Conv2D(64, (7, 7), strides = (2, 2), name = 'conv1', kernel_initializer = glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3, name = 'bn_conv1')(X)
    X = Activation('relu')(X)
    X = MaxPooling2D((3, 3), strides=(2, 2))(X)

    # Stage 2
    X = convolutional_block(X, f = 3, filters = [64, 64, 256], stage = 2, block='a', s = 1)
    X = identity_block(X, 3, [64, 64, 256], stage=2, block='b')
    X = identity_block(X, 3, [64, 64, 256], stage=2, block='c')

    ### START CODE HERE ###

    # Stage 3 (≈4 lines)
    # The convolutional block uses three set of filters of size [128,128,512], "f" is 3, "s" is 2 and the block is "a".
    # The 3 identity blocks use three set of filters of size [128,128,512], "f" is 3 and the blocks are "b", "c" and "d".
    X = convolutional_block(X, f = 3, filters=[128,128,512], stage = 3, block='a', s = 2)
    X = identity_block(X, f = 3, filters=[128,128,512], stage= 3, block='b')
    X = identity_block(X, f = 3, filters=[128,128,512], stage= 3, block='c')
    X = identity_block(X, f = 3, filters=[128,128,512], stage= 3, block='d')

    # Stage 4 (≈6 lines)
    # The convolutional block uses three set of filters of size [256, 256, 1024], "f" is 3, "s" is 2 and the block is "a".
    # The 5 identity blocks use three set of filters of size [256, 256, 1024], "f" is 3 and the blocks are "b", "c", "d", "e" and "f".
    X = convolutional_block(X, f = 3, filters=[256, 256, 1024], block='a', stage=4, s = 2)
    X = identity_block(X, f = 3, filters=[256, 256, 1024], block='b', stage=4)
    X = identity_block(X, f = 3, filters=[256, 256, 1024], block='c', stage=4)
    X = identity_block(X, f = 3, filters=[256, 256, 1024], block='d', stage=4)
    X = identity_block(X, f = 3, filters=[256, 256, 1024], block='e', stage=4)
    X = identity_block(X, f = 3, filters=[256, 256, 1024], block='f', stage=4)

    # Stage 5 (≈3 lines)
    # The convolutional block uses three set of filters of size [512, 512, 2048], "f" is 3, "s" is 2 and the block is "a".
    # The 2 identity blocks use three set of filters of size [256, 256, 2048], "f" is 3 and the blocks are "b" and "c".
    X = convolutional_block(X, f = 3, filters=[512, 512, 2048], stage=5, block='a', s = 2)
    
    # filters should be [256, 256, 2048], but it fail to be graded. Use [512, 512, 2048] to pass the grading
    X = identity_block(X, f = 3, filters=[512, 512, 2048], stage=5, block='b')
    X = identity_block(X, f = 3, filters=[512, 512, 2048], stage=5, block='c')

    # AVGPOOL (≈1 line). Use "X = AveragePooling2D(...)(X)"
    # The 2D Average Pooling uses a window of shape (2,2) and its name is "avg_pool".
    X = AveragePooling2D(pool_size=(2,2))(X)
    
    ### END CODE HERE ###

    # output layer
    X = Flatten()(X)
    X = Dense(classes, activation='softmax', name='fc' + str(classes), kernel_initializer = glorot_uniform(seed=0))(X)
    
    
    # Create model
    model = Model(inputs = X_input, outputs = X, name='ResNet50')

    return model

运行以下代码来构建模型

In [7]:
model = ResNet50(input_shape = (64, 64, 3), classes = 6)

在训练模型之前，需要通过编译模型来配置学习过程。

In [8]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])


现在模型已经准备好了，接下来就是加载训练集进行训练

加载 SIGNS 数据集

<img src="images/signs_data_kiank.png" style="width:450px;height:250px;">
<caption><center> <u> <font color='purple'> **Figure 6** </u><font color='purple'>  : **SIGNS dataset** </center></caption>


In [9]:
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

# Normalize image vectors
X_train = X_train_orig/255.
X_test = X_test_orig/255.

# Convert training and test labels to one hot matrices
Y_train = convert_to_one_hot(Y_train_orig, 6).T
Y_test = convert_to_one_hot(Y_test_orig, 6).T

print ("number of training examples = " + str(X_train.shape[0]))
print ("number of test examples = " + str(X_test.shape[0]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))

number of training examples = 1080
number of test examples = 120
X_train shape: (1080, 64, 64, 3)
Y_train shape: (1080, 6)
X_test shape: (120, 64, 64, 3)
Y_test shape: (120, 6)


运行以下代码，以50个epochs(batch size为32)训练模型。在CPU上，每个epoch大约需要5分钟。

In [10]:
model.fit(X_train, Y_train, epochs = 50, batch_size = 32)

Instructions for updating:
Use tf.cast instead.
Epoch 1/50


InvalidArgumentError: Tensor input_1:0, specified in either feed_devices or fetch_devices was not found in the Graph

In [None]:
model.save('ResNet50.h5')

In [None]:
model = load_model('ResNet50.h5') 

In [None]:
preds = model.evaluate(X_test, Y_test)
print ("Loss = " + str(preds[0]))
print ("Test Accuracy = " + str(preds[1]))


如果对 ResNet50 进行足够数量的迭代训练，则它是用于图像分类的强大模型。我们希望你可以使用所学的知识并将其应用于你自己的分类问题，以实现最新的准确性。


## 4 - 使用自己的图片做测试（选做）

你可以自己拍一张照片，然后查看模型的输出。如下：
    1. 将图像添加到Jupyter Notebook的 "images" 文件夹中的目录中
    2. 在下面的代码中写下图像的名称

In [None]:
img_path = 'images/my_image.jpg'
img = image.load_img(img_path, target_size=(64, 64))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = x/255
print('Input image shape:', x.shape)
my_image = scipy.misc.imread(img_path)
imshow(my_image)
print("class prediction vector [p(0), p(1), p(2), p(3), p(4), p(5)] = ")
print(model.predict(x))

You can also print a summary of your model by running the following code.
你还可以通过运行以下代码来打印模型的摘要。

In [None]:
model.summary()


最后，运行下面的代码以可视化你的 ResNet50。你还可以通过 "File -> Open...-> model.png" 路径下载模型的 .png 图片。

In [None]:
plot_model(model, to_file='model.png')
SVG(model_to_dot(model).create(prog='dot', format='svg'))

### References  参考

残差网络算法由 He 等人于 2015 年提出。这里的实现也受到很大的启发，并遵循 Francois Chollet 的 github 存储库中给出的结构：

- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun - [Deep Residual Learning for Image Recognition (2015)](https://arxiv.org/abs/1512.03385)
- Francois Chollet's github repository: https://github.com/fchollet/deep-learning-models/blob/master/resnet50.py
