# 7.6.1. Function Classes

Thus, only if larger function classes contain the smaller ones are we guaranteed that increasing them strictly increases the expressive power of the network. For deep neural networks, if we can train the newly-added layer into an identity function  ùëì(ùê±)=ùê± , the new model will be as effective as the original model. As the new model may get a better solution to fit the training dataset, the added layer might make it easier to reduce training errors.

This is the question that He et al. considered when working on very deep computer vision models [He et al., 2016a]. At the heart of their proposed residual network (ResNet) is the idea that every additional layer should more easily contain the identity function as one of its elements. These considerations are rather profound but they led to a surprisingly simple solution, a residual block. With it, ResNet won the ImageNet Large Scale Visual Recognition Challenge in 2015. The design had a profound influence on how to build deep neural networks.

# 7.6.2. Residual Blocks

![image.png](attachment:image.png)

Regular Blocks: mapping f(x)Î•º directly ÌïôÏäµÌï¥Ïïº ÎêúÎã§. 

Residual Blocks: needs to learn the residual mapping f(x)-x.

Identity mapping(f(x)=x) is desired underlying mapping.
The Residual mapping is easier to learn.: only pushing the weights and biases of the uppder weight layer within the dotted-line box to zero.

solid line carrying the layer input xÎ•º addition operatorÎ°ú ÏòÆÍ∏∞Í≥†ÏûàÏùå. 
=> Ïù¥Îü¨Ìïú Ïó∞ÏÇ∞ÏùÄ residual connection(shortcut connection)Ïù¥ÎùºÍ≥† ÌïúÎã§. 

Inputs can forward propagate faster through residual connections across layers.

### Bottle Neck of ResNet

Convolution Parameters = Kernel Size x Kernel Size x Input Channel x Output Channel

 

BottleNeckÏùò ÌïµÏã¨ÏùÄ 1x1 ConvolutionÏûÖÎãàÎã§. ( Pointwise Convolution Ïù¥ÎùºÍ≥†ÎèÑ Ìï©ÎãàÎã§. Ïù¥Îäî Depthwise Separable ConvolutionÏóêÏÑúÎèÑ ÎòëÍ∞ôÏùÄ ÏõêÎ¶¨Î°ú Ïù¥Ïö©ÎêòÍ∏∞ ÎïåÎ¨∏Ïóê Ïûò ÏïåÏïÑÎëêÎ©¥ Ï¢ãÏäµÎãàÎã§.)

1x1 ConvolutionÏùò ParametersÎäî 1 x 1 x Input Channel x Output ChannelÏûÖÎãàÎã§.

ÎåÄÍ≤å 1x1 ConvolutionÏùÄ Ïó∞ÏÇ∞ÎüâÏù¥ ÏûëÍ∏∞ ÎïåÎ¨∏Ïóê Feature Map(Output Channel)ÏùÑ Ï§ÑÏù¥Í±∞ÎÇò ÌÇ§Ïö∏ Îïå ÏÇ¨Ïö©Îê©ÎãàÎã§.



Ï∂úÏ≤ò: https://coding-yoon.tistory.com/116?category=825914 [ÏΩîÎî©Ïù¥ Ïû¨Î∞åÎã§!]

![image-2.png](attachment:image-2.png)

![image.png](attachment:image.png)

In [20]:
"""
ResNetÏùÄ VGG's full 3x3 Conv Layer designÏùÑ Îî∞Î•∏Îã§. 
residual blockÏùÄ Í∞ôÏùÄ Ï±ÑÎÑê ÏàòÎ•º Í∞ÄÏßÑ 3x3 Conv layer 2Ï∏µÏùÑ Í∞ñÎäîÎã§.

Í∞Å Conv layer Îí§ÏóêÎäî BNÍ≥º ReLUÍ∞Ä ÏûàÎã§. 
ÎßàÏßÄÎßâ ReLU ÏßÑÏûÖ Ï†ÑÏóê inputÏùÑ ÏßÅÏ†ëÏ†ÅÏúºÎ°ú Î∂ôÏó¨Ï£ºÎ©¥ ÎèÑÎãàÎã§. 
=> ConvÏùò ÏûÖÎ†•Í≥º Ï∂úÎ†•Ïùò shapeÏù¥ Ï†ïÌôïÌûà ÎèôÏùºÌï¥Ïïº ÎêúÎã§.

num_channelsÎ•º Î∞îÍæ∏Í≥† Ïã∂Îã§Î©¥, Ï∂îÍ∞ÄÏ†ÅÏù∏ 1x1 Conv layerÎ•º ÏÇ¨Ïö©ÌïòÏó¨, 
inputÏùÑ ÏõêÌïòÎäî shapeÏúºÎ°ú Î∞îÍøîÏ§ÄÎã§.
"""

import torch
from torch import nn
from torch.nn import functional as F
from d2l import torch as d2l

class Residual(nn.Module):  #@save
    """The Residual block of ResNet."""
    def __init__(self, input_channels, num_channels, use_1x1conv=False,
                 strides=1):
        super().__init__()
        self.conv1 = nn.Conv2d(input_channels, num_channels, kernel_size=3,
                               padding=1, stride=strides)
        self.conv2 = nn.Conv2d(num_channels, num_channels, kernel_size=3,
                               padding=1)
        if use_1x1conv:
            self.conv3 = nn.Conv2d(input_channels, num_channels,
                                   kernel_size=1, stride=strides)
        else:
            self.conv3 = None
        self.bn1 = nn.BatchNorm2d(num_channels)
        self.bn2 = nn.BatchNorm2d(num_channels)

    def forward(self, X):
        Y = F.relu(self.bn1(self.conv1(X)))
        Y = self.bn2(self.conv2(Y))
        if self.conv3:
            X = self.conv3(X)
        Y += X
        return F.relu(Y)

![image.png](attachment:image.png)

-> ResNet block with and without 1x1 convolution

use_1x1conv:False

====Ïö∞Î¶¨Í∞Ä inputÏùÑ outputÏóê ReLUÏùò nonlinearityÎ•º Ï†ÅÏö©ÌïòÍ∏∞ Ï†ÑÏóê addÌï¥Ï§ÄÎã§.

use_1x1conv:True

====1X1 convÎ•º adding Ï†ÑÏóê Ìï¥Ï§åÏúºÎ°úÏç®, Îã®ÏàúÌûà Ï±ÑÎÑê Í∞úÏàòÏôÄ resolutionÏùÑ Ï°∞Ï†ïÌï¥Ï§ÄÎã§.

In [21]:
blk = Residual(3, 3)
X = torch.rand(4, 3, 6, 6)
Y = blk(X)
Y.shape

torch.Size([4, 3, 6, 6])

In [22]:
blk = Residual(3, 6, use_1x1conv=True, strides=2)
blk(X).shape

torch.Size([4, 6, 3, 3])

In [23]:
blk = Residual(3, 3)
X = torch.rand(6, 3, 8, 8)
Y = blk(X)
Y.shape

torch.Size([6, 3, 8, 8])

Ï≤òÏùåÏùò 2Ï∏µÏùÄ googleNetÍ≥º Ïú†ÏÇ¨ÌïòÎã§. 7x7 conv layer with 64 outputchannelsÎäî 3x3 conv layerÎ•º Îí§Î°ú ÎëîÎã§. 

Ï∞®Ïù¥Ï†êÏùÄ Í∞Å conv layer Îí§Ïóê ÏûàÎäî batch normalization layerÏùò Ï°¥Ïû¨Ïù¥Îã§.

In [25]:
b1 = nn.Sequential(nn.Conv2d(1,64,kernel_size=7,stride=2,padding=3),
                  nn.BatchNorm2d(64), nn.ReLU(),
                  nn.MaxPool2d(kernel_size=3,stride=2,padding=1))

googleNetÏùÄ inception blockÏúºÎ°ú Íµ¨ÏÑ±Îêú 4Í∞úÏùò moduleÏùÑ ÏÇ¨Ïö©ÌïòÏßÄÎßå, ResnetÏùÄ residual blockÏúºÎ°ú Íµ¨ÏÑ±Îêú 4Í∞úÏùò moduleÏùÑ ÏÇ¨Ïö©ÌïúÎã§. 

1Î≤àÏß∏ Î™®ÎìàÏùò Ï±ÑÎÑê Í∞úÏàòÎäî input channelÏùò Í∞úÏàòÏôÄ Í∞ôÎã§.
Ïù¥Îäî maximum pooling layer(with stride 2)Í∞Ä Ïù¥ÎØ∏ ÏÇ¨Ïö©ÎêòÏñ¥ÏÑú height, widthÎ•º Ï§ÑÏù¥Îäî Í≤ÉÏù¥ ÌïÑÏàòÏ†ÅÏù¥ÏßÄÎäî ÏïäÍ≤å ÎêòÏóàÍ∏∞ ÎïåÎ¨∏Ïù¥Îã§. 

first residual block for each of subsequent modulesÏóêÏÑú Ïù¥Ï†Ñ Î™®ÎìàÍ≥º ÎπÑÍµêÌï†Îïå, channelÍ∞úÏàòÎäî ÎëêÎ∞∞Í∞Ä ÎêòÎ©∞, height, widthÎäî 1/2Ïù¥ ÎêúÎã§

Ï≤´Î≤àÏß∏ Î™®ÎìàÏóêÏÑú special processingÏù¥ ÏßÑÌñâÎêúÎã§Îäî Í≤ÉÏùÑ Ïú†ÏùòÌïòÍ≥† Îã§ÏùåÏùÑ Î≥¥Ïûê.

In [26]:
def resnet_block(input_channels, num_channels, num_residuals, first_block=False):
    blk=[]
    for i in range(num_residuals):
        if i==0 and not first_block:
            blk.append(
            Residual(input_channels, num_channels, use_1x1conv=True,strides=2))
        else:
            blk.append(Residual(num_channels, num_channels))
    return blk

In [27]:
# Í∑∏ Îã§Ïùå Î™®Îì† residual blocksÏùÑ Í∞Å Î™®ÎìàÏùÑ Íµ¨ÏÑ±ÌïòÎäîÎç∞ ÏÇ¨Ïö©ÌïúÎã§.

b2 = nn.Sequential(*resnet_block(64,64,2,first_block=True))
b3 = nn.Sequential(*resnet_block(64,128,2))
b4 = nn.Sequential(*resnet_block(128,256,2))
b5 = nn.Sequential(*resnet_block(256,512,2))

net = nn.Sequential(b1,b2,b3,b4,b5, nn.AdaptiveAvgPool2d((1,1)),nn.Flatten(),nn.Linear(512,10))

There are 4 convolutional layers in each module (excluding the  1√ó1  convolutional layer). Together with the first  7√ó7  convolutional layer and the final fully-connected layer, there are 18 layers in total. Therefore, this model is commonly known as ResNet-18. By configuring different numbers of channels and residual blocks in the module, we can create different ResNet models, such as the deeper 152-layer ResNet-152. Although the main architecture of ResNet is similar to that of GoogLeNet, ResNet‚Äôs structure is simpler and easier to modify. All these factors have resulted in the rapid and widespread use of ResNet. Fig. 7.6.4 depicts the full ResNet-18.

![image.png](attachment:image.png)

In [29]:
''' 
ResNetÏùò Í∞Å Î™®ÎìàÏóêÏÑú inputÏùò shapeÍ∞Ä Ïñ¥ÎñªÍ≤å Î≥ÄÌïòÎäîÏßÄ Î≥¥Ïûê.
As in all the previous architectures, 
the resolution decreases while the number of channels increases 
up until the point 
where a global average pooling layer aggregates all features.
'''

X = torch.rand(size=(1, 1, 224, 224))
for layer in net:
    X = layer(X)
    print(layer.__class__.__name__, 'output shape:\t', X.shape)

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


Sequential output shape:	 torch.Size([1, 64, 56, 56])
Sequential output shape:	 torch.Size([1, 64, 56, 56])
Sequential output shape:	 torch.Size([1, 128, 28, 28])
Sequential output shape:	 torch.Size([1, 256, 14, 14])
Sequential output shape:	 torch.Size([1, 512, 7, 7])
AdaptiveAvgPool2d output shape:	 torch.Size([1, 512, 1, 1])
Flatten output shape:	 torch.Size([1, 512])
Linear output shape:	 torch.Size([1, 10])


# Summary

Nested function classes are desirable. Learning an additional layer in deep neural networks as an identity function (though this is an extreme case) should be made easy.

The residual mapping can learn the identity function more easily, such as pushing parameters in the weight layer to zero.

We can train an effective deep neural network by having residual blocks. Inputs can forward propagate faster through the residual connections across layers.

ResNet had a major influence on the design of subsequent deep neural networks, both for convolutional and sequential nature.