# 7.4.1 Inception Blocks

structure that combined the strengths of NiN and Paradigms of Repeated Blocks.

Which sized convolution kernels are best. -> focus point

Previous popular networks employed choices as small as 1x1. 
and large as 11x11.

This paper employ combination of variously-sized kernels.

Omitted version (omit a few ad-hoc features -> added to stabilize training but are unecessary now with better training algorithms available)

![image.png](attachment:image.png)

Basic Convolution block in GoogLeNet is called Inception block.

The Inception block consists of four parallel paths. 

First three use convolution layers with window size of 1x1, 3x3, 5x5. These are used to extract information from different spatial sizes.

Two middle paths perform 1x1 convolution on the input to reduce the number of channels, reduing model's complexity. 

The fourth of path uses a 3x3 max pooling layer. followed by 1x1 conv layer(to change number of channels).

Finally, outputs along each paths are concatenated along the channel dimension and comprise the block's output.

Hyperparameters of Inception block are the number of output channels per layer.

-> 다양한 크기의 filter는 다양한 공간적 detail에 대한 정보를 얻을수 있다.

In [16]:
# Create Inception Block
import torch
from torch import nn
from torch.nn import functional as F
from d2l import torch as d2l

class Inception(nn.Module):
    
    def __init__(self, in_channels, c1, c2, c3, c4, **kwargs):
        super(Inception, self).__init__(**kwargs) 
        # Path 1
        self.p1_1 = nn.Conv2d(in_channels, c1, kernel_size=1)
        # Path 2
        self.p2_1 = nn.Conv2d(in_channels, c2[0], kernel_size=1)
        self.p2_2 = nn.Conv2d(c2[0], c2[1], kernel_size=3, padding=1)
        # Path 3
        self.p3_1 = nn.Conv2d(in_channels, c3[0], kernel_size=1)
        self.p3_2 = nn.Conv2d(c3[0], c3[1], kernel_size=5, padding=2)
        # Path 4
        self.p4_1 = nn.MaxPool2d(3, stride=1,padding=1)
        self.p4_2 = nn.Conv2d(in_channels, c4, kernel_size=1)
    '''
    def __init__(self, in_channels, c1, c2, c3, c4, **kwargs):
        super(Inception, self).__init__(**kwargs)
        # Path 1 is a single 1 x 1 convolutional layer
        self.p1_1 = nn.Conv2d(in_channels, c1, kernel_size=1)
        # Path 2 is a 1 x 1 convolutional layer followed by a 3 x 3
        # convolutional layer
        self.p2_1 = nn.Conv2d(in_channels, c2[0], kernel_size=1)
        self.p2_2 = nn.Conv2d(c2[0], c2[1], kernel_size=3, padding=1)
        # Path 3 is a 1 x 1 convolutional layer followed by a 5 x 5
        # convolutional layer
        self.p3_1 = nn.Conv2d(in_channels, c3[0], kernel_size=1)
        self.p3_2 = nn.Conv2d(c3[0], c3[1], kernel_size=5, padding=2)
        # Path 4 is a 3 x 3 maximum pooling layer followed by a 1 x 1
        # convolutional layer
        self.p4_1 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)
        self.p4_2 = nn.Conv2d(in_channels, c4, kernel_size=1)
    '''
    def forward(self, x):
        p1 = F.relu(self.p1_1(x))
        p2 = F.relu(self.p2_2(F.relu(self.p2_1(x))))
        p3 = F.relu(self.p3_2(F.relu(self.p3_1(x))))
        p4 = F.relu(self.p4_2(self.p4_1(x)))
        # Concatenate the outputs on the channel dimension
        return torch.cat((p1, p2, p3, p4), dim=1)   

# 7.4.2. GoogLeNet Model

![image.png](attachment:image.png)

9개의 inception blocks을 사용하였고, 최종적으로 Global AvgPool을 사용함. 

Maxpool은 인셉션의 dimensionality를 줄여준다. 

first module은 AlexNet, LeNet과 비슷하고, stack of blocks를 사용한 것은 VGG와 비슷하다. 또한 마지막에 GlobalAvgPool을 fully-connected layer 대신 사용해주었다.

In [17]:
# MaxPool layer 기준으로 b를 나눔

b1 = nn.Sequential(nn.Conv2d(1,64,kernel_size=7,stride=2,padding=3),nn.ReLU(),
                  nn.MaxPool2d(3, stride=2, padding=1))
b2 = nn.Sequential(nn.Conv2d(64,64,kernel_size=1),nn.ReLU(),
                  nn.Conv2d(64,192,kernel_size=3,padding=1),nn.ReLU(),
                  nn.MaxPool2d(3,stride=2,padding=1))
b3 = nn.Sequential(Inception(192,64,(96,128),(16,32),32),
# of output channel of 1st inception block : 64+128+32+32 = 256 => 2:4:1:1
                  Inception(256,128,(128,192),(32,96),64),
# of output channel of 2nd inception block : 128+192+96+64 = 480 => 4:6:3:2
                  nn.MaxPool2d(3, stride=2, padding=1))
b4 = nn.Sequential(Inception(480, 192,(96,208),(16,48),64),
# of output channel of 1st inception block : 192+208+48+64 = 512
                Inception(512, 160,(11,224),(4,64),64),
# of output channel of 2nd inception block : 160+224+64+64 = 512
                Inception(512, 128,(128,256),(24,64),64),
                Inception(512, 112,(114,288),(32,64),64),
                Inception(528, 256,(160,320),(32,128),128),
                nn.MaxPool2d(3, stride=2, padding=1))
b5 = nn.Sequential(Inception(832, 256, (160, 320), (32, 128), 128),
                   Inception(832, 384, (192, 384), (48, 128), 128),
                   nn.AdaptiveAvgPool2d((1, 1)), nn.Flatten())

net = nn.Sequential(b1, b2, b3, b4, b5, nn.Linear(1024, 10))

X = torch.rand(size=(1, 1, 96, 96))
for layer in net:
    X = layer(X)
    print(layer.__class__.__name__, 'output shape:\t', X.shape)

Sequential output shape:	 torch.Size([1, 64, 24, 24])
Sequential output shape:	 torch.Size([1, 192, 12, 12])
Sequential output shape:	 torch.Size([1, 480, 6, 6])
Sequential output shape:	 torch.Size([1, 832, 3, 3])
Sequential output shape:	 torch.Size([1, 1024])
Linear output shape:	 torch.Size([1, 10])


# 7.4.4. Summary

Inception block = subnetwork with four paths.

- Extract information in parallel through Conv layers of different widnow shapes and MaxPool.
- 1x1 Conv layer reduces the channel dimensionality. (per pixel level) 
- Max Pool reduces resolution

GoogLenet은 결국, well designed Inception blocks을 다른 layer들 안에 넣는 것. 채널 수의 비율은 실험의 결과로 도출된 것. 

lower computational complexity 하면서도 비슷한 테스트 정확도를 보인다!