# GoogleNet

## 主要贡献
延续了网络(块)串联思想，提出了Inception块来**选择最合适卷积核大小**。

延续了NiN $1 \times 1$ 卷积+全局平均池化代替全连接的思想

使用辅助分类器来帮助训练时更稳定的学习。

## 模型结构
### 1.Inception Block
<div align =center><img src = './img/inception.svg'> </img> </div>

如上图所示Inception由四条并行路径组成。从左到右的前三条路径分别使用 $1 \times 1$、$3 \times 3$、$5 \times 5$ 的卷积层。中间两条路径先在输入上执行 $1 \times 1$ 卷积来减少通道数，从而降低复杂度。 第四条路径先使用 $3 \times 3$ 的最大池化层，然后使用 $1 \times  1$卷积改变通道数。

四条路径都 $ \color{Red}使用了合适的填充使得输入经四条路经后输出的宽度和高度是一致的(通道数不一定一致)$，最后将 $ \color{Red}每条路径的输出在通道维度上进行Concat$ ，构成Inception的输出。

在Inception块中，超参数是每层输出的通道数。Inception使用不同卷积核尺寸来探索图像，意味着可以有效的识别不同粒度的图像细节。

In [1]:
import torch
from torch import nn
from torch.nn import functional as F

class Inception(nn.Module):
    def __init__(self,in_channels,c1,c2,c3,c4):
        super(Inception,self).__init__()
        # 路径1
        self.path1 = nn.Conv2d(in_channels,c1,kernel_size = 1)
        
        # 路径2
        self.path2_1 = nn.Conv2d(in_channels,c2[0],kernel_size =1)
        self.path2_2 = nn.Conv2d(c2[0],c2[1],kernel_size =3,padding = 1)

        # 路径3
        self.path3_1 = nn.Conv2d(in_channels,c3[0],kernel_size =1)
        self.path3_2 = nn.Conv2d(c3[0],c3[1],kernel_size =5,padding = 2)

        # 路径4
        self.path4_1 = nn.MaxPool2d(3,stride = 1,padding=1)
        self.path4_2 = nn.Conv2d(in_channels,c4, kernel_size=1)

    def forward(self, X):
        x1 = F.relu(self.path1(X))

        x2 = F.relu(self.path2_2(self.path2_1(X)))

        x3 = F.relu(self.path3_2(self.path3_1(X)))

        x4 = F.relu(self.path4_2(self.path4_1(X)))

        # 通道维度上cat
        return torch.cat((x1,x2,x3,x4),dim = 1)

In [2]:
# 检验Inception块
X = torch.rand(size = (1,10,32,32),dtype=torch.float32)
arch = [10, 15,[7,15],[5,15],15]
inception = Inception(arch[0],arch[1],arch[2],arch[3],arch[4])

y = inception(X)
print(X.shape,y.shape)

torch.Size([1, 10, 32, 32]) torch.Size([1, 60, 32, 32])


### 2. GoogleNet模型
GoogleNet总共使用了9个Inception加全局平均池化，Inception块之间通过最大池化层来降低维度。
<div align = center> <img src = ./img/inception-full.svg> </img></div>

Inception块中每个卷积的输出通道数是通过ImageNet数据集实验得来的。

In [3]:

class GoogleNet(nn.Module):
    def __init__(self):
        super(GoogleNet,self).__init__()
        self.block1 = nn.Sequential(
            nn.Conv2d(1,64,kernel_size=7,stride =2, padding=3),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3,stride=2,padding=1)
        )
        self.block2 = nn.Sequential(
            nn.Conv2d(64,64,kernel_size=1),
            nn.ReLU(),
            nn.Conv2d(64,192,kernel_size=3,padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride =2, padding =1)
        )
        self.block3 = nn.Sequential(Inception(192,64,(96,128),(16,32),32),
                           Inception(256,128,(96,192),(16,96),64),
                           nn.MaxPool2d(kernel_size=3,stride=2,padding=1))
        
        self.block4 = nn.Sequential(Inception(480, 192, (96, 208), (16, 48), 64),
                   Inception(512, 160, (112, 224), (24, 64), 64),
                   Inception(512, 128, (128, 256), (24, 64), 64),
                   Inception(512, 112, (144, 288), (32, 64), 64),
                   Inception(528, 256, (160, 320), (32, 128), 128),
                   nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
        
        self.block5 = nn.Sequential(Inception(832, 256, (160, 320), (32, 128), 128),
                   Inception(832, 384, (192, 384), (48, 128), 128))
                   
        self.dense = nn.Sequential(nn.AdaptiveAvgPool2d((1,1)),
                   nn.Flatten(),
                   nn.Linear(1024,10))

    def forward(self, x):
        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        x = self.block4(x)
        x = self.block5(x)
        x = self.dense(x)

        return x

In [6]:
from torchsummary import summary

net = GoogleNet()
X = torch.rand(size = (1,1,224,224),dtype = torch.float32)
y = net(X)
print(X.shape,y.shape)

summary(net,(1,224,224),batch_size=64,device ="cpu")

torch.Size([1, 1, 224, 224]) torch.Size([1, 10])
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [64, 64, 112, 112]           3,200
              ReLU-2         [64, 64, 112, 112]               0
         MaxPool2d-3           [64, 64, 56, 56]               0
            Conv2d-4           [64, 64, 56, 56]           4,160
              ReLU-5           [64, 64, 56, 56]               0
            Conv2d-6          [64, 192, 56, 56]         110,784
              ReLU-7          [64, 192, 56, 56]               0
         MaxPool2d-8          [64, 192, 28, 28]               0
            Conv2d-9           [64, 64, 28, 28]          12,352
           Conv2d-10           [64, 96, 28, 28]          18,528
           Conv2d-11          [64, 128, 28, 28]         110,720
           Conv2d-12           [64, 16, 28, 28]           3,088
           Conv2d-13           [64, 32, 28, 28]       