## 含并行连结的网络(GoogLeNet)
在$2014$年的$ImageNet$图像识别挑战赛中，一个名叫$GoogLeNet$的网络结构大放异彩。它虽然在名字上向$LeNet$致敬，但在网络结构上已经很难看到$LeNet$的影子。$GoogLeNet$吸收了$NiN$中网络串联网络的思想，并在此基础上做了很大改进。在随后的几年里，研究人员对$GoogLeNet$进行了数次改进，本节将介绍这个模型系列的第一个版本。

### Inception块
$GoogLeNet$中的基础卷积块叫作$Inception$块，得名于同名电影《盗梦空间》$（Inception）$。与上一节介绍的$NiN$块相比，这个基础块在结构上更加复杂，如下图所示。
![](../img/5.9_inception.svg)
$Inception$块里有$4$条并行的线路。前$3$条线路使用窗口大小分别是$1\times 1$、$3\times 3$和$5\times 5$的卷积层来抽取不同空间尺寸下的信息，其中中间$2$个线路会对输入先做$1\times 1$卷积来减少输入通道数，以降低模型复杂度。第四条线路则使用$3\times 3$最大池化层，后接$1\times 1$卷积层来改变通道数。$4$条线路都使用了**合适的填充来使输入与输出的高和宽一致**。最后我们将每条线路的输出在通道维上连结，并输入接下来的层中去。

$Inception$块中**可以自定义的超参数是每个层的输出通道数**，我们以此来控制模型复杂度。

In [1]:
import time
import torch
from torch import nn, optim
import torch.nn.functional as F
import sys
sys.path.append("..")
import d2lzh_pytorch as d2l
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

class Inception(nn.Module):
    def __init__(self, in_c, c1, c2, c3, c4):
        super(Inception, self).__init__()
        
        self.p1_1 = nn.Conv2d(in_c, c1, kernel_size=1)
        
        self.p2_1 = nn.Conv2d(in_c, c2[0], kernel_size=1)
        self.p2_2 = nn.Conv2d(c2[0], c2[1], kernel_size=3, stride=1, padding=1)
        
        self.p3_1 = nn.Conv2d(in_c, c3[0], kernel_size=1)
        self.p3_2 = nn.Conv2d(c3[0], c3[1], kernel_size=5, stride=1, padding=2)
        
        self.p4_1 = nn.MaxPool2d(3, stride=1, padding=1)
        self.p4_2 = nn.Conv2d(in_c, c4, kernel_size=1)
        
    def forward(self, x):
        p1 = F.relu(self.p1_1(x))
        p2 = F.relu(self.p2_2(F.relu(self.p2_1(x))))
        p3 = F.relu(self.p3_2(F.relu(self.p3_1(x))))
        p4 = F.relu(self.p4_2(self.p4_1(x)))
        return torch.cat((p1, p2, p3, p4), dim=1)

### GoogLeNet模型
$GoogLeNet$跟$VGG$一样，在主体卷积部分中使用$5$个模块$（block）$，每个模块之间使用步幅为$2$的$3\times 3$最大池化层来减小输出高宽。第一模块使用一个$64$通道的$7\times 7$卷积层。

In [2]:
b1 = nn.Sequential(
    nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
    nn.ReLU(),
    nn.MaxPool2d(3, stride=2, padding=1)
)

第二模块使用$2$个卷积层：首先是$64$通道的$1\times 1$卷积层，然后是将通道增大$3$倍的$3\times 3$卷积层。它对应$Inception$块中的第二条线路。

In [3]:
b2 = nn.Sequential(
    nn.Conv2d(64, 64, kernel_size=1),
    nn.Conv2d(64, 64*3, kernel_size=3, padding=1),
    nn.MaxPool2d(3, stride=2, padding=1)
)

第三模块串联$2$个完整的$Inception$块。第一个$Inception$块的输出通道数为$64+128+32+32=256$，其中$4$条线路的输出通道数比例为$64:128:32:32=2:4:1:1$。其中第二、第三条线路先分别将输入通道数减小至$96/192=1/2$和$16/192=1/12$后，再接上第二层卷积层。第二个$Inception$块输出通道数增至$128+192+96+64=480$，每条线路的输出通道数之比为$128:192:96:64=4:6:3:2$。其中第二、第三条线路先分别将输入通道数减小至$128/256=1/2$和$32/256=1/8$。

In [4]:
b3 = nn.Sequential(
    Inception(192, 64, (96, 128), (16, 32), 32),
    Inception(256, 128, (128, 192), (32, 96), 64),
    nn.MaxPool2d(3, stride=2, padding=1)
)

第四模块更加复杂。它串联了$5$个$Inception$块，其输出通道数分别是$192+208+48+64=512$、$160+224+64+64=512$、$128+256+64+64=512$、$112+288+64+64=528$和$256+320+128+128=832$。这些线路的通道数分配和第三模块中的类似，首先含$3\times 3$卷积层的第二条线路输出最多通道，其次是仅含$1\times 1$卷积层的第一条线路，之后是含$5\times 5$卷积层的第三条线路和含$3\times 3$最大池化层的第四条线路。其中第二、第三条线路都会先按比例减小通道数。这些比例在各个$Inception$块中都略有不同。

In [5]:
b4 = nn.Sequential(
    Inception(480, 192, (96, 208), (16, 48), 64),
    Inception(512, 160, (112, 224), (24, 64), 64),
    Inception(512, 128, (128, 256), (24, 64), 64),
    Inception(512, 112, (144, 288), (32, 64), 64),
    Inception(528, 256, (160, 320), (32, 128), 128),
    nn.MaxPool2d(3, stride=2, padding=1)
)

第五模块有输出通道数为$256+320+128+128=832$和$384+384+128+128=1024$的两个$Inception$块。其中每条线路的通道数的分配思路和第三、第四模块中的一致，只是在具体数值上有所不同。需要注意的是，第五模块的后面紧跟输出层，该模块同$NiN$一样使用全局平均池化层来将每个通道的高和宽变成$1$。最后我们将输出变成二维数组后接上一个输出个数为标签类别数的全连接层。

In [6]:
b5 = nn.Sequential(
    Inception(832, 256, (160, 320), (32, 128), 128),
    Inception(832, 384, (192, 384), (48, 128), 128),
    d2l.GlobalAvgPool2d()
)

$GoogLeNet$模型的计算复杂，而且不如$VGG$那样便于修改通道数。本节里我们将输入的高和宽从$224$降到$96$来简化计算。下面演示各个模块之间的输出的形状变化。

In [7]:
net = nn.Sequential(
    b1, b2, b3, b4, b5,
    d2l.FlattenLayer(),
    nn.Linear(1024,10)
)
X = torch.rand(1, 1, 96, 96)
for blk in net.children():
    X = blk(X)
    print('output shape:', X.shape)

output shape: torch.Size([1, 64, 24, 24])
output shape: torch.Size([1, 192, 12, 12])
output shape: torch.Size([1, 480, 6, 6])
output shape: torch.Size([1, 832, 3, 3])
output shape: torch.Size([1, 1024, 1, 1])
output shape: torch.Size([1, 1024])
output shape: torch.Size([1, 10])


我们使用高和宽均为$96$像素的图像来训练$GoogLeNet$模型。训练使用的图像依然来自$Fashion-MNIST$数据集。

### 获取数据和训练模型

In [8]:
batch_size = 128
train_iter ,test_iter = d2l.load_data_fashion_mnist(batch_size,resize=96)

lr, epochs = 0.001, 5
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, epochs)

training on: cpu
step 1, train_acc: 0.0938
step 2, train_acc: 0.0781
step 3, train_acc: 0.0859
step 4, train_acc: 0.0859
step 5, train_acc: 0.0891
step 6, train_acc: 0.0924
step 7, train_acc: 0.0971
step 8, train_acc: 0.0967
step 9, train_acc: 0.1016
step 10, train_acc: 0.1016
step 11, train_acc: 0.1037
step 12, train_acc: 0.1003
step 13, train_acc: 0.1004
step 14, train_acc: 0.1004
step 15, train_acc: 0.1010
step 16, train_acc: 0.1021
step 17, train_acc: 0.1034
step 18, train_acc: 0.1029
step 19, train_acc: 0.1003
step 20, train_acc: 0.1016
step 21, train_acc: 0.1027
step 22, train_acc: 0.1026
step 23, train_acc: 0.1026
step 24, train_acc: 0.1022
step 25, train_acc: 0.1034
step 26, train_acc: 0.1028
step 27, train_acc: 0.1039
step 28, train_acc: 0.1049
step 29, train_acc: 0.1045
step 30, train_acc: 0.1044
step 31, train_acc: 0.1041
step 32, train_acc: 0.1028
step 33, train_acc: 0.1027
step 34, train_acc: 0.1020
step 35, train_acc: 0.1013
step 36, train_acc: 0.1009
step 37, train_acc: 

step 297, train_acc: 0.4437
step 298, train_acc: 0.4445
step 299, train_acc: 0.4456
step 300, train_acc: 0.4463
step 301, train_acc: 0.4473
step 302, train_acc: 0.4482
step 303, train_acc: 0.4490
step 304, train_acc: 0.4499
step 305, train_acc: 0.4509
step 306, train_acc: 0.4519
step 307, train_acc: 0.4528
step 308, train_acc: 0.4539
step 309, train_acc: 0.4546
step 310, train_acc: 0.4555
step 311, train_acc: 0.4564
step 312, train_acc: 0.4571
step 313, train_acc: 0.4582
step 314, train_acc: 0.4591
step 315, train_acc: 0.4597
step 316, train_acc: 0.4608
step 317, train_acc: 0.4616
step 318, train_acc: 0.4625
step 319, train_acc: 0.4634
step 320, train_acc: 0.4646
step 321, train_acc: 0.4656
step 322, train_acc: 0.4666
step 323, train_acc: 0.4673
step 324, train_acc: 0.4683
step 325, train_acc: 0.4692
step 326, train_acc: 0.4702
step 327, train_acc: 0.4710
step 328, train_acc: 0.4718
step 329, train_acc: 0.4727
step 330, train_acc: 0.4733
step 331, train_acc: 0.4745
step 332, train_acc:

step 122, train_acc: 0.8246
step 123, train_acc: 0.8250
step 124, train_acc: 0.8250
step 125, train_acc: 0.8250
step 126, train_acc: 0.8253
step 127, train_acc: 0.8255
step 128, train_acc: 0.8257
step 129, train_acc: 0.8262
step 130, train_acc: 0.8264
step 131, train_acc: 0.8266
step 132, train_acc: 0.8263
step 133, train_acc: 0.8261
step 134, train_acc: 0.8264
step 135, train_acc: 0.8264
step 136, train_acc: 0.8265
step 137, train_acc: 0.8265
step 138, train_acc: 0.8263
step 139, train_acc: 0.8261
step 140, train_acc: 0.8263
step 141, train_acc: 0.8262
step 142, train_acc: 0.8265
step 143, train_acc: 0.8263
step 144, train_acc: 0.8263
step 145, train_acc: 0.8269
step 146, train_acc: 0.8272
step 147, train_acc: 0.8274
step 148, train_acc: 0.8274
step 149, train_acc: 0.8277
step 150, train_acc: 0.8276
step 151, train_acc: 0.8281
step 152, train_acc: 0.8282
step 153, train_acc: 0.8282
step 154, train_acc: 0.8282
step 155, train_acc: 0.8280
step 156, train_acc: 0.8277
step 157, train_acc:

step 415, train_acc: 0.8370
step 416, train_acc: 0.8369
step 417, train_acc: 0.8369
step 418, train_acc: 0.8369
step 419, train_acc: 0.8369
step 420, train_acc: 0.8370
step 421, train_acc: 0.8370
step 422, train_acc: 0.8369
step 423, train_acc: 0.8370
step 424, train_acc: 0.8370
step 425, train_acc: 0.8371
step 426, train_acc: 0.8371
step 427, train_acc: 0.8371
step 428, train_acc: 0.8371
step 429, train_acc: 0.8372
step 430, train_acc: 0.8372
step 431, train_acc: 0.8372
step 432, train_acc: 0.8373
step 433, train_acc: 0.8373
step 434, train_acc: 0.8374
step 435, train_acc: 0.8374
step 436, train_acc: 0.8374
step 437, train_acc: 0.8375
step 438, train_acc: 0.8375
step 439, train_acc: 0.8376
step 440, train_acc: 0.8377
step 441, train_acc: 0.8378
step 442, train_acc: 0.8379
step 443, train_acc: 0.8379
step 444, train_acc: 0.8380
step 445, train_acc: 0.8380
step 446, train_acc: 0.8381
step 447, train_acc: 0.8381
step 448, train_acc: 0.8382
step 449, train_acc: 0.8383
step 450, train_acc:

step 240, train_acc: 0.8622
step 241, train_acc: 0.8624
step 242, train_acc: 0.8624
step 243, train_acc: 0.8625
step 244, train_acc: 0.8626
step 245, train_acc: 0.8624
step 246, train_acc: 0.8626
step 247, train_acc: 0.8626
step 248, train_acc: 0.8626
step 249, train_acc: 0.8628
step 250, train_acc: 0.8628
step 251, train_acc: 0.8627
step 252, train_acc: 0.8628
step 253, train_acc: 0.8629
step 254, train_acc: 0.8628
step 255, train_acc: 0.8629
step 256, train_acc: 0.8630
step 257, train_acc: 0.8632
step 258, train_acc: 0.8632
step 259, train_acc: 0.8632
step 260, train_acc: 0.8632
step 261, train_acc: 0.8633
step 262, train_acc: 0.8631
step 263, train_acc: 0.8631
step 264, train_acc: 0.8630
step 265, train_acc: 0.8629
step 266, train_acc: 0.8632
step 267, train_acc: 0.8632
step 268, train_acc: 0.8631
step 269, train_acc: 0.8631
step 270, train_acc: 0.8631
step 271, train_acc: 0.8632
step 272, train_acc: 0.8634
step 273, train_acc: 0.8635
step 274, train_acc: 0.8635
step 275, train_acc:

step 64, train_acc: 0.8724
step 65, train_acc: 0.8733
step 66, train_acc: 0.8726
step 67, train_acc: 0.8726
step 68, train_acc: 0.8730
step 69, train_acc: 0.8731
step 70, train_acc: 0.8731
step 71, train_acc: 0.8730
step 72, train_acc: 0.8735
step 73, train_acc: 0.8738
step 74, train_acc: 0.8744
step 75, train_acc: 0.8749
step 76, train_acc: 0.8744
step 77, train_acc: 0.8742
step 78, train_acc: 0.8742
step 79, train_acc: 0.8744
step 80, train_acc: 0.8746
step 81, train_acc: 0.8738
step 82, train_acc: 0.8739
step 83, train_acc: 0.8741
step 84, train_acc: 0.8737
step 85, train_acc: 0.8737
step 86, train_acc: 0.8736
step 87, train_acc: 0.8735
step 88, train_acc: 0.8732
step 89, train_acc: 0.8734
step 90, train_acc: 0.8734
step 91, train_acc: 0.8736
step 92, train_acc: 0.8742
step 93, train_acc: 0.8741
step 94, train_acc: 0.8745
step 95, train_acc: 0.8748
step 96, train_acc: 0.8744
step 97, train_acc: 0.8744
step 98, train_acc: 0.8744
step 99, train_acc: 0.8747
step 100, train_acc: 0.8751


step 358, train_acc: 0.8815
step 359, train_acc: 0.8815
step 360, train_acc: 0.8816
step 361, train_acc: 0.8816
step 362, train_acc: 0.8817
step 363, train_acc: 0.8816
step 364, train_acc: 0.8816
step 365, train_acc: 0.8816
step 366, train_acc: 0.8817
step 367, train_acc: 0.8818
step 368, train_acc: 0.8818
step 369, train_acc: 0.8817
step 370, train_acc: 0.8818
step 371, train_acc: 0.8817
step 372, train_acc: 0.8817
step 373, train_acc: 0.8817
step 374, train_acc: 0.8818
step 375, train_acc: 0.8818
step 376, train_acc: 0.8818
step 377, train_acc: 0.8819
step 378, train_acc: 0.8819
step 379, train_acc: 0.8819
step 380, train_acc: 0.8819
step 381, train_acc: 0.8818
step 382, train_acc: 0.8818
step 383, train_acc: 0.8818
step 384, train_acc: 0.8819
step 385, train_acc: 0.8818
step 386, train_acc: 0.8817
step 387, train_acc: 0.8817
step 388, train_acc: 0.8818
step 389, train_acc: 0.8819
step 390, train_acc: 0.8820
step 391, train_acc: 0.8820
step 392, train_acc: 0.8820
step 393, train_acc:

step 183, train_acc: 0.8929
step 184, train_acc: 0.8927
step 185, train_acc: 0.8927
step 186, train_acc: 0.8926
step 187, train_acc: 0.8928
step 188, train_acc: 0.8930
step 189, train_acc: 0.8931
step 190, train_acc: 0.8930
step 191, train_acc: 0.8930
step 192, train_acc: 0.8931
step 193, train_acc: 0.8930
step 194, train_acc: 0.8930
step 195, train_acc: 0.8930
step 196, train_acc: 0.8931
step 197, train_acc: 0.8930
step 198, train_acc: 0.8931
step 199, train_acc: 0.8931
step 200, train_acc: 0.8930
step 201, train_acc: 0.8930
step 202, train_acc: 0.8930
step 203, train_acc: 0.8929
step 204, train_acc: 0.8927
step 205, train_acc: 0.8928
step 206, train_acc: 0.8928
step 207, train_acc: 0.8929
step 208, train_acc: 0.8926
step 209, train_acc: 0.8927
step 210, train_acc: 0.8926
step 211, train_acc: 0.8925
step 212, train_acc: 0.8925
step 213, train_acc: 0.8925
step 214, train_acc: 0.8924
step 215, train_acc: 0.8926
step 216, train_acc: 0.8927
step 217, train_acc: 0.8926
step 218, train_acc:

+ Inception块相当于一个有4条线路的子网络。它通过不同窗口形状的卷积层和最大池化层来并行抽取信息，并使用1×1卷积层减少通道数从而降低模型复杂度。
+ GoogLeNet将多个设计精细的Inception块和其他层串联起来。其中Inception块的通道数分配之比是在ImageNet数据集上通过大量的实验得来的。
+ GoogLeNet和它的后继者们一度是ImageNet上最高效的模型之一：在类似的测试精度下，它们的计算复杂度往往更低。