## InceptionV2
GoogLeNet实际上还有一个称呼，叫做InceptionV1，而基于此，Google研究部的成员又发展出了InceptionV2.

它与GoogLeNet的区别在于，一方面用两个3\*3的卷积层代替了一个5\*5的卷积层，减少了计算量，另一方面，在当时创新性地使用了Batch Normalization，让网络更容易学习到有效数据，并间接减少过拟合。

### 1. 导入必要模块

In [1]:
import time as time
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

### 2. 引入数据集

在这里，我们直接使用tensorflow中自带的数据集。

In [2]:
# These variables are all in type of numpy.
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
print(train_images.shape)
print(train_labels.shape)
print(test_images.shape)
print(test_labels.shape)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
(60000, 28, 28)
(60000,)
(10000, 28, 28)
(10000,)


### 3. 数据预处理

将28\*28的图片填充到32\*32的规模，以便于进行输入。将图片变为3维，以便于神经网络的训练。同时，将分类变为one-hot编码，以便于后续在神经网络训练中可以使用categorical cross-entropy损失函数。

In [3]:
train_images_32 = np.zeros((60000, 32, 32), dtype=train_images.dtype)
test_images_32 = np.zeros((10000, 32, 32), dtype=test_images.dtype)

start_row = (32 - 28) // 2
start_col = (32 - 28) // 2
for i in range(60000):
  train_images_32[i][start_row:start_row+28, start_col:start_col+28] = train_images[i]
for i in range(10000):
  test_images_32[i][start_row:start_row+28, start_col:start_col+28] = test_images[i]

train_images_32 = train_images_32.reshape((60000, 32, 32, 1)).astype('float32') / 255
test_images_32 = test_images_32.reshape((10000, 32, 32, 1)).astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

print(train_images_32.shape)
print(test_images_32.shape)
print(train_labels.shape)
print(test_labels.shape)

(60000, 32, 32, 1)
(10000, 32, 32, 1)
(60000, 10)
(10000, 10)


### 4. 搭建神经网络

In [4]:
# 初始化模块与深度级联
def inception_module(x, filters):
  conv1_1_1_1 = layers.Conv2D(filters[0], (1, 1), padding='same', activation='relu')(x)
  conv1_1_1_2 = layers.Conv2D(filters[1], (1, 1), padding='same', activation='relu')(x)
  conv3_3_1 = layers.Conv2D(filters[2], (3, 3), padding='same', activation='relu')(conv1_1_1_2)
  conv1_1_1_3 = layers.Conv2D(filters[3], (1, 1), padding='same', activation='relu')(x)
  conv5_5_1 = layers.Conv2D(filters[4], (3, 3), padding='same', activation='relu')(conv1_1_1_3)
  conv5_5_2 = layers.Conv2D(filters[4], (3, 3), padding='same', activation='relu')(conv5_5_1)
  maxpool = layers.MaxPooling2D((3, 3), strides=(1, 1), padding='same')(x)
  conv1_1_1_4 = layers.Conv2D(filters[5], (1, 1), padding='same', activation='relu')(maxpool)

  conv1_1_1_1 = layers.BatchNormalization()(conv1_1_1_1)
  conv3_3_1 = layers.BatchNormalization()(conv3_3_1)
  conv5_5_2 = layers.BatchNormalization()(conv5_5_2)
  conv1_1_1_4 = layers.BatchNormalization()(conv1_1_1_4)

  # 深度级联
  inception = tf.concat([conv1_1_1_1, conv3_3_1, conv5_5_2, conv1_1_1_4], axis=-1)

  return inception

# 搭建神经网络
def GoogLeNet_model(input_shape=(32, 32, 1), num_classes=10):
  input_tensor = layers.Input(shape=input_shape)
  # x = layers.Conv2D(64, (7, 7), padding='same', activation='relu', strides=(2, 2))(input_tensor)
  # x = layers.MaxPooling2D((3, 3), padding='same', strides=(2, 2))(x)
  # x = layers.Lambda(lambda x: tf.nn.local_response_normalization(x))(x)
  x = layers.Conv2D(64, (1, 1), padding='same', activation='relu', strides=(1, 1))(input_tensor)
  x = layers.Conv2D(192, (3, 3), padding='same', activation='relu', strides=(1, 1))(x)
  x = layers.Lambda(lambda x: tf.nn.local_response_normalization(x))(x)
  x = layers.MaxPooling2D((3, 3), padding='same', strides=(2, 2))(x)

  x = inception_module(x, [64, 96, 128, 16, 32, 32])
  x = inception_module(x, [128, 128, 192, 32, 96, 64])
  x = layers.MaxPooling2D((3, 3), padding='same', strides=(2, 2))(x)
  x = inception_module(x, [192, 96, 208, 16, 48, 64])
  # x = inception_module(x, [160, 112, 224, 24, 64, 64])
  # x = inception_module(x, [128, 128, 256, 24, 64, 64])
  # x = inception_module(x, [112, 144, 288, 32, 64, 64])
  x = inception_module(x, [256, 160, 320, 32, 128, 128])

  x = layers.MaxPooling2D((3, 3), padding='same', strides=(2, 2))(x)
  x = inception_module(x, [256, 160, 320, 32, 128, 128])
  x = inception_module(x, [384, 192, 384, 48, 128, 128])
  x = layers.AveragePooling2D((2,2), padding='valid', strides=(1, 1))(x)
  x = layers.Dropout(0.4)(x)
  x = layers.Flatten()(x)
  x = layers.Dense(512, activation='relu')(x)
  output = layers.Dense(num_classes, activation='softmax')(x)

  model = models.Model(inputs=input_tensor, outputs=output)
  return model

### 5. 编译模型

In [6]:
model = GoogLeNet_model()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [7]:
start_time = time.time()
model.fit(train_images_32, train_labels, epochs=5, batch_size=64, validation_split=0.2)
end_time = time.time()
print("Training Time:", end_time - start_time, "seconds")

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training Time: 572.3281483650208 seconds


In [8]:
test_loss, test_acc = model.evaluate(test_images_32, test_labels)
print(f'Test accuracy: {test_acc}')

Test accuracy: 0.9857000112533569
