## 更深的卷积神经网络：GoogLeNet
在2014年的Imagenet竞赛里，Google的研究人员利用一个新的网络结构取得很大的优先。这个叫做GoogLeNet的网络虽然在名字上是向LeNet致敬，但网络结构里很难看到LeNet的影子。它颠覆的大家对卷积神经网络串联一系列层的固定做法。下图是其论文对GoogLeNet的可视化

![image.png](http://zh.gluon.ai/_images/googlenet.png)

### 定义Inception
可以看到其中有多个四个并行卷积层的块。这个块一般叫做Inception，其基于Network in network的思想做了很大的改进。我们先看下如何定义一个下图所示的Inception块。

![image.png](http://zh.gluon.ai/_images/inception.svg)


In [1]:
import tensorflow as tf
slim = tf.contrib.slim
import pdb

def Inception(input, num_channel_1_1, num_channel_2_1, num_channel_2_3, num_channel_3_1, num_channel_3_5, num_channel_4_1, scope='inception'):
    with tf.variable_scope(scope):
        #path 1
        output_path1 = slim.conv2d(input, num_channel_1_1, [1, 1], scope='conv1_1')
        #path 2
        output_path2_1 = slim.conv2d(input, num_channel_2_1, [1, 1], scope='conv2_1')
        output_path2_2 = slim.conv2d(output_path2_1, num_channel_2_3, [3, 3], scope='conv2_2')
        #path 3
        output_path_3_1 = slim.conv2d(input, num_channel_3_1, [1, 1], scope='conv3_1')
        output_path_3_2 = slim.conv2d(output_path_3_1, num_channel_3_5, [5, 5], scope='conv3_2')
        #path 4
        
        output_path_4_1 = slim.max_pool2d(input, [3, 3], 1, padding='SAME', scope='max_pool4_1')
        output_path_4_2 = slim.conv2d(output_path_4_1, num_channel_4_1, [1, 1], scope='conv4_2')
        return tf.concat([output_path1, output_path2_2, output_path_3_2, output_path_4_2], -1)

可以看到Inception里有四个并行的线路。

- 单个$1×1$卷积。
- $1×1$卷积接上$3×3$卷积。通常前者的通道数少于输入通道，这样减少后者的计算量。后者加上了padding=1使得输出的长宽的输入一致
- 同2，但换成了$5×5$卷积
- 和1类似，但卷积前用了最大池化层
最后将这四个并行线路的结果在通道这个维度上合并在一起。

测试一下：

In [2]:
import numpy as np
x = np.random.uniform(size=(32,64,64,3)).astype(np.float32)

print Inception(x, 64, 96, 128, 16, 32, 32)

Tensor("inception/concat:0", shape=(32, 64, 64, 256), dtype=float32)


### 定义GoogLeNet
GoogLeNet将数个Inception串联在一起。注意到原论文里使用了多个输出，为了简化我们这里就使用一个输出。

In [3]:
def GoogleNet(input, num_classes, scope='googlenet'):
    with tf.variable_scope(scope):
        # block 1
        b1_pad = tf.pad(input, [[0,0],[3,3],[3,3],[0,0]])
        b1_conv = slim.conv2d(b1_pad, 64, [7, 7], stride=2,  padding='VALID', scope='block1_conv')
        b1_pool = slim.max_pool2d(b1_conv, [3, 3], stride=2, padding='VALID', scope='block1_pool')
        print b1_pool.shape
        
        # block 2
        b2_conv1 = slim.conv2d(b1_pool, 64, [1, 1], scope='block2_conv1')
        b2_conv1_pad = tf.pad(b2_conv1, [[0,0],[1,1],[1,1],[0,0]])
        b2_conv2 = slim.conv2d(b2_conv1_pad, 192, [3, 3], scope='block2_conv2', padding='VALID')
        b2_pool = slim.max_pool2d(b2_conv2, [3, 3], stride=2, scope='block2_pool')
        print b2_pool.shape
        
        # block 3
        b3_inception_1 = Inception(b2_pool, 64, 96, 128, 16, 32, 32, scope='block3_inception1')
        b3_inception_2 = Inception(b3_inception_1, 128, 128, 192, 32, 96, 64, scope='block3_inception2')
        b3_pool = slim.max_pool2d(b3_inception_2, [3, 3], stride=2, scope='block3_pool')
        print b3_pool.shape
        
        # block 4
        b4_inception_1 = Inception(b3_pool, 192, 96, 208, 16, 48, 64, scope='block4_inception1')
        b4_inception_2 = Inception(b4_inception_1, 160, 112, 224, 24, 64, 64, scope='block4_inception2')
        b4_inception_3 = Inception(b4_inception_2, 128, 128, 256, 24, 64, 64, scope='block4_inception3')
        b4_inception_4 = Inception(b4_inception_3, 112, 144, 288, 32, 64, 64, scope='block4_inception4')
        b4_inception_5 = Inception(b4_inception_4, 256, 160, 320, 32, 128, 128, scope='block4_inception5')
        b4_pool = slim.max_pool2d(b4_inception_5, [3, 3], stride=2, scope='block4_pool')
        print b4_pool.shape
        
        # block 5
        b5_inception_1 = Inception(b4_pool, 256, 160, 320, 32, 128, 128, scope='block5_inception1')
        b5_inception_2 = Inception(b5_inception_1, 384, 192, 384, 48, 128, 128, scope='block5_inception2')
        b5_pool = slim.max_pool2d(b5_inception_2, [2, 2], scope='block5_pool')
        print b5_pool.shape
        
        # block 6
        b6 = slim.flatten(b5_pool)
        return slim.fully_connected(b6, num_classes, scope='fc', activation_fn=None)

In [4]:
x = np.random.uniform(size=(4, 96, 96, 3)).astype(np.float32)
y = GoogleNet(x, 10)

(4, 23, 23, 64)
(4, 11, 11, 192)
(4, 5, 5, 480)
(4, 2, 2, 832)
(4, 1, 1, 1024)


### 获取数据并训练
跟VGG一样我们使用了较小的输入$96×96$来加速计算。


In [5]:
import sys
import numpy as np

sys.path.append('../../utils')
import utils

data_dir = '../../data/fashion_mnist'
train_images, train_labels, test_images, test_labels = utils.load_data_fashion_mnist(data_dir, one_hot=True)
print train_images.shape
print test_images.shape

from tensorflow.contrib.learn.python.learn.datasets.mnist import DataSet
train_dataset = DataSet(train_images, train_labels, one_hot=True)
test_dataset = DataSet(test_images, test_labels, one_hot=True)

Extracting ../../data/fashion_mnist/train-images-idx3-ubyte.gz
Extracting ../../data/fashion_mnist/train-labels-idx1-ubyte.gz
Extracting ../../data/fashion_mnist/t10k-images-idx3-ubyte.gz
Extracting ../../data/fashion_mnist/t10k-labels-idx1-ubyte.gz
(60000, 28, 28, 1)
(10000, 28, 28, 1)


In [6]:
learning_rate = 1e-1
max_steps = 1000
batch_size = 64
height = width = 28
num_channels = 1
num_outputs = 10

tf.reset_default_graph()

input_placeholder = tf.placeholder(tf.float32, [None, height, width, num_channels])
resize_input = tf.image.resize_images(input_placeholder, [96, 96])
gt_placeholder = tf.placeholder(tf.int64, [None, num_outputs])
is_training = tf.placeholder(tf.bool)


logits = GoogleNet(resize_input, num_outputs)

loss = tf.losses.softmax_cross_entropy(logits=logits, onehot_labels=gt_placeholder)
acc = utils.accuracy(logits, gt_placeholder)

test_images_reshape = np.reshape(np.squeeze(test_images), (test_images.shape[0], height, width, 1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

train_op = optimizer.minimize(loss)
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)
test_acc = []


for step in range(max_steps):
    data, label = train_dataset.next_batch(batch_size)
    data = np.reshape(data, (batch_size, height, width, num_channels))
    feed_dict = {input_placeholder: data, gt_placeholder: label, is_training: True}
    loss_, acc_, _ = sess.run([loss, acc, train_op], feed_dict=feed_dict)
    print("Batch %d, Loss: %f, Train acc %f " % (step, loss_, acc_))
        
for i in range(100):
    test_data, test_label = test_dataset.next_batch(100)
    test_data = np.reshape(test_data, (100, height, width, num_channels))
    test_loss_, test_acc_ = sess.run([loss, acc], feed_dict={input_placeholder: test_data, gt_placeholder: test_label, is_training: False})
    test_acc.append(test_acc_)
print ("Test Loss: %f, Test acc %f " % (np.mean(test_loss_), np.mean(test_acc_)))


(?, 23, 23, 64)
(?, 11, 11, 192)
(?, 5, 5, 480)
(?, 2, 2, 832)
(?, 1, 1, 1024)
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

Batch 0, Loss: 2.303140, Train acc 0.109375 
Batch 1, Loss: 2.304469, Train acc 0.093750 
Batch 2, Loss: 2.301654, Train acc 0.125000 
Batch 3, Loss: 2.301726, Train acc 0.062500 
Batch 4, Loss: 2.306557, Train acc 0.125000 
Batch 5, Loss: 2.303081, Train acc 0.093750 
Batch 6, Loss: 2.299683, Train acc 0.062500 
Batch 7, Loss: 2.298925, Train acc 0.203125 
Batch 8, Loss: 2.306689, Train acc 0.093750 
Batch 9, Loss: 2.303386, Train acc 0.078125 
Batch 10, Loss: 2.303192, Train acc 0.062500 
Batch 11, Loss: 2.303641, Train acc 0.109375 
Batch 12, Loss: 2.300117, Train acc 0.109375 
Batch 13, Loss: 2.300389, Train acc 0.140625 
Batch 14, Loss: 2.303990, Train acc 0.093750 
Batch 15, Loss: 2.305586, Train acc 0.109375 
Batch 

Batch 166, Loss: 2.283997, Train acc 0.046875 
Batch 167, Loss: 2.269356, Train acc 0.156250 
Batch 168, Loss: 2.256281, Train acc 0.140625 
Batch 169, Loss: 2.234455, Train acc 0.234375 
Batch 170, Loss: 2.235799, Train acc 0.156250 
Batch 171, Loss: 2.171879, Train acc 0.437500 
Batch 172, Loss: 2.139521, Train acc 0.281250 
Batch 173, Loss: 2.125352, Train acc 0.218750 
Batch 174, Loss: 1.948970, Train acc 0.281250 
Batch 175, Loss: 1.970015, Train acc 0.234375 
Batch 176, Loss: 4.040786, Train acc 0.015625 
Batch 177, Loss: 2.304593, Train acc 0.046875 
Batch 178, Loss: 2.300604, Train acc 0.125000 
Batch 179, Loss: 2.291907, Train acc 0.187500 
Batch 180, Loss: 2.307759, Train acc 0.109375 
Batch 181, Loss: 2.307994, Train acc 0.109375 
Batch 182, Loss: 2.304214, Train acc 0.062500 
Batch 183, Loss: 2.291765, Train acc 0.093750 
Batch 184, Loss: 2.301954, Train acc 0.093750 
Batch 185, Loss: 2.285004, Train acc 0.125000 
Batch 186, Loss: 2.277195, Train acc 0.093750 
Batch 187, Lo

Batch 341, Loss: 1.161675, Train acc 0.531250 
Batch 342, Loss: 1.177228, Train acc 0.421875 
Batch 343, Loss: 1.522959, Train acc 0.390625 
Batch 344, Loss: 1.284880, Train acc 0.453125 
Batch 345, Loss: 1.101187, Train acc 0.578125 
Batch 346, Loss: 1.422398, Train acc 0.468750 
Batch 347, Loss: 2.260985, Train acc 0.281250 
Batch 348, Loss: 1.850880, Train acc 0.234375 
Batch 349, Loss: 1.735105, Train acc 0.375000 
Batch 350, Loss: 1.713033, Train acc 0.296875 
Batch 351, Loss: 1.590950, Train acc 0.390625 
Batch 352, Loss: 1.633594, Train acc 0.359375 
Batch 353, Loss: 1.540712, Train acc 0.468750 
Batch 354, Loss: 1.462834, Train acc 0.375000 
Batch 355, Loss: 1.405450, Train acc 0.437500 
Batch 356, Loss: 1.429829, Train acc 0.437500 
Batch 357, Loss: 1.885025, Train acc 0.250000 
Batch 358, Loss: 2.481964, Train acc 0.046875 
Batch 359, Loss: 2.035939, Train acc 0.406250 
Batch 360, Loss: 1.857112, Train acc 0.296875 
Batch 361, Loss: 1.659104, Train acc 0.328125 
Batch 362, Lo

Batch 516, Loss: 0.711783, Train acc 0.750000 
Batch 517, Loss: 0.794502, Train acc 0.750000 
Batch 518, Loss: 0.864151, Train acc 0.609375 
Batch 519, Loss: 0.903185, Train acc 0.562500 
Batch 520, Loss: 0.798798, Train acc 0.703125 
Batch 521, Loss: 0.816985, Train acc 0.656250 
Batch 522, Loss: 0.942076, Train acc 0.593750 
Batch 523, Loss: 0.918943, Train acc 0.625000 
Batch 524, Loss: 0.922899, Train acc 0.718750 
Batch 525, Loss: 0.972572, Train acc 0.593750 
Batch 526, Loss: 1.076686, Train acc 0.546875 
Batch 527, Loss: 0.606504, Train acc 0.781250 
Batch 528, Loss: 0.872513, Train acc 0.671875 
Batch 529, Loss: 0.859680, Train acc 0.640625 
Batch 530, Loss: 0.682548, Train acc 0.718750 
Batch 531, Loss: 0.659631, Train acc 0.765625 
Batch 532, Loss: 0.634728, Train acc 0.734375 
Batch 533, Loss: 0.907780, Train acc 0.625000 
Batch 534, Loss: 0.871456, Train acc 0.671875 
Batch 535, Loss: 0.650903, Train acc 0.703125 
Batch 536, Loss: 0.743913, Train acc 0.750000 
Batch 537, Lo

KeyboardInterrupt: 