## 卷积神经网络 — 从0开始
之前的教程里，在输入神经网络前我们将输入图片直接转成了向量。这样做有两个不好的地方：

在图片里相近的像素在向量表示里可能很远，从而模型很难捕获他们的空间关系。
对于大图片输入，模型可能会很大。例如输入是$256×256×3$的照片（仍然远比手机拍的小），输出层是1000，那么这一层的模型大小是将近1GB.
这一节我们介绍卷积神经网络，其有效了解决了上述两个问题。

### 卷积神经网络
卷积神经网络是指主要由卷积层构成的神经网络。

### 卷积层
卷积层跟前面的全连接层类似，但输入和权重不是做简单的矩阵乘法，而是使用每次作用在一个窗口上的卷积。下图演示了输入是一个$4×4$矩阵，使用一个$3×3$的权重，计算得到$2×2$结果的过程。每次我们采样一个跟权重一样大小的窗口，让它跟权重做按元素的乘法然后相加。通常我们也是用卷积的术语把这个权重叫kernel或者filter。

![image.png](http://zh.gluon.ai/_images/no_padding_no_strides.gif)

（图片版权属于vdumoulin@github）

我们使用`tf.nn.conv2d`来演示这个。

In [1]:
import numpy as np
import tensorflow as tf

#height*width*input_channels*output_channels
w = tf.constant(np.arange(4).reshape((2,2,1,1)), dtype=tf.float32)
b = tf.constant(np.array(1), dtype=tf.float32)
#batch_size*height*width*input_channels
data = tf.constant(np.arange(9).reshape(1,3,3,1), dtype=tf.float32)
 
output = tf.nn.conv2d(data, w, strides = [1,1,1,1], padding ='VALID')
output += b
sess = tf.InteractiveSession()
out = sess.run(output)

print('input:', data.eval(), '\n\nweight:', w.eval(), '\n\nbias:', b.eval(), '\n\noutput:', out)


('input:', array([[[[0.],
         [1.],
         [2.]],

        [[3.],
         [4.],
         [5.]],

        [[6.],
         [7.],
         [8.]]]], dtype=float32), '\n\nweight:', array([[[[0.]],

        [[1.]]],


       [[[2.]],

        [[3.]]]], dtype=float32), '\n\nbias:', 1.0, '\n\noutput:', array([[[[20.],
         [26.]],

        [[38.],
         [44.]]]], dtype=float32))


我们可以控制如何移动窗口，和在边缘的时候如何填充窗口。下图演示了stride=2和pad=1。
![image.png](http://zh.gluon.ai/_images/padding_strides.gif)


In [2]:
data_pad = tf.pad(data, [[0,0],[1,1],[1,1],[0,0]])
output = tf.nn.conv2d(data_pad, w, strides=(1,2,2,1), padding='VALID')
output += b

print('input:', data_pad.eval(), '\n\nweight:', w.eval(), '\n\nbias:', b.eval(), '\n\noutput:', output.eval())

('input:', array([[[[0.],
         [0.],
         [0.],
         [0.],
         [0.]],

        [[0.],
         [0.],
         [1.],
         [2.],
         [0.]],

        [[0.],
         [3.],
         [4.],
         [5.],
         [0.]],

        [[0.],
         [6.],
         [7.],
         [8.],
         [0.]],

        [[0.],
         [0.],
         [0.],
         [0.],
         [0.]]]], dtype=float32), '\n\nweight:', array([[[[0.]],

        [[1.]]],


       [[[2.]],

        [[3.]]]], dtype=float32), '\n\nbias:', 1.0, '\n\noutput:', array([[[[ 1.],
         [ 9.]],

        [[22.],
         [44.]]]], dtype=float32))


当输入数据有多个通道的时候，每个通道会有对应的权重，然后会对每个通道做卷积之后在通道之间求和

$conv(data, w, b) = \sum_i conv(data[:,i,:,:], w[:,i,:,:], b)$

In [3]:
w = tf.constant(np.arange(8).reshape((2,2,2,1)), dtype=tf.float32)
data = tf.constant(np.arange(18).reshape(1,3,3,2), dtype=tf.float32)

output = tf.nn.conv2d(data, w, strides = [1,1,1,1], padding ='VALID')
output += b
print('input:', data.eval(), '\n\nweight:', w.eval(), '\n\nbias:', b.eval(), '\n\noutput:', output.eval())

('input:', array([[[[ 0.,  1.],
         [ 2.,  3.],
         [ 4.,  5.]],

        [[ 6.,  7.],
         [ 8.,  9.],
         [10., 11.]],

        [[12., 13.],
         [14., 15.],
         [16., 17.]]]], dtype=float32), '\n\nweight:', array([[[[0.],
         [1.]],

        [[2.],
         [3.]]],


       [[[4.],
         [5.]],

        [[6.],
         [7.]]]], dtype=float32), '\n\nbias:', 1.0, '\n\noutput:', array([[[[185.],
         [241.]],

        [[353.],
         [409.]]]], dtype=float32))


当输出需要多通道时，每个输出通道有对应权重，然后每个通道上做卷积。

$conv(data, w, b)[:,i,:,:] = conv(data, w[i,:,:,:], b[i])$

In [4]:
w = tf.constant(np.arange(16).reshape((2,2,2,2)), dtype=tf.float32)
data = tf.constant(np.arange(18).reshape(1,3,3,2), dtype=tf.float32)
b = tf.constant(np.array([1,2]), dtype=tf.float32)

output = tf.nn.conv2d(data, w, strides = [1,1,1,1], padding ='VALID')
output += b
print('input:', data.eval(), '\n\nweight:', w.eval(), '\n\nbias:', b.eval(), '\n\noutput:', output.eval())

('input:', array([[[[ 0.,  1.],
         [ 2.,  3.],
         [ 4.,  5.]],

        [[ 6.,  7.],
         [ 8.,  9.],
         [10., 11.]],

        [[12., 13.],
         [14., 15.],
         [16., 17.]]]], dtype=float32), '\n\nweight:', array([[[[ 0.,  1.],
         [ 2.,  3.]],

        [[ 4.,  5.],
         [ 6.,  7.]]],


       [[[ 8.,  9.],
         [10., 11.]],

        [[12., 13.],
         [14., 15.]]]], dtype=float32), '\n\nbias:', array([1., 2.], dtype=float32), '\n\noutput:', array([[[[369., 406.],
         [481., 534.]],

        [[705., 790.],
         [817., 918.]]]], dtype=float32))


### 池化层（pooling）
因为卷积层每次作用在一个窗口，它对位置很敏感。池化层能够很好的缓解这个问题。它跟卷积类似每次看一个小窗口，然后选出窗口里面最大的元素，或者平均元素作为输出。

In [5]:
data = tf.constant(np.arange(18).reshape(1,3,3,2), dtype=tf.float32)
max_pool = tf.nn.max_pool(data, [1,2,2,1], [1,1,1,1], padding='VALID')
avg_pool = tf.nn.avg_pool(data, [1,2,2,1], [1,1,1,1], padding='VALID')

print('data:', data.eval(), '\n\nmax pooling:', max_pool.eval(), '\n\navg pooling:', avg_pool.eval())

('data:', array([[[[ 0.,  1.],
         [ 2.,  3.],
         [ 4.,  5.]],

        [[ 6.,  7.],
         [ 8.,  9.],
         [10., 11.]],

        [[12., 13.],
         [14., 15.],
         [16., 17.]]]], dtype=float32), '\n\nmax pooling:', array([[[[ 8.,  9.],
         [10., 11.]],

        [[14., 15.],
         [16., 17.]]]], dtype=float32), '\n\navg pooling:', array([[[[ 4.,  5.],
         [ 6.,  7.]],

        [[10., 11.],
         [12., 13.]]]], dtype=float32))


下面我们可以开始使用这些层构建模型了。

### 获取数据
我们继续使用FashionMNIST（希望你还没有彻底厌烦这个数据）



In [6]:
import sys

sys.path.append('../utils')
import utils

data_dir = '../data/fashion_mnist'
train_images, train_labels, test_images, test_labels = utils.load_data_fashion_mnist(data_dir, one_hot=True)
print train_images.shape
print train_labels.shape

from tensorflow.contrib.learn.python.learn.datasets.mnist import DataSet
train_dataset = DataSet(train_images, train_labels, one_hot=True)

Extracting ../data/fashion_mnist/train-images-idx3-ubyte.gz
Extracting ../data/fashion_mnist/train-labels-idx1-ubyte.gz
Extracting ../data/fashion_mnist/t10k-images-idx3-ubyte.gz
Extracting ../data/fashion_mnist/t10k-labels-idx1-ubyte.gz
(60000, 28, 28, 1)
(60000, 10)


### 定义模型


我们使用MNIST常用的LeNet，它有两个卷积层，之后是两个全连接层。

In [7]:
import tensorflow as tf
with tf.name_scope('cnn'):
    # output channels = 20, kernel = (5,5)
    W1 = tf.Variable(tf.truncated_normal([5,5,1,20], mean=0.0, stddev=0.01, seed=None, dtype=tf.float32))
    b1 = tf.Variable(tf.constant(0.0, shape=[20]))
    # output channels = 50, kernel = (3,3)
    W2 = tf.Variable(tf.truncated_normal([3,3,20,50], mean=0.0, stddev=0.01, seed=None, dtype=tf.float32))
    b2 = tf.Variable(tf.constant(0.0, shape=[50]))
    # output dim = 128
    W3 = tf.Variable(tf.truncated_normal([1250, 128], mean=0.0, stddev=0.01, seed=None, dtype=tf.float32))
    #W3 = tf.Variable(tf.random_normal([784, 128], mean=0.0, stddev=0.01, seed=None, dtype=tf.float32))
    b3 = tf.Variable(tf.constant(0.0, shape=[128]))
    # output dim = 10
    W4 = tf.Variable(tf.truncated_normal([128, 10], mean=0.0, stddev=0.01, seed=None, dtype=tf.float32))
    b4 = tf.Variable(tf.constant(0.0, shape=[10]))    

    params = [W1, b1, W2, b2, W3, b3, W4, b4]


卷积模块通常是“卷积层-激活层-池化层”。然后转成2D矩阵输出给后面的全连接层。



In [8]:
def net(X, verbose=False):
    # 第一层卷积
    #'''
    h1_conv = tf.nn.conv2d(X, W1, strides = [1,1,1,1], padding = 'VALID') 
    h1_activation = tf.nn.relu(h1_conv)
    h1 = tf.nn.max_pool(h1_activation, [1,2,2,1], [1,2,2,1], padding = 'VALID')
    # 第二层卷积
    h2_conv = tf.nn.conv2d(h1, W2, strides = [1,1,1,1], padding = 'VALID') 
    h2_activation = tf.nn.relu(h2_conv)
    h2 = tf.nn.max_pool(h2_activation, [1,2,2,1], [1,2,2,1], padding='VALID')
    h2 = tf.layers.flatten(h2)
    # 第一层全连接
    #'''
    h3_linear = tf.matmul(h2, W3) + b3

    #h3_linear = tf.matmul(X, W3) + b3
    h3 = tf.nn.relu(h3_linear)
    # 第二层全连接
    h4_linear = tf.matmul(h3, W4) + b4
    if verbose:
        print('1st conv block:', h1.get_shape().as_list())
        print('2nd conv block:', h2.get_shape().as_list())
        print('1st dense:', h3.get_shape().as_list())
        print('2nd dense:', h4_linear.get_shape().as_list())
        print('output:', h4_linear)
    return h4_linear, h2_activation

测试一下，输出中间结果形状（当然可以直接打印结果)和最终结果。



In [9]:
    data, label = train_dataset.next_batch(64)
    data = tf.reshape(data, [64, 28, 28, 1])
    init = tf.global_variables_initializer()
    sess = tf.InteractiveSession()
    sess.run(init)
    out = net(data, verbose=True)
    #print out.eval()

('1st conv block:', [64, 12, 12, 20])
('2nd conv block:', [64, 1250])
('1st dense:', [64, 128])
('2nd dense:', [64, 10])
('output:', <tf.Tensor 'add_5:0' shape=(64, 10) dtype=float32>)


### 训练
跟前面没有什么不同的。

In [10]:
import numpy as np

learning_rate = 1e-3
max_steps = 1000
batch_size = 256
height = width = 28
num_channels = 1
num_outputs = 10

input_placeholder = tf.placeholder(tf.float32, [None, height, width, num_channels])
#input_placeholder = tf.placeholder(tf.float32, [None, height*width*num_channels])

gt_placeholder = tf.placeholder(tf.int64, [None, num_outputs])
logits, h2 = net(input_placeholder)
loss = tf.losses.softmax_cross_entropy(logits=logits,  onehot_labels=gt_placeholder)
acc = utils.accuracy(logits, gt_placeholder)
test_images_reshape = np.reshape(np.squeeze(test_images), (test_images.shape[0], height, width, num_channels))
    
train_op = tf.train.AdamOptimizer(learning_rate).minimize(loss)

init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)

for step in range(max_steps):
    data, label = train_dataset.next_batch(batch_size)
    data = np.reshape(data, (batch_size, height, width, num_channels))
    feed_dict = {input_placeholder: data, gt_placeholder: label}
    h2_, loss_, acc_, _ = sess.run([h2, loss, acc, train_op], feed_dict=feed_dict)
    if step % 10 == 0:
        print("Batch %d, Loss: %f, Train acc %f " % (step, loss_, acc_))

test_loss_, test_acc_ = sess.run([loss, acc], feed_dict={input_placeholder: test_images_reshape / 255.0, gt_placeholder: test_labels})
print ("Test Loss: %f, Test acc %f " % (test_loss_, test_acc_))


Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

Batch 0, Loss: 2.302582, Train acc 0.164062 
Batch 10, Loss: 2.276734, Train acc 0.312500 
Batch 20, Loss: 1.795223, Train acc 0.414062 
Batch 30, Loss: 1.199801, Train acc 0.492188 
Batch 40, Loss: 1.144416, Train acc 0.574219 
Batch 50, Loss: 1.097209, Train acc 0.550781 
Batch 60, Loss: 0.937172, Train acc 0.617188 
Batch 70, Loss: 0.934170, Train acc 0.601562 
Batch 80, Loss: 0.938392, Train acc 0.683594 
Batch 90, Loss: 0.908565, Train acc 0.675781 
Batch 100, Loss: 0.718502, Train acc 0.730469 
Batch 110, Loss: 0.777633, Train acc 0.691406 
Batch 120, Loss: 0.788591, Train acc 0.699219 
Batch 130, Loss: 0.820860, Train acc 0.718750 
Batch 140, Loss: 0.622196, Train acc 0.765625 
Batch 150, Loss: 0.796487, Train acc 0.687500 
Batch 160, Loss: 0.850881, Train acc 0.691406 
Batch 170, Loss: 0.7160