# Convolutional Neural Network Example

Build a convolutional neural network with TensorFlow.

- Author: Aymeric Damien
- Project: https://github.com/aymericdamien/TensorFlow-Examples/

## CNN Overview

![CNN](http://personal.ie.cuhk.edu.hk/~ccloy/project_target_code/images/fig3.png)

## MNIST Dataset Overview

This example is using MNIST handwritten digits. The dataset contains 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centered in a fixed-size image (28x28 pixels) with values from 0 to 1. For simplicity, each image has been flatten and converted to a 1-D numpy array of 784 features (28*28).

![MNIST Dataset](http://neuralnetworksanddeeplearning.com/images/mnist_100_digits.png)

More info: http://yann.lecun.com/exdb/mnist/

In [1]:
from __future__ import division, print_function, absolute_import

import tensorflow as tf

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
#注意onehot=true，输出是0000001000
#诡异的直接调用tf自带训练集的方式，并不普适，还是要了解怎么预处理出来一个数据集
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


In [5]:
# Training Parameters
learning_rate = 0.001
num_steps = 500
batch_size = 128
display_step = 10

# Network Parameters
num_input = 784 # MNIST data input (img shape: 28*28)
num_classes = 10 # MNIST total classes (0-9 digits)
dropout = 0.75 # Dropout, probability to keep units

# tf Graph input
#注意X维度的定义，我们已知每个input instance（每张图片）是一个784*1的列向量，有多少图片不知道，所以这里用【None，num_input】
X = tf.placeholder(tf.float32, [None, num_input])
#输出用Y表示，但Y也同样表示的是N个onehot的10*1的列向量
Y = tf.placeholder(tf.float32, [None, num_classes])
#网络中的值，都要用placeholder，注意keep_prob用placeholder，而且是个1*1的量
keep_prob = tf.placeholder(tf.float32) # dropout (keep probability)

In [6]:
# Create some wrappers for simplicity
#良好的习惯是把conv2d和maxpool这些常用的函数封装起来，不过这样的话怎么用tensorboard？要弄清！！
def conv2d(x, W, b, strides=1):
    # Conv2D wrapper, with bias and relu activation
    #注意conv2d，四个必须的参数，strides是一个列表，长度为4，指定input四个维度上的stride，分别是batch，height，width，channel，
    #【1，strides，strides，1】就是每次只看一个图片的一个通道，横纵方向根据传入的strides参数变化，默认是1，就是卷积核挨着一步一步走
    #padding两种方式，‘SAME’表示自动zeropadding，‘VALID’表示不padding，注意padding并不保证图片维度和原来一样，只不过是不够了的时候
    #在边上补充一点0而已，图片会在maxpooling的时候被2*2的ksize缩小，这里28*28——>14*14——>7*7
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
    #卷积完还有个bias，b和x的维度相同
    x = tf.nn.bias_add(x, b)
    #加上bias之后得到preactivation，然后做一个relu的activation非线性化，注意维度是没有变的
    return tf.nn.relu(x)


def maxpool2d(x, k=2):
    # MaxPool2D wrapper
    #maxpooling，圈一个k*k的方块，调最大值，来减少局部冗余信息，注意这里也有个strides，一般是跟ksize一致，也就是圈的方块不重叠，这样的话
    #自然又涉及到padding的问题，这里用的是SAME
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
                          padding='SAME')


# Create model
def conv_net(x, weights, biases, dropout):
    # MNIST data input is a 1-D vector of 784 features (28*28 pixels)
    # Reshape to match picture format [Height x Width x Channel]
    # Tensor input become 4-D: [Batch Size, Height, Width, Channel]
    #做卷积之前要弄成标准的二维图片形式，-1表示自行推断batchsize，这里是黑白图，channel=1
    x = tf.reshape(x, shape=[-1, 28, 28, 1])
    
    # Convolution Layer
    conv1 = conv2d(x, weights['wc1'], biases['bc1'])
    # Max Pooling (down-sampling)
    conv1 = maxpool2d(conv1, k=2)

    # Convolution Layer
    conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
    # Max Pooling (down-sampling)
    conv2 = maxpool2d(conv2, k=2)

    # Fully connected layer
    # Reshape conv2 output to fit fully connected layer input
    #注意这句话，wd1是个tf.Variable，调用getshape函数得到shape，aslist函数转成list，然后取第一个元素，是全连接层input的维度，7*7*64
    #-1表示要保留batch，一行一个图片
    fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
    #fc1是batchsize*（7*7*64）的，weights是（7*7*64）的，会广播，乘出来还是fc1的形状，包括加bias也是广播
    fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
    fc1 = tf.nn.relu(fc1)
    # Apply Dropout
    #这里体现了dropout只需要在最后网络算完了，输出层之前，dropout一下全连接层，写法是这样的：
    fc1 = tf.nn.dropout(fc1, dropout)

    # Output, class prediction
    #全连接层activate之后，输出层还要再来个线性的wx+b生成1*10的输出，注意out是batchsize*10的
    out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
    return out

In [7]:
# Store layers weight & bias
#用一个字典统一管理w和b，可以借鉴，很清楚
weights = {
    # 5x5 conv, 1 input, 32 outputs
    #注意filter的维度 [filter_height, filter_width, in_channels, out_channels]
    #注意w和b是Variable，一开始定义的时候还要用random_normal
    'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
    # 5x5 conv, 32 inputs, 64 outputs
    'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
    # fully connected, 7*7*64 inputs, 1024 outputs
    'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),
    # 1024 inputs, 10 outputs (class prediction)
    'out': tf.Variable(tf.random_normal([1024, num_classes]))
}

biases = {
    'bc1': tf.Variable(tf.random_normal([32])),
    'bc2': tf.Variable(tf.random_normal([64])),
    'bd1': tf.Variable(tf.random_normal([1024])),
    'out': tf.Variable(tf.random_normal([num_classes]))
}

# Construct model
#注意这里生成的是1*10的一堆乱七八糟的数，还没有softmax呢
logits = conv_net(X, weights, biases, keep_prob)
#prediction只是softmax之后，0.112，0.721等等，还没有argmax，再loss function那里argmax
prediction = tf.nn.softmax(logits)

# Define loss and optimizer
#注意这个softmaxcrossentropywithlogits正如名字说的，接收未normalize的10个数作为输入！！！！！！
#labels是那个Y的placeholder，就是数据集里的正确答案，一堆onehot 1*10 vector
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
#注意这里有三步，定义loss，定义optimizer，train=optimizer.minimize(loss)
train_op = optimizer.minimize(loss_op)


# Evaluate model
#argmax 1表示按行，0表示按列,这里每行都是一个图片，所以按行
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

In [12]:
# Start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)

    for step in range(1, num_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Run optimization op (backprop)
        #注意对整个网络的引用是在这里引用trainop实现的
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y, keep_prob: dropout})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            #注意这里keepbrob是1！！！！就是当evaluate的时候要用1，训练才dropout提高健壮性
            loss, acc= sess.run([loss_op, accuracy], feed_dict={X: batch_x,
                                                                 Y: batch_y,
                                                                 keep_prob: 1.0})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc))
        
    print("Optimization Finished!")

    # Calculate accuracy for 256 MNIST test images
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={X: mnist.test.images[:256],
                                      Y: mnist.test.labels[:256],
                                      keep_prob: 1.0}))


Step 1, Minibatch Loss= 107670.4609, Training Accuracy= 0.094
Step 10, Minibatch Loss= 37669.4297, Training Accuracy= 0.312
Step 20, Minibatch Loss= 14097.5498, Training Accuracy= 0.555
Step 30, Minibatch Loss= 10744.9844, Training Accuracy= 0.555
Step 40, Minibatch Loss= 5663.2852, Training Accuracy= 0.711
Step 50, Minibatch Loss= 5769.4585, Training Accuracy= 0.742
Step 60, Minibatch Loss= 4217.8574, Training Accuracy= 0.812
Step 70, Minibatch Loss= 4569.0835, Training Accuracy= 0.758
Step 80, Minibatch Loss= 2117.9468, Training Accuracy= 0.898
Step 90, Minibatch Loss= 2355.5264, Training Accuracy= 0.891
Step 100, Minibatch Loss= 2790.5195, Training Accuracy= 0.859
Step 110, Minibatch Loss= 1342.5957, Training Accuracy= 0.914
Step 120, Minibatch Loss= 1118.8083, Training Accuracy= 0.930
Step 130, Minibatch Loss= 2471.0046, Training Accuracy= 0.875
Step 140, Minibatch Loss= 990.0022, Training Accuracy= 0.945
Step 150, Minibatch Loss= 2667.6196, Training Accuracy= 0.875
Step 160, Minib

(128, 10)