# cheungdaven/cheungdaven.github.io

Switch branches/tags
Nothing to show
Fetching contributors…
Cannot retrieve contributors at this time
176 lines (127 sloc) 10.1 KB
post
true
Tensorflow

## Tensorflow教程2 Deep Mnist using CNN

• content {:toc}

### CNN简介

#### Convolution

convolution首先要定义一个kernel的矩阵，然后一步一步的对整个图片进行映射，并计算，例如，下面的图片，kernel矩阵为[[1,0,1][0,1,0],[1,0,1]]（这个在权值W中设定形状，例如第一层的[5,5,1,32]，kernel就是一个5×5×1的矩阵），在原始图片上一步一步（通过stride参数设定）的移动这个kernel矩阵，最后对下图的映射过程如下:

* 原始矩阵 ![before](https://ujwlkarn.files.wordpress.com/2016/07/screen-shot-2016-07-24-at-11-25-13-pm.png?w=127&h=115) * kernel矩阵 ![kernel](https://ujwlkarn.files.wordpress.com/2016/07/screen-shot-2016-07-24-at-11-25-24-pm.png?w=74&h=64) * after convolution ![after](https://ujwlkarn.files.wordpress.com/2016/07/convolution_schematic.gif?w=268&h=196)

#### Pooling

![pool](http://cs231n.github.io/assets/cnn/maxpool.jpeg)

#### 结构

![1](http://img.blog.csdn.net/20161210213342253?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvemhhbmdzaHVhaXpheGlh/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast)

> 1. zero-padding the 28x28x1 image to 32x32x1 2. applying 5x5x32 convolution to get 28x28x32 3. max-pooling down to 14x14x32 4. zero-padding the 14x14x32 to 18x18x32 5. applying 5x5x32x64 convolution to get 14x14x64 6. max-pooling down to 7x7x64.

### 代码实现

{% highlight python %} from tensorflow.examples.tutorials.mnist import input_data import tensorflow as tf import argparse import sys

FLAGS = None

def weigt_variable(shape): initial = tf.truncated_normal(shape, stddev=0.1) return tf.Variable(initial)

def bias_variable(shape): initial = tf.constant(0.1, shape=shape) return tf.Variable(initial)

def conv2d(x,W): return tf.nn.conv2d(x,W,strides=[1,1,1,1],padding='SAME')#Computes a 2-D convolution given 4-D `input` and `filter` tensors#

def max_pool_2_2(x): return tf.nn.max_pool(x, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')

``````x = tf.placeholder(tf.float32,[None, 784])
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
y = tf.matmul(x,W) + b #matmul means matrix multiplication

#defince the loss and optimizer
y_ = tf.placeholder(tf.float32,[None,10])

# first layer
W_conv1 = weigt_variable([5,5,1,32])
b_conv1 = bias_variable([32])

x_image = tf.reshape(x,[-1,28,28,1])

h_conv1 = tf.nn.relu(conv2d(x_image,W_conv1)+b_conv1)
h_pool1 = max_pool_2_2(h_conv1)

# seconde layer
W_conv2 = weigt_variable([5,5,32,64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1,W_conv2)+b_conv2)
h_pool2 = max_pool_2_2(h_conv2)

# densely connected layer
W_fc1 = weigt_variable(([7*7*64, 1024]))
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1)+b_fc1)

# dropout
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

W_fc2 = weigt_variable([1024,10])
b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv,y_))

correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

sess = tf.InteractiveSession()
# train
sess.run(tf.global_variables_initializer())

for i in range(20000):
batch = mnist.train.next_batch(100)
if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={x: batch[0],y_: batch[1], keep_prob: 1.0 })
print("step %d, training accuracy %g"%(i, train_accuracy))
train_step.run(feed_dict={x:batch[0],y_:batch[1], keep_prob: 0.5})
print("test accuracy %g"%accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels,keep_prob: 1.0}))
``````

if name == 'main': parser = argparse.ArgumentParser() parser.add_argument('--data_dir', type=str, default='dataset', help="Directory for storing data") FLAGS, unparsed = parser.parse_known_args() tf.app.run(main = main, argv=[sys.argv[0]] + unparsed)

{% endhighlight %}

* weight的结果什么意思？

[5,5,32,64],注意第二次卷积的kernel的channel是32，而不是1，所以输出的矩阵大小为14×14×64，而不是14*14*2048.

• strides为什么是4维的？

4维分别代表了[batch, height, width, channels]，大多数情况下strides = [1, stride, stride, 1], 首先第一个1代表的是，是否跳过一些样本，比如我们是每次训练100个，如果为1的话，就是从1，2，3，...到100。 如果为2的话，就是1，3，5，...；其次，最后一个数据的意思，代表了一次跳多少个channel，但是一般图片也就几个channel，所以没必要跳了。

[question](http://stackoverflow.com/questions/34642595/tensorflow-strides-argument)
• conv2d的解释

[conv2d](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/api_docs/python/functions_and_classes/shard8/tf.nn.conv2d.md)
• reshape的时候的-1啥意思？[-1,28,28,1]

-1 is a placeholder that says "adjust as necessary to match the size needed for the full tensor." It's a way of making the code be independent of the input batch size, so that you can change your pipeline and not have to adjust the batch size everywhere in the code，也就是说可以自适应，再修改pipelinde的时候更加的灵活

• 为什么要使用weight_variableh和bias_variable对初始值进行设定，而不默认全部使用0？

\$\$ f(x) = max\{0,x\} \$\$