# Convolutional Neural Networks

Let's adapt the model we've built yesterday with convolutional layers instead of fully connected ones.

## Recap

As a first simple model, we have implemented a neural network performing softmax regression, which is a straightforward way to do classification. It looked like this:

![alt text](images/softmax-regression-scalargraph.png "Softmax Regression graph")

Then, we expanded the model to include a hidden layer, and tried to look at our model from a _flowing tensors_ perspective.

![alt text](images/fc.png "The flow of tensors")

In [1]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [2]:
import tensorflow as tf

def conv_weights(filters_size, channels_size, name):
    shape = filters_size + channels_size
    return tf.Variable(tf.truncated_normal(shape, stddev=0.1), name=name)

def conv(x, W, stride, name):
    strides_shape = [1, stride, stride, 1]
    return tf.nn.conv2d(x, W, strides_shape, padding='SAME', name=name)

def pool(x, size, stride, name):
    pool_shape = [1] + size + [1]
    strides_shape = [1, stride, stride, 1]
    return tf.nn.max_pool(x, pool_shape, strides_shape, padding='SAME', name=name)

TensorFlow `conv2d` function requires the weights to be structured in a 4D tensor with dimension `[filter_width, filter_height, channel_in, channel_out]`, where:

- `filter_width` and `filter_height` represent the dimension of our convolutional filters
- `channel_in` represents the input channel size, be it 1 of grayscale, 3 for RGB, or any other number
- `channel_out` can be seen as the number of filters we are applying to our input, or as the number of feature maps we'll generate as output

`conv2d` also requires the input to be a 4D tensor with dimension `[batch_size, input_height, input_width, channel_in]`.

The strides vector required by `conv2d` is a 4-integer list, mapping each stride to each input dimension (`[batch_size, input_height, input_width, channel_in]`). Since the stride makes sense only when applied to the height and weight of your input, not to the batch or channel dimensions, it will (almost?) always found as `[1, stride, stride, 1]`.

![alt text](images/convolution.png "Convolution")

In [3]:
import math

BATCH_SIZE = 100

x = tf.placeholder(tf.float32, [None, 28 * 28], name='input_images')
y_ = tf.placeholder(tf.float32, [None, 10], name='labels')

input_images = tf.reshape(x, [-1, 28, 28, 1])
W1 = conv_weights([3, 3], [1, 32], 'L1_weights')
b1 = tf.Variable(tf.constant(0.1, shape=[32]), name='L1_biases')
c1 = conv(input_images, W1, stride=2, name='L1_conv')
h1 = tf.nn.relu(tf.nn.bias_add(c1, b1), name='L1_ReLU')
p1 = pool(h1, size=[2, 2], stride=2, name='L1_pool')

tf.summary.image('input', input_images)
tf.summary.histogram('L1_weights', W1)

W2 = tf.Variable(tf.truncated_normal([7 * 7 * 32, 10], stddev=1 / math.sqrt(7 * 7 * 32)), name='L2_weights')
b2 = tf.Variable(tf.constant(0.1, shape=[10]), name='L2_biases')
p1_flat = tf.reshape(p1, [-1, 7 * 7 * 32])
logits = tf.matmul(p1_flat, W2) + b2
y = tf.nn.softmax(logits, name='softmax')

tf.summary.histogram('L2_weights', W2)
tf.summary.histogram('logits', logits)
tf.summary.histogram('output', y)

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]), name='cross_entropy')
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
tf.summary.scalar('loss', cross_entropy)

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1), name='correct_prediction')
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name='train_accuracy')
tf.summary.scalar('accuracy', accuracy)

init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
merged_summaries = tf.summary.merge_all()
summary_writer = tf.summary.FileWriter('./summary/conv', sess.graph)
sess.run(init)
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(BATCH_SIZE)
    _, summary = sess.run([train_step, merged_summaries], feed_dict={x: batch_xs, y_: batch_ys})
    summary_writer.add_summary(summary, i)

print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.969
