#  Convolutional Neuran Networks

## Convolutional layer
Convolutional Neural Network (CNN) is a special type of feedforward neural network originally
employed in the field of computer vision. Its design is inspired by the human visual cortex, a visual
mechanism in animal brain. The visual cortex contains a lot of cells that are responsible for detecting
light in small and overlapping sub-regions of the visual fields, which are called receptive fields. These
cells act as local filters over the input space. CNN consists of multiple convolutional layers, each of
which performs the function that is processed by the cells in the visual cortex.

A neuron’s weights can be represented as a small image the size of the receptive field.
    1. • filters is the set of filters to apply (also a 4D tensor, as explained earlier).
    2. • strides is a four-element 1D array, where the two central elements are the verti‐
    cal and horizontal strides (s h and s w ). The first and last elements must currently
    be equal to 1. They may one day be used to specify a batch stride (to skip some
    instances) and a channel stride (to skip some of the previous layer’s feature maps
    or channels).
    3. • padding must be either "VALID" or "SAME" :
        — If set to "VALID" , the convolutional layer does not use zero padding, and may
    ignore some rows and columns at the bottom and right of the input image,
    depending on the stride, as shown in Figure 13-7 (for simplicity, only the hor‐
    izontal dimension is shown here, but of course the same logic applies to the
    vertical dimension).
        — If set to "SAME" , the convolutional layer uses zero padding if necessary. In this
    case, the number of output neurons is equal to the number of input neurons
    divided by the stride, rounded up (in this example, ceil (13 / 5) = 3). Then
    zeros are added as evenly as possible around the inputs.
## Pooling Layer
Their goal is to subsample (i.e., shrink) the input image in order to
reduce the computational load, the memory usage, and the number of parameters
(thereby limiting the risk of overfitting). Reducing the input image size also makes
the neural network tolerate a little bit of image shift (location invariance).

<img src=conv1.png>
<img src=conv2.png>
<img src=conv3.png>
<img src=conv5.png>

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/")

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [28]:
# Parameters
learning_rate = 0.001
training_iters = 200000
batch_size = 150
display_step = 10
n_epochs = 5

# Network Parameters
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
tf.reset_default_graph()
# tf Graph input
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.int64, [None])

X=tf.reshape(x, shape=[-1, 28, 28, 1])
conv1 = tf.layers.conv2d(
      inputs=X,
      filters=32,
      kernel_size=[5, 5],
      padding="same",
      activation=tf.nn.relu)
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)
conv2 = tf.layers.conv2d(
      inputs=pool1,
      filters=64,
      kernel_size=[5, 5],
      padding="same",
      activation=tf.nn.relu)
pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)
pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])
fc1 = tf.contrib.layers.fully_connected(inputs=pool2_flat, num_outputs=128, activation_fn=tf.nn.relu)
logits=tf.contrib.layers.fully_connected(inputs=fc1, num_outputs=n_classes, activation_fn=None)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# # Evaluate model
correct_pred = tf.equal(tf.argmax(logits, 1), y)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf.global_variables_initializer()

In [29]:
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples // batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(optimizer, feed_dict={x: X_batch, y: y_batch})
            acc_train = accuracy.eval(feed_dict={x: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={x: mnist.test.images[:1000],y: mnist.test.labels[:1000]})
        print(epoch, "Train accuracy:", acc_train, "Test accuracy:", acc_test)

0 Train accuracy: 1.0 Test accuracy: 0.986
1 Train accuracy: 1.0 Test accuracy: 0.988
2 Train accuracy: 1.0 Test accuracy: 0.986
3 Train accuracy: 0.986667 Test accuracy: 0.991
4 Train accuracy: 1.0 Test accuracy: 0.992
