<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Convolutional-Neural-Networks" data-toc-modified-id="Convolutional-Neural-Networks-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Convolutional Neural Networks</a></span><ul class="toc-item"><li><span><a href="#Network-Architecture" data-toc-modified-id="Network-Architecture-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Network Architecture</a></span></li><li><span><a href="#Data-Layer" data-toc-modified-id="Data-Layer-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Data Layer</a></span></li><li><span><a href="#Convolution,-Max-Pooling,-and-ReLu" data-toc-modified-id="Convolution,-Max-Pooling,-and-ReLu-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Convolution, Max Pooling, and ReLu</a></span></li><li><span><a href="#Fully-Connected-Layer" data-toc-modified-id="Fully-Connected-Layer-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Fully Connected Layer</a></span></li><li><span><a href="#Cost-and-Accuracy" data-toc-modified-id="Cost-and-Accuracy-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Cost and Accuracy</a></span></li><li><span><a href="#Configuring-the-Network" data-toc-modified-id="Configuring-the-Network-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Configuring the Network</a></span></li></ul></li></ul></div>

# Convolutional Neural Networks
Today we'll configure and train our first convolutional neural network! We'll train our network on a classic computer vision data set - MNIST. The MNIST data is a collection of 28x28 grayscale (i.e. single-channel) images of handwrittend digits (numbers 0-9). We'll train our network to classify digits. TensorFlow gives us access to the MNIST dataset, and has preprocessed the images so that the pixel values are between 0 and 1 rather than 0 and 255. 

*Run the cell below to import TensorFlow and the MNIST data set.*

In [None]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

data = input_data.read_data_sets('data/MNIST/', one_hot=True)

The `input_data.read_data_sets` method has downloaded and stored the MNIST data in a new directory called `./data`. The variable we saved - `data` - contains training, test, and validation data. You can access the training images at `data.train.images` and the corresponding labels at `data.train.labels`. The data object also has a data generator method to produce mini-batches - `data.train.next_batch`.

Note that when we imported our dataset, we passed the kwarg `one_hot`. This specifies that our labels should be given to us in a "one-hot" encoding. Instead of labeling each image with an integer from 0 to 9 representing which digit it is, we label our images with a vector with ten entries. One position in the vector should be a 1, while the rest of the entries should be 0. For instance, in a one-hot encoding, we would label an image of the digit 0 with the vector:

$$\begin{bmatrix}1&0&0&0&0&0&0&0&0&0\end{bmatrix},$$

and the digit 5 would be labeled with the vector:

$$\begin{bmatrix}0&0&0&0&0&1&0&0&0&0\end{bmatrix}.$$

## Network Architecture
We will create a network with two convolutional layers. After each convolution, we will apply 2x2 max pooling, and then ReLu activation. After two convolutions, we will use a fully connected layer with one neuron for each class of digit (i.e. each neuron in the fc layer is meant to detect the presence of a particular digit). We then apply a softmax activation, which turns the output of the final layer into a probability distribution (i.e. maintains their relative activation levels but scales them so that the sum of all activation in the layer is 1). Refer to the diagram below:

<img src="images/network-architecture.png" style="height: 300px;"/>

## Data Layer

The data layer is configured below. Since the images are given to us as one-dimensional vectors, we will reshape the raw input so it'll be an appropriate shape for convolution. Also note that if then size of a tensor in a certain dimension is unkown, you may pass `None`.

In [None]:
with tf.name_scope('data_layer'):
    X = tf.placeholder(dtype=tf.float32, shape=[None, 28*28], name="raw_input")
    X_images = tf.reshape(X, shape=[-1,28,28,1])
    y = tf.placeholder(dtype=tf.float32, shape=[None, 10], name="y")

## Convolution, Max Pooling, and ReLu
Below, we will write a function that we will use to create the two Conv + Max Pool + ReLu layers.

* input:
    1. `input_tensor`: a 4D tensor with shape [`n_images`, `height`, `width`, `n_input_channels`]
    1. `n_input_channels`: the number of input channels for each image (a gray-scale image has a single channel, while an rgb image has three)
    1. `filter_size`: the size of the convolutional filters we will use
    1. `n_filters`: the number of filters/depth of the convolutional layer. This will correspond to the number of channels that are output from this layer.
    1. `name`: a name to create a name_scope for the layer.
    
* output:
    1. `(relu, weights, bias)`: where `relu` is the tensorflow activation operation, and `weights` and `bias` are the tensorflow variables representing the weights/biases used in the convolutional layer.
    
Within a name scope corresponding to the parameter `name`, you function should create the following operations:

1. `weights`: a tensorflow variable with shape [filter_size, filter_size, n_input_channels, n_filters]. Pick an appropriate initializer.
1. `bias`: a variable with shape [n_filters]. Pick an appropriate initializer.
1. `conv`: a `tf.nn.conv2d` operation, which relies on `input_tensor` and `weights`. Strides should be 1 in each dimension, and padding should be "SAME".
1. `conv_and_bias`: the result of adding `bias` to `conv`.
1. `max_pool`: a `tf.nn.max_pool` layer applied to `conv_and_bias`. The kernel size should be [1,2,2,1], and stride should also be [1,2,2,1].
1. `relu`: finally, apply a `tf.nn.relu` activation to `max_pool`. Your function should return a tuple consisting of (`relu`, `weights`, `bias`).

In [None]:
def conv_w_pool_and_relu(input_tensor, n_input_channels = 1, filter_size = 5, n_filters = 10,
                         name = "conv_layer"):
    pass

## Fully Connected Layer
At the end of our network, we will use a fully connected layer. We want one neuron for each class our data can belong to. That way, we will train each neuron in the layer to recognize digits that belong to their class.

* input:
    1. `input_tensor`: This will be a tensor which comes from the output of one of our convolutional layers. As such, its shape will be [`n_images`, `height`, `width`, `n_input_channels`] 
    1. `name`: a name to create a name_scope for the layer.
   
Within a name scope corresponding to the parameter `name`, you function should create the following operations:

* `flattened`: Your function should first flatten the `input_tensor` using `tf.reshape` so that it has dimensions [`n_images`, `height * width * n_input_channels`].
* `weights`: a `tf.Variable` with appropriate dimensions for a fully connected layer with 10 neurons.
* `bias`: a `tf.Variable` with appropriate dimensions for the bias of a layer with 10 neurons.
* `logits`: the result of applying weights to the `flattened` input and adding on the bias.
* `y_pred`: the result of applying `tf.nn.softmax` to `logits`.

Your function should return the tuple `(logits, y_pred)`.

In [None]:
def fc_layer(input_tensor, name):
    pass

## Cost and Accuracy
Now we'll develop the operations necessary to assess the performance of our network. We will use cross entropy as our loss function, and will also create an accuracy operation that calculates the percent of images that are correctly classified.

* input:
    1. `logits`: the raw output of the fully connected layer without softmax applied
    1. `y_pred`: predictions made by the fully connected layer
    1. `y`: the actual labels for a batch of images
    1. `name`: a name to create a name_scope for the layer.
   
Within a name scope corresponding to the parameter `name`, you function should create the following operations:

* `cross_ent`: Use `tf.nn.softmax_cross_entropy_with_logits` to create a node that computes the cross entropy for each image in the batch.
* `cost`: Use `tf.reduce_mean` to compute the mean cross entropy for images in the batch.
* `y_cls` and `y_pred_cls`: Use `tf.argmax` to turn the one-hot encoded tensors `y` and `y_pred` into tensors of integers (where the integers represent classes).
* `correct_predictions`: Use `tf.cast` and `tf.equal` to create a tensor of type `tf.float32` with shape [`n_images`]. The value of the $i^{th}$ entry should be 0 if the $ith$ entry in `y_cls` and `y_pred` don't match, and should be 1 if they do match.
* `accuracy`: Use `tf.reduce_mean` to create a node the tracks the percent of images that are correctly classified by our model.

Our function should return the tuple `(cost, accuracy)`.

In [None]:
def cost_and_accuracy(logits, y_pred, y, name):
    pass

## Configuring the Network
In the cell below, we configure the forward pass and cost/accuracy calculation portions of our network, using the functions we wrote above.

The last operation required to get our network functioning properly is an optimizer! Create a variable, `optimizer`, using the `tf.train.AdamOptimizer` with a `learning_rate` of `1e-4` to optimize the `cost` node.

In [None]:
conv1, weights1, bias1 = conv_w_pool_and_relu(X_images, name="conv1")
conv2, weights2, bias2 = conv_w_pool_and_relu(conv1, n_input_channels=10, name="conv2")
logits, y_pred = fc_layer(conv2, name="fc1")
cost, accuracy = cost_and_accuracy(logits, y_pred, y, name="cost_and_accuracy")
# Create optimizer below!
optimizer = ...

In the following cell, let's create summary operations to track some important stats as our model trains.

* Create histogram summaries for `weights1`, `bias1`, `weights2`, and `bias2`
* Create scalar summaries for `cost` and `accuracy`
* Use `tf.summary.merge_all` to create a master summary operation, and save the result to `summary_op`.
* Create a `train_writer` to write our summaries related to training to the directory `tensorboard/training`.
* Create a `validation_writer` to write summaries related to validation data to the directory `tensorboard/validation`.

Now we're ready to start a session!

* Create/run a global variable initializer
* Create a loop that will iterate through the number of epochs we are training for
    * Within that, create a loop to iterate through the number of batches per epoch
        * Use `data.train.next_batch(batch_size)` and save the resulting `X_batch` and `y_batch`
        * Create a feed dict for placeholders `X` and `y` - we will pass these placeholders `X_batch` and `y_batch`
        * Run the `optimizer` and `summary_op` nodes
        * Use the `train_writer` to write the resulting summary
    * After each epoch, run the `summary_op` node, passing a feed_dict that holds the validation data (`data.validation.images` and `data.validation.labels`)
    * Use the `validation_writer` to write the resulting summary
    * Use the saver to save the model to the directory `model`. Make sure to use a `global_step` corresponding to the current epoch

In [None]:
saver = tf.train.Saver()
n_epochs = 100
batch_size = 100

with tf.Session() as sess:
    pass