## Deep MNIST for Experts

Tutorial from TensorFlow called [DeepMNIST for Experts](https://www.tensorflow.org/get_started/mnist/pros)

* Create a softmax regression function that is a model for recognizing MNIST digits, based on looking at every pixel in the image
* Use Tensorflow to train the model to recognize digits by having it "look" at thousands of examples (and run our first Tensorflow session to do so)
* Check the model's accuracy with our test data
* Build, train, and test a multilayer convolutional neural network to improve the results

## Load MNIST Data

Here mnist is a lightweight class which stores the training, validation, and testing sets as NumPy arrays. It also provides a function for iterating through data minibatches, which we will use below.

In [1]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


## Start TensorFlow InteractiveSession

In [2]:
import tensorflow as tf

# InteractiveSession class - which makes TensorFlow more flexible about how you structure your code. 
# It allows you to interleave operations which build a computation graph with ones that run the graph

sess = tf.InteractiveSession()

## Build a Softmax Regression Model

In this section we will build a softmax regression model with a single linear layer. In the next section, we will extend this to the case of softmax regression with a multilayer convolutional network.

In [3]:
# Here x and y_ aren't specific values. Rather, they are each a placeholder -- 
# a value that we'll input when we ask TensorFlow to run a computation.

x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])

The input images x will consist of a 2d tensor of floating point numbers. 

Here we assign it a shape of [None, 784] 
* Where 784 is the dimensionality of a single flattened 28 by 28 pixel MNIST image 
* None indicates that the first dimension, corresponding to the batch size, can be of any size. 

The target output classes y_ will also consist of a 2d tensor, where each row is a one-hot 10-dimensional vector indicating which digit class (zero through nine) the corresponding MNIST image belongs to.

The shape argument to placeholder is optional, but it allows TensorFlow to automatically catch bugs stemming from inconsistent tensor shapes.



## Variables

We now define the weights W and biases b for our model.

A Variable is a value that lives in TensorFlow's computation graph. It can be used and even modified by the computation.

In [4]:
W = tf.Variable(tf.zeros([784, 10])) #784 input features and 10 outputs
b = tf.Variable(tf.zeros([10])) #10 classes

In [5]:
# Initialize Variables within session

sess.run(tf.global_variables_initializer())

## Predicted Class and Loss Function

In [6]:
# We multiply the vectorized input images x by the weight matrix W, add the bias b.

y = tf.matmul(x,W) + b

In [7]:
# Loss function is the cross-entropy between the target and the softmax activation 
# function applied to the model's prediction

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

## Train the Model

In [8]:
# Use steepest gradient descent, with a step length of 0.5, to descend the cross entropy.

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

Previous line added new operations to the computation graph. 
* Compute gradients 
* Compute parameter update steps
* Apply update steps to the parameters

The returned operation train_step, when run, will apply the gradient descent updates to the parameters. Training the model can therefore be accomplished by repeatedly running train_step.

In [9]:
#repeatedly run train_step and load 1000 training examples into each iteration

for _ in range(1000):
    batch = mnist.train.next_batch(100)
    train_step.run(feed_dict={x: batch[0], y_: batch[1]})

## Evaluate the Model
How well did the model do?

In [10]:
# tf.argmax(y,1) is the label our model thinks is most likely for each input 
# tf.argmax(y_,1) is the true label
# tf.equal to checks if our prediction matches the truth

correct_prediction = tf.equal(tf.arg_max(y,1), tf.arg_max(y_,1))

In [11]:
# Cast booleans to floating point numbers and then take the mean

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

In [12]:
# evaluate our accuracy on the test data

print(accuracy.eval(feed_dict={x: mnist.test.images, y_:mnist.test.labels}))

0.9165


# Build a Multilayer Convolutional Network

Getting 92% accuracy on MNIST is bad. It's almost embarrassingly bad. In this section, we'll fix that, jumping from a very simple model to something moderately sophisticated: a small convolutional neural network. This will get us to around 99.2% accuracy -- not state of the art, but respectable.

## Weight Initialization

To create this model, we're going to need to create a lot of weights and biases. 

One should generally initialize weights with a small amount of noise for symmetry breaking, and to prevent 0 gradients. 

Since we're using ReLU neurons, it is also good practice to initialize them with a slightly positive initial bias to avoid "dead neurons". 

Instead of doing this repeatedly while we build the model, let's create two handy functions to do it for us.

In [13]:
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

## Convolution and Pooling

TensorFlow also gives us a lot of flexibility in convolution and pooling operations. 

* How do we handle the boundaries? 
* What is our stride size? 

In [14]:
# Use a stride of one and are zero padded so that the output is the same size as the input. 
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding= 'SAME')

# Pooling is plain old max pooling over 2x2 blocks. 
def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1,2,2,1],
                         strides=[1,2,2,1], padding= 'SAME')

## First Convolutional Layer

In [15]:
# Compute 32 features for each 5x5 patch
# The first two dimensions are the patch size, the next is the number of input channels, 
# The last is the number of output channels
W_conv1 = weight_variable([5, 5, 1, 32])

# bias vector with a component for each output channel
b_conv1 = bias_variable([32])

In [16]:
# To apply the layer, we first reshape x to a 4d tensor, with the second and third dimensions 
# corresponding to image width and height, and the final dimension corresponding to the number of color channels.

x_image =tf.reshape(x, [-1, 28, 28, 1])

In [17]:
# We then convolve x_image with the weight tensor, add the bias, apply the ReLU function, and finally max pool.
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)

# The max_pool_2x2 method will reduce the image size to 14x14.
h_pool1 = max_pool_2x2(h_conv1)

## Second Computational Layer

In [18]:
# In order to build a deep network, we stack several layers of this type. 
# The second layer will have 64 features for each 5x5 patch.

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2 + b_conv2))
h_pool2 = max_pool_2x2(h_conv2)

## Densely Connected Layer

In [19]:
# Add a fully-connected layer with 1024 neurons to process entire image
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

# Reshape the tensor from the pooling layer into a batch of vectors, multiply by weight matrix, add bias, apply relu
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

## Dropout

In [20]:
# Reduce overfitting by applying dropout before the readout layer

keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

## Readout Layer

In [21]:
# Add a layer

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

## Train and Evaluate the Model

In [None]:
cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))

train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  for i in range(20000):
    batch = mnist.train.next_batch(50)
    if i % 100 == 0:
      train_accuracy = accuracy.eval(feed_dict={
          x: batch[0], y_: batch[1], keep_prob: 1.0})
      print('step %d, training accuracy %g' % (i, train_accuracy))
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

  print('test accuracy %g' % accuracy.eval(feed_dict={
      x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))