### Here I will construct a deep convolutional MNIST classifier.

- Create a softmax regression function that is a model for recognizing MNIST digits, based on looking at every pixel in the image
- Use Tensorflow to train the model to recognize digits by having it "look" at thousands of examples (and run our first Tensorflow session to do so)
- Check the model's accuracy with our test data
- Build, train, and test a multilayer convolutional neural network to improve the results

In [30]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


- mnist is a lightweight class which stores the training, validation, and testing sets as NumPy arrays. It also provides a function for iterating through data minibatches, which we will use below.

In [31]:
import tensorflow as tf
sess = tf.InteractiveSession()

- In this section we will build a softmax regression model with a single linear layer. In the next section, we will extend this to the case of softmax regression with a multilayer convolutional network.

### Softmax Regression Model
Placeholders

In [32]:
# We start building the computation graph by creating nodes for the input images and target output classes
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
#Here x and y_ aren't specific values. Rather, they are each a placeholder 
#-- a value that we'll input when we ask TensorFlow to run a computation

#The target output classes y_ will also consist of a 2d tensor, 
#where each row is a one-hot 10-dimensional vector indicating which digit class (zero through nine) the corresponding MNIST image belongs to

#The shape argument to placeholder is optional, but it allows TensorFlow to automatically catch bugs 
#stemming from inconsistent tensor shapes.

### Variables

In [33]:
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
#We pass the initial value for each parameter in the call to tf.Variable. 
#In this case, we initialize both W and b as tensors full of zeros. 
#W is a 784x10 matrix (because we have 784 input features and 10 outputs) and b is a 10-dimensional vector (because we have 10 classes).

In [34]:
sess.run(tf.global_variables_initializer())
#Before Variables can be used within a session, they must be initialized using that session. 
#This step takes the initial values (in this case tensors full of zeros) that have already been specified, and assigns them to each Variable. 
#This can be done for all Variables at once:

### Predicted Class and Loss Function
- we implement our regression model. 
- It only takes one line! We multiply the vectorized input images x by the weight matrix W, add the bias b.

In [35]:
y = tf.matmul(x,W) + b
#Here, our loss function is the cross-entropy between the target and the softmax activation function 
#applied to the model's prediction.

In [36]:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
#tf.nn.softmax_cross_entropy_with_logits internally applies the softmax on the model's unnormalized model prediction and sums across all classes, and tf.reduce_mean takes the average over these sums.

### Train the Model

In [37]:
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
#What TensorFlow actually did here in this single line was to add new operations to the computation graph. 
#These operations included ones to compute gradients, compute parameter update steps, and apply update steps to the parameters.
#The returned operation train_step, when run, will apply the gradient descent updates to the parameters. 
#Training the model can therefore be accomplished by repeatedly running train_step.

In [39]:
for _ in range(1000):
    batch = mnist.train.next_batch(100)
    train_step.run(feed_dict={x: batch[0], y_: batch[1]})
#We load 100 training examples in each training iteration. 
#We then run the train_step operation, using feed_dict to replace the placeholder tensors x and y_ with the training examples. 
#Note that you can replace any tensor in your computation graph using feed_dict -- it's not restricted to just placeholders.

### Evaluate the Model

In [40]:
#First we'll figure out where we predicted the correct label. 
#tf.argmax is an extremely useful function which gives you the index of the highest entry in a tensor along some axis. 
#For example, tf.argmax(y,1) is the label our model thinks is most likely for each input, 
#while tf.argmax(y_,1) is the true label. We can use tf.equal to check if our prediction matches the truth.
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

In [42]:
#That gives us a list of booleans. 
#To determine what fraction are correct, we cast to floating point numbers and then take the mean. 
#For example, [True, False, True, True] would become [1,0,1,1] which would become 0.75.
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

In [43]:
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.9201


### Multilayer Convolutional Network

In [47]:
#Weight Initialization
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)
#we're going to need to create a lot of weights and biases. 
#One should generally initialize weights with a small amount of noise for symmetry breaking, and to prevent 0 gradients
#Since we're using ReLU neurons, it is also good practice to initialize them with a slightly positive initial bias 
#to avoid "dead neurons". 
#Instead of doing this repeatedly while we build the model, let's create two handy functions to do it for us.

#### Convolution and Pooling
- TensorFlow also gives us a lot of flexibility in convolution and pooling operations. 
- How do we handle the boundaries? What is our stride size?
- Our convolutions uses a stride of one and are zero padded so that the output is the same size as the input.
- Our pooling is plain old max pooling over 2x2 blocks

In [48]:
def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')