# Evaluate our Expert Model to classify and show a random MNIST image from the original test set

TensorFlow is a library for numerical computation,
here we build a deep convolutional MNIST classifier.

In [1]:
from tensorflow.examples.tutorials.mnist import input_data

The mnist class stores the training, validation, and testing sets as NumPy arrays. It also provides a function for iterating through data minibatches, 

In [2]:
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


All the computations are executed in a backend process. An iPython notebook takes on the role of front-end, the role of the front end is to allow us to write a driver script that creates the computational graph and then runs it in the backend. The common usage pattern is:
    1. create a graph
    2. launch it in a session.
The connection to TensoFlow C++ backend is called a session. 

But, here we will use InteractiveSession class instead, which allows you to interleave operations which build a computation graph with ones that run the graph. 

In [3]:
import tensorflow as tf
sess = tf.InteractiveSession()

## Build a softmax regression model with a single linear layer.

### Placeholdes (inputs)

In [4]:
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])

The tensors x and y_ are placeholders -- a value that we'll input when we ask TensorFlow to run a computation.
The shape argument to placeholder is optional, but it allows TensorFlow to automatically catch bugs stemming from inconsistent tensor shapes.

### Variables (intermediate results and outputs)

In [5]:
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

In [6]:
sess.run(tf.initialize_all_variables())

### Predicted Class and Cost Function

Multiply the vectorized input images x by the weight matrix W, add the bias b, and compute the softmax probabilities that are assigned to each class.

In [7]:
y = tf.nn.softmax(tf.matmul(x,W) + b)

In [8]:
cross_entropy = -tf.reduce_sum(y_*tf.log(y))

Note that reduce_sum is needed to sum up all the element-wise products of the arrays, these arrays are the size of a batch.

### Train the Model

In [9]:
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

Train the model, one minibatch at the time.
 - the batch size is chosen here (50)
 - feed__dict replaces the placeholder tensors x and y_ with the training examples.
 
 Note: feed_dict can replace any tensor in the computation graph -- it's not restricted to just placeholders.

In [10]:
for i in range(1000):
  batch = mnist.train.next_batch(50)
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})

### Evaluate the Model

tf.argmax gives the index of the highest entry in a tensor along some axis. For example, tf.argmax(y,1) is the label our model thinks is most likely for each input, while tf.argmax(y_,1) is the true label

In [11]:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

The idea is to determine what fraction of the probabilities match, the first op cast to floating point numbers and then take the mean. For example, [True, False, True, True] would become [1,0,1,1] which would become 0.75.

In [12]:
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

run out accuracy formula on the test data

In [13]:
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.9092


## Build a Multilayer Convolutional Network

### Weight Initialization

We're going to need to create a lot of weights and biases. 
- Weights are initialized with a small amount of noise for symmetry breaking. 
- Since we're using ReLU neurons, we chose a slightly positive initial bias to avoid "dead neurons." 

In [14]:
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

### Convolution and Pooling

The convolution uses a stride of one and are zero padded so that the output is the same size as the input

In [15]:
def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

The subsampling is max pooling over 2x2 blocks

In [16]:
def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

The convolutional will compute 32 features for each 5x5 patch. 

The weight tensor will have a shape of [5, 5, 1, 32]. 
The first two dimensions are the patch size, the next is the number of input channels, and the last is the number of output channels. 
We will also have a bias vector with a component for each output channel.

In [17]:
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

The inputs are images of 28x28 pixels flattened to 784 bits, in order to aply the convolutional scan with a scanning window (filter) of 5x5 pixels
we need to reshape the flattened array back to its original 2-d bitmap shape. The conv function expects a x in the form of a 4d tensor, with the second and third dimensions corresponding to image width and height, and the final dimension corresponding to the number of color channels.

In [18]:
x_image = tf.reshape(x, [-1,28,28,1])

We then convolve x_image with the weight tensor,

In [19]:
conv = conv2d(x_image, W_conv1)

Add the bias and apply ReLU nonlinearity

In [20]:
h_conv1 = tf.nn.relu(conv + b_conv1)
print tf.shape(h_conv1)

Tensor("Shape:0", shape=(4,), dtype=int32)


Subsampling (2x2 ---> 1x1), this means that the image has been reduced from 28x28 to 14x14.

In [21]:
h_pool1 = max_pool_2x2(h_conv1)
print tf.shape(h_pool1)

Tensor("Shape_1:0", shape=(4,), dtype=int32)


### Second Convolutional Layer
We stack several layers of this type. The second layer will have 64 features for each 5x5 patch.

The weight tensor will have a shape of [5, 5, 32, 64]. The first two dimensions are the patch size, the next is the number of input channels (we created 32 channels in the first convolutional layer), and the last is the number of output channels (now we increase it frpom 32 to 64). We will also have a bias vector with a component for each output channel.

In [22]:
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

### Densely Connected Layer
We add a fully-connected layer with 1024 neurons
The image size has been reduced to 7x7, and is 64 in depth.
To feed the mages into the fully connected layer, we need to flatten them, reshaping each image into a flat vector that is 7x7x64 in length. This give us a batch of vectors.

Once the data is in the proper format, we multiply by a weight matrix, the weigth matrix is actually post-multiplying the batch of image flat vectors, the weigth matrix has 7x7x64 rows (one for each pixel in the image) and 1024 columns (one for each neuron in this layer). After postmultiplying the image batch by this weight matrix, we get vectors of size 1024, each vectors represents how a given input image exites each of the 1024 neurons of this layer. To this vectors we add the bias vectors, wich is also 1024.
Then we apply a ReLU.

In [23]:
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_fc1_logits = tf.matmul(h_pool2_flat, W_fc1) + b_fc1

h_fc1 = tf.nn.relu(h_fc1_logits)


### Dropout
To reduce overfitting, we will apply dropout before the readout layer. We create a placeholder for the probability that a neuron's output is kept during dropout. This allows us to turn dropout on during training, and turn it off during testing. TensorFlow's tf.nn.dropout op automatically handles scaling neuron outputs in addition to masking them, so dropout just works without any additional scaling.

In [24]:
# keep_prob is a boolean vars will be specified when the graph is run
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

### Readout Layer
Finally, we add a softmax layer.
Notice that the weight matrix converts from 1024 inputs to 10 outputs, that is, the matrix has 1024 rows and 10 columns, and it post-multiplies the batch of arrays, each of 1024 numbers that comes from the previous layer.

In [25]:
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)


## Train and Evaluate the Model
How well does this model do? To train and evaluate it we will use code that is nearly identical to that for the simple one layer SoftMax network above. The differences are that: we will replace the steepest gradient descent optimizer with the more sophisticated ADAM optimizer; we will include the additional parameter keep_prob in feed_dict to control the dropout rate; and we will add logging to every 100th iteration in the training process.

In the training loop, the saver.save() method will periodically be called to write a checkpoint file to the training directory with the current values of all the trainable variables.


In [26]:
saver = tf.train.Saver()

In [29]:
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.initialize_all_variables())
for i in range(200):
  batch = mnist.train.next_batch(50)
  if i%20 == 0:
    train_accuracy = accuracy.eval(feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0})
    print("step %d, training accuracy %g"%(i, train_accuracy))
  if i == 100:
    saver.save(sess, "train/mnist-red", global_step=i)
    
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print("test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))


step 0, training accuracy 0.1
step 20, training accuracy 0.44
step 40, training accuracy 0.78
step 60, training accuracy 0.74
step 80, training accuracy 0.86
step 100, training accuracy 0.88
step 120, training accuracy 0.92
step 140, training accuracy 0.94
step 160, training accuracy 0.84
step 180, training accuracy 0.96
test accuracy 0.9115


The final test set accuracy after running this code should be approximately 99.2%.
I can observe 0.91 after 200 epochs.

### Restore all session variables
Restore all of session's tensors back to a previous checkpoint.

Note that this doesn't changes any o the current Python 

In [31]:
saver.restore(sess, "train/mnist-red-100")

print("test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

print i

test accuracy 0.8629
199
