## MNIST

MNIST is a simple computer vision dataset. It consists of images of handwritten digits. Every MNIST data point has two parts: an image of a handwritten digit and a corresponding label. We will call the images "xs" and the labels "ys". Both the training set and test set contain xs and ys, for example the training images are mnist.train.images and the train labels are mnist.train.labels.

In [1]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


Each image is 28 pixels by 28 pixels. We can interpret this as a big array of numbers:

We can flatten this array into a vector of 28x28 = 784 numbers. It doesn't matter how we flatten the array, as long as we're consistent between images. From this perspective, the MNIST images are just a bunch of points in a 784-dimensional vector space, with a very rich structure (warning: computationally intensive visualizations).

Flattening the data throws away information about the 2D structure of the image. Isn't that bad? Well, the best computer vision methods do exploit this structure, and we will in later tutorials. But the simple method we will be using here, a softmax regression, won't.

The result is that mnist.train.images is a tensor (an n-dimensional array) with a shape of [55000, 784]. The first dimension indexes the images and the second dimension indexes the pixels in each image. Each entry in the tensor is the pixel intensity between 0 and 1, for a particular pixel in a particular image.

we're going to want our labels as "one-hot vectors". Consequently, mnist.train.labels is a [55000, 10] array of floats.

We also add some extra evidence called a bias. Basically, we want to be able to say that some things are more likely independent of the input.

To do efficient numerical computing in Python, we typically use libraries like NumPy that do expensive operations such as matrix multiplication outside Python, using highly efficient code implemented in another language. Unfortunately, there can still be a lot of overhead from switching back to Python every operation. This overhead is especially bad if you want to run computations on GPUs or in a distributed manner, where there can be a high cost to transferring data.

TensorFlow also does its heavy lifting outside python, but it takes things a step further to avoid this overhead. Instead of running a single expensive operation independently from Python, TensorFlow lets us describe a graph of interacting operations that run entirely outside Python. (Approaches like this can be seen in a few machine learning libraries.)

In [2]:
# dimension the tensors
# There are 2 types, (1) input and (2) computed (or output)
import tensorflow as tf
# input 
x = tf.placeholder(tf.float32, [None, 784])
# computed
W = tf.Variable(tf.zeros([784, 10]), name="W")
b = tf.Variable(tf.zeros([10]), name="b")

Notice that W has a shape of [784, 10] because we want to multiply the 784-dimensional image vectors by it to produce 10-dimensional vectors of evidence for the difference classes. This should work for any number of input image vectors.

In [3]:
scores = tf.matmul(x, W) + b
y = tf.nn.softmax(scores)

This is flipped from when we multiplied them in our equation, where we had Wx, as a small trick to deal with x being a 2D tensor with multiple inputs. 

### Training



The cross-entropy is measuring how inefficient our predictions are for describing the truth

In [4]:
# this is a placeholder in the formula to enter the correct answers
# at runtime
y_ = tf.placeholder(tf.float32, [None, 10])   # y_ will contain the input labels, None is going to be N (size of data)

cross_entropy = -tf.reduce_sum(y_ * tf.log(y))

Note that this isn't just the cross-entropy of the truth with a single prediction, but the sum of the cross-entropies for all the images we looked at. In this example, we have 100 images in each batch: how well we are doing on 100 data points is a much better description of how good our model is than a single data point.

In [5]:
train_step = tf.train.GradientDescentOptimizer(0.005).minimize(cross_entropy)

In this case, we ask TensorFlow to minimize cross_entropy using the gradient descent algorithm with a learning rate of 0.01.

In [6]:
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

Train 1000 times. Each step of the loop, we get a "batch" of one hundred random data points from our training set. We run train_step feeding in the batches data to replace the placeholders.

In [7]:
for i in range(2000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

## Evaluating 

tf.argmax(y,1) is the label our model thinks is most likely for each input, while tf.argmax(y_,1) is the correct label. 

In [8]:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

That gives us a list of booleans. To determine what fraction are correct, we cast to floating point numbers and then take the mean. For example, [True, False, True, True] would become [1,0,1,1] which would become 0.75.

In [9]:
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

#### note the feed_dict clause to input the test values.
To understand how this gets computed, it is useful to imagine the computation DAG: accuracy, defined in Instruction [13], needs y and y_, it feeds y_ into In[4] and to get y it should feed x into In[3], which turns x into y, but then it jumps to In[12], kipping all the training part of the DAG (5 to 11) because those nodes are not in the computational path of formula [13] (accuracy)

In [10]:
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.9181


The best models can get to over 99.7% accuracy!


## Saving the Model

In [11]:
modelpath = "model/train/"
modelname = 'mnist-regr'

saver = tf.train.Saver()
saver.save(sess, modelpath + modelname, global_step=i+1)

'model/train/mnist-regr-2000'