The MNIST data is hosted on Yann LeCun's website. If you are copying and pasting in the code from this tutorial, start here with these two lines of code which will download and read in the data automatically:

In [2]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("../MNIST_data/", one_hot=True) 

Extracting ../MNIST_data/train-images-idx3-ubyte.gz


Extracting ../MNIST_data/train-labels-idx1-ubyte.gz
Extracting ../MNIST_data/t10k-images-idx3-ubyte.gz
Extracting ../MNIST_data/t10k-labels-idx1-ubyte.gz


Import tensorflow library:

In [3]:
import tensorflow as tf

Parameters

In [4]:
learning_rate = 0.001
traning_epochs = 25
batch_size = 100
display_step = 1

num_input = 784  # MNIST data input (img shape: 28*28)
num_classes = 10  # MNIST total classes (0-9 digits)

We describe these interacting operations by manipulating symbolic variables. Let's create one:

In [5]:
x = tf.placeholder(tf.float32, [None, num_input])
y = tf.placeholder(tf.float32, [None, num_classes])

x isn't a specific value. It's a placeholder, a value that we'll input when we ask TensorFlow to run a computation. We want to be able to input any number of MNIST images, each flattened into a 784-dimensional vector. We represent this as a 2-D tensor of floating-point numbers, with a shape [None, 784]. (Here None means that a dimension can be of any length.)
We also need the weights and biases for our model. We could imagine treating these like additional inputs, but TensorFlow has an even better way to handle it: Variable. A Variable is a modifiable tensor that lives in TensorFlow's graph of interacting operations. It can be used and even modified by the computation. For machine learning applications, one generally has the model parameters be Variables.


In [6]:
W = tf.Variable(tf.zeros([num_input, num_classes]))
b = tf.Variable(tf.zeros([num_classes]))

We create these Variables by giving tf.Variable the initial value of the Variable: in this case, we initialize both W and b as tensors full of zeros. Since we are going to learn W and b, it doesn't matter very much what they initially are.

Notice that W has a shape of [784, 10] because we want to multiply the 784-dimensional image vectors by it to produce 10-dimensional vectors of evidence for the difference classes. b has a shape of [10] so we can add it to the output.

We can now implement our model. It only takes one line to define it!

In [7]:
activation = tf.nn.softmax(tf.matmul(x, W) + b)

In order to train our model, we need to define what it means for the model to be good. Well, actually, in machine learning we typically define what it means for a model to be bad. We call this the cost, or the loss, and it represents how far off our model is from our desired outcome. We try to minimize that error, and the smaller the error margin, the better our model is.

To implement cross-entropy we need to first add a new placeholder to input the correct answers:

In [8]:
cost = tf.reduce_mean(-tf.reduce_sum(y * tf.log(activation), reduction_indices=1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

Now that we know what we want our model to do, it's very easy to have TensorFlow train it to do so. Because TensorFlow knows the entire graph of your computations, it can automatically use the backpropagation algorithm to efficiently determine how your variables affect the loss you ask it to minimize. Then it can apply your choice of optimization algorithm to modify the variables and reduce the loss.

In [9]:
# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()
sess = tf.Session()

In this case, we ask TensorFlow to minimize cross_entropy using the gradient descent algorithm with a learning rate of 0.5. Gradient descent is a simple procedure, where TensorFlow simply shifts each variable a little bit in the direction that reduces the cost. But TensorFlow also provides many other optimization algorithms: using one is as simple as tweaking one line.

What TensorFlow actually does here, behind the scenes, is to add new operations to your graph which implement backpropagation and gradient descent. Then it gives you back a single operation which, when run, does a step of gradient descent training, slightly tweaking your variables to reduce the loss.

We can now launch the model in an InteractiveSession:

In [10]:
ckpt_path = "model/model_bin.ckpt"
saver = tf.train.Saver()

In [11]:
# Run the initializer
sess.run(init)

for epoch in range(traning_epochs):
    avg_cost = 0.
    total_batch = int(mnist.train.num_examples/batch_size)
    
    for i in range(total_batch):        
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        # Run optimization op (backprop)
        sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys})
        avg_cost += sess.run(cost, feed_dict={x: batch_xs, y: batch_ys}) / total_batch        
    
    if epoch % display_step == 0:
        # Calculate batch loss and accuracy
        print("Epochs= {}, Cost= {:.7f}".format(epoch + 1, avg_cost))
        saver.save(sess, ckpt_path)

print("Optimization Finished!")

Epochs= 1, Cost= 2.0453071


Epochs= 2, Cost= 1.6545017


Epochs= 3, Cost= 1.3944961


Epochs= 4, Cost= 1.2171844


Epochs= 5, Cost= 1.0914223


Epochs= 6, Cost= 0.9984175


Epochs= 7, Cost= 0.9271074


Epochs= 8, Cost= 0.8707204


Epochs= 9, Cost= 0.8249756


Epochs= 10, Cost= 0.7870945


Epochs= 11, Cost= 0.7551323


Epochs= 12, Cost= 0.7278076


Epochs= 13, Cost= 0.7041127


Epochs= 14, Cost= 0.6833417


Epochs= 15, Cost= 0.6649725


Epochs= 16, Cost= 0.6485916


Epochs= 17, Cost= 0.6338693


Epochs= 18, Cost= 0.6205691


Epochs= 19, Cost= 0.6084738


Epochs= 20, Cost= 0.5974181


Epochs= 21, Cost= 0.5872606


Epochs= 22, Cost= 0.5778929


Epochs= 23, Cost= 0.5692398


Epochs= 24, Cost= 0.5611885


Epochs= 25, Cost= 0.5536907
Optimization Finished!


Figure out where we predicted the correct label. tf.argmax is an extremely useful function which gives you the index of the highest entry in a tensor along some axis. For example, tf.argmax(y,1) is the label our model thinks is most likely for each input, while tf.argmax(y_,1) is the correct label. We can use tf.equal to check if our prediction matches the truth.

In [12]:
correct_prediction = tf.equal(tf.argmax(activation, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels}))