In [1]:
from tensorflow.examples.tutorials.mnist import input_data

In [2]:
mnist = input_data.read_data_sets("MNIST_data/", one_hot = True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [3]:
import tensorflow as tf

In [4]:
x = tf.placeholder(tf.float32, [None, 784])

We also need the weights and biases for our model. A **Variable** is a modifiable tensor that lives in TensorFlow's graph of interacting operations. It can be used and even modified by the computation. For machine learning applications, one generally has the model parameters be **Variable**.

In [5]:
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

In [6]:
y = tf.nn.softmax(tf.matmul(x, W) + b)

# Training

## Cross-entropy

To implement cross-entropy we need to first add a new placeholder to input the correct answers:

In [7]:
y_ = tf.placeholder(tf.float32, [None, 10])

In [8]:
cross_entropy = -tf.reduce_sum(y_ * tf.log(y))

First tf.log computes the logrithm of each element y. 
Next, we multiply each element of y_ with the corresponding element of 
tf.reduce_sum adds all the elements of the tensor.

Note that this isn't just the cross-entropy of the truth with a single prediction, but the sum of the cross-entropies for all the images we looked at. In this example, we have 100 images in each batch: how well we are doing on 100 data points is a much better description of how good our model is than a single data point.

Now that we know what we want our model to do, it's very easy to have TensorFlow train it to do so. Because TensorFlow knows the entire graph of your computations, it can automatically use backpropagation algorithm to efficiently determine how your variables affect the cost you ask it minimize. Then it can apply your choice of optimization algorithm to modify the variables and reduce the cost.

In [9]:
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

In this case, we ask TensorFlow to minimize `cross_entropy` using the **gradient descent algorithm** with a learning rate of 0.01. Gradient descent is a simple procedure, where TensorFlow simply shifts each variable a little bit in the direction that reduces the cost. But TensorFlow also provides many other optimization algorithms: using one is as simple as tweaking one line.

What TensorFlow actually does here, behind the scenes, is it adds new operations to your graph which implment backpropagation and gradient descent. 

Then it gives you back a single operation which when run, will do a step of gradient descent, slightly tweaking your variables to reduce the cost.

Now we have our model set up to train. One last thing before we launch it, we have to add an operation to initialize the variables we created:

In [10]:
init = tf.initialize_all_variables()

We can now launch the model in a `Session`, and run the operation that initializes the variables:

In [11]:
sess = tf.Session()
sess.run(init)

Let's train -- we'll run the training step 1000 times!

In [12]:
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_:batch_ys})

Each step of the loop,

* We get a "batch" of 100 random data points from our training set.
* We run `train_step`, feeding in the batch's data to replace the `placeholders`.


Using small batches of random data is called *stochastic training* -- in this case, stochastic gradient descent. Ideally, we'd like to use all our data for every step of training because that would give us a better sense of what we should be doing, but that's expensive. So, instead, we use a subset every time. Doing this is cheap and has much of the same benefit.

# Evaluating our model

How well does our model do?

`tf.arg_max` - gives you the index of the highest entry in a tensor along some axis. For example, `tf.arg_max(y, 1)` is the label our model thinks is most likely for each input, while `tf.argmax(y_,1)` is the correct label. We can use `tf.equal` to check if our prediction matches the truth.

In [13]:
correct_prediction = tf.equal(tf.arg_max(y, 1), tf.arg_max(y_, 1))

That gives us a list of booleans. To determine what fraction are correct, we cast floating point numbers and then take the mean. For example, `[True, False, True, True]` would become `[1,0,1,1]` which would become `0.75`.

In [14]:
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

Finally, we ask for our accuracy on our test data.

In [15]:
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.9172
