We are following the TensorFlow tutorial to use softmax regression to classify MNIST digits. First we load the dataset.

In [1]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


The input image will be a vector $\vec{\mathbf{x}}$ with $784=28\times 28$ entries, and our predictive model is as follows:
After first applying some linear transformation to input data: $\vec{\mathbf{x}}\to y=W\vec{\mathbf{x}}+b$, we will then apply the softmax function to predict the probability 
$$p_i=\operatorname{Softmax}(\vec{\mathbf{y}})_i:=\frac{\exp(y_i)}{\sum_j \exp(y_j)}$$
that the vector $\vec{\mathbf{x}}$ is digit $i$. Our challenge will be to determine $W$ and $b$, and we will use TensorFlow to do this.

In [3]:
import tensorflow as tf

We create a placeholder for $\vec{\mathbf{x}}$ (we don't need to remember it), and variables for $W$ and $b$ (we want them to remember their states).

In [6]:
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

Now, it's time to implement our regression model:

In [7]:
y = tf.nn.softmax(tf.matmul(x, W) + b)

We will train our model to determine the optimal values of W and b. To do this, we need to define what we mean by "optimal". We will use the following cross entropy error funtion:
$$H_{y'}(y)=\sum_i y'_i\log(y_i)$$
First, we create a new placeholder in which to hold the correct answers:

In [8]:
y_=tf.placeholder(tf.float32,[None,10])

Then we define the cross entropy op.

In [9]:
cross_entropy=-tf.reduce_mean(tf.reduce_sum(y_*tf.log(y),reduction_indices=[1]))

Finally, we create a new op which will use gradient descent to update the parameters $W$ and $b$

In [10]:
train_step=tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

Next, we initialize all the variables

In [11]:
init = tf.initialize_all_variables()

In [12]:
sess=tf.InteractiveSession()
sess.run(init)

Finally, let's run it! Let's start with 100 iterations, and use a batch size of 10 (since I have a slow CPU).

In [14]:
for i in range(100):
    batch_xs, batch_ys = mnist.train.next_batch(10)
    sess.run(train_step,feed_dict={x: batch_xs, y_: batch_ys})

Finally, let's evaluate our model:

In [15]:
correct_prediction=tf.equal(tf.argmax(y_,1),tf.argmax(y,1))

In [17]:
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

Now let's evaluate the accuracy:

In [18]:
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.8306
