#  Using TensorFlow to train a neural network to recognize digits
from the MNIST digit dataset (The MNIST data is hosted on Yann LeCun's website.)

#### Importing the data


In [36]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


#### About the data:
- *The MNIST data is split into three parts: 55,000 data points of training data (mnist.train), 10,000 points of test data (mnist.test), and 5,000 points of validation data (mnist.validation). This split is very important: it's essential in machine learning that we have separate data which we don't learn from so that we can make sure that what we've learned actually generalizes!
- As mentioned earlier, every MNIST data point has two parts: an image of a handwritten digit and a corresponding label. We'll call the images "x" and the labels "y". Both the training set and test set contain images and their corresponding labels; for example the training images are mnist.train.images and the training labels are mnist.train.labels.*

- Each image is 28 pixels by 28 pixels. We can interpret this as a big array of numbers:

##### We can flatten this array into a vector of 28x28 = 784 numbers.
- The result is that mnist.train.images is a tensor (an n-dimensional array) with a shape of [55000, 784]. The first dimension is an index into the list of images and the second dimension is the index for each pixel in each image. Each entry in the tensor is a pixel intensity between 0 and 1, for a particular pixel in a particular image.

##### Each image in MNIST has a corresponding label, a number between 0 and 9 representing the digit drawn in the image.

The digit 3 would be [0,0,0,1,0,0,0,0,0,0]. Consequently, mnist.train.labels is a [55000, 10] array of floats.

### Making the model

In [37]:
import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))


Softmax regression is a natural, simple model. If you want to assign probabilities to an object being one of several different things, softmax is the thing to do, because softmax gives us a list of values between 0 and 1 that add up to 1.

$ \text{softmax}(x)_i = \frac{\exp(x_i)}{\sum_j \exp(x_j)} $ 

$ y = \text{softmax}(\sum_j W_{i,~ j} x_j + b_i) $ 

In [38]:
y = tf.nn.softmax(tf.matmul(x, W) + b)


### Training

Loss represents how far off our model is from our desired outcome. We try to minimize that error, and the smaller the error margin, the better our model is. One very common function to determine the loss of a model is called "cross-entropy." 

$ H_{y'}(y) = -\sum_i y'_i \log(y_i) $

To implement cross-entropy we need to first add a new placeholder to input the correct(true) answers(labels):

In [39]:
y_ = tf.placeholder(tf.float32, [None, 10])

Then we can implement the cross-entropy function:  $ -\sum y'\log(y) $

In [40]:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

We can ask TensorFlow to minimize cross_entropy using the gradient descent algorithm with a learning rate of 0.5

In [41]:
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)


We can now **launch the model** in an InteractiveSession and  create an operation to initialize the variables we created:

In [42]:
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

**Train** We'll run the training step 1000 times!
- Each step of the loop, we get a "batch" of one hundred random data points from our training set. We run train_step feeding in the batches data to replace the placeholders.
- Using small batches of random data is called stochastic training. Using all our data for every step of training is computationaly expensive.


In [51]:
for _ in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

### Evaluating the Model

tf.argmax is an extremely useful function which gives you the index of the highest entry in a tensor along some axis. For example, tf.argmax(y,1) is the label our model thinks is most likely for each input, while tf.argmax(y_,1) is the correct label. We can use tf.equal to check if our prediction matches the truth.

In [53]:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("Accuracy: ", sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))


Accuracy:  0.9235
