# MNIST For ML Beginners
Copied from https://www.tensorflow.org/versions/0.6.0/tutorials/mnist/beginners/index.html

In [2]:
# Import helper functions to download and extract the dataset
import tensorflow.examples.tutorials.mnist.input_data as input_data

# Download and extract the dataset
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


## The dataset

The dataset consists of images of handwritten digits and a label indicating what digit it is.

### Images

The set is devided into three parts:

1. mnist.train: 55,000 points of training data
2. mnist.test: 10,000 points of test data
3. mnist.validation: 5,000 points of validation data

The images are 28x28 pixels, but are represented as a single line of 784 pixels. these lines are represented in a tensor (n-dimensional array) with the shape [55000, 784].

 - The first dimension indexes the images
 - The second dimension indexes the pixels
 - The pixels have a intensity between 0 and 1. (1 is black)

### Labels

The labels are numbers between 0 and 9.

"One-hot-vectors" will be used to represent the labels. These are of shape [55000, 10] and contains floats

The idea is that only one dimension (the one which represents the corresponding number) is one at the same time.

Note!
    :Dimension above refers to each index 0 to 9 in the label vector. Think of these vectors as vectors in a ten dimensional room where each value represents the length in a specific dimension


## Softmax Regressions

Every image in the MNIST dataset is a digit. We want the model to give us probabilities (how confident it is) of how likely the image is a certain digit.

A softmax regression is performed in two steps:

1. The evidence of the input being in certain classes is added up
2. The evidence is converted to probabilities

The evidence in this case is a weighted sum of the pizel intensities. That is:

 - The weight is negative if the pixels intensity is evidence against the image beeing in that class
 - The weight is positive of the pizels intensity is evedence for the image beeing in that class

The weights for all the pizels in the image are then summed to form the evidence

$$\mathrm{evidence}_i = \sum_j W_{i,j}x_j+b_i$$



## Implementing the Regression

TensorFlow defines a graph of interacting operations that run entierly outside Python. This is to avoid the overhead of switching back and forth between python and whatever framework you are using.

In [3]:
import tensorflow as tf

# Create a placeholder for the input data. Note that this placeholder does not yet contain a value
# (it just contains a description of how such a value should be created?)
# The first dimension is 'None' as it can be of any length
x = tf.placeholder(tf.float32, [None, 784])

## Variables
A `Variable` is a modifiable tensor that lives in TensorFlow's graph of interacting operations. It can be used and even modified by the computation.

The weights and biases will be implemented using `Variables`.

Note that the Variables below does not yet contain any valuable data, we just defined thier shape

In [7]:
# Weight vector  
W = tf.Variable(tf.zeros([784, 10]))
# Bias vector
b = tf.Variable(tf.zeros([10]))

## Implementing the model

We must now define how the values that we defined relates to each other

The model that we are using (neural network with softmax regression) is already defined in tensorflow

In [8]:
y = tf.nn.softmax(tf.matmul(x, W) + b)

## Training

When training a model one must first decide what it means to be "good". Or more commonly in machine learning, what it means to be bad.

This is done by implementing a mathematical function called the "cost" or "loss", which is then minimized.


In [9]:
# Placeholder for the correct answers
y_ = tf.placeholder(tf.float32, [None, 10])

# Cross entropy
cross_entropy = -tf.reduce_sum(y_*tf.log(y))

# Training step
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

In [10]:
# Training
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

for i in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

In [12]:
# Evaluation
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print (sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.9148
