## MNIST for ML Beginners
This is the MNIST for ML Beginners using TensorFlow with Databricks and TensorFrames

## Cluster set-up

TensorFrames is available as a Spark Package. To use it on your cluster, create a new library with the Source option "Maven Coordinate", using "Search Spark Packages and Maven Central" to find "spark-deep-learning". Then [attach the library to a cluster](https://docs.databricks.com/user-guide/libraries.html). To run this notebook, also create and attach the following libraries: 
* via PyPI: tensorflow
* via Spark Packages: tensorframes

The latest version of TensorFrames is compatible with Spark versions 2.0 or higher and works with any instance type (CPU or GPU).

### MNIST for ML Beginners
These set of cells are based on the TensorFlow's [MNIST for ML Beginners](https://www.tensorflow.org/versions/r0.9/tutorials/mnist/beginners/index.html). 

The purpose of this notebook is to use TensorFrames and Neural Networks to **automate the identification of handwritten digits** from the  [MNIST Database of Handwritten Digits](http://yann.lecun.com/exdb/mnist/) database. The source of these handwritten digits is from the National Institute of Standards and Technology (NIST) Special Database 3 (Census Bureau employees) and Special Database 1 (high-school students).

![](https://www.tensorflow.org/versions/r0.9/images/MNIST.png)

### Import the Dataset
The MNIST dataset is comprised of:
* `mnist.train`: 55,000 data points of training data 
* `mnist.test`: 10,000 points of test data
* `mnist.validation`: 5,000 points of validation data

In [5]:
# Import MNIST digit images data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)


### What is the image?
Within this dataset, this 28px x 28px 3D structure has been flattened into an array of size 784. For this tutorial, we're using a simple algorithm - `softmax regression` which does not actually make use of the 3D structure so we do not lose any information by flattening it 2D. The `.images` contains the [x,784] matrix of representing the digits while the `.labels` contain the `One-Hot Vector` representing the actual number.

For example, `mnist.train.images[25138,:]` is the array of 784 digits for the handwritten digit number `9` as indicated in `mnist.train.labels[25138,:]`.

In [7]:
# One-Hot Vector for xs = 25138 representing the number 9 
#  The nth-digit will be represented as a vector which is 1 in the nth dimensions. 
mnist.train.labels[25138,:]

In [8]:
# This is the extracted array for xs = 25138 from the training matrix
mnist.train.images[25138,:]

But because the output is 5 columns, its really hard to see that is the number **9**.  

If you were to take this 11,281 x 5 matrix and convert it back to a 28 x 28 matrix and add a color-scale (the higher the number, the darker the value), you will get this matrix:

![](https://dennyglee.files.wordpress.com/2016/06/unflattened-digit-9-small.png)

Here, you can access the [full-size version](https://dennyglee.files.wordpress.com/2016/06/unflattened-digit-9-full.png) of this image.

## Digit Prediction

For those notebook on MNIST digit prediction, we will use the Softmax Regressions model. 
For more information, please reference the [softmax regression](https://www.tensorflow.org/versions/r0.9/tutorials/mnist/beginners/index.html#softmax-regressions) analysis.

We know that every image in MNIST is of a handwritten digit between zero and nine. So there are only ten possible things that a given image can be. We want to be able to look at an image and give the probabilities for it being each digit. For example, our model might look at a picture of a nine and be 80% sure it's a nine, but give a 5% chance to it being an eight (because of the top loop) and a bit of probability to all the others because it isn't 100% sure.

This is a classic case where a softmax regression is a natural, simple model. Softmax regression is a generalization of logistic regression to the case where we want to handle multiple classes. If you want to assign probabilities to an object being one of several different things, softmax is the thing to do, because softmax gives us a list of values between 0 and 1 that add up to 1. Even later on, when we train more sophisticated models, the final step will be a layer of softmax.  

A softmax regression has two steps: first we add up the evidence of our input being in certain classes, and then we convert that evidence into probabilities.

To tally up the evidence that a given image is in a particular class, we do a weighted sum of the pixel intensities. The weight is negative if that pixel having a high intensity is evidence against the image being in that class, and positive if it is evidence in favor.
![](https://raw.githubusercontent.com/joyq2016/demo/master/softmaxtext.png)

But it's often more helpful to think of softmax the first way: exponentiating its inputs and then normalizing them. The exponentiation means that one more unit of evidence increases the weight given to any hypothesis multiplicatively. And conversely, having one less unit of evidence means that a hypothesis gets a fraction of its earlier weight. No hypothesis ever has zero or negative weight. Softmax then normalizes these weights, so that they add up to one, forming a valid probability distribution.

Source: https://www.tensorflow.org/get_started/mnist/beginners

####Let's look at an example below:





![](https://raw.githubusercontent.com/joyq2016/demo/master/softmax.png)

Source: https://cs231n.github.io/linear-classify/


More information about how Softmax Regression works, good references include:
- http://www.kdnuggets.com/2016/07/softmax-regression-related-logistic-regression.html
- https://www.tensorflow.org/get_started/mnist/beginners
- Deep dive: http://neuralnetworksanddeeplearning.com/chap3.html#softmax

#### Implementing Softmax Regressions model

In [13]:
# Import TensorFlow
import tensorflow as tf

# Create `x` placeholder
#   Place any any number of MNIST images, each flattened into a 784-dimensional vector
#   This is represented  as a 2-D tensor of floating-point numbers, with a shape [None, 784]
x = tf.placeholder(tf.float32, [None, 784])

# Set the weights (`W`) and biases (`b`) for our model
#   Use a Variable - a modificable tensor that lives in TensorFlow's graph of interacting operations
#   We initalize them them with `zeros`
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

# Implement Softmax Regressions model
y = tf.nn.softmax(tf.matmul(x, W) + b)

#### Training the model
Use the `cross-entropy` cost function to define what it means for the model to be good.  For more information, please reference [MNIST for Beginners > Training](https://www.tensorflow.org/versions/r0.9/tutorials/mnist/beginners/index.html#training)

In [15]:
# Create `y_` placeholder Variable to input correct answers
y_ = tf.placeholder(tf.float32, [None, 10])

# Implement the `cross-entopy` cost function
#cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

# Traing using back-propagation (gradient descent optimizer)
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# Launch the model
sess = tf.InteractiveSession()

# Initialize the variables
tf.global_variables_initializer().run()

# Let's train -- running the training step 1000 times
#for _ in range(1000):
#  batch_xs, batch_ys = mnist.train.next_batch(100)
#  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) 
for _ in range(1000):
  batch = mnist.train.next_batch(100)
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})
  
#for _ in range(1000):
#  batch = mnist.train.next_batch(100)
#  train_step.run(feed_dict={x: batch[0], y_: batch[1]})

#### Evaluating the model
To evaluate the model we use the ```tf.argmax(...)``` method; the ```tf.argmax(y,1)``` will return the index from the predicted vector ```y```. We do the same for the target ```y_```. Using the ```tf.equal(...)``` method we then compare the two to see if we got a correct prediction. 

The above will return a list of booleans in a form of ```[True, False, True, True]```. We then cast it to float to get (for this example): ```[1,0,1,1]```. The ```tf.reduce_mean(...)``` method takes the list of floats and returns an average. Finally, we feed the TensorFlow engine with the testing data: by calling the ```.eval(...)``` method we specify our input testing images from the ```mnist.test.images``` set, and the labels ```y_```: the ```mnist.test.labels```. 

Only upon calling the ```.eval(...)``` TensorFlow will loop through all the testing images, check if predicted correctly, and return an average accuracy.

In [17]:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
#print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))