# Testing out tensorflow
Source: https://www.tensorflow.org/versions/r0.10/tutorials/mnist/pros/index.html

## Simple logistic regression

In [27]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [28]:
import tensorflow as tf

# Set up an interactive session.  When not using iPython, you should use a regular (non-interactive) session.

sess = tf.InteractiveSession()

In [29]:
# Create placeholders to hold input and target

x = tf.placeholder(tf.float32, shape=[None, 784]) # Input: pixels of image (28*28 = 784)
y_ = tf.placeholder(tf.float32, shape=[None, 10]) # Target: one hot encoder indicating which digit

In [30]:
# Create variables to hold model weights and bias.  They are initialized to 0s.

W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

In [31]:
# Tensorflow doesn't actually run anything until you tell it to...
sess.run(tf.initialize_all_variables())

In [32]:
# Define output function as softmax (sigmoid)
y = tf.nn.softmax(tf.matmul(x,W) + b)

In [33]:
# Define error function
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

In [34]:
# Create object to represent training step.  Will train when executed.

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

In [35]:
# Train 1000 batches of 100 examples

for i in range(1000):
  batch = mnist.train.next_batch(100)
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})

In [36]:
# Evaluate results.  Again, I have to define the workflow and then evaluate.

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.9197


That's is too low! Let's try a CNN
## CNN

In [37]:
# Weight initialization functions

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

### Convolution and Pooling
Convolution means the same thing as in other contexts.  W is the filter, which will be determined later.  `strides` specifies the step size, with 4 values corresponding with 4D input `[batch, height, width, channel]`.  In this case, `[1, 1, 1, 1]` means that we will apply the convolution to every pixel in every channel in every image.  `SAME` uses zero padding to generate outpout the same size as the input (divided by filter size). [Convolution Docs](https://www.tensorflow.org/versions/r0.10/api_docs/python/nn.html#convolution)

Pooling aggregates the values in windows of adjascent pixels.  This reduces the size of the input for the next step.  Pooling also has `ksize` to specify the size of the window in each dimension.  It uses the same convention as `strides`.  [Pooling Docs](https://www.tensorflow.org/versions/r0.10/api_docs/python/nn.html#pooling)


In [38]:
def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

### First Convolution Layer
We can now implement our first layer. It will consist of convolution, followed by max pooling. The convolutional will compute 32 features for each 5x5 patch. Its weight tensor will have a shape of [5, 5, 1, 32]. The first two dimensions are the patch size, the next is the number of input channels, and the last is the number of output channels. We will also have a bias vector with a component for each output channel.

In [39]:
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

To apply the layer, we first reshape x to a 4d tensor, with the second and third dimensions corresponding to image width and height, and the final dimension corresponding to the number of color channels.

In [40]:
x_image = tf.reshape(x, [-1,28,28,1])

We then convolve x_image with the weight tensor, add the bias, apply the ReLU function, and finally max pool.

In [41]:
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

### Second Convolutional Layer
In order to build a deep network, we stack several layers of this type. The second layer will have 64 features for each 5x5 patch.

In [42]:
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

### Densely Connected Layer

Now that the image size has been reduced to 7x7, we add a fully-connected layer with 1024 neurons to allow processing on the entire image. We reshape the tensor from the pooling layer into a batch of vectors, multiply by a weight matrix, add a bias, and apply a ReLU.

In [43]:
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

#### Dropout

To reduce overfitting, we will apply dropout before the readout layer. We create a placeholder for the probability that a neuron's output is kept during dropout. This allows us to turn dropout on during training, and turn it off during testing. TensorFlow's tf.nn.dropout op automatically handles scaling neuron outputs in addition to masking them, so dropout just works without any additional scaling.1

In [44]:
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

### Readout Layer

Finally, we add a softmax layer, just like for the one layer softmax regression above.

In [45]:
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

## Train and Evaluate the Model

How well does this model do? To train and evaluate it we will use code that is nearly identical to that for the simple one layer SoftMax network above.

The differences are that:

* We will replace the steepest gradient descent optimizer with the more sophisticated ADAM optimizer.

* We will include the additional parameter keep_prob in feed_dict to control the dropout rate.

* We will add logging to every 100th iteration in the training process.

Feel free to go ahead and run this code, but it does 20,000 training iterations and may take a while (possibly up to half an hour), depending on your processor.

In [46]:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.initialize_all_variables())
for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i%1000 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})
    print("step %d, training accuracy %g"%(i, train_accuracy))
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

step 0, training accuracy 0.1
step 1000, training accuracy 0.96
step 2000, training accuracy 0.94
step 3000, training accuracy 0.94
step 4000, training accuracy 0.98
step 5000, training accuracy 1
step 6000, training accuracy 1
step 7000, training accuracy 1
step 8000, training accuracy 1
step 9000, training accuracy 1
step 10000, training accuracy 1
step 11000, training accuracy 1
step 12000, training accuracy 1
step 13000, training accuracy 1
step 14000, training accuracy 1
step 15000, training accuracy 1
step 16000, training accuracy 1
step 17000, training accuracy 1
step 18000, training accuracy 1
step 19000, training accuracy 1


In [47]:
# Evaluate model accuracy
print("test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

test accuracy 0.9928


The final test set accuracy after running this code should be approximately 99.2%.