Convolutional Neural Networks with TensorFlow
===
**Most cutting edge machine learning technology**

"Not sure we can make an improvement on 97.2%

In [1]:
import pandas as pd
import numpy  as np

# Image Tools
from matplotlib import pyplot as plt
from PIL        import Image

# filesystem tools that allow for file manipulation
import os
from glob import glob

# Machine Learning Tools
import tensorflow as tf

%matplotlib inline

  return f(*args, **kwds)


In [2]:
from tensorflow.examples.tutorials.mnist import input_data
mnist  = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


Remember that this image set had been flattened. Convolutional Neural Networks want 2D inputs.
- They walk the convolution kernel over the entire image

**We need to reshape our original data set from flattened into 'real' images**

In [3]:
mnist.train.images.reshape(-1, 28,28,1).shape

(55000, 28, 28, 1)

# Insights from Below:

1. It seems that the **big** difference between a 'canonical' neural network and a 'convolutional' neural network
is that in the forward propogation, or prediction, function (here call 'model'), a canonical NN uses `tf.matmul` or matrix multiplication (a.k.a. the dot product) to 'mix' the weights with the layers.  **But**, the 'convolutional' NN uses a 2D convolution to 'mix' the weights with the layers.
    - this is a **big**, **big** deal; it's a major difference between the two methods and also creates a completely different layout for the interpretation of how the network interacts with the data, layers, and predictions.
 
2. Softmax creates a logit layer(?)
    - Mike Bernico said "and this will return to logit layer" when doing the `tf.nn.softmax` operation

# Part 1: Configure and Generate the Graph

In [4]:
with tf.device('/gpu:0'):
    graph = tf.Graph()
    with graph.as_default():
        # Variables: Hyper Parameters
        batch_size    = 128   # Mini batch size for SGD (Stochastic Gradient Descent)
        num_hidden    = 1024  # Depth of the hidden layer
        keep_prob     = 0.5   # Dropout keep probability (50%)
        num_labels    = 10    # One hot encoding ==> 10 labels; one for each of the 10 classes (0,1)
        patch_size    = 5     # THIS IS NEW
        num_channels  = 1     # Grayscale:1, RGB:1 THIS IS NEW
        depth         = 16    # THIS IS NEW
        image_size    = 28
        
        def weight_variable(shape):
            # There are going to be so many weight variables, that is it frugle
            #   to create a function that returns the initialized tensors for you
            # Function takes in `shape` and returns `tf.Variable` with 
            #  `tf.truncated_normal` config
            stddev0 = 0.1
            initial = tf.truncated_normal(shape, stddev=stddev0)
            return tf.Variable(initial)
        
        def bias_variable(shape):
            # There are going to be so many bias variables, that is it frugle
            #   to create a function that returns the initialized tensors for you
            # Function takes in `shape` and returns `tf.Variable` with `tf.constant` config
            b0      = 0.1
            initial = tf.constant(b0, shape=shape) ## ** Why is this `tf.constant` instead of `tf.zeros`
            return tf.Variable(initial)
        
        def conv2d(x,W):
            # Build 2D Convolutional layers for the network
            # We are going to need so many of them, it's better to adapt 
            #   a function to initialize them
            strides0  = [1,1,1,1]
            return tf.nn.conv2d(x, W, strides=strides0, padding='SAME')
        
        def max_pool_2x2(x):
            # Build 2D Max Pooling layers for the network
            # We are going to need so many of them, it's better to adapt 
            #   a function to initialize them
            ksize0  = [1,2,2,1]
            strides1= [1,2,2,1]
            return tf.nn.max_pool(x, ksize=ksize0, strides=strides1, padding='SAME')
        
        # Input Data. For the training data, we use a placeholder that will be fed
        #   at run time with a training minibatch.
        #
        # *** GETTING THE MATRICES RIGHT IS PROBABLY THE HARDEST PART ABOUT USING ConvNets
        tf_train_dataset  = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
        tf_train_labels   = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
        
        # Have to reshape the data set to be the correct size for 2D images in the 2D convolutions
        tf_valid_dataset  = tf.constant(mnist.validation.images.reshape(-1, 28,28,1))
        tf_test_dataset   = tf.constant(mnist.test.images.reshape(-1, 28,28,1))
        
        keep_prob         = tf.placeholder(tf.float32) # last time it was `tf.placeholder('float')`
        
        ## Initialize the Variables for all of the Weights and Biases
        
        # Two Convolutionaly ("conv") layers: 2 weights, 2 biases
        W_conv1 = weight_variable([5,5,1,32])
        b_conv1 = bias_variable([32])
        W_conv2 = weight_variable([5,5,32,64])
        b_conv2 = bias_variable([64])
        
        # Two fully connect ("fc") layers (max pooling?): 2 weights, 2 biases
        W_fc1   = weight_variable([7*7*64, 1024])
        b_fc1   = bias_variable([1024])
        W_fc2   = weight_variable([1024, 10])
        b_fc2   = bias_variable([10])
        
        def model(data):
            """ Assembles the NN """
            h_conv1 = tf.nn.relu(conv2d(data, W_conv1) + b_conv1)
            h_pool1 = max_pool_2x2(h_conv1)
            h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
            h_pool2 = max_pool_2x2(h_conv2)
            
            # We have to flatten the weights in order to 'mix' them with the more 
            #   'canonical', fully connected layers
            h_pool2_flat  = tf.reshape(h_pool2, [-1, 7*7*64]) 
            
            # Also acts like a hidden layer with ReLU activation
            h_fc1   = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
            
            # Now process that through the dropout 'layer' or 'filter'
            h_fc1_drop  = tf.nn.dropout(h_fc1, keep_prob)
            
            # Logit layer, return the softmax of the matrix multiplication
            #  ** GUESS ** softmax == logit layer?
            return tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
        
        # Training Computations
        logits  = model(tf_train_dataset) # prediction from `tf_train_dataset` placeholder
        loss    = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))
        ### ****************************************** ###
        ### Why is there no regularized_loss term?
        ### Instructor "removed the regularization because it just gets too complicated with this many layers
        ### ****************************************** ###
        
        # Optimizer: Stochastic Gradient Descent (SGD)
        SGD_step_size = 0.5
        optimizer     = tf.train.GradientDescentOptimizer(SGD_step_size).minimize(loss)
        
        ### ******************************************************************************** ###
        ### I can't believe that I forgot the ACTIVATE the minimize with `.minimize(loss)`   ###
        ### ******************************************************************************** ###
        
        # Predictions for the training, validation, and test data
        train_prediction  = tf.nn.softmax(logits) # WHY are we doing a SOFTMAX on the SOFTMAX?
        valid_prediction  = tf.nn.softmax(model(tf_valid_dataset))
        test_prediction   = tf.nn.softmax(model(tf_test_dataset))
        
        # Initialize Saver
        saver = tf.train.Saver()

In [5]:
def accuracy(predictions, labels):
    return 100.0*np.sum(np.argmax(predictions, 1) == np.argmax(labels,1)) / predictions.shape[0]

In [6]:
from time import time
num_steps     = 2000
print_step    = 100
time_add      = 0

on_keep_prob  = 0.5
off_keep_prob = 1.0

start0 = time()
with tf.Session(graph=graph, config=tf.ConfigProto(log_device_placement=True)) as session:
    tf.initialize_all_variables().run()
    print("Initialized at %.2f" % time())
    
    for step in range(num_steps):
        start1 = time()
        # Generate a minibatch
        batch_data, batch_labels  = mnist.train.next_batch(batch_size)
        
        # Reshape this batch
        batch_data  = batch_data.reshape(-1,28,28,1)
        
        # Prepare a dictionary telling the session where to feed the minibatch
        # The `keys` for the dictionary is the placeholder node of the graph to be fed,
        # and the `values` are the numpy array to feed to it
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels, keep_prob : on_keep_prob}
        
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        
        time_add += (time() - start1)
        if step % print_step == 0:
            print("Minibatch loss at step %d: %f" % (step, l))
            print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
            valid_predNow = valid_prediction.eval(feed_dict={keep_prob:off_keep_prob})
            print("Validation accuracy: %.1f%%" % accuracy(valid_predNow, mnist.validation.labels.astype(float)))
            print("This operation took %.2f seconds" % time_add)
            time_add  = 0
    
    save_path = saver.save(session, "./conv_net_mnist_model.ckpt")
    print("Model saved in file: %s" % save_path)
    
    test_predNow = test_prediction.eval(feed_dict={keep_prob:off_keep_prob})
    print("Test accuracy: %.1f%%" % accuracy(test_predNow, mnist.test.labels.astype(float)))
    print("This operation took %.2f seconds" % (time() - start0))

Instructions for updating:
Use `tf.global_variables_initializer` instead.
Initialized at 1511385435.79
Minibatch loss at step 0: 2.348978
Minibatch accuracy: 9.4%
Validation accuracy: 9.9%
This operation took 0.34 seconds
Minibatch loss at step 100: 2.343959
Minibatch accuracy: 11.7%
Validation accuracy: 11.0%
This operation took 20.89 seconds
Minibatch loss at step 200: 2.359588
Minibatch accuracy: 10.2%
Validation accuracy: 11.0%
This operation took 22.92 seconds
Minibatch loss at step 300: 2.359588
Minibatch accuracy: 10.2%
Validation accuracy: 11.0%
This operation took 23.21 seconds
Minibatch loss at step 400: 2.343963
Minibatch accuracy: 11.7%
Validation accuracy: 11.0%
This operation took 23.49 seconds
Minibatch loss at step 500: 2.359588
Minibatch accuracy: 10.2%
Validation accuracy: 11.0%
This operation took 23.38 seconds
Minibatch loss at step 600: 2.336151
Minibatch accuracy: 12.5%
Validation accuracy: 11.0%
This operation took 23.95 seconds
Minibatch loss at step 700: 2.1056

# Convolutional Neural Networks Results

**BLOWN AWAY Part 1:** Mike Bernico's code achieved **95.9%** Validation Accuracy in just **500** steps!!

**BLOWN AWAY Part 2:** Mike Bernico's code achieved **98.9%** Validation Accuracy after 8000 steps!!

**BLOWN AWAY Part 3:** Mike Bernico's code achieved **99.0%** Test Accuracy after 8000 steps (and 10 minutes)!!

1. OMG this is much much slower; but that was definitely expected!
2. My accuracy step takes longer than if 500 iterations
    - I must not be using multiprocessing
    - Just to compute the validation accuracy took about 5 minutes D:
3. Instructor says "the world record for MNIST is 99.77% accuracy"
    - He thinks that 99.0% is not close enough to be excited about, but I am blown away!
    - Maybe some hyperparameter manipulation could do it
4. World record was set using a ConvNN like this one.
    - They didn't use ONE, they used an ensemble of 34 convolutional NN's
    - Just like DROPOUT, more than one is always better than just one
    - **Ensembles are a way to usually improve a model**
5. ConvNN are really good at traditional (trivial) problems like MNIST, but they **revolutionary** at real image recognition
    - They can do much more amazing things when you want to recognize is it a cat or a dog
        - is it a broken car or a fixed car
        - is it a person in the cross walk or a light
    - **All** of the self-driving car technology revolve around what we've done with ConvNets
6. History of this lesson:
    - Started with multinomial logistic regression
    - Created a multilayer perceptron to solve the MLR (above)
    - Now we've implemented the convolutional neural networks