# Creating a deep learning Architecture

*Author*: Frank Fichtenmueller <br>
*Goal*: Sample Implementation to learn about the Syntax of Tensorflow<br>
*Date*: 12/05/2017

<hr>
Using multiple layers of networks, the goal is to enable the network to learn 2-D Spacial Representation Features to improve the accuracy of the prediction. 

Building on top of [2015-05-12-ff-NeuralNetwork](http://localhost:8891/notebooks/Model_Implementations/2017-05-12-ff-NeuralNetwork.ipynb) we will now implement the picture layout by using a 'convolutional neural network' to compress and learn spacial features to help increase accuracy in distinguishing the harder to decipher parts of the data.

Architecture: <br>
- A convolutional layer learns on spacial subsets of the image representation, and over time will generalize to a 2-tensor for a specific digit shape. 
- A Pooling layer is then trained to compress the digit generalization into a smaller subset of patterns, to force a bottleneck to keep the model from overfitting the specifics and increase generalization
- [convolution , pooling] is repeated twice. The second combination will be learning conceptual patterns of the arrangement of the first combinations generalized patterns. Therefore learning more abstract patterns.
- The output is then fed into a fully connected layer to train the weights and biases to combine the individual features towards classification results.
- 10 individual Neurons are set up with a Softmax Function for multi-class classification to maximize the logistic output seperation between high and low valued predictions. 
- The last layer implements the 'loss function' to measure accuracy, and initiates the backpropagation function to adjust the weights and bias terms on the fully connected layer, which in turn sends adjusted derivatives down to the next layer. This continues trough all layers.

Reduce Overfitting: <br>
- Our Model has enough degrees of freedom to perfectly learn all relevant features within our training data. Likelihood to overfitting sample specifics is therefore high. 
- We use 'dropout' on the Fully connected layer to force the classification algorithm to learn distributed submodels on the same data and not rely too much on the presence of specific features (Nodes)

In [3]:
import tensorflow as tf

ModuleNotFoundError: No module named 'tensorflow'

In [2]:
# Get Data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [1]:
# Define path to TensorBoard log files
logPath = "./tb_logs/"

# Define a function that collects statistics for TensorBoard
def variable_summaries(var):
    with tf.name_scope('summaries'):
        mean = tf.reduce_mean(var)
        tf.summary.scalar('mean', mean)
        with tf.name_scope('stddev'):
            stddev = tf.sqrt(tf.reduce_mean(tf.square(var-mean)))
        tf.summary.scalar('stddev', stddev)
        tf.summary.scalar('max', tf.reduce_max(var))
        tf.summary.scalar('min', tf.reduce_min(var))
        tf.summary.histogram('histogram', var)

In [4]:
sess = tf.InteractiveSession()

In [6]:
with tf.name_scope('MNIST_Input'):
    # Define the placeholders for MNIST input data
    x = tf.placeholder(tf.float32, shape=[None, 784])
    y_ = tf.placeholder(tf.float32, [None, 10])

with tf.name_scope('Input_Reshape'):
    # Reshaping the flattened vector in a 2-tensor
    x_image = tf.reshape(x, [-1, 28,28,1], name='x_image')
    tf.summary.image('input_img', x_image, 5)

In [7]:
# For our activation function we use 'ReLu', therefor we need to initialize
# with small random values, so that Relu does not cancel them out right away

def weight_variable(shape, name=None):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial, name=name)

def bias_variable(shape, name=None):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial, name=name)

We isolate the creation of the convolution and pooling layers, so that we can easily set parameters on the whole network in a single place. 

- Convolution Layers set a stride, and the padding
- Max Pooling sets the Kernel Size which determines the size of the array we are pooling together.

In [8]:
# Create functions to set up convolution and pooling layers for us
def conv2d(x, W, name=None):
    return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding='SAME', name=name)

def max_pool_2x2(x, name=None):
    return tf.nn.max_pool(x, ksize=[1,2,2,1],
                         strides=[1,2,2,1], padding='SAME', name=name)

## Defining the Layers of the Neural Network

We initialize the layers and implement the architectural definitions by setting parameters to the model layers.

### 1. Convolutional Layer

Given our decission to convolute on a patch of 5x5 we will end up with 32 individiual features per image, that will be attributed with a specific weight, and an individual bias term. 

- Therefore we create a 4-tensor Weigh Matrix 'W_conv1': [5,5,1,32]
    - 5x5 input size
    - 1 channel (for greyscale)
    - 32 Features in size
- A 1-tensor bias variable 'b_conv1': [32]

In [9]:
with tf.name_scope('Conv1'):
    with tf.name_scope('weights'):
        W_conv1 = weight_variable([5,5,1,32], name='weight')
        variable_summaries(W_conv1)
    with tf.name_scope('bias'):
        b_conv1 = bias_variable([32], name='bias')
        variable_summaries(b_conv)

    # Do convolution on images, add bias and push through RELU activation
    conv1_wx_b = conv2d(x_image, W_conv1, name='conv2d') + b_conv1
    tf.summary.histogram('conv1_wx_b', conv1_wx_b)
    h_conv1 = tf.nn.relu(conv1_wx_b, name='relu')
    tf.summary.histogram('h_conv1', h_conv1)
    # Take results and run them trough max_pool
    h_pool1 = max_pool_2x2(h_conv1, name='pool')

### 2. Convolutional Layer

This layer processes the output of layer 1 in a 5x5 patch. Returning 64 Weights and Bias Terms.

- Therefore we create a 4-tensor Weigh Matrix 'W_conv1': [5,5,1,32]
    - 5x5 input size
    - 32 channel (Features from Layer one)
    - 64 Features Output
- A 1-tensor bias variable 'b_conv1': [32]

In [2]:
with tf.name_scope('Conv2'):
    # Process the 32 features from  Conv1 in a 5x5 patch. Return 64 Weights and bias
    with tf.name_scope('weights'):
        W_conv2 = weight_variable([5,5,32,64], name='weight')
        variable_summaries(W_conv2)
    with tf.name_scope('bias'):
        b_conv2 = bias_variable([64], name='bias')
        variable_summaries(b_conv2)
    
    # Do convolution on the output of layer 1. Pool results
    conv2_wx_b = conv2d(h_pool1, W_conv2, name='conv2d') + b_conv2
    tf.summary.histogram('conv2_wx_b', conv2_wx_b)
    h_conv2 = tf.nn.relu(conv2_wx_b, name='relu')
    tf.summary.histogram('h_conv2', h_conv2)
    h_pool2 = max_pool_2x2(h_conv2, name='pool')

NameError: name 'tf' is not defined

### 3. Implement a fully connected Layer

This Layer receives a 7x7 Representation of the images, and outputs its weights to 10 propability function to classify the labels 0-9.

- Input is 7x7 images with 64 Features
- Connection of the whole system is 1024 Neurons all together

In [11]:
with tf.name_scope('FC'):
    # Implementing the Fully Connected Layer
    W_fc1 = weight_variable([7*7*64, 1024], name='weight')
    b_fc1 = bias_variable([1024], name='bias')

    # Connect output of pooling layer 2 as input to full connected layer
    h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1, name='relu')

As this very powerfull model can easily overfitt the comparably small dataset we use for training it, we need to implement a 'Dropout' on the fully connected layer, before passing the results to the Classification Output

In [12]:
keep_prob = tf.placeholder(tf.float32, name='keep_prob')
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

### 4. Implementing the 'Readout Layer'

This Layer takes the values and computes probability Statements about the Class prediction

In [13]:
with tf.name_scope('Readout'):
    # Implementing the Layer
    W_fc2 = weight_variable([1024, 10], name='weight')
    b_fc2 = bias_variable([10], name='bias')

    # Defining the model
    y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

Implementing the 'loss function' to calculate back propagation

In [14]:
with tf.name_scope('cross_entropy'):
    # Loss measurement
    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
        logits=y_conv, labels=y_))

with tf.name_scope('loss_optimizer'):
    # loss optimization
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

Defining the accuracy Calculations

In [15]:
with tf.name_scope('Accuracy'):
    # What is correct?
    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_,1))
    # How accurate
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
# Add functions to collect summary on the accuracy changes
tf.summary.scalar('cross_entropy_scl', cross_entropy)
tf.summary.scalar('training_accuracy', accuracy)

In [None]:
# TB - merge summaries
summarize_all = tf.summary.merge_all()

In [16]:
# Initialize all of the variables
sess.run(tf.global_variables_initializer())

# TB - Write the default graph out to view its structure
tbWriter = tf.summary.FileWriter(logPath, sess.graph)

Training the model

In [17]:
# Set variables to controll the training iterations
import time
num_steps = 1000
display_every = 100

# Training Loop
start_time = time.time()
end_time = time.time()

for i in range(num_steps):
    batch = mnist.train.next_batch(50)
    _, summary = sess.run([train_step, summarize_all], feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
    
    # Periodic status display
    if i%display_every == 0:
        train_accuracy = accuracy.eval(feed_dict= {
            x:batch[0], y_: batch[1], keep_prob: 1.0})
        end_time = time.time()
        print("step {0}, elapsed time {1: .2f} seconds, training accuracy {2: .3f}%".
              format(i, end_time-start_time, train_accuracy* 100))
        #Write summary to log
        tbWriter.add_summary(summary, i)

step 0, elapsed time  0.37 seconds, training accuracy  4.000%
step 100, elapsed time  17.90 seconds, training accuracy  90.000%
step 200, elapsed time  35.29 seconds, training accuracy  94.000%
step 300, elapsed time  52.54 seconds, training accuracy  90.000%
step 400, elapsed time  69.83 seconds, training accuracy  90.000%
step 500, elapsed time  87.16 seconds, training accuracy  92.000%
step 600, elapsed time  104.43 seconds, training accuracy  94.000%
step 700, elapsed time  121.65 seconds, training accuracy  96.000%
step 800, elapsed time  138.92 seconds, training accuracy  96.000%
step 900, elapsed time  156.18 seconds, training accuracy  98.000%


In [None]:
# Display summary
end_time = time.time()
print('Total training time for {0} batches: {1:.2f} seconds'.format(i+1, end_time-start_time))

# Accuracy on the test set
print("Test accuracy {0:.3f}%".format(accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0
})*100))

In [None]:
!tensorboard --log tb_logs