<h1 style="color:Blue;">CIFAR-10 image classification with TensorFlow<br>
    Part 1 - Training and evaluation</h1>
<br>
<b>
This Jupyter NoteBook will explain how to build a miniVGGNet CNN for classifying the CIFAR-10 dataset using the TensorFlow layers API, then how to train and evaluate it. The complete python code (cifar10_train.py and miniVGGNet.py) can be found in this GitHub repo.
<br><br>
The MiniVGGNet CNN was developed by Adrian Rosebrock and looks like this:</b>

![title](img/minivggnet.png)

<b>First, we import the necessary Python packages..</b>

In [None]:
import os
import sys
import shutil
import numpy as np

import tensorflow as tf

<b>Now we create some directories for the TensorBoard event logs and the TensorFlow checkpoints. If the directories already exist, we delete them and recreate them so that we are always starting from scratch.</b>

In [None]:
SCRIPT_DIR = os.getcwd()

INFER_GRAPH = 'inference_graph.pb'
CHKPT_FILE = 'float_model.ckpt'

CHKPT_DIR = os.path.join(SCRIPT_DIR, 'chkpts')
TB_LOG_DIR = os.path.join(SCRIPT_DIR, 'tb_logs')
CHKPT_PATH = os.path.join(CHKPT_DIR, CHKPT_FILE)


if (os.path.exists(TB_LOG_DIR)):
    shutil.rmtree(TB_LOG_DIR)
os.makedirs(TB_LOG_DIR)
print("Directory " , TB_LOG_DIR ,  "created ") 


if (os.path.exists(CHKPT_DIR)):
    shutil.rmtree(CHKPT_DIR)
os.makedirs(CHKPT_DIR)
print("Directory " , CHKPT_DIR ,  "created ")

<h2 style="color:Blue;">Data Wrangling</h2>

<b>Now we download the CIFAR-10 dataset. TensorFlow includes the Keras library which has a built-in function to do the job for us. What you get is a dataset that has been split into 50k images and labels for training, 10k images and labels for test. 
<br>
The 'images' are actually numpy arrays with the datatype of each array member set to 8bit unsigned integer. We scale this image data back to the range 0:1.0 by dividing by 255.0. The labels are also integers, so we one-hot encode them using the `to_categorical()` method.</b>

In [None]:
# CIFAR10 dataset has 60k images. Training set is 50k, test set is 10k.
# Each image is 32x32x8bits
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

# Scale image data from range 0:255 to range 0:1
# Also converts train & test image data to float from uint8
x_train = (x_train/255.0).astype(np.float32)
x_test = (x_test/255.0).astype(np.float32)

# take 5000 images & labels from the train dataset to create a validation set of 'unseen' images
x_valid = x_train[45000:]
y_valid = y_train[45000:]

# train dataset reduced to 45000 images
x_train = x_train[:45000]
y_train = y_train[:45000]


# one-hot encode the labels
y_train = tf.keras.utils.to_categorical(y_train, num_classes=10)
y_test = tf.keras.utils.to_categorical(y_test, num_classes=10)
y_valid = tf.keras.utils.to_categorical(y_valid, num_classes=10)

<b>Set up the learning rate for the Optimizer, the batch size and the number of epochs. We will only run for 3 epochs to keep the training time to a minimum..be aware that real world machine learning algorithms might need humdreds of epochs to train properly.
<br>
We also calculate the total number of steps per epoch, which is just the size of the training dataset divided by the number of batches.</b>

In [None]:
LEARN_RATE = 0.001
EPOCHS = 3
BATCHSIZE = 50


# calculate total number of batches per epoch
total_batches = int(len(x_train)/BATCHSIZE)

<h2 style="color:Blue;">The Computational Graph</h2>

<h4 style="color:Blue;">Define placeholders</h4>

__The placeholders are tensors which are used for inputting data when a session is run.__
* __The `images_in` placeholder takes in the 32pixel x 32pixel RGB images (..actually numpy arrays..) and so has shape [None,32,32,3].__
* __The `labels` placeholder takes in the one-hot encoded labels, so has shape [None,10] and data type integer32.__
__The `train` and `drop` placeholders control the action of the batch normalization and dropout layers.__
* __`train` is a boolean placeholder which will put the batch norm and droput layers in training mode during training only.__
* __`drop` controls the dropout rate of the dropout layers and will be 0.25 during training and 0.0 during validation and testing.__

In [None]:
images_in = tf.placeholder(tf.float32, shape=[None,32,32,3], name='images_in')
labels = tf.placeholder(tf.int32, shape=[None,10], name='labels')
train = tf.placeholder_with_default(False, shape=None, name='train')
drop = tf.placeholder_with_default(0.0, shape=None, name='drop')

<h4 style="color:Blue;">Define the actual CNN..</h4>
<br>
<b>
The miniVGGNet structure is described in the miniVGGNet.py script in the 'netmodel' folder and looks like this:
</b>

In [None]:
def miniVGGNet(inputs, is_training, drop_rate):
    net = tf.layers.conv2d(inputs=inputs, filters=32, kernel_size=3, padding='same', activation=tf.nn.relu)
    net = tf.layers.batch_normalization(inputs=net, training=is_training)
    net = tf.layers.conv2d(inputs=net, filters=32, kernel_size=3, padding='same', activation=tf.nn.relu)
    net = tf.layers.batch_normalization(inputs=net, training=is_training)
    net = tf.layers.max_pooling2d(inputs=net, pool_size=2, strides=2)
    net = tf.layers.dropout(inputs=net, rate=drop_rate, training=is_training)

    net = tf.layers.conv2d(inputs=net, filters=64, kernel_size=3, padding='same', activation=tf.nn.relu)
    net = tf.layers.batch_normalization(inputs=net, training=is_training)
    net = tf.layers.conv2d(inputs=net, filters=64, kernel_size=3, padding='same', activation=tf.nn.relu)
    net = tf.layers.batch_normalization(inputs=net, training=is_training)
    net = tf.layers.max_pooling2d(inputs=net, pool_size=2, strides=2)
    net = tf.layers.dropout(inputs=net, rate=drop_rate, training=is_training)

    net = tf.layers.flatten(inputs=net)
    net = tf.layers.dense(inputs=net, units=512, activation=tf.nn.relu)
    net = tf.layers.batch_normalization(inputs=net, training=is_training)
    net = tf.layers.dropout(inputs=net, rate=drop_rate, training=is_training)

    logits = tf.layers.dense(inputs=net, units=10, activation=None)
    return logits

<b>It is a series of layers, so we use the `tf.layers` API. In our main training script (cifar10_train.py) we just call the miniVGGNet function like this:</b> 

In [None]:
logits = miniVGGNet(inputs=images_in, is_training=train, drop_rate=drop)

<h4 style="color:Blue;">Define loss, accuracy and optimizer</h4>

<b>The loss function is a cross entropy function for classification which accepts labels in one-hot format (..which explains why we one-hot encoded the labels earlier..). The training optimizer is an Adaptive Momentum type.</b>

In [None]:
# softmax cross entropy loss function
loss = tf.reduce_mean(tf.losses.softmax_cross_entropy(logits=logits, onehot_labels=labels))

# Adaptive Momentum optimizer - minimize the loss
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
optimizer = tf.train.AdamOptimizer(learning_rate=LEARN_RATE, name='Adam')
with tf.control_dependencies(update_ops):
    train_op = optimizer.minimize(loss)

<b>We will calculate the accuracy of our network during training as the mean of the correct predictions..</b>

In [None]:
# Check to see if the prediction matches the label
correct_prediction = tf.equal(tf.argmax(logits, 1, output_type=tf.int32), tf.argmax(labels, 1, output_type=tf.int32)  )

 # Calculate accuracy as mean of the correct predictions
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

<b>TensorFlow provides built-in functions for calculating the top-k accuracies..</b>

In [None]:
in_top5 = tf.nn.in_top_k(predictions=logits, targets=tf.argmax(labels, 1), k=5)
in_top1 = tf.nn.in_top_k(predictions=logits, targets=tf.argmax(labels, 1), k=1)
top5_acc = tf.reduce_mean(tf.cast(in_top5, tf.float32))
top1_acc = tf.reduce_mean(tf.cast(in_top1, tf.float32))

<b>We will collect the loss and accuracy data for displaying in TensorBoard along with the images that are fed into the 'images_in' placeholder.</b>

In [None]:
# TensorBoard data collection
tf.summary.scalar('cross_entropy_loss', loss)
tf.summary.scalar('accuracy', accuracy)
tf.summary.image('input_images', images_in)

<b>We define an instance of a saver object which will be used inside our session to save the trained model checkpoint.</b>

In [None]:
# set up saver object
saver = tf.train.Saver()

![title](img/graph.png)

<h2 style="color:Blue;">The Session</h2>

<b>Inside the session, we initialize all the variables then loop through the number of epochs, sending the training data into the `images_in` and `labels` placeholders.

When we exit the training loop, the final top-1 and top-5 accuracy is calculated and then the weights and biases of the trained model are saved as a checkpoint.
</b>

In [None]:
with tf.Session() as sess:

    sess.run(tf.initializers.global_variables())
    
    # TensorBoard writer
    writer = tf.summary.FileWriter(TB_LOG_DIR, sess.graph)
    tb_summary = tf.summary.merge_all()

    # Training phase with training data
    print ('******************************')
    print ('TRAINING STARTED..')
    print ('******************************\n')
    for epoch in range(EPOCHS):
        print ("Epoch", epoch+1, "/", EPOCHS)

        # process all batches
        for i in range(total_batches):
            
            # fetch a batch from training dataset
            x_train_batch, y_train_batch = x_train[i*BATCHSIZE:i*BATCHSIZE+BATCHSIZE], y_train[i*BATCHSIZE:i*BATCHSIZE+BATCHSIZE]

            # Display training accuracy every 100 batches
            if i % 100 == 0:
              acc = sess.run(accuracy, feed_dict={images_in: x_test[:1000], labels: y_test[:1000]})
              print (' Step: {:4d}  Training accuracy: {:1.4f}'.format(i,acc))

            # Run graph for optimization  - i.e. do the training
            _, s = sess.run([train_op, tb_summary], feed_dict={images_in: x_train_batch, labels: y_train_batch, train: True, drop: 0.25})
            writer.add_summary(s, (epoch*total_batches + i))


    print("\nTRAINING FINISHED\n")
    print ('******************************')
    writer.flush()
    writer.close()


    # Validation phase with validation dataset
    # calculate top-1 and top-5 accuracy with 'unseen' data
    print ('******************************')
    print("VALIDATION")
    print ('******************************\n')
    t5_acc,t1_acc = sess.run([top5_acc,top1_acc], feed_dict={images_in: x_valid[:1000], labels: y_valid[:1000]})
    print (' Top 1 accuracy with validation set: {:1.4f}'.format(t1_acc))
    print (' Top 5 accuracy with validation set: {:1.4f}'.format(t5_acc))

    # save post-training checkpoint
    # this saves all the parameters of the trained network
    save_path = saver.save(sess, os.path.join(CHKPT_DIR, CHKPT_FILE))
    print('\nSaved checkpoint to %s' % os.path.join(CHKPT_DIR,CHKPT_FILE))
    

<h2 style="color:Blue;">Create Inference Graph for use with DNNDK</h2>

<b>The computational graph we have defined above cannot be used with DNNDK. It includes the `train` and `drop` placeholders which feed varying values into the CNN function's `is_training` and `drop_rate` arguments. We need to create a new graph which has the `is_training` and `drop_rate` arguments of our CNN tied to static values: </b>

In [None]:
with tf.Graph().as_default():

  # define placeholders for the input data
  x_1 = tf.placeholder(tf.float32, shape=[None,32,32,3], name='images_in')

  # call the miniVGGNet function with is_training=False & dropout rate=0
  logits_1 = miniVGGNet(x_1, is_training=False, drop_rate=0.0)

  tf.train.write_graph(tf.get_default_graph().as_graph_def(), CHKPT_DIR, INFER_GRAPH, as_text=False)
  print('Saved binary inference graph to %s' % os.path.join(CHKPT_DIR,INFER_GRAPH))
