# 02.a: Tensorboard Tutorial with CNN on MNIST dataset.

This tutorial will guid you on how to use the Tensorboard. Tensorboard is an amazing utility that allows us to visualize data and how it behaves. In this tutorial, you will see for what sort of purposes you can use the Tensorboard when training a neural network. 

Please refer to this excellent [article](https://www.datacamp.com/community/tutorials/tensorboard-tutorial) for more information. In this course we only discuss the first part of it.

## There are only one TODO in this Notebook:
- TODO#1: (as always) read the codes and comments from begining to the end.

In [1]:
# Make sure that you have all these libaries available to run the code successfully
from pandas_datareader import data
import matplotlib.pyplot as plt
import pandas as pd
import datetime as dt
import urllib.request, json 
import os
import numpy as np
import tensorflow as tf # This code has been tested with TensorFlow 1.6
from tensorflow.examples.tutorials.mnist import input_data

  from ._conv import register_converters as _register_converters


In [2]:
def accuracy(predictions,labels):
    '''
    Accuracy of a given set of predictions of size (N x n_classes) and
    labels of size (N x n_classes)
    '''
    return np.sum(np.argmax(predictions,axis=1)==np.argmax(labels,axis=1))*100.0/labels.shape[0]

### Define Inputs, Outputs, Weights and Biases

First you define a `batch_size` denoting the amount of data you sample at a single optimization/validation or testing step. Then you define the `layer_ids`, which gives an identifier for each of the layers of the neural network you will be defining. You then can define `layer_sizes`. Note that `len(layer_sizes)` should be `len(layer_ids)+1`, because `layer_sizes` includes the size of the input at the beginning. MNIST has images of size 28x28, which will be 784 when unwrapped to a single dimension. Then you can define the input and label placeholders, that you will later use to train the model. Finally you define two TensorFlow variables for each layer (that is, `weights` and `bias`).

You can use variable scoping (more information [here](https://www.tensorflow.org/programmers_guide/variables)) so that the variables will be nicely named and will be much easier to access later.

In [3]:
batch_size = 100
layer_ids = ['hidden1','hidden2','hidden3','hidden4','hidden5','out']
layer_sizes = [784, 500, 400, 300, 200, 100, 10]

tf.reset_default_graph()

# Inputs and Labels
train_inputs = tf.placeholder(tf.float32, shape=[batch_size, layer_sizes[0]], name='train_inputs')
train_labels = tf.placeholder(tf.float32, shape=[batch_size, layer_sizes[-1]], name='train_labels')

# Weight and Bias definitions
for idx, lid in enumerate(layer_ids):
    
    with tf.variable_scope(lid):
        w = tf.get_variable('weights',shape=[layer_sizes[idx], layer_sizes[idx+1]], 
                            initializer=tf.truncated_normal_initializer(stddev=0.05))
        b = tf.get_variable('bias',shape= [layer_sizes[idx+1]], 
                            initializer=tf.random_uniform_initializer(-0.1,0.1))
        

### Calculate Logits, Predictions, Loss and Optimization

With the input/output placeholders, weights and biases of each layer defined, you now can define the calculations to calculate the logits of the neural network. Logits are the unnormalized values produced at the last layer of the neural network. When normalized, you call them predictions. This involves iterating through each layer in the neural network and computing `tf.matmul(h,w) +b`. You also need to apply an activation function as `tf.nn.relu(tf.matmul(h,w) +b)`, for all layers except for the last layer.

Next you define loss function that is used to optimize the neural network. In this example, you can use the cross entropy loss, which often deliver better results in classification problems than the mean squared error.

Finally you will need to define an optimizer that takes in the loss and update the weights of the neural network in the direction that minimizes the loss.

In [4]:
# Calculating Logits
h = train_inputs
for lid in layer_ids:
    with tf.variable_scope(lid,reuse=True):
        w, b = tf.get_variable('weights'), tf.get_variable('bias')
        if lid != 'out':
            h = tf.nn.relu(tf.matmul(h,w)+b,name=lid+'_output')
        else:
            h = tf.nn.xw_plus_b(h,w,b,name=lid+'_output')

tf_predictions = tf.nn.softmax(h, name='predictions')
# Calculating Loss
tf_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=train_labels, logits=h),name='loss')

# Optimizer 
tf_learning_rate = tf.placeholder(tf.float32, shape=None, name='learning_rate')
optimizer = tf.train.MomentumOptimizer(tf_learning_rate,momentum=0.9)
grads_and_vars = optimizer.compute_gradients(tf_loss)
tf_loss_minimize = optimizer.minimize(tf_loss)


### Defining Tensorboard Summaries

Here you can define the `tf.summary` objects. `tf.summary` objects are the type of entities understood by the Tensorboard. This means that whatever value you'd like to be displayed on the Tensorboard, you should encapsulate it as a `tf.summary` object. There are several different types of summaries. Here as you are visualizing only scalars, you can define `tf.summary.scalar` objects. Furthermore, you can use `tf.name_scope` to group scalars on the Tensorboard. That is, scalars having the same name scope will be displayed on the same row on the Tensorboard. Here you define three different summaries.

* `tf_loss_summary` : You feed in a value by means of a placeholder, whenever you need to publish this to the Tensorboard
* `tf_accuracy_summary` : You feed in a value by means of a placeholder, whenever you need to publish this to the Tensorboard
* `tf_gradnorm_summary` : This calculates the l2 norm of the gradients of the last layer of your neural network. Gradient norm is a good indicator of whether the weights of the neural network are being properly updated. A too small gradient norm can indicate *vanishing gradient* or a too large gradient can imply *exploding gradient* phenomenon.

In [5]:
# Name scope allows you to group various summaries together
# Summaries having the same name_scope will be displayed on the same row on the Tensorboard
with tf.name_scope('performance'):
    # Summaries need to display on the Tensorboard
    # Whenever need to record the loss, feed the mean loss to this placeholder
    tf_loss_ph = tf.placeholder(tf.float32,shape=None,name='loss_summary') 
    # Create a scalar summary object for the loss so Tensorboard knows how to display it
    tf_loss_summary = tf.summary.scalar('loss', tf_loss_ph)

    # Whenever you need to record the loss, feed the mean test accuracy to this placeholder
    tf_accuracy_ph = tf.placeholder(tf.float32,shape=None, name='accuracy_summary') 
    # Create a scalar summary object for the accuracy so Tensorboard knows how to display it
    tf_accuracy_summary = tf.summary.scalar('accuracy', tf_accuracy_ph)

# Gradient norm summary
for g,v in grads_and_vars:
    if 'hidden5' in v.name and 'weights' in v.name:
        with tf.name_scope('gradients'):
            tf_last_grad_norm = tf.sqrt(tf.reduce_mean(g**2))
            tf_gradnorm_summary = tf.summary.scalar('grad_norm', tf_last_grad_norm)
            break
# Merge all summaries together
performance_summaries = tf.summary.merge([tf_loss_summary,tf_accuracy_summary])


### Executing the neural network model: Loading Data, Training, Validation and Testing

In the code below you do the following. First you create a session, in which you execute the operations you defined above. Then you create folder for saving summary data. You next create a summary write `summ_writer`. You can now initialize all variables. This will be followed by loading the MNIST dataset.

Then for each epoch, and each batch in training data (that is, each iteration). Execute `gradnorm_summary` if it is the first iteration and write `gradnorm_summary` to event file with summary writer. You now execute model optimization and calculating the loss. After you go through the full training dataset for a single epoch, calculate average training loss.

You follow a similar treatment for the validation dataset as well. Specifically, for each batch in validation data, you calculate validation accuracy for each batch. Thereafter calculate average validation accuracy for full validation set.

Finally, the testing phase is executed. In this, for each batch in test data, you calculate test accuracy for each batch. With that, you calculate average test accuracy for full test set. At the very end you execute `performance_summaries` and write them to event file with the summary writer.

In [6]:

image_size = 28
n_channels = 1
n_classes = 10
n_train = 55000
n_valid = 5000
n_test = 10000
n_epochs = 25

config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.9 # making sure Tensorflow doesn't overflow the GPU

session = tf.InteractiveSession(config=config)

if not os.path.exists('summaries'):
    os.mkdir('summaries')
if not os.path.exists(os.path.join('summaries','first')):
    os.mkdir(os.path.join('summaries','first'))

summ_writer = tf.summary.FileWriter(os.path.join('summaries','first'), session.graph)

tf.global_variables_initializer().run()

accuracy_per_epoch = []
mnist_data = input_data.read_data_sets('MNIST_data', one_hot=True)


for epoch in range(n_epochs):
    loss_per_epoch = []
    for i in range(n_train//batch_size):
        
        # =================================== Training for one step ========================================
        batch = mnist_data.train.next_batch(batch_size)    # Get one batch of training data
        if i == 0:
            # Only for the first epoch, get the summary data
            # Otherwise, it can clutter the visualization
            l,_,gn_summ = session.run([tf_loss,tf_loss_minimize,tf_gradnorm_summary],
                                      feed_dict={train_inputs: batch[0].reshape(batch_size,image_size*image_size),
                                                 train_labels: batch[1],
                                                tf_learning_rate: 0.0001})
            summ_writer.add_summary(gn_summ, epoch)
        else:
            # Optimize with training data
            l,_ = session.run([tf_loss,tf_loss_minimize],
                              feed_dict={train_inputs: batch[0].reshape(batch_size,image_size*image_size),
                                         train_labels: batch[1],
                                         tf_learning_rate: 0.0001})
        loss_per_epoch.append(l)
        
    print('Average loss in epoch %d: %.5f'%(epoch,np.mean(loss_per_epoch)))    
    avg_loss = np.mean(loss_per_epoch)
    
    # ====================== Calculate the Validation Accuracy ==========================
    valid_accuracy_per_epoch = []
    for i in range(n_valid//batch_size):
        valid_images,valid_labels = mnist_data.validation.next_batch(batch_size)
        valid_batch_predictions = session.run(
            tf_predictions,feed_dict={train_inputs: valid_images.reshape(batch_size,image_size*image_size)})
        valid_accuracy_per_epoch.append(accuracy(valid_batch_predictions,valid_labels))
        
    mean_v_acc = np.mean(valid_accuracy_per_epoch)
    print('\tAverage Valid Accuracy in epoch %d: %.5f'%(epoch,np.mean(valid_accuracy_per_epoch)))
    
    # ===================== Calculate the Test Accuracy ===============================
    accuracy_per_epoch = []
    for i in range(n_test//batch_size):
        test_images, test_labels = mnist_data.test.next_batch(batch_size)
        test_batch_predictions = session.run(
            tf_predictions,feed_dict={train_inputs: test_images.reshape(batch_size,image_size*image_size)}
        )
        accuracy_per_epoch.append(accuracy(test_batch_predictions,test_labels))
        
    print('\tAverage Test Accuracy in epoch %d: %.5f\n'%(epoch,np.mean(accuracy_per_epoch)))
    avg_test_accuracy = np.mean(accuracy_per_epoch)
    
    # Execute the summaries defined above
    summ = session.run(performance_summaries, feed_dict={tf_loss_ph:avg_loss, tf_accuracy_ph:avg_test_accuracy})

    # Write the obtained summaries to the file, so it can be displayed in the Tensorboard
    summ_writer.add_summary(summ, epoch)
    
session.close()

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Average loss in epoch 0: 2.30306
	Average Valid Accuracy in epoch 0: 11.26000
	Average Test Accuracy in epoch 0: 11.35000

Average loss in epoch 1: 2.30125
	Average Valid Accuracy in epoch 1: 11.26000
	Average Test Accuracy in epoch 1: 11.35000

Average loss in epoch 2: 2.29951
	Average Valid Accurac

In [8]:
logs_path = "../summaries/first/"

In [None]:
# !tensorboard --logdir=$logs_path --port 8890

# Conclusions:
After this Notebook, you should know:
- How to build an CNN network for an image classification task.
- How to use Tensorboard to keep-track/monitoring the learning process.