## TensorBoard for Visualization

Let's take up the same task as defined in Recitation 2. We'll be training a Neural Network to classify if a set of points $(x_1, x_2)$ lie inside a circle of radius $1$ or not. For more details on what the task is, please re-visit Recitation 2.

To activate TensorBoard on a program, we add this line after building the graph, right before running the train loop.

```
                                    writer = tf.summary.FileWriter("./logs")
```


This line creates a writer object that creates write event files and saves in the "./logs" directory. This is the directory that TensorBoard will search for an event file to log. We'll understand the usage of TensorBoard on both TensorFlow and pytorch. Let's start with **TensorFlow**.

**PS:** There are better ways to use the **summary** API in TensorFlow. For the sake of using the same method in both TensorFlow and pytorch, we'll stick with this method. Look into [**SummarySaverHook**](https://www.tensorflow.org/api_docs/python/tf/train/SummarySaverHook) or come to office hours.


In [None]:
import tensorflow as tf
import numpy as np
import torch, os
import torch.nn as nn

Similar to Recitation 2, we first sample some polar co-ordinates that are randomly distributed within a circle of radius 2 and centered at origin, ie. $(0,0)$.

In [None]:
def sample_points(n):
    """
    :param n: Total number of data-points
    :return: A tuple (X,y) where X -> [n,2] and y -> [n]
    """    
    radius = np.random.uniform(low=0,high=2,size=n).reshape(-1,1) # uniform radius between 0 and 2
    angle = np.random.uniform(low=0,high=2*np.pi,size=n).reshape(-1,1) # uniform angle
    x1 = radius*np.cos(angle)
    x2=radius*np.sin(angle)
    y = (radius<1).astype(int).reshape(-1)
    x = np.concatenate([x1,x2],axis=1)
    return x,y

In [None]:
# Generating the data

X_tr, y_tr = sample_points(10000)
X_val,y_val = sample_points(500)

print(X_tr.shape, y_tr.shape)

Let's plot some scalars. For visualizing the same metric but on different data sets, you can create separate **tf.summary.FileWriter()** objects and place them in the same folder that you would use as the **log directory** for TensorBoard. 

In [None]:
# Initialize the FileWriters for "training" and "validation" routines
train_writer_tf = tf.summary.FileWriter("./logs/train")


def build_graph(n_units=12, n_layers=2, weight_init=tf.glorot_uniform_initializer(),
    bias_init=None, activation=tf.nn.relu, learning_rate=1e-3
   ):
    X = tf.placeholder(dtype=tf.float32, shape=[None,2], name="X")
    y = tf.placeholder(dtype=tf.int64, shape=[None], name="y")
    gs = tf.train.get_or_create_global_step()
    
    with tf.variable_scope("network", reuse=tf.AUTO_REUSE):
        net = X
        for layer in range(n_layers):
            net = tf.layers.dense(net, units=n_units, name="LAYER-{}".format(layer+1), activation=activation,
                                 kernel_initializer=weight_init, bias_initializer=bias_init
                                 )
        logits = tf.layers.dense(net, units=2, name="LAYER-Last", activation=None,
                                 kernel_initializer=weight_init, bias_initializer=bias_init
                                )
        
        loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=y)
        acc = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits,1), y),tf.float32))

        opt = tf.train.AdamOptimizer(learning_rate=learning_rate)
        train = opt.minimize(loss, global_step=gs)
        
        # Evaluating the gradients to log in TB 
        grads = opt.compute_gradients(loss)
        for grad in grads: tf.summary.histogram("{}-grad".format(grad[1].name), grad[0])
        
    
    # Add "loss" and "acc" as scalar summaries
    tf.summary.scalar("loss", tf.reduce_mean(loss))
    tf.summary.scalar("accuracy", acc)
    
    # Collect all trainable variables
    all_weights = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
    for weight in all_weights: tf.summary.histogram(weight.name, weight)
        
        
    # Merge all summaries into a single op
    summary = tf.summary.merge_all()
    
    return {
        "X": X, "y": y, "train": train, "loss": tf.reduce_mean(loss), "acc": acc, 
        "gs": gs, "summ": summary
    }

We created *val_writer* and *train_writer* for collecting validation and train summaries. Note here that the log path we provided here has the same prefix / directory. This can be different as well - doesn't really matter. 

To add a scalar value in our logs, we use [*tf.summary.scalar()*](https://www.tensorflow.org/api_docs/python/tf/summary/scalar) where we add the node name and provide the tensor to log. We then merge all the summaries in one single operand (to reduce the hassle of running each operand separately) which we *run* using a session. Running these operands alone does not log it in a file. We then use a FileWriter object to add this summary op using the [*add_summary()*](https://www.tensorflow.org/api_docs/python/tf/summary/FileWriter#add_summary) method and *flush* it to write this event on disk.

Similar to plotting a scalar, we use [*tf.summary.histogram()*](https://www.tensorflow.org/api_docs/python/tf/summary/histogram) to plot a histogram. We first collect all the trainable variables and sequentially add them as summaries.

In [None]:
def start_scalar_training(epochs=100):
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for epoch in range(epochs):
            # *train* operands
            gs, _, summary = sess.run([ops["gs"],ops["train"], ops["summ"]], {ops["X"]:X_tr, ops["y"]:y_tr})
            train_writer_tf.add_summary(summary, epoch) # Logging the summary in the event file
            train_writer_tf.flush() # Write to disk
            
# Reset graph
tf.reset_default_graph()

# Build the graph and start training
ops = build_graph(weight_init=tf.random_uniform_initializer(minval=-0.01,maxval=0.01))
start_scalar_training(epochs=200)

Once you start training, you must go to your terminal and start TensorBoard. You need to provide the directory path where the event files are logged as an argument to **logdir**. TensorBoard automatically starts up on port (default) 6006. [`tensorboard --logdir=./logs_tf`]

With different weight initialization techniques, the gradient update changes and could lead to a faster convergence. 

In [None]:
# Random uniform initialization of biases and using Glorot initialization for Weight matrices.
train_writer_tf = tf.summary.FileWriter("./logs/train_init")

# Reset graph
tf.reset_default_graph()

# Build the graph and start training
ops = build_graph(bias_init=tf.random_uniform_initializer(minval=-0.2,maxval=0.2))
start_scalar_training(epochs=200)

## Using Different Activation functions

Let's see and understand how each of these activation functions perform.

- Sigmoid
    * Get values between 0 and 1.
    * A Sigmoid layer easily dies or saturates. A value too small kills the gradient flow whereas a value too big saturates the neurons, effectively passing no information through it.
    
- Tanh
    * Outputs values between -1 and 1. Also zero centered and so does not have the problem of all positive/negative gradients.
    * Better than Sigmoid but problem of saturation persists.

- ReLU
    * Converges quickly as is a threshold based activation and does not saturate.
    * Neurons die off. Large weight update could set the weights in such a way (they become negative) during backpropagation that they never fire for any data point. Important to set lower learning rates for ReLU.
    * Leaky ReLU asjusts this problem by having a very small negative value for `x < 0`.
    
    
**TRY IT OUT**

Use `He Intialization` and `Xavier Initialization` with all the 3 activation functions. See which performs better and try to find out why. 

In [None]:
# Logging for all Activations
# 0.1
train_writer_tf = tf.summary.FileWriter("./logs/train_sigmoid")

# Reset graph
tf.reset_default_graph()

# Build the graph and start training
ops = build_graph(bias_init=tf.random_uniform_initializer(minval=-0.2,maxval=0.2), activation=tf.nn.sigmoid)
start_scalar_training(epochs=200)

# 0.2
train_writer_tf = tf.summary.FileWriter("./logs/train_tanh")

# Reset graph
tf.reset_default_graph()

# Build the graph and start training
ops = build_graph(bias_init=tf.random_uniform_initializer(minval=-0.2,maxval=0.2), activation=tf.nn.tanh)
start_scalar_training(epochs=200)


# 0.3
train_writer_tf = tf.summary.FileWriter("./logs/train_lrelu")

# Reset graph
tf.reset_default_graph()

# Build the graph and start training
ops = build_graph(bias_init=tf.random_uniform_initializer(minval=-0.2,maxval=0.2), activation=tf.nn.leaky_relu)
start_scalar_training(epochs=200)

## Plotting Images
Along with scalars and histograms, you can also use TensorBoard to visualize *images*. Visualizing images is particularly helpful if you want to understand which training images are causing your loss to deviate from its normal (hopefully decreasing) path. It also helps you evaluate your model's performance in a classification task by plotting confusion matrices and visualizing which classes are difficult for your model to understand.

In [None]:
def build_another_graph(n_units=12, n_layers=2):
    ## Same as before ----------> 
    X = tf.placeholder(dtype=tf.float32, shape=[None,2], name="X")
    y = tf.placeholder(dtype=tf.int64, shape=[None], name="y")
    gs = tf.train.get_or_create_global_step()
    
    with tf.variable_scope("network", reuse=tf.AUTO_REUSE):
        net = X
        for layer in range(n_layers):
            net = tf.layers.dense(net, units=n_units, name="LAYER-{}".format(layer+1), activation=tf.nn.relu)
        logits = tf.layers.dense(net, units=2, name="LAYER-Last", activation=None)

        loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=y)
        acc = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits,1), y),tf.float32))

        opt = tf.train.AdamOptimizer(learning_rate=1e-3)
        train = opt.minimize(loss, global_step=gs)
    ## <----------
    confusion = tf.confusion_matrix(y, tf.argmax(tf.nn.softmax(logits), axis=1), num_classes=2, name='confusion')
    # reshape the matrix as a 4D image
    confusion_image = tf.reshape( tf.cast(confusion, tf.float32), [1, 2, 2, 1])
    tf.summary.image('confusion', confusion_image)
    
    # Merge all summaries into a single op
    summary = tf.summary.merge_all()
    
    return {
        "X": X, "y": y, "train": train, "loss": tf.reduce_mean(loss), "acc": acc, 
        "gs": gs, "summ": summary
    }

In [None]:
def start_image_training(epochs=50):
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for epoch in range(epochs):
            # *train* operands
            gs, _, summary = sess.run([ops["gs"],ops["train"], ops["summ"]], {ops["X"]:X_tr, ops["y"]:y_tr})
            train_writer_tf.add_summary(summary, epoch) # Logging the summary in the event file
            train_writer_tf.flush() # Write to disk
            
train_writer_tf = tf.summary.FileWriter("./logs/train_confusion")
            
# Reset graph
tf.reset_default_graph()

# Build the graph and start training
ops = build_another_graph()
start_image_training()

To plot the number of samples in each cell when plotting the confusion matrix, we can either print it on the Python console or use [*tf.summary.text()*](https://www.tensorflow.org/api_docs/python/tf/summary/text)  

### Using TensorBoard in Pytorch

Plotting in PyTorch is a bit different than TF. This is because PyTorch does not use any TensorFlow operand for calculations. Hence, we will need to represent numpy objects as tensorflow operands to setup logging. Also, we make use of the `tf.Summary` class instead of the `tf.summary` calls.

Here's an example of that conversion. This Logger class has been taken from a [Github gist](https://gist.github.com/gyglim/1f8dfb1b5c82627ae3efcfbbadb9f514). 

In [None]:
import matplotlib.pyplot as plt
import numpy as np

class Logger(object):
    """Logging in tensorboard without tensorflow ops."""

    def __init__(self, log_dir):
        self.writer = tf.summary.FileWriter(log_dir)

    def log_scalar(self, tag, value, step):
        """Log a scalar variable.
        Parameter
        ----------
        tag : Name of the scalar
        value : value itself
        step :  training iteration
        """
        # Notice we're using the Summary "class" instead of the "tf.summary" public API.
        summary = tf.Summary(value=[tf.Summary.Value(tag=tag, simple_value=value)])
        self.writer.add_summary(summary, step)

    def log_histogram(self, tag, values, step, bins=1000):
        """Logs the histogram of a list/vector of values."""
        # Convert to a numpy array
        values = np.array(values)
        
        # Create histogram using numpy        
        counts, bin_edges = np.histogram(values, bins=bins)

        # Fill fields of histogram proto
        hist = tf.HistogramProto()
        hist.min = float(np.min(values))
        hist.max = float(np.max(values))
        hist.num = int(np.prod(values.shape))
        hist.sum = float(np.sum(values))
        hist.sum_squares = float(np.sum(values**2))

        # Requires equal number as bins, where the first goes from -DBL_MAX to bin_edges[1]
        # See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/summary.proto#L30
        # Thus, we drop the start of the first bin
        bin_edges = bin_edges[1:]

        # Add bin edges and counts
        for edge in bin_edges:
            hist.bucket_limit.append(edge)
        for c in counts:
            hist.bucket.append(c)

        # Create and write Summary
        summary = tf.Summary(value=[tf.Summary.Value(tag=tag, histo=hist)])
        self.writer.add_summary(summary, step)
        self.writer.flush()

We use the same model code as in Recitation 2.

In [None]:
def generate_single_hidden_MLP(n_hidden_neurons):
    return nn.Sequential(nn.Linear(2,n_hidden_neurons),nn.ReLU(),nn.Linear(n_hidden_neurons,2))

trainx = torch.from_numpy(X_tr).float()
valx = torch.from_numpy(X_val).float()
trainy = torch.from_numpy(y_tr)
valy = torch.from_numpy(y_val)

tLog, vLog = Logger("./logs/train_pytorch"), Logger("./logs/val_pytorch")

model1 = generate_single_hidden_MLP(6)
print(trainx.type(),trainy.type())


In [None]:
def training_routine(net,dataset,n_iters,gpu):
    # organize the data
    train_data,train_labels,val_data,val_labels = dataset
    
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(net.parameters(),lr=0.01)
    
    # use the flag
    if gpu:
        train_data,train_labels = train_data.cuda(),train_labels.cuda()
        val_data,val_labels = val_data.cuda(),val_labels.cuda()
        net = net.cuda() # the network parameters also need to be on the gpu !
    for i in range(n_iters):
        # forward pass
        train_output = net(train_data)
        train_loss = criterion(train_output,train_labels)
        # backward pass and optimization
        train_loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        
        # Once every 100 iterations, log values
        if i%100==0:
            # compute the accuracy of the prediction
            train_prediction = train_output.cpu().detach().argmax(dim=1)
            train_accuracy = (train_prediction.numpy()==train_labels.numpy()).mean() 
            # Now for the validation set
            val_output = net(val_data)
            val_loss = criterion(val_output,val_labels)
            # compute the accuracy of the prediction
            val_prediction = val_output.cpu().detach().argmax(dim=1)
            val_accuracy = (val_prediction.numpy()==val_labels.numpy()).mean() 
            
            # 1. Log scalar values (scalar summary)
            tr_info = { 'loss': train_loss.cpu().detach().numpy(), 'accuracy': train_accuracy }

            for tag, value in tr_info.items():
                tLog.log_scalar(tag, value, i+1)

            # 2. Log values and gradients of the parameters (histogram summary)
            for tag, value in net.named_parameters():
                tag = tag.replace('.', '/')
                tLog.log_histogram(tag, value.data.cpu().numpy(), i+1)
                tLog.log_histogram(tag+'/grad', value.grad.data.cpu().numpy(), i+1)            
    
    net = net.cpu()

In [None]:
dataset = trainx,trainy,valx,valy
gpu = False
gpu = gpu and torch.cuda.is_available() # to know if you actually can use the GPU

training_routine(model1,dataset,1000,gpu)