# Monitoring models (Tensorboard)

Montiroing models is crucial part when training a model. Continuous monitoring of the model enables to ensure the model training is functioning as intended. Furthermore, it can also provide insights to improvements that can be made to improve model performance and execution time. Here, we will see how we can use the TensorBoard to continuously monitor the model, profile the model as well as visualize various data types such as images and text.

<table align="left">
    <td>
        <a target="_blank" href="https://colab.research.google.com/github/thushv89/manning_tf2_in_action/blob/master/Ch09/13.1_Tensorboard.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
    </td>
</table>



# Important check before running the Tensorboard

In order to make sure all the features of the Tensorboard work, make sure to instell the `libcupti` library. On linux you can install this using `sudo apt-get install libcupti-dev`.

In [1]:
import tensorflow as tf
import numpy as np
import tensorflow_datasets as tfds
%load_ext tensorboard

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
    except:
        print("Couldn't set memory_growth")
        pass
    
def fix_random_seed(seed):
    """ Setting the random seed of various libraries """
    try:
        np.random.seed(seed)
    except NameError:
        print("Warning: Numpy is not imported. Setting the seed for Numpy failed.")
    try:
        tf.random.set_seed(seed)
    except NameError:
        print("Warning: TensorFlow is not imported. Setting the seed for TensorFlow failed.")
    try:
        random.seed(seed)
    except NameError:
        print("Warning: random module is not imported. Setting the seed for random failed.")

# Fixing the random seed
random_seed=4321
fix_random_seed(random_seed)



# Visualizing Image Data on the TensorBoard

## Importing the Fashion-MNIST dataset

In [2]:
# Construct a tf.data.Dataset
fashion_ds = tfds.load('fashion_mnist')

print(fashion_ds)

{'test': <PrefetchDataset shapes: {image: (28, 28, 1), label: ()}, types: {image: tf.uint8, label: tf.int64}>, 'train': <PrefetchDataset shapes: {image: (28, 28, 1), label: ()}, types: {image: tf.uint8, label: tf.int64}>}


## Create training/validation/testing data

In [3]:
def get_train_valid_test_datasets(fashion_ds, batch_size, flatten_images=False):
    
    # Get the training dataset, shuffle it, and output a tuple of (image, label) 
    train_ds = fashion_ds["train"].shuffle(batch_size*20).map(lambda xy: (xy["image"], tf.reshape(xy["label"], [-1])))
    # Get the testing dataset, and output a tuple of (image, label)
    test_ds = fashion_ds["test"].map(lambda xy: (xy["image"], tf.reshape(xy["label"], [-1])))
    
    if flatten_images:
        # Flatten the images to a 1D vector for fully-connected networks
        train_ds = train_ds.map(lambda x,y: (tf.reshape(x, [-1]), y))
        test_ds = test_ds.map(lambda x,y: (tf.reshape(x, [-1]), y))
    
    # Make the validation dataset the first 10000 data
    valid_ds = train_ds.take(10000).batch(batch_size)
    # Make training dataset the rest
    train_ds = train_ds.skip(10000).batch(batch_size).prefetch(tf.data.experimental.AUTOTUNE)
    
    return train_ds, valid_ds, test_ds

## Using `tf.summary` to visualize images on TensorBoard

In [27]:
# Defining the ID to Label map
id2label_map = {
    0: "T-shirt/top",
    1: "Trouser",
    2:"Pullover",
    3: "Dress",
    4: "Coat",
    5: "Sandal",
    6: "Shirt",
    7: "Sneaker",
    8: "Bag",
    9: "Ankle boot"
}

print("Writing to the tensorboard")

!rm -rf ./logs/data/train

image_logdir = "./logs/data/train"
image_writer = tf.summary.create_file_writer(image_logdir)

# Write an image with its category
with image_writer.as_default():
    for data in fashion_ds["train"].batch(1).take(10):
        tf.summary.image(id2label_map[int(data["label"].numpy())], data["image"], max_outputs=20, step=0)

# Write a batch of images at once
with image_writer.as_default():
    for data in fashion_ds["train"].batch(20).take(1):
        pass
    tf.summary.image("A training data batch", data["image"], max_outputs=20, step=0)

print('\tDone')

Writing to the tensorboard
	Done


# Spinning up the TensorBoard
 
Here we're using tensorboard magic command on jupyter notebook. This gives us the same board inline, as if you were to open the Tensorboard in a browser tab. If you call the same command multiple times with the same `logdir` it will reuse the same Tensorboard. If the directories are different a new TensorBoard is spun up. 

In [11]:
%tensorboard --logdir ./logs --port 6006

---
# Open [Tensorboard](http://localhost:6006) in the browser
---

# Checking models on TensorBoard

Here we will compare two models; a fully-connected model and a convolutional neural network. To compare them we will use the Fashion-MNIST dataset.

## Monitoring the performance of the fully-connected network

Here we analyse the training and validation performance of the fully-connected network. We will track loss and accuracy of the model.

### Fully-connected network

Here we define a fully connected network with 3 layers. 

In [29]:
from tensorflow.keras import layers, models


dense_model = models.Sequential([
    layers.Dense(512, activation='relu', input_shape=(784,)),
    layers.Dense(256, activation='relu'),
    layers.Dense(10, activation='softmax')
])

dense_model.compile(loss="sparse_categorical_crossentropy", optimizer='adam', metrics=['accuracy'])


## Training the model

In [30]:
!rm -rf ./logs/dense

In [31]:
batch_size = 64
tr_ds, v_ds, ts_ds = get_train_valid_test_datasets(fashion_ds, batch_size=batch_size, flatten_images=True)

# Defining the tensorboard callback, it will log information to the defined log_dir directory
tb_callback = tf.keras.callbacks.TensorBoard(log_dir="./logs/dense", profile_batch=0)

# Train the model
dense_model.fit(tr_ds, validation_data=v_ds, epochs=10, callbacks=[tb_callback])


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f77690d63c8>

If TensorBoard is not running, run the following command

## Monitoring and profiling the performance of the CNN

In [33]:
conv_model = models.Sequential([
    layers.Conv2D(filters=32, kernel_size=(5,5), strides=(2,2), padding='same', activation='relu', input_shape=(28,28,1)),
    layers.Conv2D(filters=16, kernel_size=(3,3), strides=(1,1), padding='same', activation='relu'),
    layers.Flatten(),
    layers.Dense(10, activation='softmax')
])

conv_model.compile(loss="sparse_categorical_crossentropy", optimizer='adam', metrics=['accuracy'])
conv_model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 14, 14, 32)        832       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 16)        4624      
_________________________________________________________________
flatten (Flatten)            (None, 3136)              0         
_________________________________________________________________
dense_5 (Dense)              (None, 10)                31370     
Total params: 36,826
Trainable params: 36,826
Non-trainable params: 0
_________________________________________________________________


## Training the model

In [34]:
!rm -rf ./logs/conv

batch_size = 64
tr_ds, v_ds, ts_ds = get_train_valid_test_datasets(fashion_ds, batch_size=batch_size, flatten_images=False)

# This tensorboard call back does the followin
# 1. Log loss and accuracy
# 2. Profile the model memory/time for 370-410 batches
# 3. Plot activation histograms for 5th and 10th epoch
tb_callback = tf.keras.callbacks.TensorBoard(log_dir="./logs/conv", profile_batch=[370, 410], histogram_freq=5)

conv_model.fit(tr_ds, validation_data=v_ds, epochs=10, callbacks=[tb_callback])


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f776906b780>

---
# Open [Tensorboard](http://localhost:6006) in the browser
---

# Tensorboard with custom training loops

Here we train two models with and without batch normalization. Then we will analyze the mean and standard deviation of the absolute weights of the second layer. 

In [12]:
from tensorflow.keras import layers, models
import tensorflow.keras.backend as K

K.clear_session()

dense_model = models.Sequential([
    layers.Dense(512, activation='relu', input_shape=(784,)),    
    layers.Dense(256, activation='relu', name='log_layer'),    
    layers.Dense(10, activation='softmax')
])

dense_model.compile(loss="sparse_categorical_crossentropy", optimizer='adam', metrics=['accuracy'])

dense_model_bn = models.Sequential([
    layers.Dense(512, activation='relu', input_shape=(784,)),
    layers.BatchNormalization(),
    layers.Dense(256, activation='relu', name='log_layer_bn'),
    layers.BatchNormalization(),
    layers.Dense(10, activation='softmax')
])

dense_model_bn.compile(loss="sparse_categorical_crossentropy", optimizer='adam', metrics=['accuracy'])

## Training the model

In [15]:
!rm -rf ./logs/weights_exp

def train_model(model, dataset, log_dir, log_layer_name, epochs):    
    
    # Define the writer
    writer = tf.summary.create_file_writer(log_dir)
    
    step = 0
    # Open the writer
    with writer.as_default():        
        tot_iterations_in_epoch = 0  # Total iterations in an epoch
        
        # For every epoch
        for e in range(epochs):
            print("Training epoch {}".format(e+1))
            # For every iteration in the epoch
            for batch in tr_ds:
                # Compute the step
                
                # Train with one batch
                model.train_on_batch(*batch)
                # Get the weights of the layer [0] - weights / [1] - bias
                w = model.get_layer(log_layer_name).get_weights()[0]
                
                # Log mean and std of absolute weights
                tf.summary.scalar("mean_weights", np.mean(np.abs(w)), step=step)
                tf.summary.scalar("std_weights", np.std(np.abs(w)), step=step)
                
                # Flush to the disk from the buffer
                writer.flush()
                
                step += 1
            print('\tDone')
    
    print("Training completed\n")
    
batch_size = 64
tr_ds, _, _ = get_train_valid_test_datasets(fashion_ds, batch_size=batch_size, flatten_images=True)
train_model(dense_model, tr_ds, './logs/weights_exp/standard', "log_layer", 5)

tr_ds, _, _ = get_train_valid_test_datasets(fashion_ds, batch_size=batch_size, flatten_images=True)
train_model(dense_model_bn, tr_ds, './logs/weights_exp/bn', "log_layer_bn", 5)

Training epoch 1
	Done
Training epoch 2
	Done
Training epoch 3
	Done
Training epoch 4
	Done
Training epoch 5
	Done
Training completed

Training epoch 1
	Done
Training epoch 2
	Done
Training epoch 3
	Done
Training epoch 4
	Done
Training epoch 5
	Done
Training completed



# Visualizing word vectors on TensorBoard

In [8]:
import os
import requests
import zipfile

if not os.path.exists(os.path.join('data','glove.6B.zip')):
    
    print("Downloading")
    url = "http://nlp.stanford.edu/data/glove.6B.zip"
    # Get the file from web
    r = requests.get(url)

    if not os.path.exists('data'):
        os.mkdir('data')
    
    # Write to a file
    with open(os.path.join('data','glove.6B.zip'), 'wb') as f:
        f.write(r.content)
    print("\tDone")
    
else:
    print("The zip file already exists.")
    
if not os.path.exists(os.path.join('data', 'glove.6B.50d.txt')):
    print("Extracting data")
    with zipfile.ZipFile(os.path.join('data','glove.6B.zip'), 'r') as zip_ref:
        zip_ref.extractall('data')
    print("\tDone")
else:
    print("The extracted data already exists")

The zip file already exists.
The extracted data already exists


## Getting the most common words in the IMDB movie review dataset

In [15]:
import numpy as np
import pandas as pd

review_ds = tfds.load('imdb_reviews')
train_review_ds = review_ds["train"]

corpus = []
for data in train_review_ds:      
    txt = str(np.char.decode(data["text"].numpy(), encoding='utf-8')).lower()
    corpus.append(str(txt))

In [16]:
from collections import Counter

corpus = " ".join(corpus)

cnt = Counter(corpus.split())
print(cnt.most_common(100))

most_common_words = [w for w,_ in cnt.most_common(5000)]

[('the', 322198), ('a', 159953), ('and', 158572), ('of', 144462), ('to', 133967), ('is', 104171), ('in', 90527), ('i', 70480), ('this', 69714), ('that', 66292), ('it', 65505), ('/><br', 50935), ('was', 47024), ('as', 45102), ('for', 42843), ('with', 42729), ('but', 39764), ('on', 31619), ('movie', 30887), ('his', 29059), ('are', 28743), ('not', 28597), ('film', 27777), ('you', 27564), ('have', 27344), ('he', 26177), ('be', 25691), ('at', 22731), ('one', 22480), ('by', 21976), ('an', 21240), ('they', 20624), ('from', 19934), ('all', 19740), ('who', 19407), ('like', 18779), ('so', 18099), ('just', 17309), ('or', 16769), ('has', 16570), ('her', 16540), ('about', 16486), ("it's", 15970), ('some', 15280), ('if', 15189), ('out', 14510), ('what', 14055), ('very', 13633), ('when', 13609), ('more', 13170), ('there', 13094), ('she', 12234), ('would', 12027), ('even', 12010), ('good', 11926), ('my', 11766), ('only', 11566), ('their', 11317), ('no', 11273), ('really', 11065), ('had', 11042), ('whi

## Read GloVe vectors and filter the most common words

In [17]:
df = pd.read_csv(os.path.join('data', 'glove.6B.50d.txt'), header=None, index_col=0, sep=None, error_bad_lines=False, encoding='utf-8')
df.head()

  """Entry point for launching an IPython kernel.
Skipping line 9: field larger than field limit (131072)


Unnamed: 0_level_0,1,2,3,4,5,6,7,8,9,10,...,41,42,43,44,45,46,47,48,49,50
0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
the,0.418,0.24968,-0.41242,0.1217,0.34527,-0.044457,-0.49688,-0.17862,-0.00066,-0.6566,...,-0.29871,-0.15749,-0.34758,-0.045637,-0.44251,0.18785,0.002785,-0.18411,-0.11514,-0.78581
",",0.013441,0.23682,-0.16899,0.40951,0.63812,0.47709,-0.42852,-0.55641,-0.364,-0.23938,...,-0.080262,0.63003,0.32111,-0.46765,0.22786,0.36034,-0.37818,-0.56657,0.044691,0.30392
.,0.15164,0.30177,-0.16763,0.17684,0.31719,0.33973,-0.43478,-0.31086,-0.44999,-0.29486,...,-6.4e-05,0.068987,0.087939,-0.10285,-0.13931,0.22314,-0.080803,-0.35652,0.016413,0.10216
of,0.70853,0.57088,-0.4716,0.18048,0.54449,0.72603,0.18157,-0.52393,0.10381,-0.17566,...,-0.34727,0.28483,0.075693,-0.062178,-0.38988,0.22902,-0.21617,-0.22562,-0.093918,-0.80375
to,0.68047,-0.039263,0.30186,-0.17792,0.42962,0.032246,-0.41376,0.13228,-0.29847,-0.085253,...,-0.094375,0.018324,0.21048,-0.03088,-0.19722,0.082279,-0.09434,-0.073297,-0.064699,-0.26044


In [18]:
print("Full size of Glove: {}".format(df.shape[0]))
df_common = df.loc[df.index.isin(most_common_words)]
print("Size after only considering the most common words: {}".format(df_common.shape))

Full size of Glove: 399694
Size after only considering the most common words: (3595, 50)


## Writing the word vectors in order to be projected on TensorBoard

In [19]:
from tensorboard.plugins import projector

log_dir=os.path.join('logs', 'embeddings')
# Save the weights we want to analyse as a variable. Note that the first
# value represents any unknown word, which is not in the metadata, so
# we will remove that value.
weights = tf.Variable(df_common.values)
print(weights.shape)
# Create a checkpoint from embedding, the filename and key are
# name of the tensor.
checkpoint = tf.train.Checkpoint(embedding=weights)
checkpoint.save(os.path.join(log_dir, "embedding.ckpt"))

with open(os.path.join(log_dir, 'metadata.tsv'), 'w') as f:
    for w in df_common.index:
        f.write(w+'\n')
        
# Set up config
config = projector.ProjectorConfig()
embedding = config.embeddings.add()
# The name of the tensor will be suffixed by `/.ATTRIBUTES/VARIABLE_VALUE`
#embedding.tensor_name = "embedding/.ATTRIBUTES/VARIABLE_VALUE"
embedding.metadata_path = 'metadata.tsv'
projector.visualize_embeddings(log_dir, config)


(3595, 50)


In [None]:
# VIsualizing/highlighting word vecs
# (?:fred|larry|mrs\.|mr\.|michelle|sea|denzel|beach|comedy|theater|idiotic|sadistic|marvelous|loving|gorg|bus|truck|lugosi)

In [20]:
%tensorboard --logdir logs/embeddings/ --port 6007

2021-05-23 07:37:31.354360: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.4.1 at http://localhost:6006/ (Press CTRL+C to quit)
^C


---
# Open [Tensorboard for Word Vectors](http://localhost:6007) in the browser
---