---

# <center>★ AI / ML Project - Tensorboard Profiling★
#### <center> ***Domain: Applied DL Research***

<center><img src="0.png" style="width: 800px;"/>

---

### Description:

There is a common business saying that you can’t improve what you don’t measure. This is true in machine learning as well. There are various tools for measuring the performance of a deep learning model: Neptune AI, MLflow, Weights and Biases, Guild AI, just to mention a few. In this piece, we’ll focus on TensorFlow’s open-source visualization toolkit TensorBoard. 

The tool enables you to track various metrics such as accuracy and log loss on training or validation set. As we shall see in this piece, TensorBoard provides several tools that we can use in machine learning experimentation. The tool is also fairly easy to use. 

Here are some things we can explore with Tensorboard:
* Visualizing images in TensorBoard
* Checking model weights and biases on TensorBoard
* Visualizing the model’s architecture
* Sending a visual of the confusion matrix to TensorBoard
* Profiling your application so as to see its performance, and
* Using TensorBoard with Keras, PyTorch, and XGBoost


### Acknowledgement: 
Tensorflow.org, Neptune.ai, Coursera, Google.

### Objective:
- Optimize TensorFlow performance using the Profiler

---

# <center> Plan of Action:

**We aim to solve the problem statement by creating a plan of action, Here are some of the necessary steps:**
1. Data Exploration
2. Data Pre-processing
3. Model Training
4. Model Optimization

---

# <center>1. Data Exploration

In [1]:
!pip install -U tensorboard_plugin_profile



In [2]:
#Importing the basic librarires

import tensorflow as tf
from datetime import datetime
import tensorflow_datasets as tfds

In [3]:
# Check for GPU accelerator

device_name = tf.test.gpu_device_name()
if not device_name:
    raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


In [4]:
#Loading the dataset

(ds_train, ds_test), ds_info = tfds.load('mnist',
                                         split=['train', 'test'],
                                         shuffle_files=True,
                                         as_supervised=True,
                                         with_info=True,)

---

# <center>2. Data Preprocessing

In [5]:
# Defining a custom function to normalise the images

def normalize_img(image, label):
    """Normalizes images: `uint8` -> `float32`."""
    return tf.cast(image, tf.float32) / 255., label

In [6]:
#Preprocessing training dataset

ds_train = ds_train.map(normalize_img)
ds_train = ds_train.batch(128)

In [7]:
# Preprocessing testing dataset

ds_test = ds_test.map(normalize_img)
ds_test = ds_test.batch(128)

---

# <center>3. Model Training

In [8]:
# Building Deep Learning Model Architecture

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10)
])

In [9]:
# Compiling the Model

model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)

In [10]:
# Create a TensorBoard callback

logs = "logs/" + datetime.now().strftime("%Y%m%d-%H%M%S")

tboard_callback = tf.keras.callbacks.TensorBoard(log_dir = logs,
                                                 histogram_freq = 1,
                                                 profile_batch = '500,520')


#Training the Model

n=2
history = model.fit(ds_train,
                    epochs=2,
                    validation_data=ds_test,
                    callbacks = [tboard_callback])

Epoch 1/2
Epoch 2/2


In [11]:
# Load the TensorBoard notebook extension.
%load_ext tensorboard

In [12]:
# Launch TensorBoard and navigate to the Profile tab to view performance profile
%tensorboard --logdir=logs

Reusing TensorBoard on port 6006 (pid 2752), started 5:18:01 ago. (Use '!kill 2752' to kill it.)

The Profile tab opens the Overview page which shows you a high-level summary of your model performance. Looking at the Step-time Graph on the right, you can see that the model is highly input bound (i.e., it spends a lot of time in the data input piepline). The Overview page also gives you recommendations on potential next steps you can follow to optimize your model performance.

![](1.png)

Use the Trace Viewer to locate the performance bottlenecks in your input pipeline. The image below is a snapshot of the performance profile.

![](2.png)

Looking at the event traces, you can see that the GPU is inactive while the tf_data_iterator_get_next op is running on the CPU. This op is responsible for processing the input data and sending it to the GPU for training. As a general rule of thumb, it is a good idea to always keep the device (GPU/TPU) active.

---

# <center> 4. Model Optimization 
<center> (using Tensorboard Profiler)

In [13]:
#Loading the dataset

(ds_train, ds_test), ds_info = tfds.load('mnist',
                                         split=['train', 'test'],
                                         shuffle_files=True,
                                         as_supervised=True,
                                         with_info=True,)

Let us use the tf.data API to optimize the input pipeline. 
In this case, let's cache the training dataset and prefetch the data to ensure that there is always data available for the GPU to process.

In [14]:
#Preprocessing training dataset

ds_train = ds_train.map(normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_train = ds_train.batch(128)
ds_train = ds_train.cache()
ds_train = ds_train.prefetch(tf.data.AUTOTUNE)

In [15]:
# Preprocessing testing dataset

ds_test = ds_test.map(normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.AUTOTUNE)

In [16]:
history = model.fit(ds_train,
                    epochs=2,
                    validation_data=ds_test,
                    callbacks = [tboard_callback])

Epoch 1/2
Epoch 2/2


In [17]:
%tensorboard --logdir=logs

Reusing TensorBoard on port 6006 (pid 2752), started 5:18:07 ago. (Use '!kill 2752' to kill it.)

The Trace Viewer shows that the tf_data_iterator_get_next op executes much faster. The GPU therefore gets a steady stream of data to perform training and achieves much better utilization through model training.

The Profile tab opens the Overview page which shows you a high-level summary of your model performance. Looking at the Step-time Graph on the right, you can see that the model is highly input bound (i.e., it spends a lot of time in the data input piepline). The Overview page also gives you recommendations on potential next steps you can follow to optimize your model performance.

![](3.png)

From the Overview page, you can see that the Average Step time has reduced as has the Input Step time. The Step-time Graph also indicates that the model is no longer highly input bound. Open the Trace Viewer to examine the trace events with the optimized input pipeline.

![](4.png)

The Trace Viewer shows that the tf_data_iterator_get_next op executes much faster. The GPU therefore gets a steady stream of data to perform training and achieves much better utilization through model training.

In [18]:
#<<<--------------------------------------THE END---------------------------------------->>>