# Chapter 14: TensorBoard: Big Brother of TensorFlow

## 1️⃣ Chapter Overview

Deep learning models are often referred to as "Black Boxes" because their internal decision-making processes are complex and opaque. Furthermore, training these models is time-consuming and prone to silent failures (e.g., dead neurons, vanishing gradients).

This chapter introduces **TensorBoard**, TensorFlow's built-in visualization toolkit. TensorBoard acts as a window into the black box, allowing us to visualize datasets, track training metrics (loss/accuracy) in real-time, inspect internal weights (histograms), profile performance bottlenecks, and visualize high-dimensional embeddings.

**Key Machine Learning Concepts:**
* **Monitoring:** Tracking scalar metrics (Loss, Accuracy) over time to detect overfitting/underfitting.
* **Histograms:** Visualizing the distribution of weights and biases to detect saturation or dead neurons.
* **Dimensionality Reduction:** Using PCA/t-SNE to visualize high-dimensional word vectors in 3D space.
* **Profiling:** Identifying whether your model is CPU-bound or GPU-bound.

**Practical Skills:**
* Using `tf.keras.callbacks.TensorBoard` for automatic logging.
* Using `tf.summary` for custom logging in custom training loops.
* Visualizing Image Data on the TensorBoard dashboard.
* Setting up the **Embedding Projector** to visualize Word2Vec/GloVe embeddings.

## 2️⃣ Theoretical Explanation

### 2.1 How TensorBoard Works
TensorBoard does not read data directly from your program's memory. Instead, it relies on **Event Files** (logs).

1.  **Summary Writer:** In your TensorFlow code, you define a `SummaryWriter` that points to a specific directory (e.g., `logs/run1`).
2.  **Writing Events:** During training, you push data (scalars, images, histograms) to this writer.
3.  **TensorBoard Server:** You launch a separate process (the TensorBoard server) that watches the log directory. It parses the event files and renders them as interactive web pages.

### 2.2 Key Visualizations

1.  **Scalars:** Line charts tracking values that change over time (Epochs/Steps). Crucial for Loss and Accuracy.
2.  **Images:** Allows viewing the actual image data feeding into the model. Useful to verify data augmentation (e.g., did the rotation flip the image correctly?).
3.  **Histograms:** 3D plots showing the distribution of tensor values (weights/biases) over time. Helps identifying:
    * *Vanishing Gradients:* Values concentrate at 0.
    * *Dead Relu:* Values stay negative.
4.  **Embeddings (Projector):** Projects high-dimensional vectors (e.g., 128D word vectors) into 3D space using algorithms like PCA or t-SNE.

### 2.3 Profiling
Deep learning training pipelines are complex. Bottlenecks can occur in:
    * **Data Pipeline:** The GPU is starving because the CPU takes too long to load/augment images.
    * **Kernel Launch:** The CPU is slow to tell the GPU what to do.
    * **Device Compute:** The model is simply too big.
TensorBoard Profiler breaks down the execution time into these components, helping you optimize the pipeline.

## 3️⃣ Setup and Imports

We need to load the TensorBoard extension to view it inside the notebook.

In [None]:
%load_ext tensorboard

import tensorflow as tf
from tensorflow.keras import layers, models
import tensorflow_datasets as tfds
import datetime
import os
import numpy as np

# Ensure clean log directory
if not os.path.exists('logs'):
    os.makedirs('logs')

## 4️⃣ Section 1: Visualizing Data with TensorBoard

Before training, we should inspect our data. We will load the **Fashion MNIST** dataset and log a batch of images to TensorBoard.

### 4.1 Data Loading Pipeline

In [None]:
# Load Fashion MNIST
dataset, info = tfds.load('fashion_mnist', with_info=True, as_supervised=True)
train_ds = dataset['train']
test_ds = dataset['test']

# Map Class IDs to Names
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

def normalize_img(image, label):
    return tf.cast(image, tf.float32) / 255.0, label

train_ds = train_ds.map(normalize_img).shuffle(1000).batch(32)
test_ds = test_ds.map(normalize_img).batch(32)

### 4.2 Logging Images
We use `tf.summary.image` to write image data to the logs.

In [None]:
# Create a log directory with timestamp
current_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
log_dir = os.path.join('logs', 'image_viz', current_time)
file_writer = tf.summary.create_file_writer(log_dir)

# Get a single batch of images
images, labels = next(iter(train_ds))

# Reshape for visualization (Batch, Height, Width, Channels)
# Fashion MNIST is (32, 28, 28, 1)

with file_writer.as_default():
    # Log the first 5 images
    # step=0 indicates this is the initial state
    tf.summary.image("Training data", images, max_outputs=5, step=0)

print(f"Images logged to {log_dir}")

## 5️⃣ Section 2: Monitoring Model Training

We will build a simple CNN and use `tf.keras.callbacks.TensorBoard` to automatically log metrics (Loss, Accuracy) and weights (Histograms).

**Key Argument:** `histogram_freq=1` tells Keras to compute histograms of weights every epoch.

In [None]:
def create_model():
    model = models.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation='relu'),
        layers.Flatten(),
        layers.Dense(64, activation='relu'),
        layers.Dense(10, activation='softmax')
    ])
    return model

model = create_model()
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Define TensorBoard Callback
log_dir = os.path.join("logs", "fit", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(
    log_dir=log_dir, 
    histogram_freq=1 # Log weight histograms every epoch
)

# Train
model.fit(train_ds, 
          epochs=3, 
          validation_data=test_ds, 
          callbacks=[tensorboard_callback])

### 5.1 Viewing TensorBoard
To view the dashboard, you would typically run the following command in a cell. 

**Note:** In some environments (like standard Jupyter), this opens an interactive window. In others, you might need to run `tensorboard --logdir logs` from your terminal.

```python
%tensorboard --logdir logs
```

## 6️⃣ Section 3: Custom Logging with `tf.summary`

Sometimes the Keras callback isn't enough. You might want to log weird custom metrics (e.g., the mean value of gradients, or the learning rate schedule) inside a custom training loop.

Here, we simulate a custom loop and log the **mean weight** of the first layer manually.

In [None]:
# Define a separate writer for custom metrics
custom_log_dir = os.path.join("logs", "custom", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
summary_writer = tf.summary.create_file_writer(custom_log_dir)

model = create_model()
optimizer = tf.keras.optimizers.Adam()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()

epochs = 3
for epoch in range(epochs):
    print(f"\nStart of epoch {epoch}")
    
    for step, (x_batch_train, y_batch_train) in enumerate(train_ds):
        with tf.GradientTape() as tape:
            logits = model(x_batch_train, training=True)
            loss_value = loss_fn(y_batch_train, logits)
            
        grads = tape.gradient(loss_value, model.trainable_weights)
        optimizer.apply_gradients(zip(grads, model.trainable_weights))
        
        # --- Custom Logging ---
        # Log every 200 steps
        if step % 200 == 0:
            with summary_writer.as_default():
                # 1. Log scalar Loss
                tf.summary.scalar('custom_loss', loss_value, step=optimizer.iterations)
                
                # 2. Log mean weight of first layer
                # (To check if weights are exploding or vanishing)
                w = model.layers[0].weights[0]
                mean_w = tf.reduce_mean(w)
                tf.summary.scalar('weight_mean_l0', mean_w, step=optimizer.iterations)
                
    print(f"Epoch {epoch} done.")

## 7️⃣ Section 4: Profiling Performance

The TensorBoard **Profiler** helps identify if your input pipeline is slow (CPU bound) or if your model operations are slow (GPU bound).

To use it, we simply add the `profile_batch` argument to the callback. It defines which batches to monitor (e.g., batches 500 to 520).

*Note: Profiling often requires specific GPU drivers and the CUPTI library installed on the host machine.*

In [None]:
log_dir = os.path.join("logs", "profile", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))

tensorboard_callback = tf.keras.callbacks.TensorBoard(
    log_dir=log_dir, 
    profile_batch='50,60' # Profile batches 50 to 60
)

# We would then fit the model as usual:
# model.fit(train_ds, epochs=1, callbacks=[tensorboard_callback])
print("Profiler configured. Check the 'Profile' tab in TensorBoard after running fit.")

## 8️⃣ Section 5: Visualizing Embeddings (Projector)

The Embedding Projector allows us to verify if our model has learned semantic relationships between words. We will download pretrained **GloVe** vectors and visualize them.

**Logic:**
1. Save the weights of the embedding layer to a checkpoint file.
2. Save the vocabulary (metadata) to a TSV file.
3. Configure a `projector_config.pbtxt` linking the two.

In [None]:
from tensorboard.plugins import projector

# 1. Create dummy embeddings (Simulating GloVe for demonstration)
vocab_size = 1000
embedding_dim = 50
dummy_weights = tf.Variable(tf.random.normal([vocab_size, embedding_dim]))
dummy_vocab = [f"word_{i}" for i in range(vocab_size)]

# 2. Setup Log Directory
log_dir = os.path.join('logs', 'embeddings')
if not os.path.exists(log_dir):
    os.makedirs(log_dir)

# 3. Save Weights (Checkpoint)
checkpoint = tf.train.Checkpoint(embedding=dummy_weights)
checkpoint.save(os.path.join(log_dir, "embedding.ckpt"))

# 4. Save Metadata (TSV)
with open(os.path.join(log_dir, 'metadata.tsv'), 'w') as f:
    for word in dummy_vocab:
        f.write(f"{word}\n")

# 5. Configure Projector
config = projector.ProjectorConfig()
embedding = config.embeddings.add()
embedding.tensor_name = "embedding/.ATTRIBUTES/VARIABLE_VALUE"
embedding.metadata_path = 'metadata.tsv'

projector.visualize_embeddings(log_dir, config)

print(f"Embeddings ready. Run TensorBoard pointing to {log_dir} and check 'Projector' tab.")

## 9️⃣ Chapter Summary

* **TensorBoard** is indispensable for debugging deep learning models.
* **Scalars Tab:** Use it to track Overfitting (when Validation Loss diverges from Training Loss).
* **Images Tab:** Use it to sanity check your data pipeline inputs.
* **Histograms Tab:** Use it to monitor weight health (check for bell curves; avoid spikes at 0 or -1).
* **Profile Tab:** Use it to identify if you need to optimize your `tf.data` pipeline (prefetching/caching) or your model ops.
* **Projector Tab:** Use it to visualize high-dimensional embeddings in 3D space using PCA/t-SNE.