<a href="https://colab.research.google.com/github/CIA-Oceanix/DLOA2023/blob/main/lectures/notebooks/introduction_to_tensorboard_students.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to TensorBoard

[TensorBoard](https://www.tensorflow.org/tensorboard) is an interactive tool used to visualise your metrics and graphs.

In this notebook, we will learn to use the basics of TensorBoard with MNIST dataset as example.


## Table of contents

1. Data
2. Models
3. Learning & logging
4. Sharing your board

A few useful modules:

In [None]:
from pprint import pprint  # pretty print

import matplotlib.pyplot as plt
import torch
from torch.utils.tensorboard import SummaryWriter
import torchvision
from torchvision import datasets
from torchvision.transforms import ToTensor


And some global parameters:

In [None]:
batch_size = 100
n_epochs = 5
device = 'cuda' if torch.cuda.is_available() else 'cpu'

print(f'Currently using {device}.')

## 1. Data

Define the MNIST train and test datasets:

In [None]:
train_data = datasets.MNIST(
    root='./data/',  # Root directory where the dataset is stored
    train=True,
    transform=ToTensor(),  # Turn the data into tensor
    download=True,  # Download the dataset if necessary
)

In [None]:
test_data = datasets.MNIST(
    root='./data/',
    train=False,
    transform=ToTensor(),
    download=True,
)

In [None]:
# Some details about the dataset
print(f'    Number of classes: {len(train_data.classes)}')
print(f'      Data dimensions: {train_data.data[0].shape}')
print(f'Size of the train set: {len(train_data)}')
print(f' Size of the test set: {len(test_data)}\n')

print('Classes\n-------')
pprint(train_data.classes)

In [None]:
# Show a sample
plt.imshow(train_data.data[0])

In [None]:
# Split train data into train and validation data
_train, _validation = torch.utils.data.random_split(train_data, [.8, .2])

In [None]:
# Data loaders
train_loader = torch.utils.data.DataLoader(_train, batch_size)
validation_loader = torch.utils.data.DataLoader(_validation, batch_size)
test_loader = torch.utils.data.DataLoader(test_data, batch_size)

## 2. Models

We are going to compare two simple models and see how they behave with TensorBoard.

The models are:

- a simpel **multi-layers perceptron** (`MLP`);
- a simple **convolutional network** (`ConvNet`).

In [None]:
class MLP(torch.nn.Module):
    def __init__(self):
        super().__init__()

        self.layers = torch.nn.Sequential(
            torch.nn.Flatten(),
            torch.nn.Linear(28*28, 512),
            torch.nn.ReLU(),
            torch.nn.Linear(512, 512),
            torch.nn.ReLU(),
            torch.nn.Linear(512, 10),
        )

    def forward(self, X):
        return self.layers(X)

In [None]:
class ConvNet(torch.nn.Module):
    def __init__(self):
        super().__init__()

        self.layers = torch.nn.Sequential(
            torch.nn.Conv2d(1, 5, 3, padding=1),
            torch.nn.BatchNorm2d(5),
            torch.nn.ReLU(),
            torch.nn.Conv2d(5, 1, 3, padding=1),
            torch.nn.ReLU(),

            torch.nn.Flatten(start_dim=1),
            torch.nn.Linear(28*28, 10),
        )

    def forward(self, X):
        return self.layers(X)

## 3. Learning & logging

Before that, remove eventuel old runs and start and new board.

We need first to instanciate a `SummaryWriter` that will keep a track of everything we will log.

We will perform a simulation with the ConvNet first.

In [None]:
%rm -rf ./runs/

In [None]:
%load_ext tensorboard
%tensorboard --logdir=runs

### PyTorch

To add data in TensorBoard, you need an instance of the class [`SummaryWriter`](https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter). You can specify the location where the data will be stored in the `log_dir` parameter.

In [None]:
tb_convnet = SummaryWriter(log_dir='./runs/convnet')

We will first learn to add scalars value (`int`, `float`). For example, metrics are scalar values.

To add a scalar, you need the method [`SummaryWriter.add_scalar`](https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_scalar):

```python
SummaryWriter.add_scalar(tag, value, global_step)
```

Where:
- `tag` is where the scalar will be stored (for example "Loss"). You can add several scalar with the same label so we can track the evolution of the scalar throughout the `global_step`. You can specify a path so you can categorise your scalars (for example "Loss/Training"). Each `tag` corresponds to **one** graph;
- `value` is the scalar you want to save;
- (optional) `global_step` is the $n$-th iteration of the scalar (for example, the $n$-th epoch).

Let's write the training and testing processes and add some scalars:

In [None]:
def train(dataloader, model, loss_fn, optimizer, epoch, tb):
    n_samples = len(dataloader.dataset)
    n_batches = len(dataloader)
    model.train()

    train_loss, correct = 0, 0
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)
        train_loss += loss
        correct += (pred.argmax(1) == y).type(torch.float).sum().item()

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f'loss: {loss:>7f}  [{current:>5d}/{n_samples:>5d}]')
    train_loss /= n_batches
    correct /= n_samples

    tb.add_scalar('Loss/Train', loss, epoch)  # /!\
    tb.add_scalar('Accuracy/Train', correct, epoch)  # /!\

In [None]:
@torch.no_grad()
def validation(dataloader, model, loss_fn, epoch, tb):
    n_samples = len(dataloader.dataset)
    n_batches = len(dataloader)
    model.eval()

    validation_loss, val_correct = 0, 0
    for X, y in dataloader:
        X, y = X.to(device), y.to(device)
        pred = model(X)
        validation_loss += loss_fn(pred, y).item()
        val_correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    validation_loss /= n_batches
    val_correct /= n_samples
    print(f'Validation Error:\nAccuracy: {(100*val_correct):>0.1f}%, Avg loss: {validation_loss:>8f}\n')

**>>> In the process `validation`, add the loss and accuracy values in TensorBoard.**

And a final procedure that handles the training and the test:

In [None]:
def run(model, loss_fn, optimizer, scheduler, tb):
    for t in range(n_epochs):
        print(f"Epoch {t+1}\n-------------------------------")
        train(train_loader, model, loss_fn, optimizer, t, tb)
        validation(validation_loader, model, loss_fn, t, tb)

        tb.add_scalar('Learning Rate', scheduler.get_last_lr()[0], t)  # /!\
        scheduler.step()
    print("Done!")

Please be aware that:

1. The learning rate is normally a hyperparameter and there is a specific method to track hyperparameter in a way such that we can map metrics to hyperparameter. See [`SummaryWriter.add_hparams`](https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_hparams) for more details;
2. We haven't called `train`, `validation` and `run` yet so no scalar have been added yet to TensorBoard.


Let's instanciate the model:

In [None]:
convnet = ConvNet().to(device)
convnet

It is possible to use TensorBoard to show a graph of your model, it can be very useful if your model is complex as it would allow you to visualise it. We use the method [`SummaryWriter.add_graph`](https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_graph) for that:

```python
SummaryWriter.add_graph(model, data)
```

Where:
- `model` is your PyTorch module;
- `data` will be used to feed the model so TensorBoard can visualise it.

Let's add our ConvNet in TensorBoard:

In [None]:
sample, _ = next(iter(train_loader))
tb_convnet.add_graph(convnet, sample)  # /!\

**---> 👀 See TensorBoard**

Also, you can add images to the board with [SummaryWriter.add_image](https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_image):

```python
SummaryWriter.add_image(tag, img, global_step)
```
Where:
- `tag` same as previously;
- `img` is a **tensor** (or a **numpy.ndarray**);
- (optional) `global_step` same as previously.

Let's add a sample of our dataset (a whole batch) in the board:

In [None]:
grid = torchvision.utils.make_grid(sample)
tb_convnet.add_image("Batch", grid, global_step=None)  # /!\

**---> 👀 See TensorBoard**

**>>> Modify the previous cell such that we can see all batches of the training set in TensorBoard**
[ Pro tips: loop over `train_loader` with the function `enumerate` ]

**---> 👀 See TensorBoard**

Then we can define the loss function (`loss_fn`), the optimizer and a learning rate scheduler before running an experiment:

In [None]:
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(convnet.parameters(), lr=1e-3)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=n_epochs)

run(
    model=convnet,
    loss_fn=loss_fn,
    optimizer=optimizer,
    scheduler=scheduler,
    tb=tb_convnet,
)

**---> 👀 See TensorBoard**

Let's add another model (MLP) to our board in order to compare it with the previous one (ConvNet).

**>>> Complete the following cell.**

In [None]:
# Instanciate a new board in `./runs/mlp`
tb_mlp = ...  # /!\

# Instanciate the MLP module (and send it to `device`)
mlp = ...  # /!\

# Loss function, optimizers & LR scheduler
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(mlp.parameters(), lr=1e-3)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=n_epochs)

# Complete the following:
run(
    model=mlp,  # /!\
    loss_fn=loss_fn,
    optimizer=optimizer,
    scheduler=scheduler,
    tb=tb_mlp,  # /!\
)

Let's not forget to close the board:

In [None]:
tb_convnet.close()
tb_mlp.close()

## 4. Sharing your board

You can host your board online in order to share it with other people on [tensorboard.dev](https://tensorboard.dev).

**The data stored are public and visible to anyone, do not share sensitive data and results!**

In [None]:
%tensorboard dev upload --logdir runs --name "my board" --description "a test board"

## Other

- [TensorBoard official website](https://www.tensorflow.org/tensorboard);
- [Default page on PyTorch about TensorBoard](https://pytorch.org/docs/stable/tensorboard.html);
- [How to use TensorBoard with PyTorch](https://pytorch.org/tutorials/recipes/recipes/tensorboard_with_pytorch.html);
- [A tutorial on TensorBoard with PyTorch](https://pytorch.org/tutorials/intermediate/tensorboard_tutorial.html);
- [Another guide to use TensorBoard](https://towardsdatascience.com/a-complete-guide-to-using-tensorboard-with-pytorch-53cb2301e8c3);
- [TensorBoard dev](https://tensorboard.dev);