In [None]:
from pathlib import Path
import shutil
import sys

from lib.utils.tb_to_df import tb_to_df
from lib.utils.pseudo_log import create_pseudo_logs

# Extra: Analyzing TensorBoard Logs

TensorBoard is a very nice tool to visualize the progress of your training. It is not always easy to thoroughly compare multiple runs, however. For that, you would need access to the raw logging data. To help you with that, we have create a function `tb_to_df` that converts all the TensorBoard logging data that is present a certain directory into a [`pandas` `DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html). `pandas` is a Python library that provides tools to analyze tabular data. This tabular data is represented by `DataFrame` objects. In some way, `pandas` is like Excel for Python (but much better, of course 😉).

Unfortunately, a `pandas` tutorial is beyond the scope of this course. Nevertheless, as it is a very powerful and popular tool, investing some time in learning to work with `pandas` is really worth it. But don't worry, you won't need any `pandas` know-how for this notebook.

## Generate Some Dummy Logs

For the sake of this example, we have written a small function that writes some dummy logs to a directory.

In [None]:
# The name of the log root containing the dummy logs
log_root = 'runs_dummy'

# Clean up the directory if it exists
if Path(log_root).exists():
    shutil.rmtree(log_root)

# Write some dummy logs to the directory
create_pseudo_logs(log_root)

## Converting TensorBoard Logs to `pandas`

Once you have a directory that contains the logs of a couple of training runs, you pass the name of this directory to `tb_to_df()`, like so:

```python
from lib.utils.tb_to_df import tb_to_df

log_root = 'runs_???'
df = tb_to_df(log_root)
```

Then, `df` will contain all the logs stored in the given directory. Let's try this for the dummy logs we have created above.

In [None]:
df = tb_to_df(log_root)

As you can see when executing the cell below, `df` contains a tabular data structure. Each row corresponds to a logging step. For each step of a certain training run, the `DataFrame` contains the logged scalars (`metric1` and `metric2` in our example) at that step and the time at which each of the scalars was logged (`wall_time (metric1)` and `wall_time (metric2)` in our example).

Apart from the logged scalars, the `DataFrame` contains the value of the hyperparameters of the run. These values are extracted from the run name, as it is formatted like `hparam1(value1)_hparam2(value2)_`. This is the format we use in all our notebooks, so this hyperparameter parsing should work out of the box. Finally, the column `run_name` contains the name of the run to which the logged data belongs.

In [None]:
df

## Analyzing a `DataFrame`

As mentioned above, we won't dive into the details of `DataFrame`s, but feel free to explore it yourself. With the [`seaborn`](https://seaborn.pydata.org/) plotting library, you can create some compelling data visualizations from a `DataFrame`.

But again, there really is *no obligation to analyze your data with `pandas`*. If there's another software package that you feel comfortable with, please use that one! There's probably an easy way to convert the `DataFrame` into a format that your preferred software package supports. For example, you can save the `DataFrame` as an **Excel sheet** with the following line of code.

> **NOTE**: You might get the error: `ModuleNotFoundError: No module named 'openpyxl'`. Simply open a new cell and run `!pip install openpyxl` to install it.

In [None]:
df.to_excel('my_results.xlsx', index=False)

Now, in the directory of this notebook, you should see a new file called `my_results.xlsx`. In Jupyter Notebook, you can click the checkmark next to it and then click the button `Download` on top of the file list.

Alternatively, you can save the `DataFrame` as a **CSV-file** with the following line of code.

In [None]:
df.to_csv('my_results.csv', index=False)

For other supported formats, see [the `pandas` documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html).