# Visualizing GPU Resource Utilization with PyNVML and Bokeh

- **Author:** Rick Zamora (email: rzamora@nvidia.com)
- **Last Update:** 5/15/2019

**Note:** *In the 5/15/2019 live demonstration, the "GPU-Utilization" and "GPU-Resources" applications were used as JupyterLab extensions. The integration of the PyNVML-Bokeh server was finished by [Ben Zaitlen](https://github.com/quasiben) just before the live demonsration (thanks Ben!). In the future, this notebook will be updated to reflect the JupyterLab integration.* 

### Introduction

This notebook provides a brief demonstration of how to visualize GPU metrics using PyNVML and Bokeh. The demonstration has three general goals:

1. Introduce/discuss the PyNVML python bindings for the [NVIDIA Management Library (NVML)](https://developer.nvidia.com/nvidia-management-library-nvml)
2. Discuss a specific example of NVML-Bokeh integration for GPU-metric visualization
3. Provide a simple benchmark (using Dask) to visualize multi-GPU resource utilization

### Base Environment Setup

In order to visualize GPU utilization for this demo, we start by creating a base conda environment with [RAPIDS](https://rapids.ai/) and [Jupyter](https://jupyter.org/) packages:
```
conda create --name bokeh-pynvml \
    -c defaults -c nvidia -c rapidsai \
    -c pytorch -c numba -c conda-forge \
    cudf=0.7 cuml=0.7 python=3.7 cudatoolkit=9.2 \
    nodejs jupyterlab dask dask-cudf dask-cuda bokeh -y
conda activate bokeh-pynvml
```

Note that I am personally using a DGX machine with eight V100 NVIDIA GPUs to write this notebook (`Ubuntu 16.04.5 LTS (GNU/Linux 4.4.0-135-generic x86_64`).

Before or after activating our base conda environment, we should also choose a specific root-directory location for this demo:
```
export demo_home='/home/nfs/rzamora/workspace/pynvml-bokeh-demo'
mkdir $demo_home; cd $demo_home
```

### Python Bindings for the NVIDIA Management Library (PyNVML)

PyNVML is a python wrapper for the [NVIDIA Management Library (NVML)](https://developer.nvidia.com/nvidia-management-library-nvml), which is a C-based API for monitoring and managing various states of NVIDIA GPU devices. NVML is directly used by the better-known [NVIDIA System Management Interface](https://developer.nvidia.com/nvidia-system-management-interface) (`nvidia-smi`). According to the NVIDA developer site, NVML provides access to the following query-able states (in additional to modifiable states not discussed here):

- **ECC error counts**: Both correctable single bit and detectable double bit errors are reported. Error counts are provided for both the current boot cycle and for the lifetime of the GPU.
- **GPU utilization**: Current utilization rates are reported for both the compute resources of the GPU and the memory interface.
- **Active compute process**: The list of active processes running on the GPU is reported, along with the corresponding process name/id and allocated GPU memory.
- **Clocks and PState**: Max and current clock rates are reported for several important clock domains, as well as the current GPU performance state.
- **Temperature and fan speed**: The current core GPU temperature is reported, along with fan speeds for non-passive products.
- **Power management**: For supported products, the current board power draw and power limits are reported.
- **Identification**: Various dynamic and static information is reported, including board serial numbers, PCI device ids, VBIOS/Inforom version numbers and product names.

Although several different python wrappers for NVML currently exist, we will use the [PyNVML](https://github.com/gpuopenanalytics/pynvml) package hosted by GoAi on GitHub. This version of PyNVML uses `ctypes` to wrap most of the NVML C API.  For this demo, we will focus on a small subset of the API needed to query real-time GPU-resource utilization:

- `nvmlInit()`: Initialize an NVML profiling session
- `nvmlShutdown()`: Finalize an NVML profiling session
- `nvmlDeviceGetCount()`: Get the number of available GPU devices
- `nvmlDeviceGetHandleByIndex()`: Get a handle for a device (given an integer index)
- `nvmlDeviceGetMemoryInfo()`: Get a memory-info object (given a device handle)
- `nvmlDeviceGetUtilizationRates()`: Get a utlization-rate object (given a device handle)
- `nvmlDeviceGetPcieThroughput()`: Get a PCIe-trhoughput object (given a device handle)

In the current version of PyNVML, the python function names are usually chosen to exactly match the C API. For example, to query the current GPU-utilization rate on every available device, the code would look something like this:

```
In [1]: from pynvml import *
In [2]: nvmlInit()
In [3]: ngpus = nvmlDeviceGetCount()
In [4]: for i in range(ngpus):
   ...:     handle = nvmlDeviceGetHandleByIndex(i)
   ...:     gpu_util = nvmlDeviceGetUtilizationRates(handle).gpu
   ...:     print('GPU %d Utilization = %d%%' % (i, gpu_util))
   ...:
GPU 0 Utilization = 43%
GPU 1 Utilization = 0%
GPU 2 Utilization = 15%
GPU 3 Utilization = 0%
GPU 4 Utilization = 36%
GPU 5 Utilization = 0%
GPU 6 Utilization = 0%
GPU 7 Utilization = 11%
```

Of course, if there is nothing currently running on any of the GPUs, all devices will show 0% utilization. In this demo, we will use simple python code (like in the above example) to query GPU metrics in real time.  To install [PyNVML](https://github.com/gpuopenanalytics/pynvml) from source:
```
git clone https://github.com/gpuopenanalytics/pynvml.git
cd pynvml
pip install -e .
```

Note that this version of PyNVML is also hosted on [PyPI](https://pypi.org/project/pynvml/) and [Conda Forge](https://anaconda.org/conda-forge/pynvml), so you can alternatively use `pip install pynvml` or `conda install -c conda-forge pynvml` without cloning the repository.  For example, here is a screenshot of the PyPI page for the PyNVML package I am using:

![alt text](https://raw.githubusercontent.com/rjzamora/notebooks/master/pynvml-bokeh-files/pypi-ss.png)


### A PyNVML Bokeh-Server Example

Although it is pretty cool that we can use python to query the current state of our NVIDIA GPUs, what we really want in practice is an intuitive visualization of the most important metrics.  In order for the visualization to *paint* a complete/useful picture of the system, the NVML data will need to be automatically updated in real time. 

The good news is that the `server` module within the [Bokeh](https://bokeh.pydata.org/en/latest/) python library provides the perfect solution for this task!  In fact, the process of building programmatic bokeh servers is already nicely outlined in a [great blog post by Matt Rocklin](http://matthewrocklin.com/blog/work/2017/06/28/simple-bokeh-server) (thanks Matt!). 

For this demo, we will use a fork of the [`jupyterlab-bokeh-server`](https://github.com/ian-r-rose/jupyterlab-bokeh-server) repository, developed by [Ian Rose](https://github.com/ian-r-rose) and [Matt Rocklin](https://github.com/mrocklin).  Within this fork, the `pyunvml` branch is based on the `system-resources` branch of the upstream repository.  The `system-resources` branch is a great reference, because it already includes code for visualizing CPU resource utilization (see the *Code Details* section for further implementation details).

#### Downloading the Bokeh-Server Code

To access the code for NVML-metric visualization, clone the `pynvml` branch of [`rjzamora/jupyterlab-bokeh-server`](https://github.com/rjzamora/jupyterlab-bokeh-server):

```
cd $demo_home
git clone https://github.com/rjzamora/jupyterlab-bokeh-server.git -b pynvml
```

#### Running the PyNVML Bokeh Server

Despite the existence of `jupyterlab` within the name of the repository used for this demo, the server is not yet integrated as a jupyterlab extension.  For now, we need to run the `jupyterlab_bokeh_server/server.py` script directly. For example:

```
python $demo_home/jupyterlab-bokeh-server/jupyterlab_bokeh_server/server.py 5000 > server.out 2>&1 &
```

After the bokeh server is launched, you can navigate to `http://<IP>:5000` in your web browser. If everything is working correctly, you will see the following menu page:

![alt text](https://raw.githubusercontent.com/rjzamora/notebooks/master/pynvml-bokeh-files/bokeh-app-ss.png)

##### GPU-Utilization Bar Plot

If you click on the **GPU-Utilization** link listed in the main menu, you will see a bar-chart of the current GPU compute utilization (y-scale being 1-100%).  For the dask benchmark (discussed below), I saw the following output for a single snapshot (your snapshot might show more or less utilization):

![alt text](https://raw.githubusercontent.com/rjzamora/notebooks/master/pynvml-bokeh-files/gpu-utilization-ss.png)

##### GPU-Resources Stacked Line Plot

If you click on the **GPU-Resources** link listed in the main menu, you will see a figure with four stacked line plots: 

- **GPU Utilization (per Device) [%]**: Plot of the GPU-**compute** utilization for each device. Each GPU is plotted with a different color, and the units are percentage.
- **Memory Utilization (per Device)**: Plot of the GPU-**memory** utilization for each device. Each GPU is plotted with a different color, and the units are  GiB.
- **Total Utilization [%]**: Plot of the **total** GPU **memory** and **compute** utilization. Units are percentage.
- **Total PCI Throughput [MB/s]**: Plot of the **total** PCIe **TX** and **RX** data throughput. Units are MB/s.

For example, when running the dask benchmark (discussed below), I see the following output:

![alt text](https://raw.githubusercontent.com/rjzamora/notebooks/master/pynvml-bokeh-files/gpu-resources-ss.png)


#### Code Details

The pyNVML-specific code needed for this demo can be found in the `jupyterlab_bokeh_server/server.py` and `jupyterlab_bokeh_server/nvml_apps.py` files of my `jupyterlab-bokeh-server` fork. In `server.py`, the only significant change to the upstream repository is the addition of new `gpu`, `gpu_resource_timeline`, and `pci` bokeh applications (which are all defined in `nvml_apps.py`):

```
try:
    import nvml_apps
    routes = {
        "/CPU-Utilization": cpu,
        "/Machine-Resources": resource_timeline,
        "/GPU-Utilization": nvml_apps.gpu,
        "/GPU-Resources": nvml_apps.gpu_resource_timeline,
        "/PCI-Throughput": nvml_apps.pci,
    }
```

In order for the server to constantly refresh the pyNVML data used by the bokeh applications, we use bokeh's `ColumnDataSource` class to define the *source* of data in each of our plots. The `ColumnDataSource` class allows you to pass an `update` function for each type of data, which can be called within a dedicated callback function (`cb`) for each application.  For example, the `gpu` application is defined like this:

```
def gpu(doc):
    fig = figure(title="GPU Usage", sizing_mode="stretch_both", y_range=[0, 100])

    gpu = [ pynvml.nvmlDeviceGetUtilizationRates( gpu_handles[i] ).gpu for i in range(ngpus) ]
    left = list(range(len(gpu)))
    right = [l + 0.8 for l in left]
    source = ColumnDataSource({"left": left, "right": right, "gpu": gpu})
    mapper = LinearColorMapper(palette=all_palettes['RdYlBu'][4], low=0, high=100)

    fig.quad(
        source=source, left="left", right="right", bottom=0, top="gpu", color={"field": "gpu", "transform": mapper}
    )

    doc.title = "GPU Utilization [%]"
    doc.add_root(fig)

    def cb():
        source.data.update({"gpu": [ pynvml.nvmlDeviceGetUtilizationRates( gpu_handles[i] ).gpu for i in range(ngpus) ]})

    doc.add_periodic_callback(cb, 200)
```

Note that the real-time update of PyNVML GPU-utilization data is performed within the `source.data.update()` call.

### Sample GPU Benchmark

If you have followed this notebook so far, there is a decent chance that you saw some pretty boring plots for the GPU activity on your own system (unless you happened to be running a GPU-intensive application at the time). In case you don't have a decent GPU benchmark on hand, I am including a code snippent from the [join-indexed](https://github.com/mrocklin/dask-gpu-benchmarks/blob/master/join-indexed.ipynb) example from the [dask-gpu-benchmarks](https://github.com/mrocklin/dask-gpu-benchmarks) repository:

In [1]:
from dask.distributed import Client, wait
from dask_cuda import LocalCUDACluster
import dask
import cudf

  "diagnostics_port has been deprecated. "


CPU times: user 13.9 s, sys: 1.76 s, total: 15.6 s
Wall time: 2min 14s


In [None]:
cluster = LocalCUDACluster(diagnostics_port=9000)
client = Client(cluster)
client

In [None]:
left = dask.datasets.timeseries(
    '2000', '2001', 
    dtypes={'id': int, 'x': float, 'y': float},
    freq='10ms',
    partition_freq='2d',
)
left.index = left.index.astype(int)
left = left.persist()

right = dask.datasets.timeseries(
    '2000', '2001', 
    dtypes={'z': float},
    freq='100ms',
    partition_freq='5d',
)
right.index = right.index.astype(int)
right = right.persist()

gleft = left.map_partitions(cudf.from_pandas)
gright = right.map_partitions(cudf.from_pandas)
gleft, gright = dask.persist(gleft, gright)  # persist data in device memory

out = gleft.merge(gright, left_index=True, right_index=True, how='inner')  # this is lazy
out = out.persist()
%time _ = wait(out)

Since this example is designed to use every available GPU device, it is a great fit for this demonstration.  If you happen to have other GPU benchmarks that also produce interesting PyNVML visualizations, please do share :)

Thanks for reading!