# NVTX Support

In this notebook you will learn how to:

* Filter kernels based on NVTX ranges
* Extract the NVTX push/pop stack for a given kernel
* Extract NVTX event attributes for a given kernel

## Setup

First, import NVIDIA Nsight Compute's Python Report Interface (PRI) as `ncu_report`
and load an `ncu-rep` report file with `load_report`.

In [None]:
import ncu_report

report_file_path = "../sample_reports/manual_nvtx.ncu-rep"
report = ncu_report.load_report(report_file_path)

Next, extract all `kernels` using the subscript operator to get the first `IRange` object of the `report` object:

In [None]:
kernels = report[0]

## Kernel Filtering based on NVTX Ranges

The NVIDIA Tools Extension (NVTX) is an application programming interface that enables users to annotate
applications with events and resource names in order to customize their profiling experience
when using NVIDIA's developer tools. A full reference of all features of NVTX  may be found in the
[CUDA Toolkit Documentation](https://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvtx).

This notebook and the accompanying report are based on the introductory example from
[this Technical Blog post](https://developer.nvidia.com/blog/cuda-pro-tip-generate-custom-application-profile-timelines-nvtx/), which uses Push/Pop Ranges.

As a first example, let's look at how you can use the PRI to filter kernel results based on their NVTX ranges.
You can do that by calling `IRange.actions_by_nvtx(includes, excludes)` on `kernels` where `includes`
and `excludes` are lists of strings that specify which NVTX Ranges to include and exclude, respectively.

Including all kernels inside the `"run_test"` range, but excluding all kernel in the `"check_results"` range
would look like this:

In [None]:
includes = ["run_test/"]
excludes = ["check_results/"]
filtered_kernels = [
    kernels[index] for index in kernels.actions_by_nvtx(includes, excludes)
]

for kernel in filtered_kernels:
    print(kernel)

Note that `actions_by_nvtx` returns a `tuple` of `int`s, each of which specifies the `index` of a valid
`IAction` object in `kernels`.
You can query these objects by using the subscript operator on the `IRange` object, i.e. `kernels[index]`.

Since you are dealing with Push/Pop Ranges here, you need to use `/` as delimiter between range names.
The filtering scheme follows the same rules as when using `--nvtx-include`/`--nvtx-exclude` with `ncu` on
the command-line. You can find all specifications in the
[online documentation](https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#nvtx-filtering).

## Extracting NVTX Call Stack

Having filtered the profiled kernels based on their NVTX ranges, you might want to look at each kernel's
NVTX information individually.
As a first step, you can look at the NVTX call stack. This is particularly useful when exploring the kernel's
execution hierarchy when dealing with an unknown code base.

In order to extract NVTX information from an `IAction` object, you will first have to extract an `INvtxState` object from
it using `IAction.nvtx_state()`. The latter has a member function `domains()` which returns a `tuple` of all valid
NVTX Domains for the given `kernel`:

In [None]:
for kernel in filtered_kernels:
    nvtx_state = kernel.nvtx_state()
    print(f"{kernel}: {nvtx_state.domains()}")

Since the example application only uses the _default domain_ the only valid domain ID for `filtered_kernels` is `0`.
With this information you can now extract an `INvtxDomainInfo` object using `INvtxState[0]`.
Unpacking the `domain_info` for the `daxpy_kernel` would look like this:

In [None]:
domain_info = filtered_kernels[1].nvtx_state()[0]

You can now use `name()`, `push_pop_ranges()` and `start_end_ranges()` to query the NVTX Domain name,
the Push/Pop Range stack, as well as the Start/End Ranges the `daxpy_kernel` belongs to, respectively.

In [None]:
print(
    f"Domain name: {domain_info},\n"
    f"Push/Pop Ranges: {domain_info.push_pop_ranges()},\n"
    f"Start/Stop Ranges: {domain_info.start_end_ranges()}"
)

You can see here that the application indeed uses the _default domain_ for the `daxpy_kernel`
and that no Start/Stop Ranges where annotated.

## Extracting NVTX Event Attributes

Apart from a `tuple` of all NVTX Ranges, you can also extract
the NVTX Event Attributes of a specific NVTX Range from `domain_info`. This can be done using `push_pop_range(index)`
or `start_stop_range(index)` for Push/Pop or Start/Stop Ranges, respectively.

Querying the `daxpy` Range for the `daxpy_kernel`, which has index `1` in the `push_pop_ranges()` above,
would look like this:

In [None]:
nvtx_range = domain_info.push_pop_range(1)

The return value `nvtx_range` is of type `INvtxRange` and has a number of member functions to retrieve NVTX Event Attributes,
including `message()`, `category()`, `color()` and functions for the Payload Attribute:

In [None]:
print(
    f"Range: {nvtx_range.name()}\n"
    f"Has NVTX Event Attributes? {'Yes' if nvtx_range.has_attributes() else 'No'}\n"
    f"Category: {nvtx_range.category()}\n"
    f"Color: {hex(nvtx_range.color())}\n"
    f"Payload Type: {nvtx_range.payload_type()}\n"
    f"Message: {nvtx_range.message()}"
)

You can learn more about NVTX Event Attributes in the
[CUDA Toolkit Documentation](https://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvtx-event-attribute-struct).
The full API of the `INvtxRange` class is documented in Nsight Compute's
[online documentation](https://docs.nvidia.com/nsight-compute/NvRulesAPI/annotated.html#classNV_1_1Rules_1_1INvtxRange).