# Source-Correlated Metrics

In this notebook you will learn how to:

* Find and traverse source-correlated metrics in an NVIDIA Nsight Compute report
* Associate individual instances of source-correlated metrics with their corresponding SASS/PTX instructions
* Associate these SASS/PTX instructions with the original CUDA-C/C++ code

## Setup

First, import NVIDIA Nsight Compute's Python Report Interface (PRI) as `ncu_report`
and load an `ncu-rep` report file with `load_report`:

In [None]:
import ncu_report

report_file_path = "../sample_reports/mergeSort.ncu-rep"
report = ncu_report.load_report(report_file_path)

For later use, unpack the profiling results of the first kernel and create a list of all metrics it contains:

In [None]:
kernel = report[0][0]
metrics = (kernel[name] for name in kernel)

## Identifying Source-Correlated Metrics

Source-correlated metrics are PC sampling metrics that can be associated with precise locations in the binary.
Inspecting these metrics for conspicuous values might give you a hint where performance optimization efforts should
be focused.

Which source-correlated metrics will be available within your report will depend on the _Metric Set_ or _Metrics Sections_
chosen when collecting the profiling data. To find all source-correlated metrics in a given kernel you can call
 `has_correlation_ids()` for every `IMetric` object the `IAction` contains.
This will evaluate to `True` whenever the metric is correlated with at least one location in the binary.

In [None]:
for metric in metrics:
    if metric.has_correlation_ids():
        print(f"{metric} is source-correlated in {metric.num_instances()} places")

## Traversing Source-Correlated Metrics along SASS/PTX instructions

Having found a relevant source-correlated metric you can use `IMetric.correlation_ids()` to
find all locations with correlation information. In this case, correlation IDs represent `addresses` in the binary.
You will be using them shortly to find the SASS/PTX instructions associated with the source-correlated metric.

Note that `correlation_ids()` returns a new `IMetric` object which contains the `addresses` as _instance values_.
The number of correlations of the source-correlated metric equals the number of instances in that newly created
`IMetric` object `addresses`. You can query this number with `num_instances()`.

As an example, you can look at uncoalesced accesses to shared memory which can be understood via the excessive L1 wavefronts:

In [None]:
metric_name = "derived__memory_l1_wavefronts_shared_excessive"
metric = kernel[metric_name]
addresses = metric.correlation_ids()
num_correlations = addresses.num_instances()

Now, you can look at the _instance values_ of the source-correlated metric, that is, the individual values for each
correlation location. You can do this by using `metric.value(index)` for each instance. Likewise, you can now obtain
the address for each location using `addresses.value(index)`.

With the help of these addresses you can look up the SASS/PTX instructions associate with each correlation location
using `IAction.sass_by_pc(address)` and `IAction.ptx_by_pc(address)`, respectively:

In [None]:
print(f"All correlations for {metric_name}; total value = {metric.value():,}")

for index in range(num_correlations):
    instance_value = metric.value(index)
    address = addresses.value(index)
    print(
        f"[@{address}]  "
        f"instance value: {instance_value:<11,}"
        f"SASS code: {kernel.sass_by_pc(address).strip():<30} "
        f"PTX code: {kernel.ptx_by_pc(address)} "
    )

Looking at every correlation location like this can reveal where performance improvement potential (such as reducing
uncoalesced shared memory accesses in this example) might exist and where not.

## Associate SASS/PTX instructions with high-level source code

Now that you know the instance values of your source-correlated metric, as well as their associated SASS instructions,
you might want to find their respective locations in the source code, too.

In order to do that, you can once again use the `addresses`. First, however, you need to import the contents of the
relevant source files. Note that this will only succeed if the profiled application was built with the
`--generate-line-info` (`-lineinfo`) `nvcc` compiler flag.

Source file contents can be obtain with `IAction.source_files()`. You can additionally convert the output to a `dict` to
get a mapping from source file paths to source file contents:

In [None]:
sources = dict(kernel.source_files())

In case the report does not contain the source file contents (but the lineinfo and source files names),
you could try to read the files from the local file system.

Next, you can use each `address` to obtain an `ISourceInfo` object using `IAction.source_info(address)`.
`ISourceInfo` objects have two member functions: `file_name()` and `line()`. These can be used to
get the path to the relevant source file and the line number of the correlation location, respectively.

With this, you can build up a `dict` that maps from `file_path` to a list of `CorrelationInfo`s.
You can use the latter to store the `value`, `line` and `address` for each instance of the source-correlated
metric.

In [None]:
from collections import defaultdict, namedtuple

CorrelationInfo = namedtuple('CorrelationInfo', ['value', 'line', 'address'])
high_level_correlations = defaultdict(list)

for index in range(num_correlations):
    value = metric.value(index)
    address = addresses.value(index)

    source_info = kernel.source_info(address)
    file_path = source_info.file_name()
    line = source_info.line()

    high_level_correlations[file_path].append(CorrelationInfo(value, line, address))

To extract a single `line` from a string representing the contents of a file, additional functionality is needed.
For illustrational purposes, you may want to use a simple implementation:

In [None]:
def get_line_from_file(line: int, file_content: str) -> str:
    return file_content.splitlines()[line-1]  # line numbering uses 1-based indexing

Now you can look at the extracted data! Since you might care about places for potential improvement only, you can ignore zero instance values:

In [None]:
from pathlib import Path

for path in high_level_correlations:
    print(f"{Path(path).name}:")

    for info in high_level_correlations[path]:
        if info.value > 0:
            print(
                f"  [line {info.line:<3}]  "
                f"value = {info.value:<10,}  "
                f"CUDA-C: {get_line_from_file(info.line, sources[path]).strip():<48}"
                f"SASS: {kernel.sass_by_pc(info.address).strip()}"
            )