# Opcode-Instanced Metrics

In this notebook you will learn how to:

* Find opcode-instanced metrics in an NVIDIA Nsight Compute report
* Traverse the individual instances of such a metric along with their SASS opcode

## Setup

First, import NVIDIA Nsight Compute's Python Report Interface (PRI) as `ncu_report`
and load an `ncu-rep` report file with `load_report`:

In [None]:
import ncu_report

report_file_path = "../sample_reports/mergeSort.ncu-rep"
report = ncu_report.load_report(report_file_path)

For later use, unpack the profiling results of the first kernel and create a list of all metrics it contains:

In [None]:
kernel = report[0][0]
metrics = (kernel[name] for name in kernel)

## Identifying Opcode-Instanced Metrics

Opcode-instanced metrics are metrics that contain multiple values (so-called _instances_),
each of which is associated with a SASS operation code, or _opcode_.
They may, for example, help you understand the fraction of a particular type of SASS
instruction relative to the whole instruction mixture executed in your code.

They always carry the substring "opcode" in their metric name and are thus easy to find:

In [None]:
for metric in metrics:
    if "opcode" in str(metric):
        print(
            f"{metric} contains instance values for "
            f"{metric.num_instances()} opcodes"
        )

You can also find a list of all opcode-instanced metrics together with a description in the
[Metric Reference](https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#metrics-reference).

## Traversing Opcode-Instanced Metrics along with their Opcodes

In order to traverse opcode-instanced metrics you simple need to call the member function
`value(index)` of an `IMetric` object for each of its instances, where `index` runs from `0`
to `num_instances()-1`.

To obtain the corresponding opcode of each instances, you first need to call `correlation_ids()`
for the opcode-instanced metric. This will create a new `IMetric` object which has
instance values of type `str`, each representing the opcode of the original metrics instance
with the same index.

You can use the opcodes as keys in a `dict` with the instances of the original metric
as values. Here's a helper function to construct such a `dict`:

In [None]:
def values_per_opcode(metric: ncu_report.IMetric) -> dict[str, int]:
    values = dict()
    opcodes = metric.correlation_ids()
    num_values = metric.num_instances()

    for index in range(num_values):
        opcode = opcodes.value(index)
        values[opcode] = metric.value(index)

    return values

Now, you can look at all SASS instructions executed at thread-level within the `kernel`:

In [None]:
metric_name = "sass__thread_inst_executed_true_per_opcode_with_modifier_all"
metric = kernel[metric_name]
all_instructions = metric.value()

Using `values_per_opcode(metric)` it is now very simple to traverse all instance values
along with their respective opcodes. You can also calculate the percentage of each
instruction type with respect to all instructions:

In [None]:
print(
    f"All instructions executed at thread-level: {all_instructions:,.0f}, "
    f"number of opcodes: {metric.num_instances()}"
)

for opcode, value in values_per_opcode(metric).items():
    fraction = value / all_instructions * 100
    print(f"   \033[1m{opcode}:\033[0m {value:,} ({fraction:.3f}%)")

Analyzing instruction compositions like this and comparing them against your expectations might help you discover performance
bugs in your kernels.

As a next step, you can have a look into `Source_correlated_metrics.ipynb` to learn how to obtain the disassembled SASS
instructions (which include the opcodes) or learn about the
[`IAction.sass_by_pc` API](https://docs.nvidia.com/nsight-compute/2022.3/NvRulesAPI/annotated.html#classNV_1_1Rules_1_1IAction_1a1fc608333aefe67f0559ab03094acb4).