Skip to content

Commit

Permalink
perf mem/c2c: Add load store event mappings for AMD
Browse files Browse the repository at this point in the history
The 'perf mem' and 'perf c2c' tools are wrappers around 'perf record'
with mem load/ store events. IBS tagged load/store sample provides most
of the information needed for these tools. Wire in the "ibs_op//" event
as mem-ldst event for AMD.

There are some limitations though: Only load/store micro-ops provide
mem/c2c information. Whereas, IBS does not have a way to choose a
particular type of micro-op to tag. This results in many non-LS
micro-ops being tagged which appear as N/A in the perf report. IBS,
being an uncore pmu from kernel point of view[1], does not support per
process monitoring. Thus, perf mem/c2c on AMD are currently supported in
per-cpu mode only.

Example:

  $ sudo perf mem record -- -c 10000
  ^C[ perf record: Woken up 227 times to write data ]
  [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]

  $ sudo perf mem report -F mem,sample,snoop
  Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
  Memory access                  Samples  Snoop
  N/A                             700620  N/A
  L1 hit                          126675  N/A
  L2 hit                             424  N/A
  L3 hit                             664  HitM
  L3 hit                              10  N/A
  Local RAM hit                        2  N/A
  Remote RAM (1 hop) hit            8558  N/A
  Remote Cache (1 hop) hit             3  N/A
  Remote Cache (1 hop) hit             2  HitM
  Remote Cache (2 hops) hit           10  HitM
  Remote Cache (2 hops) hit            6  N/A
  Uncached hit                         4  N/A
  $

[1]: https://lore.kernel.org/lkml/20220829113347.295-1-ravi.bangoria@amd.com

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20221006153946.7816-6-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
  • Loading branch information
Ravi Bangoria authored and acmel committed Oct 6, 2022
1 parent 4173cc0 commit f7b58cb
Show file tree
Hide file tree
Showing 3 changed files with 41 additions and 7 deletions.
14 changes: 10 additions & 4 deletions tools/perf/Documentation/perf-c2c.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,10 @@ C2C stands for Cache To Cache.
The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
you to track down the cacheline contentions.

On x86, the tool is based on load latency and precise store facility events
On Intel, the tool is based on load latency and precise store facility events
provided by Intel CPUs. On PowerPC, the tool uses random instruction sampling
with thresholding feature.
with thresholding feature. On AMD, the tool uses IBS op pmu (due to hardware
limitations, perf c2c is not supported on Zen3 cpus).

These events provide:
- memory address of the access
Expand Down Expand Up @@ -49,7 +50,8 @@ RECORD OPTIONS

-l::
--ldlat::
Configure mem-loads latency. (x86 only)
Configure mem-loads latency. Supported on Intel and Arm64 processors
only. Ignored on other archs.

-k::
--all-kernel::
Expand Down Expand Up @@ -135,11 +137,15 @@ Following perf record options are configured by default:
-W,-d,--phys-data,--sample-cpu

Unless specified otherwise with '-e' option, following events are monitored by
default on x86:
default on Intel:

cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P

following on AMD:

ibs_op//

and following on PowerPC:

cpu/mem-loads/
Expand Down
3 changes: 2 additions & 1 deletion tools/perf/Documentation/perf-mem.txt
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,8 @@ RECORD OPTIONS
Be more verbose (show counter open errors, etc)

--ldlat <n>::
Specify desired latency for loads event. (x86 only)
Specify desired latency for loads event. Supported on Intel and Arm64
processors only. Ignored on other archs.

In addition, for report all perf report options are valid, and for record
all perf record options.
Expand Down
31 changes: 29 additions & 2 deletions tools/perf/arch/x86/util/mem-events.c
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
// SPDX-License-Identifier: GPL-2.0
#include "util/pmu.h"
#include "util/env.h"
#include "map_symbol.h"
#include "mem-events.h"
#include "linux/string.h"

static char mem_loads_name[100];
static bool mem_loads_name__init;
Expand All @@ -12,18 +14,43 @@ static char mem_stores_name[100];

#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }

static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
static struct perf_mem_event perf_mem_events_intel[PERF_MEM_EVENTS__MAX] = {
E("ldlat-loads", "%s/mem-loads,ldlat=%u/P", "%s/events/mem-loads"),
E("ldlat-stores", "%s/mem-stores/P", "%s/events/mem-stores"),
E(NULL, NULL, NULL),
};

static struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
E(NULL, NULL, NULL),
E(NULL, NULL, NULL),
E("mem-ldst", "ibs_op//", "ibs_op"),
};

static int perf_mem_is_amd_cpu(void)
{
struct perf_env env = { .total_mem = 0, };

perf_env__cpuid(&env);
if (env.cpuid && strstarts(env.cpuid, "AuthenticAMD"))
return 1;
return -1;
}

struct perf_mem_event *perf_mem_events__ptr(int i)
{
/* 0: Uninitialized, 1: Yes, -1: No */
static int is_amd;

if (i >= PERF_MEM_EVENTS__MAX)
return NULL;

return &perf_mem_events[i];
if (!is_amd)
is_amd = perf_mem_is_amd_cpu();

if (is_amd == 1)
return &perf_mem_events_amd[i];

return &perf_mem_events_intel[i];
}

bool is_mem_loads_aux_event(struct evsel *leader)
Expand Down

0 comments on commit f7b58cb

Please sign in to comment.