Skip to content
Thomas Roehl edited this page Nov 7, 2023 · 1 revision

Architecture specific notes for AMD K19 (Zen4)

Official documentation

Performance groups

AMD Zen4 Performance groups

Events

The input file for the events on AMD Zen4 can be found here.

Counters

Core-local counters

Fixed-purpose counters

The AMD® Zen4 microarchitecture provides three fixed-purpose counters for retired instructions, actual CPU core clock (MPerf: This register increments in proportion to the actual number of core clocks cycles while the core is in C0) and maximum CPU core clock (APerf: Incremented by hardware at the P0 frequency while the core is in C0).

Counters
Counter name Event name
FIXC0 INST_RETIRED_ANY (removed due to bad counts)
FIXC1 ACTUAL_CPU_CLOCK or APERF
FIXC2 MAX_CPU_CLOCK or MPERF

General-purpose counters

The AMD® Zen4 microarchitecture provides 6 general-purpose counters consisting of a config and a counter register.

Counters
Counter name Event name
PMC0 *
PMC1 *
PMC2 *
PMC3 *
PMC4 *
PMC5 *
Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
kernel N Set bit 17 in config register
threshold 8 bit hex value Set bits 24-31 in config register The value for threshold can range between 0x0 and 0x7F
invert N Set bit 23 in config register
Special handling for events

If you want to measure an event that can potentially increment more than 15 in one cycle, use the MERGE event in the next odd counter like RETIRED_SSE_AVX_FLOPS_ALL:PMC0,MERGE:PMC1.

Cache-wide counters

L3 general-purpose counters

The AMD® Zen4 microarchitecture provides 6 general-purpose counters for measuring L3 cache events. They consist of a config and a counter register. The counters are related to a shared L3 cache, hence you get only one value per L3 cache.

Counters
Counter name Event name
CPMC0 *
CPMC1 *
CPMC2 *
CPMC3 *
CPMC4 *
CPMC5 *
Available Options
Option Argument Operation Comment
tid 8 bit hex value Set bits 56 to 63 in config register Selects whether the accesses of an attached thread should be counted. Default all threads: 0x3
cid 3 bit hex value Set bits 42 to 45 in config register Selects which core should be counted. If not specified, the all-cores flag (bit 47) is set
slice 4 bit hex value Set bits 48 to 51 in config register Selects which L3 slice should be counted. If not specified, the all-slices flag (bit 46) is set
Option Argument Description Comment
tid 8 bit hex value Set bits 56 to 63 in config register Selects whether the accesses of an attached thread should be counted. Default all threads: 0x3
cid 3 bit hex value Set bits 42 to 45 in config register Selects which core should be counted. If not specified, the special all-cores flag (bit 47) is set
slice 4 bit hex value Set bits 48 to 51 in config register Selects which L3 slice should be counted. If not specified, the special all-slices flag (bit 46) is set

Socket-wide counters

Energy counters

The AMD® Zen4 microarchitecture provides 2 energy counters (RAPL) for CPU core and L3 slice energy. Keep in mind, that the CPU core counter returns one value per CPU core, the L3 slice counter once per L3 slice (aka CCD or similar).

Counters
Counter name Event name
PWR0 RAPL_CORE_ENERGY
PWR1 RAPL_DRAM_ENERGY

Data fabric counters

The AMD® Zen4 microarchitecture provides 4 data fabric counters (DF) once per socket.

Counters
Counter name Event name
DFC0 *
DFC1 *
DFC2 *
DFC3 *

UMC counters

There are 64 config/counter pairs documented publically with the name Core::X86::Msr::UMC_PerfMonCtl and Core::X86::Msr::UMC_PerfMonCntr but there are not events documented, so the support is currently dropped. If you require it and have some publically documented events, open an issue.

Clone this wiki locally