Skip to content

IcelakeSP

Thomas Gruber edited this page Jun 25, 2021 · 6 revisions

Architecture specific notes for Intel® Icelake SP

Intel introduces with the IcelakeSP architecture a generic lookup and configuration mechanism for the Uncore units called "PMON Discovery mechanism" in the Uncore monitoring reference guide. It is mentioned a few times in the document but not described in detail. The main idea seems to be to provide the configuration of the performance monitoring units and their counters, called PMON blocks, at specific memory addresses to make it machine-readable. Part of the unit-specific configurations is the "Global Status bit". It is required to establish the mapping between units and bit offsets in the global overflow status register. Since the mechanism is not documented and the bit offsets are unknown, LIKWID does not use this global overflow status register but only the unit-local overflow registers. So in order to detect overflows, LIKWID reads each unit-local overflow register for all units that are part of the event set.

The unit to bit offset mapping is fixed for architecture before IcelakeSP and therefore documented in the appropriate Uncore monitoring reference guides. LIKWID uses a single read of the global status register to know which unit overflowed to further read the unit-local overflow status register to identify the overflowed counter. This requires commonly less read operations.

Performance groups

Intel® Icelake SP Performance groups

Events

The input file for the events on Intel® Icelake SP can be found here.

Counters

Core-local counters

Fixed-purpose counters

Since the Core2 microarchitecture, Intel® provides a set of fixed-purpose counters. Each can measure only one specific event.

Counters
Counter name Event name
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
FIXC3 TOPDOWN_SLOTS
Available Options
Option Argument Description Comment
anythread N Set bit 2+(index*4) in config register
kernel N Set bit (index*4) in config register

Performance metric counters

With the Intel® Icelake microarchitecture a new class of core-local counters was introduced, the so-called perf-metrics. The reflect the first level of the Top-down Microarchitecture Analysis tree.

Counters
Counter name Event name
TMA0 RETIRING
TMA1 BAD_SPECULATION
TMA2 FRONTEND_BOUND
TMA3 BACKEND_BOUND

The events return the fraction of slots used by the event.

General-purpose counters

The Intel® IcelakeSP microarchitecture provides 4 general-purpose counters consisting of a config and a counter register.

Counters
Counter name Event name
PMC0 *
PMC1 *
PMC2 *
PMC3 *
PMC4 * (only available without HyperThreading)
PMC5 * (only available without HyperThreading)
PMC6 * (only available without HyperThreading)
PMC7 * (only available without HyperThreading)
Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
kernel N Set bit 17 in config register
anythread N Set bit 21 in config register The anythread option is deprecated! Please check the documentation how to use it on Icelake
threshold 8 bit hex value Set bits 24-31 in config register
invert N Set bit 23 in config register
in_transaction N Set bit 32 in config register Only available if Intel® Transactional Synchronization Extensions are available
in_transaction_aborted N Set bit 33 in config register Only counter PMC2 and only if Intel® Transactional Synchronization Extensions are available

Thermal counter

The Intel® IcelakeSP microarchitecture provides one register for the current core temperature.

Counters
Counter name Event name
TMP0 TEMP_CORE

Core voltage counter

The Intel® IcelakeSP microarchitecture provides one register for the current core voltage.

Counters
Counter name Event name
VTG0 VOLTAGE_CORE

Socket-wide counters

Energy counters

The Intel® IcelakeSP microarchitecture provides measurements of the current energy consumption through the RAPL interface.

Counters
Counter name Event name
PWR0 PWR_PKG_ENERGY
PWR1 PWR_PP0_ENERGY
PWR2 PWR_PP1_ENERGY (*)
PWR3 PWR_DRAM_ENERGY
PWR4 PWR_PLATFORM_ENERGY (+)

(*) Commonly not supported (+) Often returns zeros

Uncore management fixed-purpose counter

The Intel® Icelake X microarchitecture provides measurements of the management box in the uncore. The description from Intel®:
The UBox serves as the system configuration controller for Intel® Xeon® Processor Scalable Memory Family
In this capacity, the UBox acts as the central unit for a variety of functions:

  • The master for reading and writing physically distributed registers across using the Message Channel.
  • The UBox is the intermediary for interrupt traffic, receiving interrupts from the system and dispatching interrupts to the appropriate core.
  • The UBox serves as the system lock master used when quiescing the platform (e.g., Intel® UPI bus lock).

The single fixed-purpose counter counts the clock frequency of the clock source of the uncore. The uncore management performance counters are exposed to the operating system through the MSR interface. The name UBOX originates from the Nehalem EX uncore monitoring.

Counter
Counter name Event name
UBOXFIX UNCORE_CLOCK

Uncore management general-purpose counters

The Intel® Skylake X microarchitecture provides measurements of the management box in the uncore. The description from Intel®:
The UBox serves as the system configuration controller for Intel® Xeon® Processor Scalable Memory Family
In this capacity, the UBox acts as the central unit for a variety of functions:

  • The master for reading and writing physically distributed registers across using the Message Channel.
  • The UBox is the intermediary for interrupt traffic, receiving interrupts from the system and dispatching interrupts to the appropriate core.
  • The UBox serves as the system lock master used when quiescing the platform (e.g., Intel® UPI bus lock).

The uncore management performance counters are exposed to the operating system through the MSR interface. The name UBOX originates from the Nehalem EX uncore monitoring.

Counter
Counter name Event name
UBOX0 *
UBOX1 *
Available Options
Option Argument Operation Comment
edgedetect N Set bit 18 in config register
invert N Set bit 23 in config register
threshold 5 bit hex value Set bits 24-28 in config register

Last Level cache counters

The Intel® Icelake X microarchitecture provides measurements of the LLC coherency engine in the uncore. The description from Intel®:
The LLC coherence engine and Home agent (CHA) merges the caching agent and home agent (HA) responsibilities of the chip into a single block. In its capacity as a caching agent the CHA manages the interface between the core the IIO devices and the last level cache (LLC). In its capacity as a home agent the CHA manages the interface between the LLC and the rest of the UPI coherent fabric as well as the on die memory controller.
The LLC hardware performance counters are exposed to the operating system through the MSR interface. The maximal amount of supported coherency engines for the Intel® Icelake X microarchitecture is 40. It may be possible that your systems does not have all CBOXes, LIKWID will skip the unavailable ones in the setup phase. The name CBOX originates from the Nehalem EX uncore monitoring.

Counters
Counter name Event name
CBOX<0-39>C0 *
CBOX<0-39>C1 *
CBOX<0-39>C2 *
CBOX<0-39>C3 *
Available Options
Option Argument Operation Comment
edgedetect N Set bit 18 in config register
invert N Set bit 23 in config register
threshold 8 bit hex value Set bits 24-31 in config register
match0 28 bit hex value Set bits 32-57 in config register named UmaskExt Only for events LLC_LOOKUP, TOR_INSERTS and TOR_OCCUPANCY events. Check uncore documentation for explanations
state 8 bit hex value Set bits 8-15 in config register, similar to umask Only for event LLC_LOOKUP
LLC F: 0x80,
LLC M: 0x40,
LLC E: 0x20,
LLC S: 0x10,
SF H: 0x08,
SF E: 0x04,
SF S: 0x02,
LLC I: 0x01
Special handling for events

Event LLC_LOOKUP uses the umask field for the state specification. Further filters are in a field called UmaskExt. These filters can be addressed with the MASK0 option, thus LLC_LOOKUP_I:CBOX0C0:MATCH0=0x8 would count cache lines in I (=invalid) state filtered by RFOs (MATCH0=0x8). Most LLC_LOOKUP events use umask/state 0xFF for all states. If you want to use multiple states, use the STATE option like STATE=0xE for all SF states.

Bit for the MATCH0 option for LLC_LOOKUP event:

The events TOR_INSERTS and TOR_OCCUPANCY also use the UmaskExt field but with different width and meaning:

Bit offset Description
0 Data Reads- local or remote. includes prefetches
1 All write transactions to the LLC - including writebacks to LLC and uncacheable write transactions
Does not include evict cleans or invalidates
2 Flush or Invalidates
3 RFOs - local or remote. includes prefetches
4 Code Reads- local or remote. includes prefetches
5 Any local or remote transaction to the LLC. Includes prefetches
6 Any local transaction to LLC, including prefetches from Core
7 Any local prefetch to LLC from an LLC
8 Any local prefetch to LLC from Core
9 Snoop transactions to the LLC from a remote agent
10 Non-snoop transactions to the LLC from a remote agent
11 Transactions to locally homed addresses
12 Transactions to remotely homed addresses

The events WRITE_NO_CREDITS and READ_NO_CREDITS use the UmaskExt as real extension of the default umask field. Each of the bits corresponds to a memory controller (0-13). The first eight are covered by the bits in umask. The other 6 bits are in the UmaskExt field addressable with the MATCH0 option.

The event LLC_VICTIMS uses the MATCH0 option to differentiate between 'local only' and 'remote only' victims. If nothing is set, 'all' are counted. There are only two settings: MATCH0=0x20 for 'local only' and MATCH0=0x80 for 'remote only'.

Power control unit fixed-purpose counters

The Intel® Icelake X microarchitecture provides measurements of the power control unit (PCU) in the uncore. The description from Intel®:
The PCU is the primary Power Controller for the Ice Lake die, responsible for distributing power to core/uncore components and thermal management. It runs in firmware on an internal micro-controller and coordinates the socket’s power states.
Note: Power management is not completely centralized. Many units employ their own power saving features. Events that provide information about those features are captured in the PMON bocks of those units. For example, Intel® UPI Link Power saving states and Memory CKE statistics are captured in the Intel® UPI Perfmon and IMC Perfmon respectively.

The PCU offers four fixed-purpose counters to retrieve the cycles CPU cores stay in state C6, C3, P6 and P3. The uncore management performance counters are exposed to the operating system through the MSR interface. The name WBOX originates from the Nehalem EX uncore monitoring.

Counters
Bit offset Description
0 Just entries that Hit the LLC - Bit offsets 0 is XORed with bit offset 1. No filtering applied if both bits are either 0 or 1
1 Just entries that Missed the LLC - Bit offsets 1 is XORed with bit offset 0. No filtering applied if both bits are either 0 or 1
2 Filter on requests to memory mapped to DDR
3 Filter on requests to memory mapped to PMM
4 Filter on requests to memory mapped to HBM
5 Filter on requests to memory mapped to MMCFG space
6 Filter on requests to memory mapped to MMIO space
7 Match on Remote Node Target - Bit offsets 7 is XORed with bit offset 8. No filtering applied if both bits are either 0 or 1
8 Match on Local Node Target - Bit offsets 8 is XORed with bit offset 7. No filtering applied if both bits are either 0 or 1
9 Filter by Opcodes
10 Filter by PreMorphed Opcodes
11-21 Match on Opcode - 11b IDI Opcode w/top 2b 0x3 - Check IcelakeSP Uncore documentation
22 Just Match on Near Memory Cacheable Accesses - Bit offsets 22 is XORed with bit offset 23. No filtering applied if both bits are either 0 or 1
23 Just Match on Non Near Memory Cacheable Accesses - Bit offsets 23 is XORed with bit offset 22. No filtering applied if both bits are either 0 or 1
24 Match on Non-Coherent Requests
25 Match on ISOC Requests
Counter name Event name
WBOX0FIX CORES_IN_C3
WBOX1FIX CORES_IN_C6
WBOX2FIX CORES_IN_P3
WBOX3FIX CORES_IN_P6

Power control unit general-purpose counters

The Intel® Icelake X microarchitecture provides measurements of the power control unit (PCU) in the uncore. The description from Intel®:
The PCU is the primary Power Controller for the Ice Lake die, responsible for distributing power to core/uncore components and thermal management. It runs in firmware on an internal micro-controller and coordinates the socket’s power states.
Note: Power management is not completely centralized. Many units employ their own power saving features. Events that provide information about those features are captured in the PMON bocks of those units. For example, Intel® UPI Link Power saving states and Memory CKE statistics are captured in the Intel® UPI Perfmon and IMC Perfmon respectively.
The PCU performance counters are exposed to the operating system through the MSR interface. The name WBOX originates from the Nehalem EX uncore monitoring.

Counters
Counter name Event name
WBOX0 *
WBOX1 *
WBOX2 *
WBOX3 *
Available Options
Option Argument Operation Comment
edgedetect N Set bit 18 in config register
invert N Set bit 23 in config register
threshold 5 bit hex value Set bits 24-28 in config register
occ_edgedetect N Set bit 31 in config register
occ_invert N Set bit 30 in config register

Memory controller fixed-purpose counters

The Intel® Icelake X microarchitecture provides measurements of the integrated Memory Controllers (iMC) in the uncore. The description from Intel®:
The Ice Lake integrated Memory Controller provides the interface to DRAM and communicates to the rest of the Uncore through the Mesh2Mem block.
The memory controller also provides a variety of RAS features, such as ECC, memory access retry, memory scrubbing, thermal throttling, mirroring, and rank sparing.
The integrated Memory Controllers performance counters are exposed to the operating system through MMIO interfaces. Each memory controller provides a set of fixed counters.

Counters
Counter name Event name
MBOX<0-7>FIX MBOX_CLOCKTICKS

Memory controller general-purpose counters

The Intel® Icelake X microarchitecture provides measurements of the integrated Memory Controllers (iMC) in the uncore. The description from Intel®:
The Ice Lake integrated Memory Controller provides the interface to DRAM and communicates to the rest of the Uncore through the Mesh2Mem block.
The memory controller also provides a variety of RAS features, such as ECC, memory access retry, memory scrubbing, thermal throttling, mirroring, and rank sparing.

The integrated Memory Controllers performance counters are exposed to the operating system through MMIO interfaces. Icelake supports up to 8 channels of DDR4 with 2 channels per memory controller.

Counters
Counter name Event name
MBOX<0-7>C0 *
MBOX<0-7>C1 *
MBOX<0-7>C2 *
MBOX<0-7>C3 *
Available Options
Option Argument Operation Comment
edgedetect N Set bit 18 in config register
invert N Set bit 23 in config register
threshold 8 bit hex value Set bits 24-31 in config register

Memory controller free-running counters

The Intel® Icelake X microarchitecture provides measurements of the integrated Memory Controllers (iMC) in the uncore. The description from Intel®:
The Ice Lake integrated Memory Controller provides the interface to DRAM and communicates to the rest of the Uncore through the Mesh2Mem block.
The memory controller also provides a variety of RAS features, such as ECC, memory access retry, memory scrubbing, thermal throttling, mirroring, and rank sparing.

Besides the general-purpose counters for each memory channel, Intel Icelake X provides free-running counters per memory controller.

Counters
Counter name Event name
MDEV<0-3>C0 DDR_READ_BYTES
MDEV<0-3>C1 DDR_WRITE_BYTES
MDEV<0-3>C2 PMM_READ_BYTES
MDEV<0-3>C3 PMM_WRITE_BYTES
MDEV<0-3>C4 IMC_DEV_CLOCKTICKS

Mapping between MBOX<0-7>C<0-3> and MDEV<0-3>C<0-4>:

UPI Link Layer counters

Counters
MDEV MBOX
MDEV0 MBOX<0-1>
MDEV1 MBOX<2-3>
MDEV2 MBOX<4-5>
MDEV3 MBOX<6-7>
Counter name Event name
QBOX<0-2>C0 *
QBOX<0-2>C1 *
QBOX<0-2>C2 *
QBOX<0-2>C3 *
Available Options
Option Argument Operation Comment
edgedetect N Set bit 18 in config register
invert N Set bit 23 in config register
threshold 8 bit hex value Set bits 24-31 in config register

M3UPI counters

Counters
Counter name Event name
SBOX<0-2>C0 *
SBOX<0-2>C1 *
SBOX<0-2>C2 *
SBOX<0-2>C3 *
Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
invert N Set bit 23 in config register
threshold 8 bit hex value Set bits 24-31 in config register

IIO general-purpose counters

Counters
Counter name Event name
TCBOX<0-5>C0 *
TCBOX<0-5>C1 *
TCBOX<0-5>C2 *
TCBOX<0-5>C3 *
Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
invert N Set bit 23 in config register
threshold 12 bit hex value Set bits 24-35 in config register
mask0 8 bit hex mask Channel mask filter, sets bits 36-43 in config register Check Intel® Xeon® Processor Scalable Family Uncore Reference Manual for bit fields.

IIO fixed-purpose counters

Counters
Counter name Event name
IBOX<0-5>PORT0 IIO_BANDWIDTH_IN_PORT0
IBOX<0-5>PORT1 IIO_BANDWIDTH_IN_PORT1
IBOX<0-5>PORT2 IIO_BANDWIDTH_IN_PORT2
IBOX<0-5>PORT3 IIO_BANDWIDTH_IN_PORT3
IBOX<0-5>PORT4 IIO_BANDWIDTH_IN_PORT4
IBOX<0-5>PORT5 IIO_BANDWIDTH_IN_PORT5
IBOX<0-5>PORT6 IIO_BANDWIDTH_IN_PORT6
IBOX<0-5>PORT7 IIO_BANDWIDTH_IN_PORT7

IRP general-purpose counters

Counters
Counter name Event name
IBOX<0-5>C0 *
IBOX<0-5>C1 *
Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
invert N Set bit 23 in config register
threshold 8 bit hex value Set bits 24-31 in config register

Mesh-2-Memory general-purpose counters

Counters
Counter name Event name
M2M<0-3>C0 *
M2M<0-3>C1 *
M2M<0-3>C2 *
M2M<0-3>C3 *
Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
invert N Set bit 23 in config register
threshold 5 bit hex value Set bits 24-28 in config register

PCIe general-purpose counters

Counters
Counter name Event name
PBOX<0-5>C0 *
PBOX<0-5>C1 *
PBOX<0-5>C2 *
PBOX<0-5>C3 *
Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
invert N Set bit 23 in config register
threshold 5 bit hex value Set bits 24-28 in config register
Clone this wiki locally