Skip to content

Conversation

@Avenger-285714
Copy link
Member

upstream commits:
GNR:
6d64273,perf/x86/intel/uncore: Support more units on Granite Rapids,2025-01-10 18:16:50,Kan Liang kan.liang@linux.intel.com,v6.14-rc1
3f710be,perf/x86/intel/uncore: Clean up func_id,2025-01-10 18:16:50,Kan Liang kan.liang@linux.intel.com,v6.14-rc1

ClearWater Forest:
CWF events:
e415c14,perf vendor events: Add Clearwaterforest events,2025-02-12 19:54:38,Ian Rogers irogers@google.com,v6.15-rc1,v6.15-rc1

CWF uncore:
fca24bf,perf/x86/intel/uncore: Support customized MMIO map size,2025-07-09 13:40:19,Kan Liang kan.liang@linux.intel.com,v6.17-rc1,v6.17-rc1
cf002da,perf/x86/intel/uncore: Support MSR portal for discovery tables,2025-07-09 13:40:19,Kan Liang kan.liang@linux.intel.com,v6.17-rc1,v6.17-rc1
b6ccddd,perf/x86/intel/uncore: Add Clearwater Forest support,2024-12-17 17:47:23,Kan Liang kan.liang@linux.intel.com,v6.13-rc5,v6.13-rc5
9828a1c,perf/x86/intel/uncore: Switch to new Intel CPU model defines,2024-04-29 10:30:39,Tony Luck tony.luck@intel.com,v6.10-rc1

CWF core:
3e830f6,perf/x86: Optimize the is_x86_event,2025-04-25 14:55:22,Kan Liang kan.liang@linux.intel.com,v6.16-rc1,v6.16-rc1
efd4485,perf/x86/intel: Check the X86 leader for ACR group,2025-04-25 14:55:22,Kan Liang kan.liang@linux.intel.com,v6.16-rc1,v6.16-rc1
e9988ad,perf/x86/intel: Check the X86 leader for pebs_counter_event_group,2025-04-25 14:55:19,Kan Liang kan.liang@linux.intel.com,v6.15-rc5,v6.15-rc5
75aea4b,perf/x86/intel: Only check the group flag for X86 leader,2025-04-25 14:55:19,Kan Liang kan.liang@linux.intel.com,v6.15-rc5,v6.15-rc5
25c623f,perf/x86/intel: Parse CPUID archPerfmonExt leaves for non-hybrid CPUs,2025-04-17 14:21:24,Dapeng Mi dapeng1.mi@linux.intel.com,v6.16-rc1,v6.16-rc1
48d66c8,perf/x86/intel: Add PMU support for Clearwater Forest,2025-04-17 14:21:23,Dapeng Mi dapeng1.mi@linux.intel.com,v6.16-rc1,v6.16-rc1
a5f5e12,perf/x86/intel: Don't clear perf metrics overflow bit unconditionally,2025-04-17 14:19:07,Dapeng Mi dapeng1.mi@linux.intel.com,v6.15-rc3,v6.15-rc3
ec980e4,perf/x86/intel: Support auto counter reload,2025-04-08 20:55:49,Kan Liang kan.liang@linux.intel.com,v6.16-rc1,v6.16-rc1
1856c6c,perf/x86/intel: Add CPUID enumeration for the auto counter reload,2025-04-08 20:55:49,Kan Liang kan.liang@linux.intel.com,v6.16-rc1,v6.16-rc1
c9449c8,perf: Extend the bit width of the arch-specific flag,2025-04-08 20:55:49,Kan Liang kan.liang@linux.intel.com,v6.16-rc1,v6.16-rc1
0a65579,perf/x86/intel: Track the num of events needs late setup,2025-04-08 20:55:48,Kan Liang kan.liang@linux.intel.com,v6.16-rc1,v6.16-rc1
4dfe323,perf/x86: Add dynamic constraint,2025-04-08 20:55:48,Kan Liang kan.liang@linux.intel.com,v6.16-rc1,v6.16-rc1
47a973f,perf/x86/intel: Fix ARCH_PERFMON_NUM_COUNTER_LEAF,2025-02-08 15:47:25,Kan Liang kan.liang@linux.intel.com,v6.14-rc3
e02e9b0,perf/x86/intel: Support PEBS counters snapshotting,2025-02-05 10:29:45,Kan Liang kan.liang@linux.intel.com,v6.15-rc1,v6.15-rc1
0e45818,perf/x86/intel: Support RDPMC metrics clear mode,2024-12-20 15:31:22,Kan Liang kan.liang@linux.intel.com,v6.14-rc1,v6.14-rc1
b8c3a25,perf/x86/intel/ds: Add PEBS format 6,2024-12-17 17:47:23,Kan Liang kan.liang@linux.intel.com,v6.13-rc5,v6.13-rc5
ae55e30,perf/x86/intel/ds: Simplify the PEBS records processing for adaptive PEBS,2024-12-02 12:01:34,Kan Liang kan.liang@linux.intel.com,v6.14-rc1,v6.14-rc1
3c00ed3,perf/x86/intel/ds: Factor out functions for PEBS records processing,2024-12-02 12:01:34,Kan Liang kan.liang@linux.intel.com,v6.14-rc1,v6.14-rc1
7087bfb,perf/x86/intel/ds: Clarify adaptive PEBS processing,2024-12-02 12:01:34,Kan Liang kan.liang@linux.intel.com,v6.14-rc1,v6.14-rc1
149fd47,perf/x86/intel: Support Perfmon MSRs aliasing,2024-07-04 16:00:40,Kan Liang kan.liang@linux.intel.com,v6.11-rc1,v6.11-rc1
dce0c74,perf/x86/intel: Support PERFEVTSEL extension,2024-07-04 16:00:40,Kan Liang kan.liang@linux.intel.com,v6.11-rc1,v6.11-rc1
e8fb5d6,perf/x86: Add config_mask to represent EVENTSEL bitmask,2024-07-04 16:00:39,Kan Liang kan.liang@linux.intel.com,v6.11-rc1,v6.11-rc1
608f697,perf/x86/intel: Support new data source for Lunar Lake,2024-07-04 16:00:38,Kan Liang kan.liang@linux.intel.com,v6.11-rc1,v6.11-rc1
0902624,perf/x86/intel: Rename model-specific pebs_latency_data functions,2024-07-04 16:00:38,Kan Liang kan.liang@linux.intel.com,v6.11-rc1,v6.11-rc1
a932aa0,perf/x86: Add Lunar Lake and Arrow Lake support,2024-07-04 16:00:37,Kan Liang kan.liang@linux.intel.com,v6.11-rc1,v6.11-rc1
722e42e,perf/x86: Support counter mask,2024-07-04 16:00:36,Kan Liang kan.liang@linux.intel.com,v6.11-rc1,v6.11-rc1
a23eb2f,perf/x86/intel: Support the PEBS event mask,2024-07-04 16:00:36,Kan Liang kan.liang@linux.intel.com,v6.11-rc1,v6.11-rc1
d142df1,perf/x86/intel: Switch to new Intel CPU model defines,2024-05-28 10:59:02,Tony Luck tony.luck@intel.com,v6.11-rc1

TEST RESULT - all pass

uncore test
ls /sys/devices/*
/sys/devices/uncore_b2cmi_0:
/sys/devices/uncore_b2cmi_1:
/sys/devices/uncore_b2cmi_10:
/sys/devices/uncore_b2cmi_11:
/sys/devices/uncore_b2cmi_2:
/sys/devices/uncore_b2cmi_3:
/sys/devices/uncore_b2cmi_4:
/sys/devices/uncore_b2cmi_5:
/sys/devices/uncore_b2cmi_6:
/sys/devices/uncore_b2cmi_7:
/sys/devices/uncore_b2cmi_8:
/sys/devices/uncore_b2cmi_9:
/sys/devices/uncore_b2cxl_0:
/sys/devices/uncore_b2cxl_1:
/sys/devices/uncore_b2cxl_10:
/sys/devices/uncore_b2cxl_11:
/sys/devices/uncore_b2cxl_12:
/sys/devices/uncore_b2cxl_13:
/sys/devices/uncore_b2cxl_14:
/sys/devices/uncore_b2cxl_15:
/sys/devices/uncore_b2cxl_2:
/sys/devices/uncore_b2cxl_3:
/sys/devices/uncore_b2cxl_4:
/sys/devices/uncore_b2cxl_5:
/sys/devices/uncore_b2cxl_6:
/sys/devices/uncore_b2cxl_7:
/sys/devices/uncore_b2cxl_8:
/sys/devices/uncore_b2cxl_9:
/sys/devices/uncore_b2hot_0:
/sys/devices/uncore_b2hot_1:
/sys/devices/uncore_b2hot_10:
/sys/devices/uncore_b2hot_11:
/sys/devices/uncore_b2hot_12:
/sys/devices/uncore_b2hot_13:
/sys/devices/uncore_b2hot_14:
/sys/devices/uncore_b2hot_15:
/sys/devices/uncore_b2hot_16:
/sys/devices/uncore_b2hot_17:
/sys/devices/uncore_b2hot_18:
/sys/devices/uncore_b2hot_19:
/sys/devices/uncore_b2hot_2:
/sys/devices/uncore_b2hot_3:
/sys/devices/uncore_b2hot_4:
/sys/devices/uncore_b2hot_5:
/sys/devices/uncore_b2hot_6:
/sys/devices/uncore_b2hot_7:
/sys/devices/uncore_b2hot_8:
/sys/devices/uncore_b2hot_9:
/sys/devices/uncore_b2upi_0:
/sys/devices/uncore_b2upi_1:
/sys/devices/uncore_b2upi_2:
/sys/devices/uncore_b2upi_3:
/sys/devices/uncore_b2upi_4:
/sys/devices/uncore_b2upi_5:
/sys/devices/uncore_cha_0:
/sys/devices/uncore_cha_1:
/sys/devices/uncore_cha_10:
/sys/devices/uncore_cha_11:
/sys/devices/uncore_cha_12:
/sys/devices/uncore_cha_13:
/sys/devices/uncore_cha_14:
/sys/devices/uncore_cha_15:
/sys/devices/uncore_cha_16:
/sys/devices/uncore_cha_17:
/sys/devices/uncore_cha_18:
/sys/devices/uncore_cha_19:
/sys/devices/uncore_cha_2:
/sys/devices/uncore_cha_20:
/sys/devices/uncore_cha_21:
/sys/devices/uncore_cha_22:
/sys/devices/uncore_cha_23:
/sys/devices/uncore_cha_24:
/sys/devices/uncore_cha_25:
/sys/devices/uncore_cha_26:
/sys/devices/uncore_cha_27:
/sys/devices/uncore_cha_28:
/sys/devices/uncore_cha_29:
/sys/devices/uncore_cha_3:
/sys/devices/uncore_cha_30:
/sys/devices/uncore_cha_31:
/sys/devices/uncore_cha_32:
/sys/devices/uncore_cha_33:
/sys/devices/uncore_cha_34:
/sys/devices/uncore_cha_35:
/sys/devices/uncore_cha_36:
/sys/devices/uncore_cha_37:
/sys/devices/uncore_cha_38:
/sys/devices/uncore_cha_39:
/sys/devices/uncore_cha_4:
/sys/devices/uncore_cha_40:
/sys/devices/uncore_cha_41:
/sys/devices/uncore_cha_42:
/sys/devices/uncore_cha_43:
/sys/devices/uncore_cha_44:
/sys/devices/uncore_cha_45:
/sys/devices/uncore_cha_46:
/sys/devices/uncore_cha_47:
/sys/devices/uncore_cha_48:
/sys/devices/uncore_cha_49:
/sys/devices/uncore_cha_5:
/sys/devices/uncore_cha_50:
/sys/devices/uncore_cha_51:
/sys/devices/uncore_cha_52:
/sys/devices/uncore_cha_53:
/sys/devices/uncore_cha_54:
/sys/devices/uncore_cha_55:
/sys/devices/uncore_cha_56:
/sys/devices/uncore_cha_57:
/sys/devices/uncore_cha_58:
/sys/devices/uncore_cha_59:
/sys/devices/uncore_cha_6:
/sys/devices/uncore_cha_60:
/sys/devices/uncore_cha_61:
/sys/devices/uncore_cha_62:
/sys/devices/uncore_cha_63:
/sys/devices/uncore_cha_64:
/sys/devices/uncore_cha_65:
/sys/devices/uncore_cha_7:
/sys/devices/uncore_cha_8:
/sys/devices/uncore_cha_9:
/sys/devices/uncore_cxlcm_16:
/sys/devices/uncore_cxlcm_18:
/sys/devices/uncore_cxlcm_2:
/sys/devices/uncore_cxlcm_4:
/sys/devices/uncore_cxlcm_6:
/sys/devices/uncore_cxlcm_8:
/sys/devices/uncore_cxldp_17:
/sys/devices/uncore_cxldp_19:
/sys/devices/uncore_cxldp_3:
/sys/devices/uncore_cxldp_5:
/sys/devices/uncore_cxldp_7:
/sys/devices/uncore_cxldp_9:
/sys/devices/uncore_iio_1:
/sys/devices/uncore_iio_11:
/sys/devices/uncore_iio_12:
/sys/devices/uncore_iio_14:
/sys/devices/uncore_iio_2:
/sys/devices/uncore_iio_3:
/sys/devices/uncore_iio_4:
/sys/devices/uncore_iio_5:
/sys/devices/uncore_iio_6:
/sys/devices/uncore_iio_9:
/sys/devices/uncore_iio_free_running_0:
/sys/devices/uncore_iio_free_running_1:
/sys/devices/uncore_iio_free_running_10:
/sys/devices/uncore_iio_free_running_11:
/sys/devices/uncore_iio_free_running_12:
/sys/devices/uncore_iio_free_running_13:
/sys/devices/uncore_iio_free_running_14:
/sys/devices/uncore_iio_free_running_2:
/sys/devices/uncore_iio_free_running_3:
/sys/devices/uncore_iio_free_running_4:
/sys/devices/uncore_iio_free_running_5:
/sys/devices/uncore_iio_free_running_6:
/sys/devices/uncore_iio_free_running_7:
/sys/devices/uncore_iio_free_running_8:
/sys/devices/uncore_iio_free_running_9:
/sys/devices/uncore_imc:
/sys/devices/uncore_irp_1:
/sys/devices/uncore_irp_11:
/sys/devices/uncore_irp_12:
/sys/devices/uncore_irp_14:
/sys/devices/uncore_irp_2:
/sys/devices/uncore_irp_3:
/sys/devices/uncore_irp_4:
/sys/devices/uncore_irp_5:
/sys/devices/uncore_irp_6:
/sys/devices/uncore_irp_9:
/sys/devices/uncore_mdf_sbo_0:
/sys/devices/uncore_mdf_sbo_1:
/sys/devices/uncore_mdf_sbo_10:
/sys/devices/uncore_mdf_sbo_11:
/sys/devices/uncore_mdf_sbo_12:
/sys/devices/uncore_mdf_sbo_13:
/sys/devices/uncore_mdf_sbo_14:
/sys/devices/uncore_mdf_sbo_15:
/sys/devices/uncore_mdf_sbo_16:
/sys/devices/uncore_mdf_sbo_17:
/sys/devices/uncore_mdf_sbo_18:
/sys/devices/uncore_mdf_sbo_19:
/sys/devices/uncore_mdf_sbo_2:
/sys/devices/uncore_mdf_sbo_20:
/sys/devices/uncore_mdf_sbo_21:
/sys/devices/uncore_mdf_sbo_22:
/sys/devices/uncore_mdf_sbo_23:
/sys/devices/uncore_mdf_sbo_24:
/sys/devices/uncore_mdf_sbo_25:
/sys/devices/uncore_mdf_sbo_26:
/sys/devices/uncore_mdf_sbo_27:
/sys/devices/uncore_mdf_sbo_28:
/sys/devices/uncore_mdf_sbo_29:
/sys/devices/uncore_mdf_sbo_3:
/sys/devices/uncore_mdf_sbo_30:
/sys/devices/uncore_mdf_sbo_31:
/sys/devices/uncore_mdf_sbo_32:
/sys/devices/uncore_mdf_sbo_33:
/sys/devices/uncore_mdf_sbo_34:
/sys/devices/uncore_mdf_sbo_35:
/sys/devices/uncore_mdf_sbo_36:
/sys/devices/uncore_mdf_sbo_37:
/sys/devices/uncore_mdf_sbo_38:
/sys/devices/uncore_mdf_sbo_39:
/sys/devices/uncore_mdf_sbo_4:
/sys/devices/uncore_mdf_sbo_40:
/sys/devices/uncore_mdf_sbo_41:
/sys/devices/uncore_mdf_sbo_42:
/sys/devices/uncore_mdf_sbo_43:
/sys/devices/uncore_mdf_sbo_44:
/sys/devices/uncore_mdf_sbo_45:
/sys/devices/uncore_mdf_sbo_46:
/sys/devices/uncore_mdf_sbo_47:
/sys/devices/uncore_mdf_sbo_48:
/sys/devices/uncore_mdf_sbo_49:
/sys/devices/uncore_mdf_sbo_5:
/sys/devices/uncore_mdf_sbo_50:
/sys/devices/uncore_mdf_sbo_51:
/sys/devices/uncore_mdf_sbo_52:
/sys/devices/uncore_mdf_sbo_53:
/sys/devices/uncore_mdf_sbo_54:
/sys/devices/uncore_mdf_sbo_55:
/sys/devices/uncore_mdf_sbo_56:
/sys/devices/uncore_mdf_sbo_57:
/sys/devices/uncore_mdf_sbo_58:
/sys/devices/uncore_mdf_sbo_59:
/sys/devices/uncore_mdf_sbo_6:
/sys/devices/uncore_mdf_sbo_60:
/sys/devices/uncore_mdf_sbo_61:
/sys/devices/uncore_mdf_sbo_62:
/sys/devices/uncore_mdf_sbo_63:
/sys/devices/uncore_mdf_sbo_64:
/sys/devices/uncore_mdf_sbo_65:
/sys/devices/uncore_mdf_sbo_66:
/sys/devices/uncore_mdf_sbo_67:
/sys/devices/uncore_mdf_sbo_68:
/sys/devices/uncore_mdf_sbo_69:
/sys/devices/uncore_mdf_sbo_7:
/sys/devices/uncore_mdf_sbo_70:
/sys/devices/uncore_mdf_sbo_71:
/sys/devices/uncore_mdf_sbo_72:
/sys/devices/uncore_mdf_sbo_73:
/sys/devices/uncore_mdf_sbo_74:
/sys/devices/uncore_mdf_sbo_75:
/sys/devices/uncore_mdf_sbo_76:
/sys/devices/uncore_mdf_sbo_77:
/sys/devices/uncore_mdf_sbo_78:
/sys/devices/uncore_mdf_sbo_79:
/sys/devices/uncore_mdf_sbo_8:
/sys/devices/uncore_mdf_sbo_9:
/sys/devices/uncore_pciex16_2:
/sys/devices/uncore_pciex16_3:
/sys/devices/uncore_pciex16_8:
/sys/devices/uncore_pciex16_9:
/sys/devices/uncore_pciex8:
/sys/devices/uncore_pcu_0:
/sys/devices/uncore_pcu_1:
/sys/devices/uncore_pcu_2:
/sys/devices/uncore_pcu_3:
/sys/devices/uncore_pcu_4:
/sys/devices/uncore_ubox:
/sys/devices/uncore_upi_0:
/sys/devices/uncore_upi_1:
/sys/devices/uncore_upi_2:
/sys/devices/uncore_upi_3:
/sys/devices/uncore_upi_4:
/sys/devices/uncore_upi_5:
./perf stat -e uncore_upi/event=0x1/,uncore_cha/event=0x1/,uncore_imc/event=0x1/ -a sleep 1

Performance counter stats for 'system wide':

10,432,220,068 uncore_upi/event=0x1/
75,878,604,600 uncore_cha/event=0x1/
840,576,348 uncore_imc/event=0x1/

1.002999881 seconds time elapsed
core test
./perf stat -a sleep 1

Performance counter stats for 'system wide':

392,913.63 msec cpu-clock                        #  384.673 CPUs utilized
     1,345      context-switches                 #    3.423 /sec
       386      cpu-migrations                   #    0.982 /sec
        85      page-faults                      #    0.216 /sec

816,854,365 cycles # 0.002 GHz
214,062,498 instructions # 0.26 insn per cycle
45,335,327 branches # 115.382 K/sec
992,884 branch-misses # 2.19% of all branches

1.021422273 seconds time elapsed
./perf record -e instructions -Iax,bx -b -c 100000 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.040 MB perf.data (17 samples) ]
./perf record -e branches -Iax,bx -b -c 10000 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.053 MB perf.data (39 samples) ]

event test
./perf stat -e LONGEST_LAT_CACHE.MISS,LONGEST_LAT_CACHE.REFERENCE -a sleep 1

Performance counter stats for 'system wide':

   770,872      LONGEST_LAT_CACHE.MISS
10,448,146      LONGEST_LAT_CACHE.REFERENCE

1.019260018 seconds time elapsed
./perf record -e instructions -Iax,bx -b -c 100000 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.036 MB perf.data (17 samples) ]
./perf record -e branches -Iax,bx -b -c 10000 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.054 MB perf.data (39 samples) ]

Kan Liang and others added 8 commits December 6, 2025 16:07
commit 3f710be upstream.

The below warning may be triggered on GNR when the PCIE uncore units are
exposed.

WARNING: CPU: 4 PID: 1 at arch/x86/events/intel/uncore.c:1169 uncore_pci_pmu_register+0x158/0x190

The current uncore driver assumes that all the devices in the same PMU
have the exact same devfn. It's true for the previous platforms. But it
doesn't work for the new PCIE uncore units on GNR.

The assumption doesn't make sense. There is no reason to limit the
devices from the same PMU to the same devfn. Also, the current code just
throws the warning, but still registers the device. The WARN_ON_ONCE()
should be removed.

The func_id is used by the later event_init() to check if a event->pmu
has valid devices. For cpu and mmio uncore PMUs, they are always valid.
For pci uncore PMUs, it's set when the PMU is registered. It can be
replaced by the pmu->registered. Clean up the func_id.

Intel-SIG: commit 3f710be perf/x86/intel/uncore: Clean up func_id.
PMU GNR support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Eric Hu <eric.hu@intel.com>
Link: https://lkml.kernel.org/r/20250108143017.1793781-1-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit 6d64273 upstream.

The same CXL PMONs support is also avaiable on GNR. Apply
spr_uncore_cxlcm and spr_uncore_cxldp to GNR as well.

The other units were broken on early HW samples, so they were ignored in
the early enabling patch. The issue has been fixed and verified on the
later production HW. Add UPI, B2UPI, B2HOT, PCIEX16 and PCIEX8 for GNR.

Intel-SIG: commit 6d64273 perf/x86/intel/uncore: Support more units on Granite Rapids.
PMU GNR support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Eric Hu <eric.hu@intel.com>
Link: https://lkml.kernel.org/r/20250108143017.1793781-2-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit e415c14 upstream.

Add events v1.00.

Bring in the events from:
https://github.com/intel/perfmon/tree/main/CWF/events

Co-developed-by: Caleb Biggers <caleb.biggers@intel.com>
Intel-SIG: commit e415c14 perf vendor events: Add Clearwaterforest events.
PMU Clearwater Forest support

Signed-off-by: Caleb Biggers <caleb.biggers@intel.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lore.kernel.org/r/20250211213031.114209-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit the upstream.

same as the previous Sierra Forest. The only difference is the event
list, which will be supported in the perf tool later.

Intel-SIG: commit the perf/x86/intel/uncore: Add Clearwater Forest support.
PMU Clearwater Forest support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20241211161146.235253-1-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit cf002da upstream.

Starting from the Panther Lake, the discovery table mechanism is also
supported in client platforms. The difference is that the portal of the
global discovery table is retrieved from an MSR.

The layout of discovery tables are the same as the server platforms.
Factor out __parse_discovery_table() to parse discover tables.

The uncore PMON is Die scope. Need to parse the discovery tables for
each die.

Intel-SIG: commit cf002da perf/x86/intel/uncore: Support MSR portal for discovery tables.
PMU Clearwater Forest support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250707201750.616527-2-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit fca24bf upstream.

For a server platform, the MMIO map size is always 0x4000. However, a
client platform may have a smaller map size.

Make the map size customizable.

Intel-SIG: commit fca24bf perf/x86/intel/uncore: Support customized MMIO map size.
PMU Clearwater Forest support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250707201750.616527-3-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit a23eb2f upstream.

The current perf assumes that the counters that support PEBS are
contiguous. But it's not guaranteed with the new leaf 0x23 introduced.
The counters are enumerated with a counter mask. There may be holes in
the counter mask for future platforms or in a virtualization
environment.

Store the PEBS event mask rather than the maximum number of PEBS
counters in the x86 PMU structures.

Intel-SIG: commit a23eb2f perf/x86/intel: Support the PEBS event mask.
PMU Clearwater Forest support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-2-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit d142df1 upstream.

New CPU #defines encode vendor and family as well as model.

Intel-SIG: commit d142df1 perf/x86/intel: Switch to new Intel CPU model defines.
PMU Clearwater Forest support

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20240520224620.9480-32-tony.luck%40intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
Copilot AI review requested due to automatic review settings December 6, 2025 08:46
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @Avenger-285714, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request backports critical Intel PMU infrastructure improvements and adds Clearwater Forest platform support to the Linux 6.6-y stable kernel. The changes refactor the x86 PMU subsystem to use bitmask-based counter representation instead of simple counter counts, enabling more flexible hardware configurations and advanced features like Auto Counter Reload (ACR) and PEBS counter snapshotting.

Key Changes:

  • Architectural refactoring: Converted counter tracking from integer counts to 64-bit bitmasks for general-purpose and fixed counters
  • Added support for Intel Clearwater Forest (Darkmont) and Lunarlake platforms with new PMU capabilities
  • Implemented ACR (Auto Counter Reload) and PEBS counter snapshotting features for v6+ PMU architectures
  • Added support for MSR aliasing, extended EVENTSEL fields (EQ, UMASK2), and dynamic constraints

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
arch/x86/events/perf_event.h Core header changes: counter masks, ACR/PEBS support structures, helper functions
arch/x86/events/core.c Core counter handling refactored from counts to bitmasks
arch/x86/events/intel/core.c Intel PMU driver: ACR support, new platforms (Lunarlake, Clearwater Forest), event constraints
arch/x86/events/intel/ds.c PEBS format 6 support, counter snapshotting, latency data handling for new platforms
arch/x86/events/intel/uncore*.c Uncore driver updates: MSR portal support, GNR unit additions
arch/x86/events/zhaoxin/core.c CRITICAL BUG: Typo in macro name (ENMASK_ULL → GENMASK_ULL)
arch/x86/events/amd/core.c AMD driver updated to use bitmask representation
arch/x86/include/asm/perf_event.h New MSR definitions, PEBS/ACR constants, data structure updates
tools/perf/pmu-events/arch/x86/clearwaterforest/*.json Performance monitoring event definitions for Clearwater Forest

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


x86_pmu.version = version;
x86_pmu.num_counters = eax.split.num_counters;
x86_pmu.cntr_mask64 = ENMASK_ULL(eax.split.num_counters - 1, 0);
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in macro name: ENMASK_ULL should be GENMASK_ULL.

Suggested change
x86_pmu.cntr_mask64 = ENMASK_ULL(eax.split.num_counters - 1, 0);
x86_pmu.cntr_mask64 = GENMASK_ULL(eax.split.num_counters - 1, 0);

Copilot uses AI. Check for mistakes.
Kan Liang added 16 commits December 9, 2025 00:14
commit 722e42e upstream.

The current perf assumes that both GP and fixed counters are contiguous.
But it's not guaranteed on newer Intel platforms or in a virtualization
environment.

Use the counter mask to replace the number of counters for both GP and
the fixed counters. For the other ARCHs or old platforms which don't
support a counter mask, using GENMASK_ULL(num_counter - 1, 0) to
replace. There is no functional change for them.

The interface to KVM is not changed. The number of counters still be
passed to KVM. It can be updated later separately.

Intel-SIG: commit 722e42e perf/x86: Support counter mask.
PMU Clearwater Forest support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-3-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
Signed-off-by: WangYuli <wangyuli@aosc.io>
commit a932aa0 upstream.

From PMU's perspective, Lunar Lake and Arrow Lake are similar to the
previous generation Meteor Lake. Both are hybrid platforms, with e-core
and p-core.

The key differences include:
- The e-core supports 3 new fixed counters
- The p-core supports an updated PEBS Data Source format
- More GP counters (Updated event constraint table)
- New Architectural performance monitoring V6
  (New Perfmon MSRs aliasing, umask2, eq).
- New PEBS format V6 (Counters Snapshotting group)
- New RDPMC metrics clear mode

The legacy features, the 3 new fixed counters and updated event
constraint table are enabled in this patch.

The new PEBS data source format, the architectural performance
monitoring V6, the PEBS format V6, and the new RDPMC metrics clear mode
are supported in the following patches.

Intel-SIG: commit a932aa0 perf/x86: Add Lunar Lake and Arrow Lake support.
PMU Clearwater Forest support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-4-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit 0902624 upstream.

The model-specific pebs_latency_data functions of ADL and MTL use the
"small" as a postfix to indicate the e-core. The postfix is too generic
for a model-specific function. It cannot provide useful information that
can directly map it to a specific uarch, which can facilitate the
development and maintenance.
Use the abbr of the uarch to rename the model-specific functions.

Intel-SIG: commit 0902624 perf/x86/intel: Rename model-specific pebs_latency_data functions.
PMU Clearwater Forest support

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-5-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit 608f697 upstream.

A new PEBS data source format is introduced for the p-core of Lunar
Lake. The data source field is extended to 8 bits with new encodings.

A new layout is introduced into the union intel_x86_pebs_dse.
Introduce the lnl_latency_data() to parse the new format.
Enlarge the pebs_data_source[] accordingly to include new encodings.

Only the mem load and the mem store events can generate the data source.
Introduce INTEL_HYBRID_LDLAT_CONSTRAINT and
INTEL_HYBRID_STLAT_CONSTRAINT to mark them.

Add two new bits for the new cache-related data src, L2_MHB and MSC.
The L2_MHB is short for L2 Miss Handling Buffer, which is similar to
LFB (Line Fill Buffer), but to track the L2 Cache misses.
The MSC stands for the memory-side cache.

Intel-SIG: commit 608f697 perf/x86/intel: Support new data source for Lunar Lake.
PMU Clearwater Forest support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-6-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit e8fb5d6 upstream.

Different vendors may support different fields in EVENTSEL MSR, such as
Intel would introduce new fields umask2 and eq bits in EVENTSEL MSR
since Perfmon version 6. However, a fixed mask X86_RAW_EVENT_MASK is
used to filter the attr.config.

Introduce a new config_mask to record the real supported EVENTSEL
bitmask.
Only apply it to the existing code now. No functional change.

Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Intel-SIG: commit e8fb5d6 perf/x86: Add config_mask to represent EVENTSEL bitmask.
PMU Clearwater Forest support

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-7-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit dce0c74 upstream.

Two new fields (the unit mask2, and the equal flag) are added in the
IA32_PERFEVTSELx MSRs. They can be enumerated by the CPUID.23H.0.EBX.

Update the config_mask in x86_pmu and x86_hybrid_pmu for the true layout
of the PERFEVTSEL.
Expose the new formats into sysfs if they are available. The umask
extension reuses the same format attr name "umask" as the previous
umask. Add umask2_show to determine/display the correct format
for the current machine.

Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Intel-SIG: commit dce0c74 perf/x86/intel: Support PERFEVTSEL extension.
PMU Clearwater Forest support

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-8-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit 149fd47 upstream.

The architectural performance monitoring V6 supports a new range of
counters' MSRs in the 19xxH address range. They include all the GP
counter MSRs, the GP control MSRs, and the fixed counter MSRs.

The step between each sibling counter is 4. Add intel_pmu_addr_offset()
to calculate the correct offset.

Add fixedctr in struct x86_pmu to store the address of the fixed counter
0. It can be used to calculate the rest of the fixed counters.

The MSR address of the fixed counter control is not changed.

Intel-SIG: commit 149fd47 perf/x86/intel: Support Perfmon MSRs aliasing.
PMU Clearwater Forest support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-9-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit 7087bfb upstream.

Modify the pebs_basic and pebs_meminfo structs to make the bitfields
more explicit to ease readability of the code.

Co-developed-by: Stephane Eranian <eranian@google.com>
Intel-SIG: commit 7087bfb perf/x86/intel/ds: Clarify adaptive PEBS processing.
PMU Clearwater Forest support

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241119135504.1463839-3-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit 3c00ed3 upstream.

Factor out functions to process normal and the last PEBS records, which
can be shared with the later patch.

Move the event updating related codes (intel_pmu_save_and_restart())
to the end, where all samples have been processed.
For the current usage, it doesn't matter when perf updates event counts
and reset the counter. Because all counters are stopped when the PEBS
buffer is drained.
Drop the return of the !intel_pmu_save_and_restart(event) check. Because
it never happen. The intel_pmu_save_and_restart(event) only returns 0,
when !hwc->event_base or the period_left > 0.
- The !hwc->event_base is impossible for the PEBS event, since the PEBS
  event is only available on GP and fixed counters, which always have
  a valid hwc->event_base.
- The check only happens for the case of non-AUTO_RELOAD and single
  PEBS, which implies that the event must be overflowed. The period_left
  must be always <= 0 for an overflowed event after the
  x86_pmu_update().

Co-developed-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Intel-SIG: commit 3c00ed3 perf/x86/intel/ds: Factor out functions for PEBS records processing.
PMU Clearwater Forest support

Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241119135504.1463839-4-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
…PEBS

commit ae55e30 upstream.

The current code may iterate all the PEBS records in the DS area several
times. The first loop is to find all active events and calculate the
available records for each event. Then iterate the whole buffer again
and again to process available records until all active events are
processed.

The algorithm is inherited from the old generations. The old PEBS
hardware does not deal well with the situation when events happen near
each other. SW has to drop the error records. Multiple iterations are
required.

The hardware limit has been addressed on newer platforms with adaptive
PEBS. A simple one-iteration algorithm is introduced.

The samples are output by record order with the patch, rather than the
event order. It doesn't impact the post-processing. The perf tool always
sorts the records by time before presenting them to the end user.

In an NMI, the last record has to be specially handled. Add a last[]
variable to track the last unprocessed record of each event.

Test:

11 PEBS events are used in the perf test. Only the basic information is
collected.
perf record -e instructions:up,...,instructions:up -c 2000003 benchmark

The ftrace is used to record the duration of the
intel_pmu_drain_pebs_icl().

The average duration reduced from 62.04us to 57.94us.

A small improvement can be observed with the new algorithm.
Also, the implementation becomes simpler and more straightforward.

Intel-SIG: commit ae55e30 perf/x86/intel/ds: Simplify the PEBS records processing for adaptive PEBS.
PMU Clearwater Forest support

Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20241119135504.1463839-5-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit b8c3a25 upstream.

The only difference between 5 and 6 is the new counters snapshotting
group, without the following counters snapshotting enabling patches,
it's impossible to utilize the feature in a PEBS record. It's safe to
share the same code path with format 5.

Add format 6, so the end user can at least utilize the legacy PEBS
features.

Fixes: a932aa0 ("perf/x86: Add Lunar Lake and Arrow Lake support")
Intel-SIG: commit b8c3a25 perf/x86/intel/ds: Add PEBS format 6.
PMU Clearwater Forest support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241216204505.748363-1-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit 0e45818 upstream.

The new RDPMC enhancement, metrics clear mode, is to clear the
PERF_METRICS-related resources as well as the fixed-function performance
monitoring counter 3 after the read is performed. It is available for
ring 3. The feature is enumerated by the
IA32_PERF_CAPABILITIES.RDPMC_CLEAR_METRICS[bit 19]. To enable the
feature, the IA32_FIXED_CTR_CTRL.METRICS_CLEAR_EN[bit 14] must be set.

Two ways were considered to enable the feature.
- Expose a knob in the sysfs globally. One user may affect the
  measurement of other users when changing the knob. The solution is
  dropped.
- Introduce a new event format, metrics_clear, for the slots event to
  disable/enable the feature only for the current process. Users can
  utilize the feature as needed.
The latter solution is implemented in the patch.

The current KVM doesn't support the perf metrics yet. For
virtualization, the feature can be enabled later separately.

Intel-SIG: commit 0e45818 perf/x86/intel: Support RDPMC metrics clear mode.
PMU Clearwater Forest support

Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20241211160318.235056-1-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit e02e9b0 upstream.

The counters snapshotting is a new adaptive PEBS extension, which can
capture programmable counters, fixed-function counters, and performance
metrics in a PEBS record. The feature is available in the PEBS format
V6.

The target counters can be configured in the new fields of MSR_PEBS_CFG.
Then the PEBS HW will generate the bit mask of counters (Counters Group
Header) followed by the content of all the requested counters into a
PEBS record.

The current Linux perf sample read feature can read all events in the
group when any event in the group is overflowed. But the rdpmc in the
NMI/overflow handler has a small gap from overflow. Also, there is some
overhead for each rdpmc read. The counters snapshotting feature can be
used as an accurate and low-overhead replacement.

Extend intel_update_topdown_event() to accept the value from PEBS
records.

Add a new PEBS_CNTR flag to indicate a sample read group that utilizes
the counters snapshotting feature. When the group is scheduled, the
PEBS configure can be updated accordingly.

To prevent the case that a PEBS record value might be in the past
relative to what is already in the event, perf always stops the PMU and
drains the PEBS buffer before updating the corresponding event->count.

Intel-SIG: commit e02e9b0 perf/x86/intel: Support PEBS counters snapshotting.
PMU Clearwater Forest support

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250121152303.3128733-4-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit 4dfe323 upstream.

More and more features require a dynamic event constraint, e.g., branch
counter logging, auto counter reload, Arch PEBS, etc.

Add a generic flag, PMU_FL_DYN_CONSTRAINT, to indicate the case. It
avoids keeping adding the individual flag in intel_cpuc_prepare().

Add a variable dyn_constraint in the struct hw_perf_event to track the
dynamic constraint of the event. Apply it if it's updated.

Apply the generic dynamic constraint for branch counter logging.
Many features on and after V6 require dynamic constraint. So
unconditionally set the flag for V6+.

Intel-SIG: commit 4dfe323 perf/x86: Add dynamic constraint.
PMU Clearwater Forest support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lkml.kernel.org/r/20250327195217.2683619-2-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit 0a65579 upstream.

When a machine supports PEBS v6, perf unconditionally searches the
cpuc->event_list[] for every event and check if the late setup is
required, which is unnecessary.

The late setup is only required for special events, e.g., events support
counters snapshotting feature. Add n_late_setup to track the num of
events that needs the late setup.

Other features, e.g., auto counter reload feature, require the late
setup as well. Add a wrapper, intel_pmu_pebs_late_setup, for the events
that support counters snapshotting feature.

Intel-SIG: commit 0a65579 perf/x86/intel: Track the num of events needs late setup.
PMU Clearwater Forest support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lkml.kernel.org/r/20250327195217.2683619-3-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit c9449c8 upstream.

The auto counter reload feature requires an event flag to indicate an
auto counter reload group, which can only be scheduled on specific
counters that enumerated in CPUID. However, the hw_perf_event.flags has
run out on X86.

Two solutions were considered to address the issue.
- Currently, 20 bits are reserved for the architecture-specific flags.
  Only the bit 31 is used for the generic flag. There is still plenty
  of space left. Reserve 8 more bits for the arch-specific flags.
- Add a new X86 specific hw_perf_event.flags1 to support more flags.

The former is implemented. Enough room is still left in the global
generic flag.

Intel-SIG: commit c9449c8 perf: Extend the bit width of the arch-specific flag.
PMU Clearwater Forest support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lkml.kernel.org/r/20250327195217.2683619-4-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
Kan Liang and others added 8 commits December 9, 2025 00:14
commit 1856c6c upstream.

The counters that support the auto counter reload feature can be
enumerated in the CPUID Leaf 0x23 sub-leaf 0x2.

Add acr_cntr_mask to store the mask of counters which are reloadable.
Add acr_cause_mask to store the mask of counters which can cause reload.
Since the e-core and p-core may have different numbers of counters,
track the masks in the struct x86_hybrid_pmu as well.

Intel-SIG: commit 1856c6c perf/x86/intel: Add CPUID enumeration for the auto counter reload.
PMU Clearwater Forest support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lkml.kernel.org/r/20250327195217.2683619-5-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit ec980e4 upstream.

The relative rates among two or more events are useful for performance
analysis, e.g., a high branch miss rate may indicate a performance
issue. Usually, the samples with a relative rate that exceeds some
threshold are more useful. However, the traditional sampling takes
samples of events separately. To get the relative rates among two or
more events, a high sample rate is required, which can bring high
overhead. Many samples taken in the non-hotspot area are also dropped
(useless) in the post-process.

The auto counter reload (ACR) feature takes samples when the relative
rate of two or more events exceeds some threshold, which provides the
fine-grained information at a low cost.
To support the feature, two sets of MSRs are introduced. For a given
counter IA32_PMC_GPn_CTR/IA32_PMC_FXm_CTR, bit fields in the
IA32_PMC_GPn_CFG_B/IA32_PMC_FXm_CFG_B MSR indicate which counter(s)
can cause a reload of that counter. The reload value is stored in the
IA32_PMC_GPn_CFG_C/IA32_PMC_FXm_CFG_C.
The details can be found at Intel SDM (085), Volume 3, 21.9.11 Auto
Counter Reload.

In the hw_config(), an ACR event is specially configured, because the
cause/reloadable counter mask has to be applied to the dyn_constraint.
Besides the HW limit, e.g., not support perf metrics, PDist and etc, a
SW limit is applied as well. ACR events in a group must be contiguous.
It facilitates the later conversion from the event idx to the counter
idx. Otherwise, the intel_pmu_acr_late_setup() has to traverse the whole
event list again to find the "cause" event.
Also, add a new flag PERF_X86_EVENT_ACR to indicate an ACR group, which
is set to the group leader.

The late setup() is also required for an ACR group. It's to convert the
event idx to the counter idx, and saved it in hw.config1.

The ACR configuration MSRs are only updated in the enable_event().
The disable_event() doesn't clear the ACR CFG register.
Add acr_cfg_b/acr_cfg_c in the struct cpu_hw_events to cache the MSR
values. It can avoid a MSR write if the value is not changed.

Expose an acr_mask to the sysfs. The perf tool can utilize the new
format to configure the relation of events in the group. The bit
sequence of the acr_mask follows the events enabled order of the group.

Example:

Here is the snippet of the mispredict.c. Since the array has a random
numbers, jumps are random and often mispredicted.
The mispredicted rate depends on the compared value.

For the Loop1, ~11% of all branches are mispredicted.
For the Loop2, ~21% of all branches are mispredicted.

main()
{
...
        for (i = 0; i < N; i++)
                data[i] = rand() % 256;
...
        /* Loop 1 */
        for (k = 0; k < 50; k++)
                for (i = 0; i < N; i++)
                        if (data[i] >= 64)
                                sum += data[i];
...

...
        /* Loop 2 */
        for (k = 0; k < 50; k++)
                for (i = 0; i < N; i++)
                        if (data[i] >= 128)
                                sum += data[i];
...
}

Usually, a code with a high branch miss rate means a bad performance.
To understand the branch miss rate of the codes, the traditional method
usually samples both branches and branch-misses events. E.g.,
perf record -e "{cpu_atom/branch-misses/ppu, cpu_atom/branch-instructions/u}"
               -c 1000000 -- ./mispredict

[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.925 MB perf.data (5106 samples) ]
The 5106 samples are from both events and spread in both Loops.
In the post-process stage, a user can know that the Loop 2 has a 21%
branch miss rate. Then they can focus on the samples of branch-misses
events for the Loop 2.

With this patch, the user can generate the samples only when the branch
miss rate > 20%. For example,
perf record -e "{cpu_atom/branch-misses,period=200000,acr_mask=0x2/ppu,
                 cpu_atom/branch-instructions,period=1000000,acr_mask=0x3/u}"
                -- ./mispredict

(Two different periods are applied to branch-misses and
branch-instructions. The ratio is set to 20%.
If the branch-instructions is overflowed first, the branch-miss
rate < 20%. No samples should be generated. All counters should be
automatically reloaded.
If the branch-misses is overflowed first, the branch-miss rate > 20%.
A sample triggered by the branch-misses event should be
generated. Just the counter of the branch-instructions should be
automatically reloaded.

The branch-misses event should only be automatically reloaded when
the branch-instructions is overflowed. So the "cause" event is the
branch-instructions event. The acr_mask is set to 0x2, since the
event index in the group of branch-instructions is 1.

The branch-instructions event is automatically reloaded no matter which
events are overflowed. So the "cause" events are the branch-misses
and the branch-instructions event. The acr_mask should be set to 0x3.)

[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.098 MB perf.data (2498 samples) ]

 $perf report

Percent       │154:   movl    $0x0,-0x14(%rbp)
              │     ↓ jmp     1af
              │     for (i = j; i < N; i++)
              │15d:   mov     -0x10(%rbp),%eax
              │       mov     %eax,-0x18(%rbp)
              │     ↓ jmp     1a2
              │     if (data[i] >= 128)
              │165:   mov     -0x18(%rbp),%eax
              │       cltq
              │       lea     0x0(,%rax,4),%rdx
              │       mov     -0x8(%rbp),%rax
              │       add     %rdx,%rax
              │       mov     (%rax),%eax
              │    ┌──cmp     $0x7f,%eax
100.00   0.00 │    ├──jle     19e
              │    │sum += data[i];

The 2498 samples are all from the branch-misses events for the Loop 2.

The number of samples and overhead is significantly reduced without
losing any information.

Intel-SIG: commit ec980e4 perf/x86/intel: Support auto counter reload.
PMU Clearwater Forest support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lkml.kernel.org/r/20250327195217.2683619-6-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit a5f5e12 upstream.

The below code would always unconditionally clear other status bits like
perf metrics overflow bit once PEBS buffer overflows:

        status &= intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;

This is incorrect. Perf metrics overflow bit should be cleared only when
fixed counter 3 in PEBS counter group. Otherwise perf metrics overflow
could be missed to handle.

Closes: https://lore.kernel.org/all/20250225110012.GK31462@noisy.programming.kicks-ass.net/
Fixes: 7b2c05a ("perf/x86/intel: Generic support for hardware TopDown metrics")
Intel-SIG: commit a5f5e12 perf/x86/intel: Don't clear perf metrics overflow bit unconditionally.
PMU Clearwater Forest support

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250415104135.318169-1-dapeng1.mi@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit 48d66c8 upstream.

From the PMU's perspective, Clearwater Forest is similar to the previous
generation Sierra Forest.

The key differences are the ARCH PEBS feature and the new added 3 fixed
counters for topdown L1 metrics events.

The ARCH PEBS is supported in the following patches. This patch provides
support for basic perfmon features and 3 new added fixed counters.

Intel-SIG: commit 48d66c8 perf/x86/intel: Add PMU support for Clearwater Forest.
PMU Clearwater Forest support

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20250415114428.341182-3-dapeng1.mi@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit 25c623f upstream.

CPUID archPerfmonExt (0x23) leaves are supported to enumerate CPU
level's PMU capabilities on non-hybrid processors as well.

This patch supports to parse archPerfmonExt leaves on non-hybrid
processors. Architectural PEBS leverages archPerfmonExt sub-leaves 0x4
and 0x5 to enumerate the PEBS capabilities as well. This patch is a
precursor of the subsequent arch-PEBS enabling patches.

Intel-SIG: commit 25c623f perf/x86/intel: Parse CPUID archPerfmonExt leaves for non-hybrid CPUs.
PMU Clearwater Forest support

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20250415114428.341182-4-dapeng1.mi@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit e9988ad upstream.

The PEBS counters snapshotting group also requires a group flag in the
leader. The leader must be a X86 event.

Fixes: e02e9b0 ("perf/x86/intel: Support PEBS counters snapshotting")
Intel-SIG: commit e9988ad perf/x86/intel: Check the X86 leader for pebs_counter_event_group.
PMU Clearwater Forest support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250424134718.311934-3-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit efd4485 upstream.

The auto counter reload group also requires a group flag in the leader.
The leader must be a X86 event.

Intel-SIG: commit efd4485 perf/x86/intel: Check the X86 leader for ACR group.
PMU Clearwater Forest support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250424134718.311934-4-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
commit 3e830f6 upstream.

The current is_x86_event has to go through the hybrid_pmus list to find
the matched pmu, then check if it's a X86 PMU and a X86 event. It's not
necessary.

The X86 PMU has a unique type ID on a non-hybrid machine, and a unique
capability type. They are good enough to do the check.

Intel-SIG: commit 3e830f6 perf/x86: Optimize the is_x86_event.
PMU Clearwater Forest support

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250424134718.311934-5-kan.liang@linux.intel.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@opsiff
Copy link
Member

opsiff commented Dec 9, 2025

有些补丁修复没合入 perf/x86/intel: Fix segfault with PEBS-via-PT with sample_freq
fix: perf/x86: Support counter mask

commit aa5d2ca
Author: Kan Liang kan.liang@linux.intel.com
Date: Mon Dec 16 08:02:52 2024 -0800

perf/x86/intel: Fix bitmask of OCR and FRONTEND events for LNC

fix:perf/x86: Add Lunar Lake and Arrow Lake support

commit 782cffe
Author: Kan Liang kan.liang@linux.intel.com
Date: Wed Feb 19 06:10:05 2025 -0800

perf/x86/intel: Fix event constraints for LNC

fix:perf/x86: Add Lunar Lake and Arrow Lake support
commit 0ba6502
Author: Dapeng Mi dapeng1.mi@linux.intel.com
Date: Tue Oct 28 14:42:14 2025 +0800

perf/x86/intel: Fix KASAN global-out-of-bounds warning

fix:perf/x86/intel: Rename model-specific pebs_latency_data functions

commit 7da9960
Author: Kan Liang kan.liang@linux.intel.com
Date: Thu Apr 24 06:47:18 2025 -0700

perf/x86/intel/ds: Fix counter backwards of non-precise events counters-snapshotting

The counter backwards may be observed in the PMI handler when
counters-snapshotting some non-precise events in the freq mode.

For the non-precise events, it's possible the counters-snapshotting
records a positive value for an overflowed PEBS event. Then the HW
auto-reload mechanism reset the counter to 0 immediately. Because the
pebs_event_reset is cleared in the freq mode, which doesn't set the
PERF_X86_EVENT_AUTO_RELOAD.
In the PMI handler, 0 will be read rather than the positive value
recorded in the counters-snapshotting record.

The counters-snapshotting case has to be specially handled. Since the
event value has been updated when processing the counters-snapshotting
record, only needs to set the new period for the counter via
x86_pmu_set_period().

Fixes: e02e9b0374c3 ("perf/x86/intel: Support PEBS counters snapshotting")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250424134718.311934-6-kan.liang@linux.intel.com

commit 43796f3
Author: Dapeng Mi dapeng1.mi@linux.intel.com
Date: Wed Aug 20 10:30:27 2025 +0800

perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error

When running perf_fuzzer on PTL, sometimes the below "unchecked MSR
 access error" is seen when accessing IA32_PMC_x_CFG_B MSRs.

[   55.611268] unchecked MSR access error: WRMSR to 0x1986 (tried to write 0x0000000200000001) at rIP: 0xffffffffac564b28 (native_write_msr+0x8/0x30)
[   55.611280] Call Trace:
[   55.611282]  <TASK>
[   55.611284]  ? intel_pmu_config_acr+0x87/0x160
[   55.611289]  intel_pmu_enable_acr+0x6d/0x80
[   55.611291]  intel_pmu_enable_event+0xce/0x460
[   55.611293]  x86_pmu_start+0x78/0xb0
[   55.611297]  x86_pmu_enable+0x218/0x3a0
[   55.611300]  ? x86_pmu_enable+0x121/0x3a0
[   55.611302]  perf_pmu_enable+0x40/0x50
[   55.611307]  ctx_resched+0x19d/0x220
[   55.611309]  __perf_install_in_context+0x284/0x2f0
[   55.611311]  ? __pfx_remote_function+0x10/0x10
[   55.611314]  remote_function+0x52/0x70
[   55.611317]  ? __pfx_remote_function+0x10/0x10
[   55.611319]  generic_exec_single+0x84/0x150
[   55.611323]  smp_call_function_single+0xc5/0x1a0
[   55.611326]  ? __pfx_remote_function+0x10/0x10
[   55.611329]  perf_install_in_context+0xd1/0x1e0
[   55.611331]  ? __pfx___perf_install_in_context+0x10/0x10
[   55.611333]  __do_sys_perf_event_open+0xa76/0x1040
[   55.611336]  __x64_sys_perf_event_open+0x26/0x30
[   55.611337]  x64_sys_call+0x1d8e/0x20c0
[   55.611339]  do_syscall_64+0x4f/0x120
[   55.611343]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

On PTL, GP counter 0 and 1 doesn't support auto counter reload feature,
thus it would trigger a #GP when trying to write 1 on bit 0 of CFG_B MSR

commit 86aa94c
Author: Dapeng Mi dapeng1.mi@linux.intel.com
Date: Thu May 29 08:02:36 2025 +0000

perf/x86/intel: Fix incorrect MSR index calculations in intel_pmu_config_acr()

The MSR offset calculations in intel_pmu_config_acr() are buggy.

To calculate fixed counter MSR addresses in intel_pmu_config_acr(),
the HW counter index "idx" is subtracted by INTEL_PMC_IDX_FIXED.

Kan Liang and others added 6 commits December 12, 2025 19:04
The released OCR and FRONTEND events utilized more bits on Lunar Lake
p-core. The corresponding mask in the extra_regs has to be extended to
unblock the extra bits.

Add a dedicated intel_lnc_extra_regs.

Fixes: a932aa0 ("perf/x86: Add Lunar Lake and Arrow Lake support")
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20241216160252.430858-1-kan.liang@linux.intel.com
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
According to the latest event list, update the event constraint tables
for Lion Cove core.

The general rule (the event codes < 0x90 are restricted to counters
0-3.) has been removed. There is no restriction for most of the
performance monitoring events.

Fixes: a932aa0 ("perf/x86: Add Lunar Lake and Arrow Lake support")
Reported-by: Amiri Khalil <amiri.khalil@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20250219141005.2446823-1-kan.liang@linux.intel.com
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
When running "perf mem record" command on CWF, the below KASAN
global-out-of-bounds warning is seen.

  ==================================================================
  BUG: KASAN: global-out-of-bounds in cmt_latency_data+0x176/0x1b0
  Read of size 4 at addr ffffffffb721d000 by task dtlb/9850

  Call Trace:

   kasan_report+0xb8/0xf0
   cmt_latency_data+0x176/0x1b0
   setup_arch_pebs_sample_data+0xf49/0x2560
   intel_pmu_drain_arch_pebs+0x577/0xb00
   handle_pmi_common+0x6c4/0xc80

The issue is caused by below code in __grt_latency_data(). The code
tries to access x86_hybrid_pmu structure which doesn't exist on
non-hybrid platform like CWF.

        WARN_ON_ONCE(hybrid_pmu(event->pmu)->pmu_type == hybrid_big)

So add is_hybrid() check before calling this WARN_ON_ONCE to fix the
global-out-of-bounds access issue.

Fixes: 0902624 ("perf/x86/intel: Rename model-specific pebs_latency_data functions")
Reported-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251028064214.1451968-1-dapeng1.mi@linux.intel.com
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
…rs-snapshotting

The counter backwards may be observed in the PMI handler when
counters-snapshotting some non-precise events in the freq mode.

For the non-precise events, it's possible the counters-snapshotting
records a positive value for an overflowed PEBS event. Then the HW
auto-reload mechanism reset the counter to 0 immediately. Because the
pebs_event_reset is cleared in the freq mode, which doesn't set the
PERF_X86_EVENT_AUTO_RELOAD.
In the PMI handler, 0 will be read rather than the positive value
recorded in the counters-snapshotting record.

The counters-snapshotting case has to be specially handled. Since the
event value has been updated when processing the counters-snapshotting
record, only needs to set the new period for the counter via
x86_pmu_set_period().

Fixes: e02e9b0 ("perf/x86/intel: Support PEBS counters snapshotting")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250424134718.311934-6-kan.liang@linux.intel.com
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
When running perf_fuzzer on PTL, sometimes the below "unchecked MSR
 access error" is seen when accessing IA32_PMC_x_CFG_B MSRs.

[   55.611268] unchecked MSR access error: WRMSR to 0x1986 (tried to write 0x0000000200000001) at rIP: 0xffffffffac564b28 (native_write_msr+0x8/0x30)
[   55.611280] Call Trace:
[   55.611282]  <TASK>
[   55.611284]  ? intel_pmu_config_acr+0x87/0x160
[   55.611289]  intel_pmu_enable_acr+0x6d/0x80
[   55.611291]  intel_pmu_enable_event+0xce/0x460
[   55.611293]  x86_pmu_start+0x78/0xb0
[   55.611297]  x86_pmu_enable+0x218/0x3a0
[   55.611300]  ? x86_pmu_enable+0x121/0x3a0
[   55.611302]  perf_pmu_enable+0x40/0x50
[   55.611307]  ctx_resched+0x19d/0x220
[   55.611309]  __perf_install_in_context+0x284/0x2f0
[   55.611311]  ? __pfx_remote_function+0x10/0x10
[   55.611314]  remote_function+0x52/0x70
[   55.611317]  ? __pfx_remote_function+0x10/0x10
[   55.611319]  generic_exec_single+0x84/0x150
[   55.611323]  smp_call_function_single+0xc5/0x1a0
[   55.611326]  ? __pfx_remote_function+0x10/0x10
[   55.611329]  perf_install_in_context+0xd1/0x1e0
[   55.611331]  ? __pfx___perf_install_in_context+0x10/0x10
[   55.611333]  __do_sys_perf_event_open+0xa76/0x1040
[   55.611336]  __x64_sys_perf_event_open+0x26/0x30
[   55.611337]  x64_sys_call+0x1d8e/0x20c0
[   55.611339]  do_syscall_64+0x4f/0x120
[   55.611343]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

On PTL, GP counter 0 and 1 doesn't support auto counter reload feature,
thus it would trigger a #GP when trying to write 1 on bit 0 of CFG_B MSR
which requires to enable auto counter reload on GP counter 0.

The root cause of causing this issue is the check for auto counter
reload (ACR) counter mask from user space is incorrect in
intel_pmu_acr_late_setup() helper. It leads to an invalid ACR counter
mask from user space could be set into hw.config1 and then written into
CFG_B MSRs and trigger the MSR access warning.

e.g., User may create a perf event with ACR counter mask (config2=0xcb),
and there is only 1 event created, so "cpuc->n_events" is 1.

The correct check condition should be "i + idx >= cpuc->n_events"
instead of "i + idx > cpuc->n_events" (it looks a typo). Otherwise,
the counter mask would traverse twice and an invalid "cpuc->assign[1]"
bit (bit 0) is set into hw.config1 and cause MSR accessing error.

Besides, also check if the ACR counter mask corresponding events are
ACR events. If not, filter out these counter mask. If a event is not a
ACR event, it could be scheduled to an HW counter which doesn't support
ACR. It's invalid to add their counter index in ACR counter mask.

Furthermore, remove the WARN_ON_ONCE() since it's easily triggered as
user could set any invalid ACR counter mask and the warning message
could mislead users.

Fixes: ec980e4 ("perf/x86/intel: Support auto counter reload")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20250820023032.17128-3-dapeng1.mi@linux.intel.com
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
…fig_acr()

The MSR offset calculations in intel_pmu_config_acr() are buggy.

To calculate fixed counter MSR addresses in intel_pmu_config_acr(),
the HW counter index "idx" is subtracted by INTEL_PMC_IDX_FIXED.

This leads to the ACR mask value of fixed counters to be incorrectly
saved to the positions of GP counters in acr_cfg_b[], e.g.

For fixed counter 0, its ACR counter mask should be saved to
acr_cfg_b[32], but it's saved to acr_cfg_b[0] incorrectly.

Fix this issue.

[ mingo: Clarified & improved the changelog. ]

Fixes: ec980e4 ("perf/x86/intel: Support auto counter reload")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250529080236.2552247-2-dapeng1.mi@linux.intel.com
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
@Avenger-285714
Copy link
Member Author

有些补丁修复没合入 perf/x86/intel: Fix segfault with PEBS-via-PT with sample_freq fix: perf/x86: Support counter mask

commit aa5d2ca Author: Kan Liang kan.liang@linux.intel.com Date: Mon Dec 16 08:02:52 2024 -0800

perf/x86/intel: Fix bitmask of OCR and FRONTEND events for LNC

fix:perf/x86: Add Lunar Lake and Arrow Lake support

commit 782cffe Author: Kan Liang kan.liang@linux.intel.com Date: Wed Feb 19 06:10:05 2025 -0800

perf/x86/intel: Fix event constraints for LNC

fix:perf/x86: Add Lunar Lake and Arrow Lake support commit 0ba6502 Author: Dapeng Mi dapeng1.mi@linux.intel.com Date: Tue Oct 28 14:42:14 2025 +0800

perf/x86/intel: Fix KASAN global-out-of-bounds warning

fix:perf/x86/intel: Rename model-specific pebs_latency_data functions

commit 7da9960 Author: Kan Liang kan.liang@linux.intel.com Date: Thu Apr 24 06:47:18 2025 -0700

perf/x86/intel/ds: Fix counter backwards of non-precise events counters-snapshotting

The counter backwards may be observed in the PMI handler when
counters-snapshotting some non-precise events in the freq mode.

For the non-precise events, it's possible the counters-snapshotting
records a positive value for an overflowed PEBS event. Then the HW
auto-reload mechanism reset the counter to 0 immediately. Because the
pebs_event_reset is cleared in the freq mode, which doesn't set the
PERF_X86_EVENT_AUTO_RELOAD.
In the PMI handler, 0 will be read rather than the positive value
recorded in the counters-snapshotting record.

The counters-snapshotting case has to be specially handled. Since the
event value has been updated when processing the counters-snapshotting
record, only needs to set the new period for the counter via
x86_pmu_set_period().

Fixes: e02e9b0374c3 ("perf/x86/intel: Support PEBS counters snapshotting")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250424134718.311934-6-kan.liang@linux.intel.com

commit 43796f3 Author: Dapeng Mi dapeng1.mi@linux.intel.com Date: Wed Aug 20 10:30:27 2025 +0800

perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error

When running perf_fuzzer on PTL, sometimes the below "unchecked MSR
 access error" is seen when accessing IA32_PMC_x_CFG_B MSRs.

[   55.611268] unchecked MSR access error: WRMSR to 0x1986 (tried to write 0x0000000200000001) at rIP: 0xffffffffac564b28 (native_write_msr+0x8/0x30)
[   55.611280] Call Trace:
[   55.611282]  <TASK>
[   55.611284]  ? intel_pmu_config_acr+0x87/0x160
[   55.611289]  intel_pmu_enable_acr+0x6d/0x80
[   55.611291]  intel_pmu_enable_event+0xce/0x460
[   55.611293]  x86_pmu_start+0x78/0xb0
[   55.611297]  x86_pmu_enable+0x218/0x3a0
[   55.611300]  ? x86_pmu_enable+0x121/0x3a0
[   55.611302]  perf_pmu_enable+0x40/0x50
[   55.611307]  ctx_resched+0x19d/0x220
[   55.611309]  __perf_install_in_context+0x284/0x2f0
[   55.611311]  ? __pfx_remote_function+0x10/0x10
[   55.611314]  remote_function+0x52/0x70
[   55.611317]  ? __pfx_remote_function+0x10/0x10
[   55.611319]  generic_exec_single+0x84/0x150
[   55.611323]  smp_call_function_single+0xc5/0x1a0
[   55.611326]  ? __pfx_remote_function+0x10/0x10
[   55.611329]  perf_install_in_context+0xd1/0x1e0
[   55.611331]  ? __pfx___perf_install_in_context+0x10/0x10
[   55.611333]  __do_sys_perf_event_open+0xa76/0x1040
[   55.611336]  __x64_sys_perf_event_open+0x26/0x30
[   55.611337]  x64_sys_call+0x1d8e/0x20c0
[   55.611339]  do_syscall_64+0x4f/0x120
[   55.611343]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

On PTL, GP counter 0 and 1 doesn't support auto counter reload feature,
thus it would trigger a #GP when trying to write 1 on bit 0 of CFG_B MSR

commit 86aa94c Author: Dapeng Mi dapeng1.mi@linux.intel.com Date: Thu May 29 08:02:36 2025 +0000

perf/x86/intel: Fix incorrect MSR index calculations in intel_pmu_config_acr()

The MSR offset calculations in intel_pmu_config_acr() are buggy.

To calculate fixed counter MSR addresses in intel_pmu_config_acr(),
the HW counter index "idx" is subtracted by INTEL_PMC_IDX_FIXED.

done

@deepin-ci-robot
Copy link

deepin pr auto review

我来帮你分析这个git diff。这是一个较大的改动,主要涉及x86架构下性能监控单元(PMU)的改进。我将从语法逻辑、代码质量、性能和安全几个方面进行分析。

  1. 语法逻辑改进:
  • 使用位掩码(cntr_mask64)替代计数器数量(num_counters)的表示方式,这是一个更灵活的表示方法
  • 添加了新的PEBS(基于事件的采样)和ACR(自动计数器重载)功能支持
  • 改进了事件约束处理机制,增加了动态约束支持
  • 统一了混合架构下的PMU处理方式
  1. 代码质量改进:
  • 增加了更多的错误检查和边界条件处理
  • 改进了代码结构,将相关功能封装到独立函数中
  • 添加了更多的注释说明
  • 使用了更清晰的变量命名
  1. 性能改进:
  • 优化了计数器访问方式,使用位运算代替循环
  • 改进了PEBS缓冲区处理效率
  • 添加了计数器快照功能,减少不必要的读取操作
  1. 安全性改进:
  • 增加了对无效配置的检查
  • 改进了资源分配和释放的配对检查
  • 添加了对特殊情况的防护

具体改进建议:

  1. 对于新增的PEBS和ACR功能,建议添加更多的文档说明其使用方法和注意事项。

  2. 在性能关键路径上,如事件处理函数中,建议进一步优化内存访问模式。

  3. 对于混合架构的处理,建议添加更多的单元测试来确保不同架构下的正确性。

  4. 建议添加更多的运行时检查,特别是在处理用户提供的配置参数时。

  5. 对于新增的MSR(Model Specific Register)访问,建议添加更多的错误处理和恢复机制。

  6. 在事件约束处理中,建议添加更多的调试信息,方便问题诊断。

这些改动整体上是积极的,提高了PMU子系统的灵活性和可扩展性,同时也改善了代码的可维护性。建议在合并前进行充分的测试,特别是在各种不同的硬件配置下。

@opsiff opsiff merged commit ab0f9b7 into deepin-community:linux-6.6.y Dec 15, 2025
11 checks passed
@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: opsiff

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

opsiff pushed a commit to opsiff/UOS-kernel that referenced this pull request Dec 16, 2025
Intel inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/ICZHEB
CVE: NA

--------------------------------

Following upstream commits introduced 2 fields (config1 and dyn_constraint)
in struct hw_perf_event, which breaks kABI.

	ec980e4 ("perf/x86/intel: Support auto counter reload")
	4dfe323 ("perf/x86: Add dynamic constraint")

To fix this kABI breakage, we introduce struct hw_perf_event_ext, and
use one KABI_RESERVE field in struct perf_event as pointer to this
struct hw_perf_event_ext. This is viable because hw_perf_event is
always embedded in struct perf_event, so we can always access
hw_perf_event_ext from perf_event when needed.

We also create a kmem_cache for struct hw_per_event_ext.

Another kABI changes are caused by the following commit:

	0e102ce ("KVM: x86/pmu: Change ambiguous _mask suffix to _rsvd in kvm_pmu")

But the fix is trivial.

Fixes: ec980e4 ("perf/x86/intel: Support auto counter reload")
Fixes: 4dfe323 ("perf/x86: Add dynamic constraint")
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
Link: deepin-community#1356
[Backport: drop arch/x86/include/asm/kvm_host.h for no rename it]
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
lanlanxiyiji pushed a commit that referenced this pull request Dec 16, 2025
Intel inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/ICZHEB
CVE: NA

--------------------------------

Following upstream commits introduced 2 fields (config1 and dyn_constraint)
in struct hw_perf_event, which breaks kABI.

	ec980e4 ("perf/x86/intel: Support auto counter reload")
	4dfe323 ("perf/x86: Add dynamic constraint")

To fix this kABI breakage, we introduce struct hw_perf_event_ext, and
use one KABI_RESERVE field in struct perf_event as pointer to this
struct hw_perf_event_ext. This is viable because hw_perf_event is
always embedded in struct perf_event, so we can always access
hw_perf_event_ext from perf_event when needed.

We also create a kmem_cache for struct hw_per_event_ext.

Another kABI changes are caused by the following commit:

	0e102ce ("KVM: x86/pmu: Change ambiguous _mask suffix to _rsvd in kvm_pmu")

But the fix is trivial.

Fixes: ec980e4 ("perf/x86/intel: Support auto counter reload")
Fixes: 4dfe323 ("perf/x86: Add dynamic constraint")
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
Link: #1356
[Backport: drop arch/x86/include/asm/kvm_host.h for no rename it]
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants