Skip to content

Commit 69e7649

Browse files
pilotAlpalherbertx
authored andcommitted
crypto: qat - add support for device telemetry
Expose through debugfs device telemetry data for QAT GEN4 devices. This allows to gather metrics about the performance and the utilization of a device. In particular, statistics on (1) the utilization of the PCIe channel, (2) address translation, when SVA is enabled and (3) the internal engines for crypto and data compression. If telemetry is supported by the firmware, the driver allocates a DMA region and a circular buffer. When telemetry is enabled, through the `control` attribute in debugfs, the driver sends to the firmware, via the admin interface, the `TL_START` command. This triggers the device to periodically gather telemetry data from hardware registers and write it into the DMA memory region. The device writes into the shared region every second. The driver, every 500ms, snapshots the DMA shared region into the circular buffer. This is then used to compute basic metric (min/max/average) on each counter, every time the `device_data` attribute is queried. Telemetry counters are exposed through debugfs in the folder /sys/kernel/debug/qat_<device>_<BDF>/telemetry. For details, refer to debugfs-driver-qat_telemetry in Documentation/ABI. This patch is based on earlier work done by Wojciech Ziemba. Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Damian Muszynski <damian.muszynski@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
1 parent 7f06679 commit 69e7649

File tree

13 files changed

+1339
-0
lines changed

13 files changed

+1339
-0
lines changed
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
What: /sys/kernel/debug/qat_<device>_<BDF>/telemetry/control
2+
Date: March 2024
3+
KernelVersion: 6.8
4+
Contact: qat-linux@intel.com
5+
Description: (RW) Enables/disables the reporting of telemetry metrics.
6+
7+
Allowed values to write:
8+
========================
9+
* 0: disable telemetry
10+
* 1: enable telemetry
11+
* 2, 3, 4: enable telemetry and calculate minimum, maximum
12+
and average for each counter over 2, 3 or 4 samples
13+
14+
Returned values:
15+
================
16+
* 1-4: telemetry is enabled and running
17+
* 0: telemetry is disabled
18+
19+
Example.
20+
21+
Writing '3' to this file starts the collection of
22+
telemetry metrics. Samples are collected every second and
23+
stored in a circular buffer of size 3. These values are then
24+
used to calculate the minimum, maximum and average for each
25+
counter. After enabling, counters can be retrieved through
26+
the ``device_data`` file::
27+
28+
echo 3 > /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/control
29+
30+
Writing '0' to this file stops the collection of telemetry
31+
metrics::
32+
33+
echo 0 > /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/control
34+
35+
This attribute is only available for qat_4xxx devices.
36+
37+
What: /sys/kernel/debug/qat_<device>_<BDF>/telemetry/device_data
38+
Date: March 2024
39+
KernelVersion: 6.8
40+
Contact: qat-linux@intel.com
41+
Description: (RO) Reports device telemetry counters.
42+
Reads report metrics about performance and utilization of
43+
a QAT device:
44+
45+
======================= ========================================
46+
Field Description
47+
======================= ========================================
48+
sample_cnt number of acquisitions of telemetry data
49+
from the device. Reads are performed
50+
every 1000 ms.
51+
pci_trans_cnt number of PCIe partial transactions
52+
max_rd_lat maximum logged read latency [ns] (could
53+
be any read operation)
54+
rd_lat_acc_avg average read latency [ns]
55+
max_gp_lat max get to put latency [ns] (only takes
56+
samples for AE0)
57+
gp_lat_acc_avg average get to put latency [ns]
58+
bw_in PCIe, write bandwidth [Mbps]
59+
bw_out PCIe, read bandwidth [Mbps]
60+
at_page_req_lat_avg Address Translator(AT), average page
61+
request latency [ns]
62+
at_trans_lat_avg AT, average page translation latency [ns]
63+
at_max_tlb_used AT, maximum uTLB used
64+
util_cpr<N> utilization of Compression slice N [%]
65+
exec_cpr<N> execution count of Compression slice N
66+
util_xlt<N> utilization of Translator slice N [%]
67+
exec_xlt<N> execution count of Translator slice N
68+
util_dcpr<N> utilization of Decompression slice N [%]
69+
exec_dcpr<N> execution count of Decompression slice N
70+
util_pke<N> utilization of PKE N [%]
71+
exec_pke<N> execution count of PKE N
72+
util_ucs<N> utilization of UCS slice N [%]
73+
exec_ucs<N> execution count of UCS slice N
74+
util_wat<N> utilization of Wireless Authentication
75+
slice N [%]
76+
exec_wat<N> execution count of Wireless Authentication
77+
slice N
78+
util_wcp<N> utilization of Wireless Cipher slice N [%]
79+
exec_wcp<N> execution count of Wireless Cipher slice N
80+
util_cph<N> utilization of Cipher slice N [%]
81+
exec_cph<N> execution count of Cipher slice N
82+
util_ath<N> utilization of Authentication slice N [%]
83+
exec_ath<N> execution count of Authentication slice N
84+
======================= ========================================
85+
86+
The telemetry report file can be read with the following command::
87+
88+
cat /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/device_data
89+
90+
If ``control`` is set to 1, only the current values of the
91+
counters are displayed::
92+
93+
<counter_name> <current>
94+
95+
If ``control`` is 2, 3 or 4, counters are displayed in the
96+
following format::
97+
98+
<counter_name> <current> <min> <max> <avg>
99+
100+
If a device lacks of a specific accelerator, the corresponding
101+
attribute is not reported.
102+
103+
This attribute is only available for qat_4xxx devices.

drivers/crypto/intel/qat/qat_420xx/adf_420xx_hw_data.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
#include <adf_gen4_pm.h>
1616
#include <adf_gen4_ras.h>
1717
#include <adf_gen4_timer.h>
18+
#include <adf_gen4_tl.h>
1819
#include "adf_420xx_hw_data.h"
1920
#include "icp_qat_hw.h"
2021

@@ -543,6 +544,7 @@ void adf_init_hw_data_420xx(struct adf_hw_device_data *hw_data, u32 dev_id)
543544
adf_gen4_init_pf_pfvf_ops(&hw_data->pfvf_ops);
544545
adf_gen4_init_dc_ops(&hw_data->dc_ops);
545546
adf_gen4_init_ras_ops(&hw_data->ras_ops);
547+
adf_gen4_init_tl_data(&hw_data->tl_data);
546548
adf_init_rl_data(&hw_data->rl_data);
547549
}
548550

drivers/crypto/intel/qat/qat_4xxx/adf_4xxx_hw_data.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
#include <adf_gen4_pm.h>
1616
#include "adf_gen4_ras.h"
1717
#include <adf_gen4_timer.h>
18+
#include <adf_gen4_tl.h>
1819
#include "adf_4xxx_hw_data.h"
1920
#include "icp_qat_hw.h"
2021

@@ -453,6 +454,7 @@ void adf_init_hw_data_4xxx(struct adf_hw_device_data *hw_data, u32 dev_id)
453454
adf_gen4_init_pf_pfvf_ops(&hw_data->pfvf_ops);
454455
adf_gen4_init_dc_ops(&hw_data->dc_ops);
455456
adf_gen4_init_ras_ops(&hw_data->ras_ops);
457+
adf_gen4_init_tl_data(&hw_data->tl_data);
456458
adf_init_rl_data(&hw_data->rl_data);
457459
}
458460

drivers/crypto/intel/qat/qat_common/Makefile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,12 @@ intel_qat-$(CONFIG_DEBUG_FS) += adf_transport_debug.o \
4141
adf_fw_counters.o \
4242
adf_cnv_dbgfs.o \
4343
adf_gen4_pm_debugfs.o \
44+
adf_gen4_tl.o \
4445
adf_heartbeat.o \
4546
adf_heartbeat_dbgfs.o \
4647
adf_pm_dbgfs.o \
48+
adf_telemetry.o \
49+
adf_tl_debugfs.o \
4750
adf_dbgfs.o
4851

4952
intel_qat-$(CONFIG_PCI_IOV) += adf_sriov.o adf_vf_isr.o adf_pfvf_utils.o \

drivers/crypto/intel/qat/qat_common/adf_accel_devices.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
#include <linux/types.h>
1212
#include "adf_cfg_common.h"
1313
#include "adf_rl.h"
14+
#include "adf_telemetry.h"
1415
#include "adf_pfvf_msg.h"
1516

1617
#define ADF_DH895XCC_DEVICE_NAME "dh895xcc"
@@ -254,6 +255,7 @@ struct adf_hw_device_data {
254255
struct adf_ras_ops ras_ops;
255256
struct adf_dev_err_mask dev_err_mask;
256257
struct adf_rl_hw_data rl_data;
258+
struct adf_tl_hw_data tl_data;
257259
const char *fw_name;
258260
const char *fw_mmp_name;
259261
u32 fuses;
@@ -308,6 +310,7 @@ struct adf_hw_device_data {
308310
#define GET_CSR_OPS(accel_dev) (&(accel_dev)->hw_device->csr_ops)
309311
#define GET_PFVF_OPS(accel_dev) (&(accel_dev)->hw_device->pfvf_ops)
310312
#define GET_DC_OPS(accel_dev) (&(accel_dev)->hw_device->dc_ops)
313+
#define GET_TL_DATA(accel_dev) GET_HW_DATA(accel_dev)->tl_data
311314
#define accel_to_pci_dev(accel_ptr) accel_ptr->accel_pci_dev.pci_dev
312315

313316
struct adf_admin_comms;
@@ -356,6 +359,7 @@ struct adf_accel_dev {
356359
struct adf_cfg_device_data *cfg;
357360
struct adf_fw_loader_data *fw_loader;
358361
struct adf_admin_comms *admin;
362+
struct adf_telemetry *telemetry;
359363
struct adf_dc_data *dc_data;
360364
struct adf_pm power_management;
361365
struct list_head crypto_list;

drivers/crypto/intel/qat/qat_common/adf_dbgfs.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
#include "adf_fw_counters.h"
1111
#include "adf_heartbeat_dbgfs.h"
1212
#include "adf_pm_dbgfs.h"
13+
#include "adf_tl_debugfs.h"
1314

1415
/**
1516
* adf_dbgfs_init() - add persistent debugfs entries
@@ -66,6 +67,7 @@ void adf_dbgfs_add(struct adf_accel_dev *accel_dev)
6667
adf_heartbeat_dbgfs_add(accel_dev);
6768
adf_pm_dbgfs_add(accel_dev);
6869
adf_cnv_dbgfs_add(accel_dev);
70+
adf_tl_dbgfs_add(accel_dev);
6971
}
7072
}
7173

@@ -79,6 +81,7 @@ void adf_dbgfs_rm(struct adf_accel_dev *accel_dev)
7981
return;
8082

8183
if (!accel_dev->is_vf) {
84+
adf_tl_dbgfs_rm(accel_dev);
8285
adf_cnv_dbgfs_rm(accel_dev);
8386
adf_pm_dbgfs_rm(accel_dev);
8487
adf_heartbeat_dbgfs_rm(accel_dev);
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
// SPDX-License-Identifier: GPL-2.0-only
2+
/* Copyright (c) 2023 Intel Corporation. */
3+
#include <linux/export.h>
4+
#include <linux/kernel.h>
5+
6+
#include "adf_gen4_tl.h"
7+
#include "adf_telemetry.h"
8+
#include "adf_tl_debugfs.h"
9+
10+
#define ADF_GEN4_TL_DEV_REG_OFF(reg) ADF_TL_DEV_REG_OFF(reg, gen4)
11+
12+
#define ADF_GEN4_TL_SL_UTIL_COUNTER(_name) \
13+
ADF_TL_COUNTER("util_" #_name, \
14+
ADF_TL_SIMPLE_COUNT, \
15+
ADF_TL_SLICE_REG_OFF(_name, reg_tm_slice_util, gen4))
16+
17+
#define ADF_GEN4_TL_SL_EXEC_COUNTER(_name) \
18+
ADF_TL_COUNTER("exec_" #_name, \
19+
ADF_TL_SIMPLE_COUNT, \
20+
ADF_TL_SLICE_REG_OFF(_name, reg_tm_slice_exec_cnt, gen4))
21+
22+
/* Device level counters. */
23+
static const struct adf_tl_dbg_counter dev_counters[] = {
24+
/* PCIe partial transactions. */
25+
ADF_TL_COUNTER(PCI_TRANS_CNT_NAME, ADF_TL_SIMPLE_COUNT,
26+
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_pci_trans_cnt)),
27+
/* Max read latency[ns]. */
28+
ADF_TL_COUNTER(MAX_RD_LAT_NAME, ADF_TL_COUNTER_NS,
29+
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_rd_lat_max)),
30+
/* Read latency average[ns]. */
31+
ADF_TL_COUNTER_LATENCY(RD_LAT_ACC_NAME, ADF_TL_COUNTER_NS_AVG,
32+
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_rd_lat_acc),
33+
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_rd_cmpl_cnt)),
34+
/* Max get to put latency[ns]. */
35+
ADF_TL_COUNTER(MAX_LAT_NAME, ADF_TL_COUNTER_NS,
36+
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_gp_lat_max)),
37+
/* Get to put latency average[ns]. */
38+
ADF_TL_COUNTER_LATENCY(LAT_ACC_NAME, ADF_TL_COUNTER_NS_AVG,
39+
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_gp_lat_acc),
40+
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_ae_put_cnt)),
41+
/* PCIe write bandwidth[Mbps]. */
42+
ADF_TL_COUNTER(BW_IN_NAME, ADF_TL_COUNTER_MBPS,
43+
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_bw_in)),
44+
/* PCIe read bandwidth[Mbps]. */
45+
ADF_TL_COUNTER(BW_OUT_NAME, ADF_TL_COUNTER_MBPS,
46+
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_bw_out)),
47+
/* Page request latency average[ns]. */
48+
ADF_TL_COUNTER_LATENCY(PAGE_REQ_LAT_NAME, ADF_TL_COUNTER_NS_AVG,
49+
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_page_req_lat_acc),
50+
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_page_req_cnt)),
51+
/* Page translation latency average[ns]. */
52+
ADF_TL_COUNTER_LATENCY(AT_TRANS_LAT_NAME, ADF_TL_COUNTER_NS_AVG,
53+
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_trans_lat_acc),
54+
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_trans_lat_cnt)),
55+
/* Maximum uTLB used. */
56+
ADF_TL_COUNTER(AT_MAX_UTLB_USED_NAME, ADF_TL_SIMPLE_COUNT,
57+
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_max_tlb_used)),
58+
};
59+
60+
/* Slice utilization counters. */
61+
static const struct adf_tl_dbg_counter sl_util_counters[ADF_TL_SL_CNT_COUNT] = {
62+
/* Compression slice utilization. */
63+
ADF_GEN4_TL_SL_UTIL_COUNTER(cpr),
64+
/* Translator slice utilization. */
65+
ADF_GEN4_TL_SL_UTIL_COUNTER(xlt),
66+
/* Decompression slice utilization. */
67+
ADF_GEN4_TL_SL_UTIL_COUNTER(dcpr),
68+
/* PKE utilization. */
69+
ADF_GEN4_TL_SL_UTIL_COUNTER(pke),
70+
/* Wireless Authentication slice utilization. */
71+
ADF_GEN4_TL_SL_UTIL_COUNTER(wat),
72+
/* Wireless Cipher slice utilization. */
73+
ADF_GEN4_TL_SL_UTIL_COUNTER(wcp),
74+
/* UCS slice utilization. */
75+
ADF_GEN4_TL_SL_UTIL_COUNTER(ucs),
76+
/* Cipher slice utilization. */
77+
ADF_GEN4_TL_SL_UTIL_COUNTER(cph),
78+
/* Authentication slice utilization. */
79+
ADF_GEN4_TL_SL_UTIL_COUNTER(ath),
80+
};
81+
82+
/* Slice execution counters. */
83+
static const struct adf_tl_dbg_counter sl_exec_counters[ADF_TL_SL_CNT_COUNT] = {
84+
/* Compression slice execution count. */
85+
ADF_GEN4_TL_SL_EXEC_COUNTER(cpr),
86+
/* Translator slice execution count. */
87+
ADF_GEN4_TL_SL_EXEC_COUNTER(xlt),
88+
/* Decompression slice execution count. */
89+
ADF_GEN4_TL_SL_EXEC_COUNTER(dcpr),
90+
/* PKE execution count. */
91+
ADF_GEN4_TL_SL_EXEC_COUNTER(pke),
92+
/* Wireless Authentication slice execution count. */
93+
ADF_GEN4_TL_SL_EXEC_COUNTER(wat),
94+
/* Wireless Cipher slice execution count. */
95+
ADF_GEN4_TL_SL_EXEC_COUNTER(wcp),
96+
/* UCS slice execution count. */
97+
ADF_GEN4_TL_SL_EXEC_COUNTER(ucs),
98+
/* Cipher slice execution count. */
99+
ADF_GEN4_TL_SL_EXEC_COUNTER(cph),
100+
/* Authentication slice execution count. */
101+
ADF_GEN4_TL_SL_EXEC_COUNTER(ath),
102+
};
103+
104+
void adf_gen4_init_tl_data(struct adf_tl_hw_data *tl_data)
105+
{
106+
tl_data->layout_sz = ADF_GEN4_TL_LAYOUT_SZ;
107+
tl_data->slice_reg_sz = ADF_GEN4_TL_SLICE_REG_SZ;
108+
tl_data->num_hbuff = ADF_GEN4_TL_NUM_HIST_BUFFS;
109+
tl_data->msg_cnt_off = ADF_GEN4_TL_MSG_CNT_OFF;
110+
tl_data->cpp_ns_per_cycle = ADF_GEN4_CPP_NS_PER_CYCLE;
111+
tl_data->bw_units_to_bytes = ADF_GEN4_TL_BW_HW_UNITS_TO_BYTES;
112+
113+
tl_data->dev_counters = dev_counters;
114+
tl_data->num_dev_counters = ARRAY_SIZE(dev_counters);
115+
tl_data->sl_util_counters = sl_util_counters;
116+
tl_data->sl_exec_counters = sl_exec_counters;
117+
}
118+
EXPORT_SYMBOL_GPL(adf_gen4_init_tl_data);

0 commit comments

Comments
 (0)