Skip to content

Commit eb52707

Browse files
pilotAlpalherbertx
authored andcommitted
crypto: qat - add support for ring pair level telemetry
Expose through debugfs ring pair telemetry data for QAT GEN4 devices. This allows to gather metrics about the PCIe channel and device TLB for a selected ring pair. It is possible to monitor maximum 4 ring pairs at the time per device. For details, refer to debugfs-driver-qat_telemetry in Documentation/ABI. This patch is based on earlier work done by Wojciech Ziemba. Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Damian Muszynski <damian.muszynski@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
1 parent 69e7649 commit eb52707

File tree

11 files changed

+449
-5
lines changed

11 files changed

+449
-5
lines changed

Documentation/ABI/testing/debugfs-driver-qat_telemetry

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,3 +101,128 @@ Description: (RO) Reports device telemetry counters.
101101
attribute is not reported.
102102

103103
This attribute is only available for qat_4xxx devices.
104+
105+
What: /sys/kernel/debug/qat_<device>_<BDF>/telemetry/rp_<A/B/C/D>_data
106+
Date: March 2024
107+
KernelVersion: 6.8
108+
Contact: qat-linux@intel.com
109+
Description: (RW) Selects up to 4 Ring Pairs (RP) to monitor, one per file,
110+
and report telemetry counters related to each.
111+
112+
Allowed values to write:
113+
========================
114+
* 0 to ``<num_rps - 1>``:
115+
Ring pair to be monitored. The value of ``num_rps`` can be
116+
retrieved through ``/sys/bus/pci/devices/<BDF>/qat/num_rps``.
117+
See Documentation/ABI/testing/sysfs-driver-qat.
118+
119+
Reads report metrics about performance and utilization of
120+
the selected RP:
121+
122+
======================= ========================================
123+
Field Description
124+
======================= ========================================
125+
sample_cnt number of acquisitions of telemetry data
126+
from the device. Reads are performed
127+
every 1000 ms
128+
rp_num RP number associated with slot <A/B/C/D>
129+
service_type service associated to the RP
130+
pci_trans_cnt number of PCIe partial transactions
131+
gp_lat_acc_avg average get to put latency [ns]
132+
bw_in PCIe, write bandwidth [Mbps]
133+
bw_out PCIe, read bandwidth [Mbps]
134+
at_glob_devtlb_hit Message descriptor DevTLB hit rate
135+
at_glob_devtlb_miss Message descriptor DevTLB miss rate
136+
tl_at_payld_devtlb_hit Payload DevTLB hit rate
137+
tl_at_payld_devtlb_miss Payload DevTLB miss rate
138+
======================= ========================================
139+
140+
Example.
141+
142+
Writing the value '32' to the file ``rp_C_data`` starts the
143+
collection of telemetry metrics for ring pair 32::
144+
145+
echo 32 > /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/rp_C_data
146+
147+
Once a ring pair is selected, statistics can be read accessing
148+
the file::
149+
150+
cat /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/rp_C_data
151+
152+
If ``control`` is set to 1, only the current values of the
153+
counters are displayed::
154+
155+
<counter_name> <current>
156+
157+
If ``control`` is 2, 3 or 4, counters are displayed in the
158+
following format::
159+
160+
<counter_name> <current> <min> <max> <avg>
161+
162+
163+
On QAT GEN4 devices there are 64 RPs on a PF, so the allowed
164+
values are 0..63. This number is absolute to the device.
165+
If Virtual Functions (VF) are used, the ring pair number can
166+
be derived from the Bus, Device, Function of the VF:
167+
168+
============ ====== ====== ====== ======
169+
PCI BDF/VF RP0 RP1 RP2 RP3
170+
============ ====== ====== ====== ======
171+
0000:6b:0.1 RP 0 RP 1 RP 2 RP 3
172+
0000:6b:0.2 RP 4 RP 5 RP 6 RP 7
173+
0000:6b:0.3 RP 8 RP 9 RP 10 RP 11
174+
0000:6b:0.4 RP 12 RP 13 RP 14 RP 15
175+
0000:6b:0.5 RP 16 RP 17 RP 18 RP 19
176+
0000:6b:0.6 RP 20 RP 21 RP 22 RP 23
177+
0000:6b:0.7 RP 24 RP 25 RP 26 RP 27
178+
0000:6b:1.0 RP 28 RP 29 RP 30 RP 31
179+
0000:6b:1.1 RP 32 RP 33 RP 34 RP 35
180+
0000:6b:1.2 RP 36 RP 37 RP 38 RP 39
181+
0000:6b:1.3 RP 40 RP 41 RP 42 RP 43
182+
0000:6b:1.4 RP 44 RP 45 RP 46 RP 47
183+
0000:6b:1.5 RP 48 RP 49 RP 50 RP 51
184+
0000:6b:1.6 RP 52 RP 53 RP 54 RP 55
185+
0000:6b:1.7 RP 56 RP 57 RP 58 RP 59
186+
0000:6b:2.0 RP 60 RP 61 RP 62 RP 63
187+
============ ====== ====== ====== ======
188+
189+
The mapping is only valid for the BDFs of VFs on the host.
190+
191+
192+
The service provided on a ring-pair varies depending on the
193+
configuration. The configuration for a given device can be
194+
queried and set using ``cfg_services``.
195+
See Documentation/ABI/testing/sysfs-driver-qat for details.
196+
197+
The following table reports how ring pairs are mapped to VFs
198+
on the PF 0000:6b:0.0 configured for `sym;asym` or `asym;sym`:
199+
200+
=========== ============ =========== ============ ===========
201+
PCI BDF/VF RP0/service RP1/service RP2/service RP3/service
202+
=========== ============ =========== ============ ===========
203+
0000:6b:0.1 RP 0 asym RP 1 sym RP 2 asym RP 3 sym
204+
0000:6b:0.2 RP 4 asym RP 5 sym RP 6 asym RP 7 sym
205+
0000:6b:0.3 RP 8 asym RP 9 sym RP10 asym RP11 sym
206+
... ... ... ... ...
207+
=========== ============ =========== ============ ===========
208+
209+
All VFs follow the same pattern.
210+
211+
212+
The following table reports how ring pairs are mapped to VFs on
213+
the PF 0000:6b:0.0 configured for `dc`:
214+
215+
=========== ============ =========== ============ ===========
216+
PCI BDF/VF RP0/service RP1/service RP2/service RP3/service
217+
=========== ============ =========== ============ ===========
218+
0000:6b:0.1 RP 0 dc RP 1 dc RP 2 dc RP 3 dc
219+
0000:6b:0.2 RP 4 dc RP 5 dc RP 6 dc RP 7 dc
220+
0000:6b:0.3 RP 8 dc RP 9 dc RP10 dc RP11 dc
221+
... ... ... ... ...
222+
=========== ============ =========== ============ ===========
223+
224+
The mapping of a RP to a service can be retrieved using
225+
``rp2srv`` from sysfs.
226+
See Documentation/ABI/testing/sysfs-driver-qat for details.
227+
228+
This attribute is only available for qat_4xxx devices.

drivers/crypto/intel/qat/qat_420xx/adf_420xx_hw_data.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -520,6 +520,7 @@ void adf_init_hw_data_420xx(struct adf_hw_device_data *hw_data, u32 dev_id)
520520
hw_data->init_device = adf_gen4_init_device;
521521
hw_data->reset_device = adf_reset_flr;
522522
hw_data->admin_ae_mask = ADF_420XX_ADMIN_AE_MASK;
523+
hw_data->num_rps = ADF_GEN4_MAX_RPS;
523524
hw_data->fw_name = ADF_420XX_FW;
524525
hw_data->fw_mmp_name = ADF_420XX_MMP;
525526
hw_data->uof_get_name = uof_get_name_420xx;

drivers/crypto/intel/qat/qat_4xxx/adf_4xxx_hw_data.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -421,6 +421,7 @@ void adf_init_hw_data_4xxx(struct adf_hw_device_data *hw_data, u32 dev_id)
421421
hw_data->init_device = adf_gen4_init_device;
422422
hw_data->reset_device = adf_reset_flr;
423423
hw_data->admin_ae_mask = ADF_4XXX_ADMIN_AE_MASK;
424+
hw_data->num_rps = ADF_GEN4_MAX_RPS;
424425
switch (dev_id) {
425426
case ADF_402XX_PCI_DEVICE_ID:
426427
hw_data->fw_name = ADF_402XX_FW;

drivers/crypto/intel/qat/qat_common/adf_accel_devices.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -278,6 +278,7 @@ struct adf_hw_device_data {
278278
u8 num_logical_accel;
279279
u8 num_engines;
280280
u32 num_hb_ctrs;
281+
u8 num_rps;
281282
};
282283

283284
/* CSR write macro */

drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
#define ADF_GEN4_MSIX_RTTABLE_OFFSET(i) (0x409000 + ((i) * 0x04))
3737

3838
/* Bank and ring configuration */
39+
#define ADF_GEN4_MAX_RPS 64
3940
#define ADF_GEN4_NUM_RINGS_PER_BANK 2
4041
#define ADF_GEN4_NUM_BANKS_PER_VF 4
4142
#define ADF_GEN4_ETR_MAX_BANKS 64

drivers/crypto/intel/qat/qat_common/adf_gen4_tl.c

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@
99

1010
#define ADF_GEN4_TL_DEV_REG_OFF(reg) ADF_TL_DEV_REG_OFF(reg, gen4)
1111

12+
#define ADF_GEN4_TL_RP_REG_OFF(reg) ADF_TL_RP_REG_OFF(reg, gen4)
13+
1214
#define ADF_GEN4_TL_SL_UTIL_COUNTER(_name) \
1315
ADF_TL_COUNTER("util_" #_name, \
1416
ADF_TL_SIMPLE_COUNT, \
@@ -101,11 +103,42 @@ static const struct adf_tl_dbg_counter sl_exec_counters[ADF_TL_SL_CNT_COUNT] = {
101103
ADF_GEN4_TL_SL_EXEC_COUNTER(ath),
102104
};
103105

106+
/* Ring pair counters. */
107+
static const struct adf_tl_dbg_counter rp_counters[] = {
108+
/* PCIe partial transactions. */
109+
ADF_TL_COUNTER(PCI_TRANS_CNT_NAME, ADF_TL_SIMPLE_COUNT,
110+
ADF_GEN4_TL_RP_REG_OFF(reg_tl_pci_trans_cnt)),
111+
/* Get to put latency average[ns]. */
112+
ADF_TL_COUNTER_LATENCY(LAT_ACC_NAME, ADF_TL_COUNTER_NS_AVG,
113+
ADF_GEN4_TL_RP_REG_OFF(reg_tl_gp_lat_acc),
114+
ADF_GEN4_TL_RP_REG_OFF(reg_tl_ae_put_cnt)),
115+
/* PCIe write bandwidth[Mbps]. */
116+
ADF_TL_COUNTER(BW_IN_NAME, ADF_TL_COUNTER_MBPS,
117+
ADF_GEN4_TL_RP_REG_OFF(reg_tl_bw_in)),
118+
/* PCIe read bandwidth[Mbps]. */
119+
ADF_TL_COUNTER(BW_OUT_NAME, ADF_TL_COUNTER_MBPS,
120+
ADF_GEN4_TL_RP_REG_OFF(reg_tl_bw_out)),
121+
/* Message descriptor DevTLB hit rate. */
122+
ADF_TL_COUNTER(AT_GLOB_DTLB_HIT_NAME, ADF_TL_SIMPLE_COUNT,
123+
ADF_GEN4_TL_RP_REG_OFF(reg_tl_at_glob_devtlb_hit)),
124+
/* Message descriptor DevTLB miss rate. */
125+
ADF_TL_COUNTER(AT_GLOB_DTLB_MISS_NAME, ADF_TL_SIMPLE_COUNT,
126+
ADF_GEN4_TL_RP_REG_OFF(reg_tl_at_glob_devtlb_miss)),
127+
/* Payload DevTLB hit rate. */
128+
ADF_TL_COUNTER(AT_PAYLD_DTLB_HIT_NAME, ADF_TL_SIMPLE_COUNT,
129+
ADF_GEN4_TL_RP_REG_OFF(reg_tl_at_payld_devtlb_hit)),
130+
/* Payload DevTLB miss rate. */
131+
ADF_TL_COUNTER(AT_PAYLD_DTLB_MISS_NAME, ADF_TL_SIMPLE_COUNT,
132+
ADF_GEN4_TL_RP_REG_OFF(reg_tl_at_payld_devtlb_miss)),
133+
};
134+
104135
void adf_gen4_init_tl_data(struct adf_tl_hw_data *tl_data)
105136
{
106137
tl_data->layout_sz = ADF_GEN4_TL_LAYOUT_SZ;
107138
tl_data->slice_reg_sz = ADF_GEN4_TL_SLICE_REG_SZ;
139+
tl_data->rp_reg_sz = ADF_GEN4_TL_RP_REG_SZ;
108140
tl_data->num_hbuff = ADF_GEN4_TL_NUM_HIST_BUFFS;
141+
tl_data->max_rp = ADF_GEN4_TL_MAX_RP_NUM;
109142
tl_data->msg_cnt_off = ADF_GEN4_TL_MSG_CNT_OFF;
110143
tl_data->cpp_ns_per_cycle = ADF_GEN4_CPP_NS_PER_CYCLE;
111144
tl_data->bw_units_to_bytes = ADF_GEN4_TL_BW_HW_UNITS_TO_BYTES;
@@ -114,5 +147,7 @@ void adf_gen4_init_tl_data(struct adf_tl_hw_data *tl_data)
114147
tl_data->num_dev_counters = ARRAY_SIZE(dev_counters);
115148
tl_data->sl_util_counters = sl_util_counters;
116149
tl_data->sl_exec_counters = sl_exec_counters;
150+
tl_data->rp_counters = rp_counters;
151+
tl_data->num_rp_counters = ARRAY_SIZE(rp_counters);
117152
}
118153
EXPORT_SYMBOL_GPL(adf_gen4_init_tl_data);

drivers/crypto/intel/qat/qat_common/adf_gen4_tl.h

Lines changed: 39 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,9 @@ struct adf_tl_hw_data;
2121
/* Max number of HW resources of one type. */
2222
#define ADF_GEN4_TL_MAX_SLICES_PER_TYPE 24
2323

24+
/* Max number of simultaneously monitored ring pairs. */
25+
#define ADF_GEN4_TL_MAX_RP_NUM 4
26+
2427
/**
2528
* struct adf_gen4_tl_slice_data_regs - HW slice data as populated by FW.
2629
* @reg_tm_slice_exec_cnt: Slice execution count.
@@ -92,18 +95,52 @@ struct adf_gen4_tl_device_data_regs {
9295
struct adf_gen4_tl_slice_data_regs wcp_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
9396
};
9497

98+
/**
99+
* struct adf_gen4_tl_ring_pair_data_regs - This structure stores Ring Pair
100+
* telemetry counter values as are being populated periodically by device.
101+
* @reg_tl_gp_lat_acc: get-put latency accumulator
102+
* @reserved: reserved
103+
* @reg_tl_pci_trans_cnt: PCIe partial transactions
104+
* @reg_tl_ae_put_cnt: Accelerator Engine put counts across all rings
105+
* @reg_tl_bw_in: PCIe write bandwidth
106+
* @reg_tl_bw_out: PCIe read bandwidth
107+
* @reg_tl_at_glob_devtlb_hit: Message descriptor DevTLB hit rate
108+
* @reg_tl_at_glob_devtlb_miss: Message descriptor DevTLB miss rate
109+
* @reg_tl_at_payld_devtlb_hit: Payload DevTLB hit rate
110+
* @reg_tl_at_payld_devtlb_miss: Payload DevTLB miss rate
111+
* @reg_tl_re_cnt: ring empty time samples count
112+
* @reserved1: reserved
113+
*/
114+
struct adf_gen4_tl_ring_pair_data_regs {
115+
__u64 reg_tl_gp_lat_acc;
116+
__u64 reserved;
117+
__u32 reg_tl_pci_trans_cnt;
118+
__u32 reg_tl_ae_put_cnt;
119+
__u32 reg_tl_bw_in;
120+
__u32 reg_tl_bw_out;
121+
__u32 reg_tl_at_glob_devtlb_hit;
122+
__u32 reg_tl_at_glob_devtlb_miss;
123+
__u32 reg_tl_at_payld_devtlb_hit;
124+
__u32 reg_tl_at_payld_devtlb_miss;
125+
__u32 reg_tl_re_cnt;
126+
__u32 reserved1;
127+
};
128+
129+
#define ADF_GEN4_TL_RP_REG_SZ sizeof(struct adf_gen4_tl_ring_pair_data_regs)
130+
95131
/**
96132
* struct adf_gen4_tl_layout - This structure represents entire telemetry
97133
* counters data: Device + 4 Ring Pairs as are being populated periodically
98134
* by device.
99135
* @tl_device_data_regs: structure of device telemetry registers
100-
* @reserved1: reserved
136+
* @tl_ring_pairs_data_regs: array of ring pairs telemetry registers
101137
* @reg_tl_msg_cnt: telemetry messages counter
102138
* @reserved: reserved
103139
*/
104140
struct adf_gen4_tl_layout {
105141
struct adf_gen4_tl_device_data_regs tl_device_data_regs;
106-
__u32 reserved1[14];
142+
struct adf_gen4_tl_ring_pair_data_regs
143+
tl_ring_pairs_data_regs[ADF_GEN4_TL_MAX_RP_NUM];
107144
__u32 reg_tl_msg_cnt;
108145
__u32 reserved;
109146
};

drivers/crypto/intel/qat/qat_common/adf_telemetry.c

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,9 @@ static int validate_tl_data(struct adf_tl_hw_data *tl_data)
3333
if (!tl_data->dev_counters ||
3434
TL_IS_ZERO(tl_data->num_dev_counters) ||
3535
!tl_data->sl_util_counters ||
36-
!tl_data->sl_exec_counters)
36+
!tl_data->sl_exec_counters ||
37+
!tl_data->rp_counters ||
38+
TL_IS_ZERO(tl_data->num_rp_counters))
3739
return -EOPNOTSUPP;
3840

3941
return 0;
@@ -53,11 +55,17 @@ static int adf_tl_alloc_mem(struct adf_accel_dev *accel_dev)
5355
if (!telemetry)
5456
return -ENOMEM;
5557

58+
telemetry->rp_num_indexes = kmalloc_array(tl_data->max_rp,
59+
sizeof(*telemetry->rp_num_indexes),
60+
GFP_KERNEL);
61+
if (!telemetry->rp_num_indexes)
62+
goto err_free_tl;
63+
5664
telemetry->regs_hist_buff = kmalloc_array(tl_data->num_hbuff,
5765
sizeof(*telemetry->regs_hist_buff),
5866
GFP_KERNEL);
5967
if (!telemetry->regs_hist_buff)
60-
goto err_free_tl;
68+
goto err_free_rp_indexes;
6169

6270
telemetry->regs_data = dma_alloc_coherent(dev, regs_sz,
6371
&telemetry->regs_data_p,
@@ -86,6 +94,8 @@ static int adf_tl_alloc_mem(struct adf_accel_dev *accel_dev)
8694

8795
err_free_regs_hist_buff:
8896
kfree(telemetry->regs_hist_buff);
97+
err_free_rp_indexes:
98+
kfree(telemetry->rp_num_indexes);
8999
err_free_tl:
90100
kfree(telemetry);
91101

@@ -107,6 +117,7 @@ static void adf_tl_free_mem(struct adf_accel_dev *accel_dev)
107117
telemetry->regs_data_p);
108118

109119
kfree(telemetry->regs_hist_buff);
120+
kfree(telemetry->rp_num_indexes);
110121
kfree(telemetry);
111122
accel_dev->telemetry = NULL;
112123
}
@@ -196,7 +207,8 @@ int adf_tl_run(struct adf_accel_dev *accel_dev, int state)
196207
int ret;
197208

198209
ret = adf_send_admin_tl_start(accel_dev, telemetry->regs_data_p,
199-
layout_sz, NULL, &telemetry->slice_cnt);
210+
layout_sz, telemetry->rp_num_indexes,
211+
&telemetry->slice_cnt);
200212
if (ret) {
201213
dev_err(dev, "failed to start telemetry\n");
202214
return ret;
@@ -213,8 +225,10 @@ int adf_tl_run(struct adf_accel_dev *accel_dev, int state)
213225
int adf_tl_init(struct adf_accel_dev *accel_dev)
214226
{
215227
struct adf_tl_hw_data *tl_data = &GET_TL_DATA(accel_dev);
228+
u8 max_rp = GET_TL_DATA(accel_dev).max_rp;
216229
struct device *dev = &GET_DEV(accel_dev);
217230
struct adf_telemetry *telemetry;
231+
unsigned int i;
218232
int ret;
219233

220234
ret = validate_tl_data(tl_data);
@@ -234,6 +248,9 @@ int adf_tl_init(struct adf_accel_dev *accel_dev)
234248
mutex_init(&telemetry->regs_hist_lock);
235249
INIT_DELAYED_WORK(&telemetry->work_ctx, tl_work_handler);
236250

251+
for (i = 0; i < max_rp; i++)
252+
telemetry->rp_num_indexes[i] = ADF_TL_RP_REGS_DISABLED;
253+
237254
return 0;
238255
}
239256

drivers/crypto/intel/qat/qat_common/adf_telemetry.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,17 +23,23 @@ struct dentry;
2323
/* Interval within timer interrupt should be handled. Value in milliseconds. */
2424
#define ADF_TL_TIMER_INT_MS (ADF_TL_DATA_WR_INTERVAL_MS / 2)
2525

26+
#define ADF_TL_RP_REGS_DISABLED (0xff)
27+
2628
struct adf_tl_hw_data {
2729
size_t layout_sz;
2830
size_t slice_reg_sz;
31+
size_t rp_reg_sz;
2932
size_t msg_cnt_off;
3033
const struct adf_tl_dbg_counter *dev_counters;
3134
const struct adf_tl_dbg_counter *sl_util_counters;
3235
const struct adf_tl_dbg_counter *sl_exec_counters;
36+
const struct adf_tl_dbg_counter *rp_counters;
3337
u8 num_hbuff;
3438
u8 cpp_ns_per_cycle;
3539
u8 bw_units_to_bytes;
3640
u8 num_dev_counters;
41+
u8 num_rp_counters;
42+
u8 max_rp;
3743
};
3844

3945
struct adf_telemetry {
@@ -50,6 +56,7 @@ struct adf_telemetry {
5056
*/
5157
void **regs_hist_buff;
5258
struct dentry *dbg_dir;
59+
u8 *rp_num_indexes;
5360
/**
5461
* @regs_hist_lock: protects from race conditions between write and read
5562
* to the copies referenced by @regs_hist_buff

0 commit comments

Comments
 (0)