Skip to content

Commit 753c860

Browse files
committed
Daniel Borkmann says: ==================== pull-request: bpf-next 2023-11-30 We've added 30 non-merge commits during the last 7 day(s) which contain a total of 58 files changed, 1598 insertions(+), 154 deletions(-). The main changes are: 1) Add initial TX metadata implementation for AF_XDP with support in mlx5 and stmmac drivers. Two types of offloads are supported right now, that is, TX timestamp and TX checksum offload, from Stanislav Fomichev with stmmac implementation from Song Yoong Siang. 2) Change BPF verifier logic to validate global subprograms lazily instead of unconditionally before the main program, so they can be guarded using BPF CO-RE techniques, from Andrii Nakryiko. 3) Add BPF link_info support for uprobe multi link along with bpftool integration for the latter, from Jiri Olsa. 4) Use pkg-config in BPF selftests to determine ld flags which is in particular needed for linking statically, from Akihiko Odaki. 5) Fix a few BPF selftest failures to adapt to the upcoming LLVM18, from Yonghong Song. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (30 commits) bpf/tests: Remove duplicate JSGT tests selftests/bpf: Add TX side to xdp_hw_metadata selftests/bpf: Convert xdp_hw_metadata to XDP_USE_NEED_WAKEUP selftests/bpf: Add TX side to xdp_metadata selftests/bpf: Add csum helpers selftests/xsk: Support tx_metadata_len xsk: Add option to calculate TX checksum in SW xsk: Validate xsk_tx_metadata flags xsk: Document tx_metadata_len layout net: stmmac: Add Tx HWTS support to XDP ZC net/mlx5e: Implement AF_XDP TX timestamp and checksum offload tools: ynl: Print xsk-features from the sample xsk: Add TX timestamp and TX checksum offload support xsk: Support tx_metadata_len selftests/bpf: Use pkg-config for libelf selftests/bpf: Override PKG_CONFIG for static builds selftests/bpf: Choose pkg-config for the target bpftool: Add support to display uprobe_multi links selftests/bpf: Add link_info test for uprobe_multi link selftests/bpf: Use bpf_link__destroy in fill_link_info tests ... ==================== Conflicts: Documentation/netlink/specs/netdev.yaml: 839ff60 ("net: page_pool: add nlspec for basic access to page pools") 48eb03d ("xsk: Add TX timestamp and TX checksum offload support") https://lore.kernel.org/all/20231201094705.1ee3cab8@canb.auug.org.au/ While at it also regen, tree is dirty after: 48eb03d ("xsk: Add TX timestamp and TX checksum offload support") looks like code wasn't re-rendered after "render-max" was removed. Link: https://lore.kernel.org/r/20231130145708.32573-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 parents 975f2d7 + f690ff9 commit 753c860

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+1590
-158
lines changed

Documentation/netlink/specs/netdev.yaml

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,6 @@ definitions:
4545
-
4646
type: flags
4747
name: xdp-rx-metadata
48-
render-max: true
4948
entries:
5049
-
5150
name: timestamp
@@ -55,6 +54,18 @@ definitions:
5554
name: hash
5655
doc:
5756
Device is capable of exposing receive packet hash via bpf_xdp_metadata_rx_hash().
57+
-
58+
type: flags
59+
name: xsk-flags
60+
entries:
61+
-
62+
name: tx-timestamp
63+
doc:
64+
HW timestamping egress packets is supported by the driver.
65+
-
66+
name: tx-checksum
67+
doc:
68+
L3 checksum HW offload is supported by the driver.
5869

5970
attribute-sets:
6071
-
@@ -86,6 +97,11 @@ attribute-sets:
8697
See Documentation/networking/xdp-rx-metadata.rst for more details.
8798
type: u64
8899
enum: xdp-rx-metadata
100+
-
101+
name: xsk-features
102+
doc: Bitmask of enabled AF_XDP features.
103+
type: u64
104+
enum: xsk-flags
89105
-
90106
name: page-pool
91107
attributes:
@@ -209,6 +225,7 @@ operations:
209225
- xdp-features
210226
- xdp-zc-max-segs
211227
- xdp-rx-metadata-features
228+
- xsk-features
212229
dump:
213230
reply: *dev-all
214231
-

Documentation/networking/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,7 @@ Contents:
124124
xfrm_sync
125125
xfrm_sysctl
126126
xdp-rx-metadata
127+
xsk-tx-metadata
127128

128129
.. only:: subproject and html
129130

Documentation/networking/xdp-rx-metadata.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
13
===============
24
XDP RX Metadata
35
===============
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
==================
2+
AF_XDP TX Metadata
3+
==================
4+
5+
This document describes how to enable offloads when transmitting packets
6+
via :doc:`af_xdp`. Refer to :doc:`xdp-rx-metadata` on how to access similar
7+
metadata on the receive side.
8+
9+
General Design
10+
==============
11+
12+
The headroom for the metadata is reserved via ``tx_metadata_len`` in
13+
``struct xdp_umem_reg``. The metadata length is therefore the same for
14+
every socket that shares the same umem. The metadata layout is a fixed UAPI,
15+
refer to ``union xsk_tx_metadata`` in ``include/uapi/linux/if_xdp.h``.
16+
Thus, generally, the ``tx_metadata_len`` field above should contain
17+
``sizeof(union xsk_tx_metadata)``.
18+
19+
The headroom and the metadata itself should be located right before
20+
``xdp_desc->addr`` in the umem frame. Within a frame, the metadata
21+
layout is as follows::
22+
23+
tx_metadata_len
24+
/ \
25+
+-----------------+---------+----------------------------+
26+
| xsk_tx_metadata | padding | payload |
27+
+-----------------+---------+----------------------------+
28+
^
29+
|
30+
xdp_desc->addr
31+
32+
An AF_XDP application can request headrooms larger than ``sizeof(struct
33+
xsk_tx_metadata)``. The kernel will ignore the padding (and will still
34+
use ``xdp_desc->addr - tx_metadata_len`` to locate
35+
the ``xsk_tx_metadata``). For the frames that shouldn't carry
36+
any metadata (i.e., the ones that don't have ``XDP_TX_METADATA`` option),
37+
the metadata area is ignored by the kernel as well.
38+
39+
The flags field enables the particular offload:
40+
41+
- ``XDP_TXMD_FLAGS_TIMESTAMP``: requests the device to put transmission
42+
timestamp into ``tx_timestamp`` field of ``union xsk_tx_metadata``.
43+
- ``XDP_TXMD_FLAGS_CHECKSUM``: requests the device to calculate L4
44+
checksum. ``csum_start`` specifies byte offset of where the checksumming
45+
should start and ``csum_offset`` specifies byte offset where the
46+
device should store the computed checksum.
47+
48+
Besides the flags above, in order to trigger the offloads, the first
49+
packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA``
50+
bit in the ``options`` field. Also note that in a multi-buffer packet
51+
only the first chunk should carry the metadata.
52+
53+
Software TX Checksum
54+
====================
55+
56+
For development and testing purposes its possible to pass
57+
``XDP_UMEM_TX_SW_CSUM`` flag to ``XDP_UMEM_REG`` UMEM registration call.
58+
In this case, when running in ``XDK_COPY`` mode, the TX checksum
59+
is calculated on the CPU. Do not enable this option in production because
60+
it will negatively affect performance.
61+
62+
Querying Device Capabilities
63+
============================
64+
65+
Every devices exports its offloads capabilities via netlink netdev family.
66+
Refer to ``xsk-flags`` features bitmask in
67+
``Documentation/netlink/specs/netdev.yaml``.
68+
69+
- ``tx-timestamp``: device supports ``XDP_TXMD_FLAGS_TIMESTAMP``
70+
- ``tx-checksum``: device supports ``XDP_TXMD_FLAGS_CHECKSUM``
71+
72+
See ``tools/net/ynl/samples/netdev.c`` on how to query this information.
73+
74+
Example
75+
=======
76+
77+
See ``tools/testing/selftests/bpf/xdp_hw_metadata.c`` for an example
78+
program that handles TX metadata. Also see https://github.com/fomichev/xskgen
79+
for a more bare-bones example.

drivers/net/ethernet/mellanox/mlx5/core/en.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -484,10 +484,12 @@ struct mlx5e_xdp_info_fifo {
484484

485485
struct mlx5e_xdpsq;
486486
struct mlx5e_xmit_data;
487+
struct xsk_tx_metadata;
487488
typedef int (*mlx5e_fp_xmit_xdp_frame_check)(struct mlx5e_xdpsq *);
488489
typedef bool (*mlx5e_fp_xmit_xdp_frame)(struct mlx5e_xdpsq *,
489490
struct mlx5e_xmit_data *,
490-
int);
491+
int,
492+
struct xsk_tx_metadata *);
491493

492494
struct mlx5e_xdpsq {
493495
/* data path */

drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c

Lines changed: 61 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
103103
xdptxd->dma_addr = dma_addr;
104104

105105
if (unlikely(!INDIRECT_CALL_2(sq->xmit_xdp_frame, mlx5e_xmit_xdp_frame_mpwqe,
106-
mlx5e_xmit_xdp_frame, sq, xdptxd, 0)))
106+
mlx5e_xmit_xdp_frame, sq, xdptxd, 0, NULL)))
107107
return false;
108108

109109
/* xmit_mode == MLX5E_XDP_XMIT_MODE_FRAME */
@@ -145,7 +145,7 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
145145
xdptxd->dma_addr = dma_addr;
146146

147147
if (unlikely(!INDIRECT_CALL_2(sq->xmit_xdp_frame, mlx5e_xmit_xdp_frame_mpwqe,
148-
mlx5e_xmit_xdp_frame, sq, xdptxd, 0)))
148+
mlx5e_xmit_xdp_frame, sq, xdptxd, 0, NULL)))
149149
return false;
150150

151151
/* xmit_mode == MLX5E_XDP_XMIT_MODE_PAGE */
@@ -261,6 +261,37 @@ const struct xdp_metadata_ops mlx5e_xdp_metadata_ops = {
261261
.xmo_rx_hash = mlx5e_xdp_rx_hash,
262262
};
263263

264+
struct mlx5e_xsk_tx_complete {
265+
struct mlx5_cqe64 *cqe;
266+
struct mlx5e_cq *cq;
267+
};
268+
269+
static u64 mlx5e_xsk_fill_timestamp(void *_priv)
270+
{
271+
struct mlx5e_xsk_tx_complete *priv = _priv;
272+
u64 ts;
273+
274+
ts = get_cqe_ts(priv->cqe);
275+
276+
if (mlx5_is_real_time_rq(priv->cq->mdev) || mlx5_is_real_time_sq(priv->cq->mdev))
277+
return mlx5_real_time_cyc2time(&priv->cq->mdev->clock, ts);
278+
279+
return mlx5_timecounter_cyc2time(&priv->cq->mdev->clock, ts);
280+
}
281+
282+
static void mlx5e_xsk_request_checksum(u16 csum_start, u16 csum_offset, void *priv)
283+
{
284+
struct mlx5_wqe_eth_seg *eseg = priv;
285+
286+
/* HW/FW is doing parsing, so offsets are largely ignored. */
287+
eseg->cs_flags |= MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_L4_CSUM;
288+
}
289+
290+
const struct xsk_tx_metadata_ops mlx5e_xsk_tx_metadata_ops = {
291+
.tmo_fill_timestamp = mlx5e_xsk_fill_timestamp,
292+
.tmo_request_checksum = mlx5e_xsk_request_checksum,
293+
};
294+
264295
/* returns true if packet was consumed by xdp */
265296
bool mlx5e_xdp_handle(struct mlx5e_rq *rq,
266297
struct bpf_prog *prog, struct mlx5e_xdp_buff *mxbuf)
@@ -398,11 +429,11 @@ INDIRECT_CALLABLE_SCOPE int mlx5e_xmit_xdp_frame_check_mpwqe(struct mlx5e_xdpsq
398429

399430
INDIRECT_CALLABLE_SCOPE bool
400431
mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd,
401-
int check_result);
432+
int check_result, struct xsk_tx_metadata *meta);
402433

403434
INDIRECT_CALLABLE_SCOPE bool
404435
mlx5e_xmit_xdp_frame_mpwqe(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd,
405-
int check_result)
436+
int check_result, struct xsk_tx_metadata *meta)
406437
{
407438
struct mlx5e_tx_mpwqe *session = &sq->mpwqe;
408439
struct mlx5e_xdpsq_stats *stats = sq->stats;
@@ -420,7 +451,7 @@ mlx5e_xmit_xdp_frame_mpwqe(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptx
420451
*/
421452
if (unlikely(sq->mpwqe.wqe))
422453
mlx5e_xdp_mpwqe_complete(sq);
423-
return mlx5e_xmit_xdp_frame(sq, xdptxd, 0);
454+
return mlx5e_xmit_xdp_frame(sq, xdptxd, 0, meta);
424455
}
425456
if (!xdptxd->len) {
426457
skb_frag_t *frag = &xdptxdf->sinfo->frags[0];
@@ -450,6 +481,7 @@ mlx5e_xmit_xdp_frame_mpwqe(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptx
450481
* and it's safe to complete it at any time.
451482
*/
452483
mlx5e_xdp_mpwqe_session_start(sq);
484+
xsk_tx_metadata_request(meta, &mlx5e_xsk_tx_metadata_ops, &session->wqe->eth);
453485
}
454486

455487
mlx5e_xdp_mpwqe_add_dseg(sq, p, stats);
@@ -480,7 +512,7 @@ INDIRECT_CALLABLE_SCOPE int mlx5e_xmit_xdp_frame_check(struct mlx5e_xdpsq *sq)
480512

481513
INDIRECT_CALLABLE_SCOPE bool
482514
mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd,
483-
int check_result)
515+
int check_result, struct xsk_tx_metadata *meta)
484516
{
485517
struct mlx5e_xmit_data_frags *xdptxdf =
486518
container_of(xdptxd, struct mlx5e_xmit_data_frags, xd);
@@ -599,6 +631,8 @@ mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd,
599631
sq->pc++;
600632
}
601633

634+
xsk_tx_metadata_request(meta, &mlx5e_xsk_tx_metadata_ops, eseg);
635+
602636
sq->doorbell_cseg = cseg;
603637

604638
stats->xmit++;
@@ -608,7 +642,9 @@ mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd,
608642
static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,
609643
struct mlx5e_xdp_wqe_info *wi,
610644
u32 *xsk_frames,
611-
struct xdp_frame_bulk *bq)
645+
struct xdp_frame_bulk *bq,
646+
struct mlx5e_cq *cq,
647+
struct mlx5_cqe64 *cqe)
612648
{
613649
struct mlx5e_xdp_info_fifo *xdpi_fifo = &sq->db.xdpi_fifo;
614650
u16 i;
@@ -668,10 +704,24 @@ static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,
668704

669705
break;
670706
}
671-
case MLX5E_XDP_XMIT_MODE_XSK:
707+
case MLX5E_XDP_XMIT_MODE_XSK: {
672708
/* AF_XDP send */
709+
struct xsk_tx_metadata_compl *compl = NULL;
710+
struct mlx5e_xsk_tx_complete priv = {
711+
.cqe = cqe,
712+
.cq = cq,
713+
};
714+
715+
if (xp_tx_metadata_enabled(sq->xsk_pool)) {
716+
xdpi = mlx5e_xdpi_fifo_pop(xdpi_fifo);
717+
compl = &xdpi.xsk_meta;
718+
719+
xsk_tx_metadata_complete(compl, &mlx5e_xsk_tx_metadata_ops, &priv);
720+
}
721+
673722
(*xsk_frames)++;
674723
break;
724+
}
675725
default:
676726
WARN_ON_ONCE(true);
677727
}
@@ -720,7 +770,7 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq)
720770

721771
sqcc += wi->num_wqebbs;
722772

723-
mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, &bq);
773+
mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, &bq, cq, cqe);
724774
} while (!last_wqe);
725775

726776
if (unlikely(get_cqe_opcode(cqe) != MLX5_CQE_REQ)) {
@@ -767,7 +817,7 @@ void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq)
767817

768818
sq->cc += wi->num_wqebbs;
769819

770-
mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, &bq);
820+
mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, &bq, NULL, NULL);
771821
}
772822

773823
xdp_flush_frame_bulk(&bq);
@@ -840,7 +890,7 @@ int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
840890
}
841891

842892
ret = INDIRECT_CALL_2(sq->xmit_xdp_frame, mlx5e_xmit_xdp_frame_mpwqe,
843-
mlx5e_xmit_xdp_frame, sq, xdptxd, 0);
893+
mlx5e_xmit_xdp_frame, sq, xdptxd, 0, NULL);
844894
if (unlikely(!ret)) {
845895
int j;
846896

drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
#define __MLX5_EN_XDP_H__
3434

3535
#include <linux/indirect_call_wrapper.h>
36+
#include <net/xdp_sock.h>
3637

3738
#include "en.h"
3839
#include "en/txrx.h"
@@ -82,7 +83,7 @@ enum mlx5e_xdp_xmit_mode {
8283
* num, page_1, page_2, ... , page_num.
8384
*
8485
* MLX5E_XDP_XMIT_MODE_XSK:
85-
* none.
86+
* frame.xsk_meta.
8687
*/
8788
#define MLX5E_XDP_FIFO_ENTRIES2DS_MAX_RATIO 4
8889

@@ -97,6 +98,7 @@ union mlx5e_xdp_info {
9798
u8 num;
9899
struct page *page;
99100
} page;
101+
struct xsk_tx_metadata_compl xsk_meta;
100102
};
101103

102104
struct mlx5e_xsk_param;
@@ -112,13 +114,16 @@ int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
112114
u32 flags);
113115

114116
extern const struct xdp_metadata_ops mlx5e_xdp_metadata_ops;
117+
extern const struct xsk_tx_metadata_ops mlx5e_xsk_tx_metadata_ops;
115118

116119
INDIRECT_CALLABLE_DECLARE(bool mlx5e_xmit_xdp_frame_mpwqe(struct mlx5e_xdpsq *sq,
117120
struct mlx5e_xmit_data *xdptxd,
118-
int check_result));
121+
int check_result,
122+
struct xsk_tx_metadata *meta));
119123
INDIRECT_CALLABLE_DECLARE(bool mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq,
120124
struct mlx5e_xmit_data *xdptxd,
121-
int check_result));
125+
int check_result,
126+
struct xsk_tx_metadata *meta));
122127
INDIRECT_CALLABLE_DECLARE(int mlx5e_xmit_xdp_frame_check_mpwqe(struct mlx5e_xdpsq *sq));
123128
INDIRECT_CALLABLE_DECLARE(int mlx5e_xmit_xdp_frame_check(struct mlx5e_xdpsq *sq));
124129

0 commit comments

Comments
 (0)