Skip to content

Commit f8cb38c

Browse files
committed
Merge branch 'net-depend-on-instance-lock-for-queue-related-netlink-ops'
Jakub Kicinski says: ==================== net: depend on instance lock for queue related netlink ops netdev-genl used to be protected by rtnl_lock. In previous release we already switched the queue management ops (for Rx zero-copy) to the instance lock. This series converts other ops to depend on the instance lock when possible. Unfortunately queue related state is hard to lock (unlike NAPI) as the process of switching the number of queues usually involves a large reconfiguration of the driver. The reconfig process has historically been under rtnl_lock, but for drivers which opt into ops locking it is also under the instance lock. Leverage that and conditionally take rtnl_lock or instance lock depending on the device capabilities. v1: https://lore.kernel.org/20250407190117.16528-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20250408195956.412733-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 parents 420aabe + ce7b149 commit f8cb38c

File tree

13 files changed

+217
-66
lines changed

13 files changed

+217
-66
lines changed

Documentation/networking/netdevices.rst

Lines changed: 49 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -314,24 +314,22 @@ napi->poll:
314314
softirq
315315
will be called with interrupts disabled by netconsole.
316316

317-
struct netdev_queue_mgmt_ops synchronization rules
318-
==================================================
319-
320-
All queue management ndo callbacks are holding netdev instance lock.
321-
322-
RTNL and netdev instance lock
323-
=============================
317+
netdev instance lock
318+
====================
324319

325320
Historically, all networking control operations were protected by a single
326321
global lock known as ``rtnl_lock``. There is an ongoing effort to replace this
327322
global lock with separate locks for each network namespace. Additionally,
328323
properties of individual netdev are increasingly protected by per-netdev locks.
329324

330325
For device drivers that implement shaping or queue management APIs, all control
331-
operations will be performed under the netdev instance lock. Currently, this
332-
instance lock is acquired within the context of ``rtnl_lock``. The drivers
333-
can also explicitly request instance lock to be acquired via
334-
``request_ops_lock``. In the future, there will be an option for individual
326+
operations will be performed under the netdev instance lock.
327+
Drivers can also explicitly request instance lock to be held during ops
328+
by setting ``request_ops_lock`` to true. Code comments and docs refer
329+
to drivers which have ops called under the instance lock as "ops locked".
330+
See also the documentation of the ``lock`` member of struct net_device.
331+
332+
In the future, there will be an option for individual
335333
drivers to opt out of using ``rtnl_lock`` and instead perform their control
336334
operations directly under the netdev instance lock.
337335

@@ -343,8 +341,46 @@ there are two sets of interfaces: ``dev_xxx`` and ``netif_xxx`` (e.g.,
343341
acquiring the instance lock themselves, while the ``netif_xxx`` functions
344342
assume that the driver has already acquired the instance lock.
345343

344+
struct net_device_ops
345+
---------------------
346+
347+
``ndos`` are called without holding the instance lock for most drivers.
348+
349+
"Ops locked" drivers will have most of the ``ndos`` invoked under
350+
the instance lock.
351+
352+
struct ethtool_ops
353+
------------------
354+
355+
Similarly to ``ndos`` the instance lock is only held for select drivers.
356+
For "ops locked" drivers all ethtool ops without exceptions should
357+
be called under the instance lock.
358+
359+
struct netdev_stat_ops
360+
----------------------
361+
362+
"qstat" ops are invoked under the instance lock for "ops locked" drivers,
363+
and under rtnl_lock for all other drivers.
364+
365+
struct net_shaper_ops
366+
---------------------
367+
368+
All net shaper callbacks are invoked while holding the netdev instance
369+
lock. ``rtnl_lock`` may or may not be held.
370+
371+
Note that supporting net shapers automatically enables "ops locking".
372+
373+
struct netdev_queue_mgmt_ops
374+
----------------------------
375+
376+
All queue management callbacks are invoked while holding the netdev instance
377+
lock. ``rtnl_lock`` may or may not be held.
378+
379+
Note that supporting struct netdev_queue_mgmt_ops automatically enables
380+
"ops locking".
381+
346382
Notifiers and netdev instance lock
347-
==================================
383+
----------------------------------
348384

349385
For device drivers that implement shaping or queue management APIs,
350386
some of the notifiers (``enum netdev_cmd``) are running under the netdev
@@ -354,6 +390,7 @@ For devices with locked ops, currently only the following notifiers are
354390
running under the lock:
355391
* ``NETDEV_REGISTER``
356392
* ``NETDEV_UP``
393+
* ``NETDEV_XDP_FEAT_CHANGE``
357394

358395
The following notifiers are running without the lock:
359396
* ``NETDEV_UNREGISTER``

drivers/net/ethernet/google/gve/gve_main.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2185,7 +2185,7 @@ static void gve_set_netdev_xdp_features(struct gve_priv *priv)
21852185
xdp_features = 0;
21862186
}
21872187

2188-
xdp_set_features_flag(priv->dev, xdp_features);
2188+
xdp_set_features_flag_locked(priv->dev, xdp_features);
21892189
}
21902190

21912191
static int gve_init_priv(struct gve_priv *priv, bool skip_describe_device)

include/linux/netdevice.h

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -688,6 +688,7 @@ struct netdev_queue {
688688
/* Subordinate device that the queue has been assigned to */
689689
struct net_device *sb_dev;
690690
#ifdef CONFIG_XDP_SOCKETS
691+
/* "ops protected", see comment about net_device::lock */
691692
struct xsk_buff_pool *pool;
692693
#endif
693694

@@ -1952,6 +1953,7 @@ enum netdev_reg_state {
19521953
* @priv_destructor: Called from unregister
19531954
* @npinfo: XXX: need comments on this one
19541955
* @nd_net: Network namespace this network device is inside
1956+
* protected by @lock
19551957
*
19561958
* @ml_priv: Mid-layer private
19571959
* @ml_priv_type: Mid-layer private type
@@ -2359,6 +2361,9 @@ struct net_device {
23592361

23602362
bool dismantle;
23612363

2364+
/** @moving_ns: device is changing netns, protected by @lock */
2365+
bool moving_ns;
2366+
23622367
enum {
23632368
RTNL_LINK_INITIALIZED,
23642369
RTNL_LINK_INITIALIZING,
@@ -2521,7 +2526,7 @@ struct net_device {
25212526
* @net_shaper_hierarchy, @reg_state, @threaded
25222527
*
25232528
* Double protects:
2524-
* @up
2529+
* @up, @moving_ns, @nd_net, @xdp_flags
25252530
*
25262531
* Double ops protects:
25272532
* @real_num_rx_queues, @real_num_tx_queues

include/net/netdev_lock.h

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,22 @@ netdev_ops_assert_locked_or_invisible(const struct net_device *dev)
6464
netdev_ops_assert_locked(dev);
6565
}
6666

67+
static inline void netdev_lock_ops_compat(struct net_device *dev)
68+
{
69+
if (netdev_need_ops_lock(dev))
70+
netdev_lock(dev);
71+
else
72+
rtnl_lock();
73+
}
74+
75+
static inline void netdev_unlock_ops_compat(struct net_device *dev)
76+
{
77+
if (netdev_need_ops_lock(dev))
78+
netdev_unlock(dev);
79+
else
80+
rtnl_unlock();
81+
}
82+
6783
static inline int netdev_lock_cmp_fn(const struct lockdep_map *a,
6884
const struct lockdep_map *b)
6985
{

include/net/netdev_queues.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,9 +85,11 @@ struct netdev_queue_stats_tx {
8585
* for some of the events is not maintained, and reliable "total" cannot
8686
* be provided).
8787
*
88+
* Ops are called under the instance lock if netdev_need_ops_lock()
89+
* returns true, otherwise under rtnl_lock.
8890
* Device drivers can assume that when collecting total device stats,
8991
* the @get_base_stats and subsequent per-queue calls are performed
90-
* "atomically" (without releasing the rtnl_lock).
92+
* "atomically" (without releasing the relevant lock).
9193
*
9294
* Device drivers are encouraged to reset the per-queue statistics when
9395
* number of queues change. This is because the primary use case for

include/net/netdev_rx_queue.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,12 @@ struct netdev_rx_queue {
2020
struct net_device *dev;
2121
netdevice_tracker dev_tracker;
2222

23+
/* All fields below are "ops protected",
24+
* see comment about net_device::lock
25+
*/
2326
#ifdef CONFIG_XDP_SOCKETS
2427
struct xsk_buff_pool *pool;
2528
#endif
26-
/* NAPI instance for the queue
27-
* "ops protected", see comment about net_device::lock
28-
*/
2929
struct napi_struct *napi;
3030
struct pp_memory_provider_params mp_params;
3131
} ____cacheline_aligned_in_smp;

include/net/xdp.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -616,6 +616,7 @@ struct xdp_metadata_ops {
616616
u32 bpf_xdp_metadata_kfunc_id(int id);
617617
bool bpf_dev_bound_kfunc_id(u32 btf_id);
618618
void xdp_set_features_flag(struct net_device *dev, xdp_features_t val);
619+
void xdp_set_features_flag_locked(struct net_device *dev, xdp_features_t val);
619620
void xdp_features_set_redirect_target(struct net_device *dev, bool support_sg);
620621
void xdp_features_clear_redirect_target(struct net_device *dev);
621622
#else

net/core/dev.c

Lines changed: 69 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -828,7 +828,7 @@ netdev_napi_by_id_lock(struct net *net, unsigned int napi_id)
828828
dev_hold(dev);
829829
rcu_read_unlock();
830830

831-
dev = __netdev_put_lock(dev);
831+
dev = __netdev_put_lock(dev, net);
832832
if (!dev)
833833
return NULL;
834834

@@ -1039,10 +1039,11 @@ struct net_device *dev_get_by_napi_id(unsigned int napi_id)
10391039
* This helper is intended for locking net_device after it has been looked up
10401040
* using a lockless lookup helper. Lock prevents the instance from going away.
10411041
*/
1042-
struct net_device *__netdev_put_lock(struct net_device *dev)
1042+
struct net_device *__netdev_put_lock(struct net_device *dev, struct net *net)
10431043
{
10441044
netdev_lock(dev);
1045-
if (dev->reg_state > NETREG_REGISTERED) {
1045+
if (dev->reg_state > NETREG_REGISTERED ||
1046+
dev->moving_ns || !net_eq(dev_net(dev), net)) {
10461047
netdev_unlock(dev);
10471048
dev_put(dev);
10481049
return NULL;
@@ -1051,6 +1052,20 @@ struct net_device *__netdev_put_lock(struct net_device *dev)
10511052
return dev;
10521053
}
10531054

1055+
static struct net_device *
1056+
__netdev_put_lock_ops_compat(struct net_device *dev, struct net *net)
1057+
{
1058+
netdev_lock_ops_compat(dev);
1059+
if (dev->reg_state > NETREG_REGISTERED ||
1060+
dev->moving_ns || !net_eq(dev_net(dev), net)) {
1061+
netdev_unlock_ops_compat(dev);
1062+
dev_put(dev);
1063+
return NULL;
1064+
}
1065+
dev_put(dev);
1066+
return dev;
1067+
}
1068+
10541069
/**
10551070
* netdev_get_by_index_lock() - find a device by its ifindex
10561071
* @net: the applicable net namespace
@@ -1070,7 +1085,19 @@ struct net_device *netdev_get_by_index_lock(struct net *net, int ifindex)
10701085
if (!dev)
10711086
return NULL;
10721087

1073-
return __netdev_put_lock(dev);
1088+
return __netdev_put_lock(dev, net);
1089+
}
1090+
1091+
struct net_device *
1092+
netdev_get_by_index_lock_ops_compat(struct net *net, int ifindex)
1093+
{
1094+
struct net_device *dev;
1095+
1096+
dev = dev_get_by_index(net, ifindex);
1097+
if (!dev)
1098+
return NULL;
1099+
1100+
return __netdev_put_lock_ops_compat(dev, net);
10741101
}
10751102

10761103
struct net_device *
@@ -1090,7 +1117,32 @@ netdev_xa_find_lock(struct net *net, struct net_device *dev,
10901117
dev_hold(dev);
10911118
rcu_read_unlock();
10921119

1093-
dev = __netdev_put_lock(dev);
1120+
dev = __netdev_put_lock(dev, net);
1121+
if (dev)
1122+
return dev;
1123+
1124+
(*index)++;
1125+
} while (true);
1126+
}
1127+
1128+
struct net_device *
1129+
netdev_xa_find_lock_ops_compat(struct net *net, struct net_device *dev,
1130+
unsigned long *index)
1131+
{
1132+
if (dev)
1133+
netdev_unlock_ops_compat(dev);
1134+
1135+
do {
1136+
rcu_read_lock();
1137+
dev = xa_find(&net->dev_by_index, index, ULONG_MAX, XA_PRESENT);
1138+
if (!dev) {
1139+
rcu_read_unlock();
1140+
return NULL;
1141+
}
1142+
dev_hold(dev);
1143+
rcu_read_unlock();
1144+
1145+
dev = __netdev_put_lock_ops_compat(dev, net);
10941146
if (dev)
10951147
return dev;
10961148

@@ -12157,7 +12209,11 @@ int __dev_change_net_namespace(struct net_device *dev, struct net *net,
1215712209
netif_close(dev);
1215812210
/* And unlink it from device chain */
1215912211
unlist_netdevice(dev);
12160-
netdev_unlock_ops(dev);
12212+
12213+
if (!netdev_need_ops_lock(dev))
12214+
netdev_lock(dev);
12215+
dev->moving_ns = true;
12216+
netdev_unlock(dev);
1216112217

1216212218
synchronize_net();
1216312219

@@ -12193,7 +12249,9 @@ int __dev_change_net_namespace(struct net_device *dev, struct net *net,
1219312249
move_netdevice_notifiers_dev_net(dev, net);
1219412250

1219512251
/* Actually switch the network namespace */
12252+
netdev_lock(dev);
1219612253
dev_net_set(dev, net);
12254+
netdev_unlock(dev);
1219712255
dev->ifindex = new_ifindex;
1219812256

1219912257
if (new_name[0]) {
@@ -12219,7 +12277,11 @@ int __dev_change_net_namespace(struct net_device *dev, struct net *net,
1221912277
err = netdev_change_owner(dev, net_old, net);
1222012278
WARN_ON(err);
1222112279

12222-
netdev_lock_ops(dev);
12280+
netdev_lock(dev);
12281+
dev->moving_ns = false;
12282+
if (!netdev_need_ops_lock(dev))
12283+
netdev_unlock(dev);
12284+
1222312285
/* Add the device back in the hashes */
1222412286
list_netdevice(dev);
1222512287
/* Notify protocols, that a new device appeared. */

net/core/dev.h

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ netdev_napi_by_id_lock(struct net *net, unsigned int napi_id);
3030
struct net_device *dev_get_by_napi_id(unsigned int napi_id);
3131

3232
struct net_device *netdev_get_by_index_lock(struct net *net, int ifindex);
33-
struct net_device *__netdev_put_lock(struct net_device *dev);
33+
struct net_device *__netdev_put_lock(struct net_device *dev, struct net *net);
3434
struct net_device *
3535
netdev_xa_find_lock(struct net *net, struct net_device *dev,
3636
unsigned long *index);
@@ -42,6 +42,21 @@ DEFINE_FREE(netdev_unlock, struct net_device *, if (_T) netdev_unlock(_T));
4242
(var_name = netdev_xa_find_lock(net, var_name, &ifindex)); \
4343
ifindex++)
4444

45+
struct net_device *
46+
netdev_get_by_index_lock_ops_compat(struct net *net, int ifindex);
47+
struct net_device *
48+
netdev_xa_find_lock_ops_compat(struct net *net, struct net_device *dev,
49+
unsigned long *index);
50+
51+
DEFINE_FREE(netdev_unlock_ops_compat, struct net_device *,
52+
if (_T) netdev_unlock_ops_compat(_T));
53+
54+
#define for_each_netdev_lock_ops_compat_scoped(net, var_name, ifindex) \
55+
for (struct net_device *var_name __free(netdev_unlock_ops_compat) = NULL; \
56+
(var_name = netdev_xa_find_lock_ops_compat(net, var_name, \
57+
&ifindex)); \
58+
ifindex++)
59+
4560
#ifdef CONFIG_PROC_FS
4661
int __init dev_proc_init(void);
4762
#else

net/core/lock_debug.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ int netdev_debug_event(struct notifier_block *nb, unsigned long event,
2020
switch (cmd) {
2121
case NETDEV_REGISTER:
2222
case NETDEV_UP:
23+
case NETDEV_XDP_FEAT_CHANGE:
2324
netdev_ops_assert_locked(dev);
2425
fallthrough;
2526
case NETDEV_DOWN:
@@ -58,7 +59,6 @@ int netdev_debug_event(struct notifier_block *nb, unsigned long event,
5859
case NETDEV_OFFLOAD_XSTATS_DISABLE:
5960
case NETDEV_OFFLOAD_XSTATS_REPORT_USED:
6061
case NETDEV_OFFLOAD_XSTATS_REPORT_DELTA:
61-
case NETDEV_XDP_FEAT_CHANGE:
6262
ASSERT_RTNL();
6363
break;
6464

0 commit comments

Comments
 (0)