Skip to content

Commit 102b55e

Browse files
Yunsheng Lindavem330
authored andcommitted
net: sched: fix tx action rescheduling issue during deactivation
Currently qdisc_run() checks the STATE_DEACTIVATED of lockless qdisc before calling __qdisc_run(), which ultimately clear the STATE_MISSED when all the skb is dequeued. If STATE_DEACTIVATED is set before clearing STATE_MISSED, there may be rescheduling of net_tx_action() at the end of qdisc_run_end(), see below: CPU0(net_tx_atcion) CPU1(__dev_xmit_skb) CPU2(dev_deactivate) . . . . set STATE_MISSED . . __netif_schedule() . . . set STATE_DEACTIVATED . . qdisc_reset() . . . .<--------------- . synchronize_net() clear __QDISC_STATE_SCHED | . . . | . . . | . some_qdisc_is_busy() . | . return *false* . | . . test STATE_DEACTIVATED | . . __qdisc_run() *not* called | . . . | . . test STATE_MISS | . . __netif_schedule()--------| . . . . . . . . __qdisc_run() is not called by net_tx_atcion() in CPU0 because CPU2 has set STATE_DEACTIVATED flag during dev_deactivate(), and STATE_MISSED is only cleared in __qdisc_run(), __netif_schedule is called at the end of qdisc_run_end(), causing tx action rescheduling problem. qdisc_run() called by net_tx_action() runs in the softirq context, which should has the same semantic as the qdisc_run() called by __dev_xmit_skb() protected by rcu_read_lock_bh(). And there is a synchronize_net() between STATE_DEACTIVATED flag being set and qdisc_reset()/some_qdisc_is_busy in dev_deactivate(), we can safely bail out for the deactived lockless qdisc in net_tx_action(), and qdisc_reset() will reset all skb not dequeued yet. So add the rcu_read_lock() explicitly to protect the qdisc_run() and do the STATE_DEACTIVATED checking in net_tx_action() before calling qdisc_run_begin(). Another option is to do the checking in the qdisc_run_end(), but it will add unnecessary overhead for non-tx_action case, because __dev_queue_xmit() will not see qdisc with STATE_DEACTIVATED after synchronize_net(), the qdisc with STATE_DEACTIVATED can only be seen by net_tx_action() because of __netif_schedule(). The STATE_DEACTIVATED checking in qdisc_run() is to avoid race between net_tx_action() and qdisc_reset(), see: commit d518d2e ("net/sched: fix race between deactivation and dequeue for NOLOCK qdisc"). As the bailout added above for deactived lockless qdisc in net_tx_action() provides better protection for the race without calling qdisc_run() at all, so remove the STATE_DEACTIVATED checking in qdisc_run(). After qdisc_reset(), there is no skb in qdisc to be dequeued, so clear the STATE_MISSED in dev_reset_queue() too. Fixes: 6b3ba91 ("net: sched: allow qdiscs to handle locking") Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> V8: Clearing STATE_MISSED before calling __netif_schedule() has avoid the endless rescheduling problem, but there may still be a unnecessary rescheduling, so adjust the commit log. Signed-off-by: David S. Miller <davem@davemloft.net>
1 parent a90c57f commit 102b55e

File tree

3 files changed

+26
-11
lines changed

3 files changed

+26
-11
lines changed

include/net/pkt_sched.h

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -128,12 +128,7 @@ void __qdisc_run(struct Qdisc *q);
128128
static inline void qdisc_run(struct Qdisc *q)
129129
{
130130
if (qdisc_run_begin(q)) {
131-
/* NOLOCK qdisc must check 'state' under the qdisc seqlock
132-
* to avoid racing with dev_qdisc_reset()
133-
*/
134-
if (!(q->flags & TCQ_F_NOLOCK) ||
135-
likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
136-
__qdisc_run(q);
131+
__qdisc_run(q);
137132
qdisc_run_end(q);
138133
}
139134
}

net/core/dev.c

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5025,25 +5025,43 @@ static __latent_entropy void net_tx_action(struct softirq_action *h)
50255025
sd->output_queue_tailp = &sd->output_queue;
50265026
local_irq_enable();
50275027

5028+
rcu_read_lock();
5029+
50285030
while (head) {
50295031
struct Qdisc *q = head;
50305032
spinlock_t *root_lock = NULL;
50315033

50325034
head = head->next_sched;
50335035

5034-
if (!(q->flags & TCQ_F_NOLOCK)) {
5035-
root_lock = qdisc_lock(q);
5036-
spin_lock(root_lock);
5037-
}
50385036
/* We need to make sure head->next_sched is read
50395037
* before clearing __QDISC_STATE_SCHED
50405038
*/
50415039
smp_mb__before_atomic();
5040+
5041+
if (!(q->flags & TCQ_F_NOLOCK)) {
5042+
root_lock = qdisc_lock(q);
5043+
spin_lock(root_lock);
5044+
} else if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED,
5045+
&q->state))) {
5046+
/* There is a synchronize_net() between
5047+
* STATE_DEACTIVATED flag being set and
5048+
* qdisc_reset()/some_qdisc_is_busy() in
5049+
* dev_deactivate(), so we can safely bail out
5050+
* early here to avoid data race between
5051+
* qdisc_deactivate() and some_qdisc_is_busy()
5052+
* for lockless qdisc.
5053+
*/
5054+
clear_bit(__QDISC_STATE_SCHED, &q->state);
5055+
continue;
5056+
}
5057+
50425058
clear_bit(__QDISC_STATE_SCHED, &q->state);
50435059
qdisc_run(q);
50445060
if (root_lock)
50455061
spin_unlock(root_lock);
50465062
}
5063+
5064+
rcu_read_unlock();
50475065
}
50485066

50495067
xfrm_dev_backlog(sd);

net/sched/sch_generic.c

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1177,8 +1177,10 @@ static void dev_reset_queue(struct net_device *dev,
11771177
qdisc_reset(qdisc);
11781178

11791179
spin_unlock_bh(qdisc_lock(qdisc));
1180-
if (nolock)
1180+
if (nolock) {
1181+
clear_bit(__QDISC_STATE_MISSED, &qdisc->state);
11811182
spin_unlock_bh(&qdisc->seqlock);
1183+
}
11821184
}
11831185

11841186
static bool some_qdisc_is_busy(struct net_device *dev)

0 commit comments

Comments
 (0)