Skip to content

Commit 726e9e8

Browse files
edumazetkuba-moo
authored andcommitted
tcp: refine skb->ooo_okay setting
Enabling BIG TCP on a low end platform apparently increased chances of getting flows locked on one busy TX queue. A similar problem was handled in commit 9b462d0 ("tcp: TCP Small Queues and strange attractors"), but the strategy worked for either bulk flows, or 'large enough' RPC. BIG TCP changed how large RPC needed to be to enable the work around: If RPC fits in a single skb, TSQ never triggers. Root cause for the problem is a busy TX queue, with delayed TX completions. This patch changes how we set skb->ooo_okay to detect the case TX completion was not done, but incoming ACK already was processed and emptied rtx queue. Update the comment to explain the tricky details. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230817182353.2523746-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
1 parent fc72039 commit 726e9e8

File tree

1 file changed

+14
-7
lines changed

1 file changed

+14
-7
lines changed

net/ipv4/tcp_output.c

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1301,14 +1301,21 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
13011301
}
13021302
tcp_header_size = tcp_options_size + sizeof(struct tcphdr);
13031303

1304-
/* if no packet is in qdisc/device queue, then allow XPS to select
1305-
* another queue. We can be called from tcp_tsq_handler()
1306-
* which holds one reference to sk.
1307-
*
1308-
* TODO: Ideally, in-flight pure ACK packets should not matter here.
1309-
* One way to get this would be to set skb->truesize = 2 on them.
1304+
/* We set skb->ooo_okay to one if this packet can select
1305+
* a different TX queue than prior packets of this flow,
1306+
* to avoid self inflicted reorders.
1307+
* The 'other' queue decision is based on current cpu number
1308+
* if XPS is enabled, or sk->sk_txhash otherwise.
1309+
* We can switch to another (and better) queue if:
1310+
* 1) No packet with payload is in qdisc/device queues.
1311+
* Delays in TX completion can defeat the test
1312+
* even if packets were already sent.
1313+
* 2) Or rtx queue is empty.
1314+
* This mitigates above case if ACK packets for
1315+
* all prior packets were already processed.
13101316
*/
1311-
skb->ooo_okay = sk_wmem_alloc_get(sk) < SKB_TRUESIZE(1);
1317+
skb->ooo_okay = sk_wmem_alloc_get(sk) < SKB_TRUESIZE(1) ||
1318+
tcp_rtx_queue_empty(sk);
13121319

13131320
/* If we had to use memory reserve to allocate this skb,
13141321
* this might cause drops if packet is looped back :

0 commit comments

Comments
 (0)