Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: fix skb pacing for traffic from pods #15324

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

borkmann
Copy link
Member

@borkmann borkmann commented Mar 11, 2021

See commit msg.

TODOs till final release:

  • edt map removal if feature not used
  • adapt edt drop horizon for any proto to 2s in general (incl UDP)
  • finalize l7 support
  • Implement for NodePort backend Pods (double check if supported)

Related:

On top of [0] patches.

Consider a socket which has SO_MAX_PACING_RATE of 4Gbit/s & the socket being
part of a Pod. This is currently broken given skb->tstamps are cleared on
redirect even though fq in hostns manages the socket's pacing. This fixes
BBR and SO_MAX_PACING_RATE for Pods.

Before (rates un{stable,predictable}):

root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
 87380  16384  16384    40.04     655.52

root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
 87380  16384  16384    40.07    1274.70

root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
 87380  16384  16384    40.07    1519.32

root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
 87380  16384  16384    40.06     849.96

After, stable at 4Gbit/s:

root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
 87380  16384  16384    40.01    3976.04

root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
 87380  16384  16384    40.01    3961.40

root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
 87380  16384  16384    40.01    3957.66

root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
 87380  16384  16384    40.01    3977.37

  [0] https://git.kernel.org/pub/scm/linux/kernel/git/dborkman/bpf.git/log/?h=pr/bpf-fix-pacing

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@borkmann borkmann added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. kind/performance There is a performance impact of this. release-note/misc This PR makes changes that have no direct user impact. feature/bandwidth-manager Impacts BPF bandwidth manager. labels Mar 11, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.10.0 Mar 11, 2021
@maintainer-s-little-helper
Copy link

Commit 0b6ecd9 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Mar 17, 2021
@aanm aanm added this to the 1.10.0 milestone Apr 16, 2021
@borkmann borkmann added the release-priority/best-effort The project for target version is not a hard requirement. label Apr 30, 2021
borkmann added a commit that referenced this pull request Apr 30, 2021
Consider a socket which has SO_MAX_PACING_RATE of 4Gbit/s and the
socket being part of a Pod. This is currently broken given skb->tstamps
are cleared on BPF redirect as well as netns traversal even though
fq in hostns manages the socket's pacing. Rates would result being
unpredictable:

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.04     655.52

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.07    1274.70

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.07    1519.32

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.06     849.96

We are working on a kernel side solution to retain skb->tstamps which
would fix this issue and result in a stable 4Gbit/s rate for this
example. Once that is merged we can reenable BBR from BWM side for
those kernels (and fallback to cubic for those that do not have it).

Related: #15324
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
borkmann added a commit that referenced this pull request Apr 30, 2021
Consider a socket which has SO_MAX_PACING_RATE of 4Gbit/s and the
socket being part of a Pod. This is currently broken given skb->tstamps
are cleared on BPF redirect as well as netns traversal even though
fq in hostns manages the socket's pacing. Rates would result being
unpredictable:

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.04     655.52

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.07    1274.70

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.07    1519.32

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.06     849.96

We are working on a kernel side solution to retain skb->tstamps which
would fix this issue and result in a stable 4Gbit/s rate for this
example. Once that is merged we can reenable BBR from BWM side for
those kernels (and fallback to cubic for those that do not have it).

Related: #15324
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
brb pushed a commit that referenced this pull request May 7, 2021
[ upstream commit c233140 ]

Consider a socket which has SO_MAX_PACING_RATE of 4Gbit/s and the
socket being part of a Pod. This is currently broken given skb->tstamps
are cleared on BPF redirect as well as netns traversal even though
fq in hostns manages the socket's pacing. Rates would result being
unpredictable:

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.04     655.52

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.07    1274.70

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.07    1519.32

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.06     849.96

We are working on a kernel side solution to retain skb->tstamps which
would fix this issue and result in a stable 4Gbit/s rate for this
example. Once that is merged we can reenable BBR from BWM side for
those kernels (and fallback to cubic for those that do not have it).

Related: #15324
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
brb pushed a commit that referenced this pull request May 7, 2021
[ upstream commit c233140 ]

Consider a socket which has SO_MAX_PACING_RATE of 4Gbit/s and the
socket being part of a Pod. This is currently broken given skb->tstamps
are cleared on BPF redirect as well as netns traversal even though
fq in hostns manages the socket's pacing. Rates would result being
unpredictable:

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.04     655.52

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.07    1274.70

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.07    1519.32

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.06     849.96

We are working on a kernel side solution to retain skb->tstamps which
would fix this issue and result in a stable 4Gbit/s rate for this
example. Once that is merged we can reenable BBR from BWM side for
those kernels (and fallback to cubic for those that do not have it).

Related: #15324
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
ti-mo pushed a commit that referenced this pull request May 10, 2021
[ upstream commit c233140 ]

Consider a socket which has SO_MAX_PACING_RATE of 4Gbit/s and the
socket being part of a Pod. This is currently broken given skb->tstamps
are cleared on BPF redirect as well as netns traversal even though
fq in hostns manages the socket's pacing. Rates would result being
unpredictable:

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.04     655.52

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.07    1274.70

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.07    1519.32

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.06     849.96

We are working on a kernel side solution to retain skb->tstamps which
would fix this issue and result in a stable 4Gbit/s rate for this
example. Once that is merged we can reenable BBR from BWM side for
those kernels (and fallback to cubic for those that do not have it).

Related: #15324
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
aanm pushed a commit that referenced this pull request May 11, 2021
[ upstream commit c233140 ]

Consider a socket which has SO_MAX_PACING_RATE of 4Gbit/s and the
socket being part of a Pod. This is currently broken given skb->tstamps
are cleared on BPF redirect as well as netns traversal even though
fq in hostns manages the socket's pacing. Rates would result being
unpredictable:

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.04     655.52

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.07    1274.70

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.07    1519.32

  root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384  16384    40.06     849.96

We are working on a kernel side solution to retain skb->tstamps which
would fix this issue and result in a stable 4Gbit/s rate for this
example. Once that is merged we can reenable BBR from BWM side for
those kernels (and fallback to cubic for those that do not have it).

Related: #15324
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
@borkmann borkmann added the pinned These issues are not marked stale by our issue bot. label May 25, 2021
@joestringer joestringer modified the milestones: 1.10.0, 1.11 Jun 28, 2021
@joestringer joestringer removed this from In progress in 1.10.0 Oct 25, 2021
@joestringer joestringer removed the release-priority/best-effort The project for target version is not a hard requirement. label Oct 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. feature/bandwidth-manager Impacts BPF bandwidth manager. kind/performance There is a performance impact of this. pinned These issues are not marked stale by our issue bot. release-note/misc This PR makes changes that have no direct user impact. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants