-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf: fix skb pacing for traffic from pods #15324
Draft
borkmann
wants to merge
2
commits into
main
Choose a base branch
from
pr/fix-pod-pacing
base: main
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
On top of [0] patches. Consider a socket which has SO_MAX_PACING_RATE of 4Gbit/s & the socket being part of a Pod. This is currently broken given skb->tstamps are cleared on redirect even though fq in hostns manages the socket's pacing. This fixes BBR and SO_MAX_PACING_RATE for Pods. Before (rates un{stable,predictable}): root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.04 655.52 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.07 1274.70 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.07 1519.32 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.06 849.96 After, stable at 4Gbit/s: root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.01 3976.04 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.01 3961.40 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.01 3957.66 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.01 3977.37 [0] https://git.kernel.org/pub/scm/linux/kernel/git/dborkman/bpf.git/log/?h=pr/bpf-fix-pacing Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
borkmann
added
sig/datapath
Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
kind/performance
There is a performance impact of this.
release-note/misc
This PR makes changes that have no direct user impact.
feature/bandwidth-manager
Impacts BPF bandwidth manager.
labels
Mar 11, 2021
jrfastab
approved these changes
Mar 11, 2021
Commit 0b6ecd9 does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
maintainer-s-little-helper
bot
added
the
dont-merge/needs-sign-off
The author needs to add signoff to their commits before merge.
label
Mar 17, 2021
borkmann
added
the
release-priority/best-effort
The project for target version is not a hard requirement.
label
Apr 30, 2021
borkmann
added a commit
that referenced
this pull request
Apr 30, 2021
Consider a socket which has SO_MAX_PACING_RATE of 4Gbit/s and the socket being part of a Pod. This is currently broken given skb->tstamps are cleared on BPF redirect as well as netns traversal even though fq in hostns manages the socket's pacing. Rates would result being unpredictable: root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.04 655.52 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.07 1274.70 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.07 1519.32 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.06 849.96 We are working on a kernel side solution to retain skb->tstamps which would fix this issue and result in a stable 4Gbit/s rate for this example. Once that is merged we can reenable BBR from BWM side for those kernels (and fallback to cubic for those that do not have it). Related: #15324 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
borkmann
added a commit
that referenced
this pull request
Apr 30, 2021
Consider a socket which has SO_MAX_PACING_RATE of 4Gbit/s and the socket being part of a Pod. This is currently broken given skb->tstamps are cleared on BPF redirect as well as netns traversal even though fq in hostns manages the socket's pacing. Rates would result being unpredictable: root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.04 655.52 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.07 1274.70 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.07 1519.32 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.06 849.96 We are working on a kernel side solution to retain skb->tstamps which would fix this issue and result in a stable 4Gbit/s rate for this example. Once that is merged we can reenable BBR from BWM side for those kernels (and fallback to cubic for those that do not have it). Related: #15324 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
brb
pushed a commit
that referenced
this pull request
May 7, 2021
[ upstream commit c233140 ] Consider a socket which has SO_MAX_PACING_RATE of 4Gbit/s and the socket being part of a Pod. This is currently broken given skb->tstamps are cleared on BPF redirect as well as netns traversal even though fq in hostns manages the socket's pacing. Rates would result being unpredictable: root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.04 655.52 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.07 1274.70 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.07 1519.32 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.06 849.96 We are working on a kernel side solution to retain skb->tstamps which would fix this issue and result in a stable 4Gbit/s rate for this example. Once that is merged we can reenable BBR from BWM side for those kernels (and fallback to cubic for those that do not have it). Related: #15324 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt>
brb
pushed a commit
that referenced
this pull request
May 7, 2021
[ upstream commit c233140 ] Consider a socket which has SO_MAX_PACING_RATE of 4Gbit/s and the socket being part of a Pod. This is currently broken given skb->tstamps are cleared on BPF redirect as well as netns traversal even though fq in hostns manages the socket's pacing. Rates would result being unpredictable: root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.04 655.52 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.07 1274.70 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.07 1519.32 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.06 849.96 We are working on a kernel side solution to retain skb->tstamps which would fix this issue and result in a stable 4Gbit/s rate for this example. Once that is merged we can reenable BBR from BWM side for those kernels (and fallback to cubic for those that do not have it). Related: #15324 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt>
ti-mo
pushed a commit
that referenced
this pull request
May 10, 2021
[ upstream commit c233140 ] Consider a socket which has SO_MAX_PACING_RATE of 4Gbit/s and the socket being part of a Pod. This is currently broken given skb->tstamps are cleared on BPF redirect as well as netns traversal even though fq in hostns manages the socket's pacing. Rates would result being unpredictable: root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.04 655.52 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.07 1274.70 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.07 1519.32 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.06 849.96 We are working on a kernel side solution to retain skb->tstamps which would fix this issue and result in a stable 4Gbit/s rate for this example. Once that is merged we can reenable BBR from BWM side for those kernels (and fallback to cubic for those that do not have it). Related: #15324 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt>
aanm
pushed a commit
that referenced
this pull request
May 11, 2021
[ upstream commit c233140 ] Consider a socket which has SO_MAX_PACING_RATE of 4Gbit/s and the socket being part of a Pod. This is currently broken given skb->tstamps are cleared on BPF redirect as well as netns traversal even though fq in hostns manages the socket's pacing. Rates would result being unpredictable: root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.04 655.52 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.07 1274.70 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.07 1519.32 root@apoc:~/go/src/github.com/cilium/cilium# netperf -H 10.217.1.19 -t TCP_STREAM -l40 -s2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.217.1.19 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 40.06 849.96 We are working on a kernel side solution to retain skb->tstamps which would fix this issue and result in a stable 4Gbit/s rate for this example. Once that is merged we can reenable BBR from BWM side for those kernels (and fallback to cubic for those that do not have it). Related: #15324 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt>
joestringer
removed
the
release-priority/best-effort
The project for target version is not a hard requirement.
label
Oct 25, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
dont-merge/needs-sign-off
The author needs to add signoff to their commits before merge.
feature/bandwidth-manager
Impacts BPF bandwidth manager.
kind/performance
There is a performance impact of this.
pinned
These issues are not marked stale by our issue bot.
release-note/misc
This PR makes changes that have no direct user impact.
sig/datapath
Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See commit msg.
TODOs till final release:
Related: