-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network performance with Wireguard extremely poor #28413
Comments
Yeah this is a show stopper I think? Configured my cluster as in #28387 and things work amazing until you try to send larger amounts of data and it falls on its face. MTUs are identical to OP, 9001 on everything except 8921 on cilium_wg0. Send packets with MSS 8852
Send packets with MSS=8853
|
I am seeing the same issues as @mmerickel. IPv6 does not fragment automatically, and without proper path MTU there is no way for the link to find the valid MTU automatically so connectivity just drops to the floor. |
We did test configuring Also it's worth noting that things work great if you turn off wireguard encryption! :-) |
Hi folks,
Cilium values: encryption:
enabled: true
type: wireguard
nodeEncryption: false
tunnel: "vxlan" Without Wireguard: root@iperf3-deployment-5d6946cbd9-r7k2r:~# iperf3 -c 10.2.2.98 -p 12345
Connecting to host 10.2.2.98, port 12345
[ 4] local 10.2.3.7 port 39422 connected to 10.2.2.98 port 12345
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 591 MBytes 4.95 Gbits/sec 0 2.14 MBytes
[ 4] 1.00-2.00 sec 589 MBytes 4.94 Gbits/sec 0 2.14 MBytes
[ 4] 2.00-3.00 sec 589 MBytes 4.94 Gbits/sec 49 2.24 MBytes
[ 4] 3.00-4.00 sec 588 MBytes 4.94 Gbits/sec 0 2.26 MBytes
[ 4] 4.00-5.00 sec 589 MBytes 4.94 Gbits/sec 0 2.33 MBytes
[ 4] 5.00-6.00 sec 589 MBytes 4.94 Gbits/sec 0 2.33 MBytes
[ 4] 6.00-7.00 sec 589 MBytes 4.94 Gbits/sec 0 2.61 MBytes
[ 4] 7.00-8.00 sec 589 MBytes 4.94 Gbits/sec 0 2.66 MBytes
[ 4] 8.00-9.00 sec 589 MBytes 4.94 Gbits/sec 0 2.66 MBytes
[ 4] 9.00-10.00 sec 589 MBytes 4.94 Gbits/sec 0 2.66 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 5.75 GBytes 4.94 Gbits/sec 49 sender
[ 4] 0.00-10.00 sec 5.75 GBytes 4.94 Gbits/sec receiver Enabled Wireguard, restarted all Cilium and iperf3 pods, ensured the iperf3 Pods were running on different nodes, rerun the tests: With Wireguard: root@iperf3-deployment-5c59548b57-76bcr:~# iperf3 -c 10.2.3.57 -p 12345
Connecting to host 10.2.3.57, port 12345
[ 4] local 10.2.2.103 port 53710 connected to 10.2.3.57 port 12345
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 591 MBytes 4.95 Gbits/sec 7 1.91 MBytes
[ 4] 1.00-2.00 sec 589 MBytes 4.94 Gbits/sec 0 1.91 MBytes
[ 4] 2.00-3.00 sec 589 MBytes 4.94 Gbits/sec 0 1.95 MBytes
[ 4] 3.00-4.00 sec 588 MBytes 4.93 Gbits/sec 0 1.97 MBytes
[ 4] 4.00-5.00 sec 589 MBytes 4.94 Gbits/sec 56 2.24 MBytes
[ 4] 5.00-6.00 sec 589 MBytes 4.94 Gbits/sec 0 2.24 MBytes
[ 4] 6.00-7.00 sec 588 MBytes 4.93 Gbits/sec 0 2.24 MBytes
[ 4] 7.00-8.00 sec 589 MBytes 4.94 Gbits/sec 0 2.24 MBytes
[ 4] 8.00-9.00 sec 586 MBytes 4.92 Gbits/sec 15 2.25 MBytes
[ 4] 9.00-10.00 sec 589 MBytes 4.94 Gbits/sec 0 2.26 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 5.75 GBytes 4.94 Gbits/sec 78 sender
[ 4] 0.00-10.00 sec 5.74 GBytes 4.93 Gbits/sec receiver Will try to reproduce it once again with a proper direct routing, EKS VPC-CNI chained setup anytime soon. |
@PhilipSchmid, it's hard to believe you're doing it right. If I create two t3a.xlarge with recent Ubuntu LTS on AWS, the performance drop is still significant (no Cilium, just two nodes talking wireguard to each other). I don't have the numbers here, but the drop was what I remember from 4.6Gbit/s to something like 3.5Gbit/s. That's still order of 3 magnitudes better :) |
@roarvroom Maybe yes, but Wireguard is definetly activated
... and the Pods are for sure not running on the same node. That would look like this:
Hence, I'm still looking into it. |
Found something: As soon as I enable the beta Node-to-Node encryption as well (rollout restart all Cilium agents), the performance drops significantly. encryption:
enabled: true
type: wireguard
nodeEncryption: true # <- this root@iperf3-deployment-5c59548b57-w7mmb:~# iperf3 -c 10.2.3.57 -p 12345
Connecting to host 10.2.3.57, port 12345
[ 4] local 10.2.1.144 port 45620 connected to 10.2.3.57 port 12345
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 17.7 MBytes 148 Mbits/sec 263 113 KBytes
[ 4] 1.00-2.00 sec 9.06 MBytes 76.0 Mbits/sec 187 43.3 KBytes
[ 4] 2.00-3.00 sec 9.35 MBytes 78.5 Mbits/sec 131 104 KBytes
[ 4] 3.00-4.00 sec 9.06 MBytes 76.0 Mbits/sec 179 26.0 KBytes
[ 4] 4.00-5.00 sec 8.47 MBytes 71.0 Mbits/sec 112 60.6 KBytes
[ 4] 5.00-6.00 sec 7.87 MBytes 66.1 Mbits/sec 101 60.6 KBytes
[ 4] 6.00-7.00 sec 9.83 MBytes 82.4 Mbits/sec 163 208 KBytes
[ 4] 7.00-8.00 sec 8.23 MBytes 69.0 Mbits/sec 162 52.0 KBytes
[ 4] 8.00-9.00 sec 8.47 MBytes 71.0 Mbits/sec 122 173 KBytes
[ 4] 9.00-10.00 sec 8.88 MBytes 74.5 Mbits/sec 151 86.6 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 96.9 MBytes 81.3 Mbits/sec 1571 sender
[ 4] 0.00-10.00 sec 94.8 MBytes 79.6 Mbits/sec receiver
iperf Done. Disabling root@iperf3-deployment-d8f65bb65-fkqdd:~# iperf3 -c 10.2.3.241 -p 12345
Connecting to host 10.2.3.241, port 12345
[ 4] local 10.2.1.152 port 53434 connected to 10.2.3.241 port 12345
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 592 MBytes 4.97 Gbits/sec 0 1.92 MBytes
[ 4] 1.00-2.00 sec 586 MBytes 4.92 Gbits/sec 0 2.24 MBytes
[ 4] 2.00-3.00 sec 589 MBytes 4.94 Gbits/sec 54 2.40 MBytes
[ 4] 3.00-4.00 sec 589 MBytes 4.94 Gbits/sec 0 2.40 MBytes
[ 4] 4.00-5.00 sec 582 MBytes 4.89 Gbits/sec 0 2.44 MBytes
[ 4] 5.00-6.00 sec 589 MBytes 4.94 Gbits/sec 0 2.44 MBytes
[ 4] 6.00-7.00 sec 585 MBytes 4.91 Gbits/sec 0 2.80 MBytes
[ 4] 7.00-8.00 sec 589 MBytes 4.94 Gbits/sec 0 2.93 MBytes
[ 4] 8.00-9.00 sec 588 MBytes 4.93 Gbits/sec 0 2.93 MBytes
[ 4] 9.00-10.00 sec 589 MBytes 4.94 Gbits/sec 0 2.93 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 5.74 GBytes 4.93 Gbits/sec 54 sender
[ 4] 0.00-10.00 sec 5.74 GBytes 4.93 Gbits/sec receiver I guess in my case it's only after enabling Node-to-Node encryption because I'm running with VXLAN overlay while you're running with direct routing. So don't get me wrong, I don't think it's directly related to Node-to-Node encryption but rather to the similarities between running "Wireguard & direct routing" and "Wireguard & VXLAN & Node-to-Node encryption". |
This issue(or similar) can be also seen even on a Kind cluster. Environment: Cilium without wireguard:
Iperf:
Wireguard enabled and cilium restarted:
I also enabled node encryption at the end but there was no significant change. On the worker nodes cilium interfaces:
|
It's not an issue with AWS, you could replicate the same setup with two hosts on bare metal, it's the way that the WireGuard tunnel MTU is set lower while the pod's veth's are all set to the same value as What we need: pod eth0 (mtu: 8921) -> veth (mtu: 8921) -> wg0 (mtu: 8921) -> eth0 (mtu: 9001) So that we don't have packets being fragmented. This is especially important in IPv6 where packet fragmentation is not allowed to be done by routers in the path unlike IPv4, and thus the packet just gets dropped once it hits the interface with the smaller MTU. OR the better fix: Path MTU really should be enabled, and I am not sure why it doesn't work. So that packets not destined for the wireguard tunnel can use the full MTU of the interface they are going to get routed out of (eth0/9001) |
I haven't been able to track it down, but building on @archoversight's note about the ideal flow, things still do not work unless we drop the originating packet size even lower to 8852 if you look at the iperf3 output at #28413 (comment). There is some un-understood slop (69 bytes) being introduced on top of the 80 bytes required for bare metal wireguard-over-ipv6 that it seems cilium is injecting into the system somewhere? |
@michaltorbinski when you are doing your iperf3 testing, try setting the On IPv4 the reason you are seeing the massive slow down is because the packets are being fragmented en-route. |
Thank you for the suggestion @archoversight
|
@PhilipSchmid regarding your "encryption: enabled" from "cilium status". During my journey with Cilium, I remember seeing the situation when Cilium reported encryption enabled, but there were no _wg0 interfaces and "wg show" threw an empty output. I only saw this once, but can you verify that as well? I would really be interested in seeing why your wireguard-protected connection is not dropping the performance by tens of percent. |
This is really interesting... since dropping the MSS should allow you to avoid hitting fragmentation of the packets thereby increasing throughput as less work has to be done to chop the packets into multiple packets. |
Just dropping by with a quick observation: cilium generally sets a larger MTU on the interface, then restricts the MTU via routes. For example, on a test cluster I have lying around in an arbitrary pod:
(I'm not saying that MTU isn't the issue, but it's worth noting). Also, I observed issues with ipv6 fragmentation and filed an issue here: #25135 |
This issue has been automatically marked as stale because it has not |
not stale |
We're also experiencing this issue with Cilium v2.15.0-pre.3 (VXLAN, WireGuard, Node Encryption). There might be multiple issues at play:
For issue 2. I looked into the bpf code and from what I can tell, the redirect in The easiest, but not fully complete solution is to set the MSS value on a route via
What I still need to wrap my heap around is the fact that the IP fragmentation of the WireGuard UDP packets costs 90% of the bandwidth. |
This issue has been automatically marked as stale because it has not |
/not-stale |
@mmerickel What's your cluster configuration? AWS CNI chaining? |
The entire config is in the issue description but yes it’s using CNI chaining with the aws-vpc-cni. |
Just relaying relevant bits @learnitall said in a private thread:
|
Thanks for relaying @brb. After this comment, I did some more investigating and found more context as to why this happens. It's not that the AWS VPC CNI is forcing pods to use a pod MTU of 9001, it's that Cilium doesn't configure route MTUs in chaining mode.
|
Could you please try the following images https://github.com/cilium/cilium/actions/runs/8737485006/job/23975228008?pr=32047#step:4:19 (based on v1.15), and report back whether it has resolved the performance issues? |
Just referencing #32244 since it might be relevant for some people. That PR is included in Cilium v1.15.5, v1.14.11 and v1.13.16. |
It seems it is not solving the issue for AWS CNI chaining case. The routes in Pod are still without MTU set. |
Yes, #32244 is specific to Azure and Alibaba Cloud IPAM. |
Is there an existing issue for this?
What happened?
Environment:
Cilium helm values:
Issue:
After installing Cilium with the above configuration, I observed very poor network performance (around 80Mb/s) when running an iperf test between two pods scheduled on different worker nodes, even though the network capacity is 4.6Gbit/s (AWS instances t3a.2xlarge).
Upon further investigation, I found that even after setting the MTU to 8000 (my arbitrary value below 9001) in the Cilium configmap and restarting Cilium, the MTU for the container interfaces remained at 9001. This is while the MTU for the cilium_wg0 interface was correctly set to 7920 and the cilium_net@cilium_host and cilium_host@cilium_net interfaces were set to 8000.
After manually adjusting the MTU inside the containers, I found that setting the MTU to anything below 8928 resulted in a significant improvement in network performance, achieving around 3.5Gbit/s.
Expected Behavior:
When adjusting the MTU in the Cilium configuration, it should be reflected on all relevant interfaces, including the container interfaces. Additionally, with the default MTU settings and WireGuard encryption enabled, the network performance should not be as poor as observed.
Steps to Reproduce:
Cilium Version
cilium-cli: v0.15.7 compiled with go1.21.0 on darwin/arm64
cilium image (default): v1.14.1
cilium image (stable): v1.14.2
cilium image (running): 1.14.2
Kernel Version
5.10.184-175.749.amzn2.x86_64
Kubernetes Version
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.2", GitCommit:"fc04e732bb3e7198d2fa44efa5457c7c6f8c0f5b", GitTreeState:"clean", BuildDate:"2023-02-22T13:32:21Z", GoVersion:"go1.20.1", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.4-eks-2d98532", GitCommit:"3d90c097c72493c2f1a9dd641e4a22d24d15be68", GitTreeState:"clean", BuildDate:"2023-07-28T16:51:44Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
Sysdump
cilium-sysdump-20231005-125744.zip
Relevant log output
No response
Anything else?
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: