-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create pods on different nodes,Poor throughput in cilium compared to calico #18169
Comments
@Charlottell it appears your kernel version |
5.10 is only required to get some additional Cilium features which improve performance, but the performance should be at least as good as Calico's regardless of the kernel version. @Charlottell Could you share a Cilium sysdump of the cluster? |
@pchaigno These files are respectively for master1 and master2. If you need other files, please let me know, I am willing to share logs-cilium-t2s6r-cilium-agent-20211209-101524.log |
@Charlottell if you can provide Cilium sysdump, not just cilium agent log, it would be great, I think cilium cli |
@vincentmli @pchaigno I using cilium sysdump to get a folder that was too big, the upload has been failed, so I divided the following files to upload separately. And I check the TSO,LRO,GRO status of the cilium and calico
|
What tunneling protocol are you using in Calico's case? If VXLAN, what UDP port is it using? |
@Charlottell from sysdump, it looks you are running VM in alibaba cloud, and use cilium VXLAN tunnel, so netperf packet is through VXLAN tunnel then. it is not clear where the bottleneck though based on just sysdump. for performance/throughput issue, I sometime run |
@pchaigno @vincentmli Calico use BGP. I tried to deploy without VXLAN in cilium,The result is as follows
And Calico uses local network card ens3 and bgp,so I tried to use local network card and BGP to deploy cilium.
Calico also support Ebpf, I tried to deploy and test.
There may be some parameter settings that I don't know about, leading to a gap between BPF and Calico. |
@Charlottell noticed you are running cilium with debug on, don't think that would make much difference for datapath. no experience with calico myself. again, sometime when I running out of idea for performance issue, I use linux perf tool I mentioned earlier to get |
Can you run the test 3 separate times with each CNI, relaunching the pods to show the variance between executions? Can you describe the environment you are running on, ie dual socket, 10Gbps NIC, any tuning applied? |
@vincentmli The perf tool has been installed successfully for a long time |
@Charlottell thanks for the perf top, it looks Cilium slightly higher than Calico for same kernel functions, maybe it is the number of samples being different (251k vs 147k)? you could try something like another option is to try |
@jtaleric I tested it more than three times,rebooting the pod each time. Because cilium doesn't use iptables, so I thought the iptables generated by SVC would affect calico performance, so I created 1000,3000,5000 services and then tested it, I deployed 1000 service in the following way,
calico netperf :
cilium netperf:
deploy 3000 service:
calico netperf
cilium netperf
deploy 5000 service:
calico netperf:
cilium netperf:
As for the other environmental variables you mentioned, I don't know how to confirm them. Test Calico and Cilium on the same virtual machine. |
Since I installed perf last time, the entire virtual machine occupied a lot of resources, so I did not install perf after I rebuilt the virtual machine. I also used qperf to test. Cilium parameters are set as follows:
When I used Cilium, I deleted all the iptables with the following command
calico qperf results:
cilium qperf results:
|
@Charlottell |
@Charlottell Hm, For example, for vxlan, might be missing https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=89e5c58fc1e2857ccdaae506fb8bc5fed57ee063 . |
@Charlottell were you able to replicate the results shared in the previous reply? |
@Charlottell - Were you able to improve your performance? I'd love to know as we've hit a similar issue. |
Were you able to improve your performance? I'd love to know as we've hit a similar issue. |
Is there an existing issue for this?
What happened?
I use helm to install cilium, all the pod are running, i create two pod in the different node.
Then, i use netperf in the netperf-server pod to test.
The same cluster to deploy calico v3.14,and all the pod are running. I use the same method as above to test,The result is as follows
I don't know why the gap is so big .I tested it many times and the results were similar. I read this article, https://cilium.io/blog/2021/05/11/cni-benchmark. cilium is better than calico.
Thanks for your consideration and feedback!
Cilium Version
Client: 1.10.5 b0836e8 2021-10-13T16:20:49-07:00 go version go1.16.9 linux/amd64
Daemon: 1.10.5 b0836e8 2021-10-13T16:20:49-07:00 go version go1.16.9 linux/amd64
Kernel Version
Linux master1 5.4.61-050461-generic #202008260931 SMP Wed Aug 26 09:34:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Kubernetes Version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1-20", GitCommit:"353ee0f1a502f841db8bd781235f68a67b379010", GitTreeState:"archive", BuildDate:"2021-11-10T02:48:13Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1-20", GitCommit:"353ee0f1a502f841db8bd781235f68a67b379010", GitTreeState:"archive", BuildDate:"2021-11-10T02:48:13Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Sysdump
No response
Relevant log output
No response
Anything else?
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: