Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create pods on different nodes,Poor throughput in cilium compared to calico #18169

Closed
2 tasks done
Charlottell opened this issue Dec 8, 2021 · 20 comments
Closed
2 tasks done
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. kind/performance There is a performance impact of this. need-more-info More information is required to further debug or fix the issue. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.

Comments

@Charlottell
Copy link

Charlottell commented Dec 8, 2021

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

I use helm to install cilium, all the pod are running, i create two pod in the different node.

netperf-client-master2-67bb94cf68-5cpcc   1/1     Running   0          29h   10.0.4.157   master2   <none>           <none>
netperf-server-747765686-9bjn9            1/1     Running   0          34h   10.0.0.123   master1   <none>           <none>

Then, i use netperf in the netperf-server pod to test.

bash-5.1# netperf -H 10.0.4.157 -t TCP_STREAM -l 60
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.4.157 (10.0.4.) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.01    1756.84

The same cluster to deploy calico v3.14,and all the pod are running. I use the same method as above to test,The result is as follows

bash-5.1# netperf -H 100.101.161.9 -t TCP_STREAM -l 60
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 100.101.161.9 (100.101) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.00    6418.32

I don't know why the gap is so big .I tested it many times and the results were similar. I read this article, https://cilium.io/blog/2021/05/11/cni-benchmark. cilium is better than calico.

Thanks for your consideration and feedback!

Cilium Version

Client: 1.10.5 b0836e8 2021-10-13T16:20:49-07:00 go version go1.16.9 linux/amd64
Daemon: 1.10.5 b0836e8 2021-10-13T16:20:49-07:00 go version go1.16.9 linux/amd64

Kernel Version

Linux master1 5.4.61-050461-generic #202008260931 SMP Wed Aug 26 09:34:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1-20", GitCommit:"353ee0f1a502f841db8bd781235f68a67b379010", GitTreeState:"archive", BuildDate:"2021-11-10T02:48:13Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1-20", GitCommit:"353ee0f1a502f841db8bd781235f68a67b379010", GitTreeState:"archive", BuildDate:"2021-11-10T02:48:13Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

Sysdump

No response

Relevant log output

No response

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Charlottell Charlottell added kind/bug This is a bug in the Cilium logic. needs/triage This issue requires triaging to establish severity and next steps. labels Dec 8, 2021
@vincentmli
Copy link
Contributor

@Charlottell it appears your kernel version 5.4.61 not meeting the requirements https://docs.cilium.io/en/latest/operations/performance/tuning/, kernel needs to > =5.10

@pchaigno
Copy link
Member

pchaigno commented Dec 8, 2021

@Charlottell it appears your kernel version 5.4.61 not meeting the requirements https://docs.cilium.io/en/latest/operations/performance/tuning/, kernel needs to > =5.10

5.10 is only required to get some additional Cilium features which improve performance, but the performance should be at least as good as Calico's regardless of the kernel version.

@Charlottell Could you share a Cilium sysdump of the cluster?

@pchaigno pchaigno added kind/community-report This was reported by a user in the Cilium community, eg via Slack. kind/performance There is a performance impact of this. need-more-info More information is required to further debug or fix the issue. labels Dec 8, 2021
@Charlottell Charlottell reopened this Dec 9, 2021
@Charlottell
Copy link
Author

@Charlottell it appears your kernel version 5.4.61 not meeting the requirements https://docs.cilium.io/en/latest/operations/performance/tuning/, kernel needs to > =5.10

5.10 is only required to get some additional Cilium features which improve performance, but the performance should be at least as good as Calico's regardless of the kernel version.

@Charlottell Could you share a Cilium sysdump of the cluster?

@pchaigno These files are respectively for master1 and master2. If you need other files, please let me know, I am willing to share

logs-cilium-t2s6r-cilium-agent-20211209-101524.log
logs-cilium-t2s6r-mount-cgroup-20211209-101524.log
logs-cilium-vpzxs-cilium-agent-20211209-101524.log
logs-cilium-vpzxs-mount-cgroup-20211209-101524.log

@vincentmli
Copy link
Contributor

@Charlottell if you can provide Cilium sysdump, not just cilium agent log, it would be great, I think cilium cli cilium sysdump collect sysdump, you can also run https://github.com/cilium/cilium-sysdump to collect sysdump that way. usually when I trouble shoot throughput issue, I would get throughput sample capture with tcpdump, from tcpdump capture you could tell if TSO, (maybe LRO/GRO?) are in play, usually these settings could affect throughput

@Charlottell
Copy link
Author

@Charlottell if you can provide Cilium sysdump, not just cilium agent log, it would be great, I think cilium cli cilium sysdump collect sysdump, you can also run https://github.com/cilium/cilium-sysdump to collect sysdump that way. usually when I trouble shoot throughput issue, I would get throughput sample capture with tcpdump, from tcpdump capture you could tell if TSO, (maybe LRO/GRO?) are in play, usually these settings could affect throughput

@vincentmli @pchaigno I using cilium sysdump to get a folder that was too big, the upload has been failed, so I divided the following files to upload separately.
cilium-sysdump-20211209-141103.zip
bugtool-cilium-9mfhf-20211209-141114.zip
bugtool-cilium-ddq52-20211209-141114.zip
bugtool-cilium-n8gj7-20211209-141114.zip
bugtool-cilium-s855p-20211209-141114.zip
bugtool-cilium-x9xht-20211209-141114.zip

And I check the TSO,LRO,GRO status of the cilium and calico

ethtool -k cilium_host | grep -E "tcp-segmentation-offload|large-receive-offload|generic-receive-offload"
tcp-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
ethtool -k ens3 | grep -E "tcp-segmentation-offload|large-receive-offload|generic-receive-offload"
tcp-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off

@pchaigno
Copy link
Member

pchaigno commented Dec 9, 2021

What tunneling protocol are you using in Calico's case? If VXLAN, what UDP port is it using?

@vincentmli
Copy link
Contributor

@Charlottell from sysdump, it looks you are running VM in alibaba cloud, and use cilium VXLAN tunnel, so netperf packet is through VXLAN tunnel then. it is not clear where the bottleneck though based on just sysdump. for performance/throughput issue, I sometime run perf top to just get an idea which kernel functions are among the busiest when under throughput/load test, that could give some clue where the bottleneck might be. see perf tool
https://www.brendangregg.com/perf.html, you can also run perf top for calico so you can compare the perf top output

@Charlottell
Copy link
Author

@pchaigno @vincentmli Calico use BGP. I tried to deploy without VXLAN in cilium,The result is as follows

helm install  cilium cilium  --namespace kube-system --set tunnel=disabled --set autoDirectNodeRoutes=true --set kubeProxyReplacement=strict --set loadBalancer.mode=dsr --set nativeRoutingCIDR=10.0.0.0/8 --set ipam.operator.clusterPoolIPv4PodCIDR=10.0.0.0/8 --set ipam.operator.clusterPoolIPv4MaskSize=26  --set k8sServiceHost=192.168.122.111  --set k8sServicePort=6443 --debug
root@master1:~# kubectl get pod -owide
NAME                                      READY   STATUS              RESTARTS   AGE     IP           NODE      NOMINATED NODE   READINESS GATES
netperf-client-master2-67bb94cf68-xll4q   1/1     Running             0          83s     10.0.4.29    master2   <none>           <none>
netperf-server-747765686-6jctj            1/1     Running             0          2m14s   10.0.0.7     master1   <none>           <none>

root@master1:~# kubectl exec -it netperf-client-master2-67bb94cf68-xll4q bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
bash-5.1# netperf -H 10.0.0.7 -t TCP_STREAM -l 60
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.7 (10.0.0.) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.01    3964.70

And Calico uses local network card ens3 and bgp,so I tried to use local network card and BGP to deploy cilium.

helm install  cilium cilium  --namespace kube-system --set tunnel=disabled --set devices=ens3 --set kubeProxyReplacement=strict --set loadBalancer.mode=dsr --set nativeRoutingCIDR=10.0.0.0/8 --set ipam.operator.clusterPoolIPv4PodCIDR=10.0.0.0/8 --set ipam.operator.clusterPoolIPv4MaskSize=26  --set k8sServiceHost=192.168.122.111  --set k8sServicePort=6443 --set bgp.enabled=true --set bgp.announce.loadbalancerIP=true --debug
root@master1:~# kubectl get pod -owide | grep net
netperf-client-master2-67bb94cf68-mrdq8   1/1     Running   0          61m     10.0.4.209   master2   <none>           <none>
netperf-server-747765686-x64gw            1/1     Running   0          61m     10.0.0.45    master1   <none>           <none>

root@master1:~# kubectl exec -it netperf-client-master2-67bb94cf68-mrdq8 bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
bash-5.1# netperf -H 10.0.0.45 -t TCP_STREAM -l 60
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.45 (10.0.0.) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.29    3878.94

Calico also support Ebpf, I tried to deploy and test.

kind: ConfigMap
apiVersion: v1
metadata:
  name: calico-config
  namespace: kube-system
data:
  kubernetes_service_host: "192.168.122.111"
  kubernetes_service_port: "6443"

##Try out DSR mode
kubectl set env -n kube-system ds/calico-node FELIX_BPFExternalServiceMode="DSR"
##disable kube-proxy
kubectl patch ds -n kube-system kube-proxy -p '{"spec":{"template":{"spec":{"nodeSelector":{"non-calico": "true"}}}}}'
kubectl set env -n kube-system ds/calico-node FELIX_BPFENABLED="true"

root@master1:~# kubectl exec -it netperf-client-master2-67bb94cf68-7d8fx bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
bash-5.1# netperf -H 172.16.137.65 -t TCP_STREAM -l 60
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.137.65 (172.16.) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.01    4057.95

There may be some parameter settings that I don't know about, leading to a gap between BPF and Calico.

@vincentmli
Copy link
Contributor

@Charlottell noticed you are running cilium with debug on, don't think that would make much difference for datapath. no experience with calico myself. again, sometime when I running out of idea for performance issue, I use linux perf tool I mentioned earlier to get perf top to list kernel functions cycles, not sure if linux perf would be able to profile BPF, I recall cilium itself has a performance profile debug option, not sure if that would help compare performance between calico and cilium though.

@jtaleric
Copy link
Member

Can you run the test 3 separate times with each CNI, relaunching the pods to show the variance between executions?

Can you describe the environment you are running on, ie dual socket, 10Gbps NIC, any tuning applied?

@Charlottell
Copy link
Author

@vincentmli The perf tool has been installed successfully for a long time
This is the perf top result when I deployed Cilium
cilium
This is the perf top result when I deployed Calico
calico

@vincentmli
Copy link
Contributor

vincentmli commented Dec 16, 2021

@Charlottell thanks for the perf top, it looks Cilium slightly higher than Calico for same kernel functions, maybe it is the number of samples being different (251k vs 147k)? you could try something like perf record -F 99 -g -- sleep 120
for 2 minutes for both Calico and Cilium (maybe you would get same number of samples), then perf report to show the result, anyway I am not perf expert, check https://www.brendangregg.com/perf.html , i can't make much sense out of the result :). it looks both cilium and calico using iptables since the ipt_do_table showed up.

another option is to try iperf to do the test, sometime different performance test tool may show different result, it is worth trying to just confirm if different performance test tool showing same result

@Charlottell
Copy link
Author

Can you run the test 3 separate times with each CNI, relaunching the pods to show the variance between executions?

Can you describe the environment you are running on, ie dual socket, 10Gbps NIC, any tuning applied?

@jtaleric I tested it more than three times,rebooting the pod each time. Because cilium doesn't use iptables, so I thought the iptables generated by SVC would affect calico performance, so I created 1000,3000,5000 services and then tested it,

I deployed 1000 service in the following way,

kind: Pod
apiVersion: v1
metadata:
  name: test-nginx
  labels:
    app: nginx
spec:
  nodeSelector:
    kubernetes.io/hostname: master1
  containers:
  - name: test-nginx
    image: nginx:v1.17.5
  tolerations:
  - key: pro
    operator: Equal
    value: master
    effect: NoSchedule
#!/bin/bash
for i in {1..1000}; do  
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  name: nginx-svc-$i
spec:
  ports:
  - port: 6788
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx
EOF
done

root@master1:~# kubectl get svc | grep nginx | wc -l
1000

root@master1:~# kubectl get svc -A | wc -l
1004

calico netperf :

root@master1:~# kubectl exec -it netperf-server-747765686-45wn8 bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
bash-5.1# netperf -H 100.101.208.6 -t TCP_STREAM -l 60
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 100.101.208.6 (100.101) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.00    6673.69

cilium netperf:

root@master1:~# kubectl exec -it netperf-client-master2-67f545c4df-2756j bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
bash-5.1# netperf -H 100.101.4.100 -t TCP_STREAM -l 60
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 100.101.4.100 (100.101) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.01    5159.85

deploy 3000 service:

root@master1:~# kubectl get svc | grep nginx | wc -l
3000
root@master1:~# kubectl get svc -A | wc -l
3004

calico netperf

bash-5.1# netperf -H 100.101.161.16 -t TCP_STREAM -l 60
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 100.101.161.16 (100.101) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.01    5768.29

cilium netperf

bash-5.1# netperf -H 100.101.4.37 -t TCP_STREAM -l 60
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 100.101.4.37 (100.101) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.01    5375.56

deploy 5000 service:

root@master1:~# kubectl get svc | grep nginx | wc -l
5000
root@master1:~# kubectl get svc -A | wc -l
5004

calico netperf:

bash-5.1# netperf -H 100.101.161.20 -t TCP_STREAM -l 60
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 100.101.161.20 (100.101) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.01    5755.90

cilium netperf:

bash-5.1# netperf -H 100.101.4.62 -t TCP_STREAM -l 60
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 100.101.4.62 (100.101) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.01    5438.81

As for the other environmental variables you mentioned, I don't know how to confirm them. Test Calico and Cilium on the same virtual machine.

@Charlottell
Copy link
Author

@Charlottell thanks for the perf top, it looks Cilium slightly higher than Calico for same kernel functions, maybe it is the number of samples being different (251k vs 147k)? you could try something like perf record -F 99 -g -- sleep 120 for 2 minutes for both Calico and Cilium (maybe you would get same number of samples), then perf report to show the result, anyway I am not perf expert, check https://www.brendangregg.com/perf.html , i can't make much sense out of the result :). it looks both cilium and calico using iptables since the ipt_do_table showed up.

another option is to try iperf to do the test, sometime different performance test tool may show different result, it is worth trying to just confirm if different performance test tool showing same result

Since I installed perf last time, the entire virtual machine occupied a lot of resources, so I did not install perf after I rebuilt the virtual machine. I also used qperf to test.

Cilium parameters are set as follows:

root@master1:~# kubectl get cm -nkube-system cilium-config -o yaml
apiVersion: v1
data:
  auto-direct-node-routes: "true"
  bpf-lb-external-clusterip: "false"
  bpf-lb-map-max: "65536"
  bpf-lb-mode: dsr
  bpf-map-dynamic-size-ratio: "0.0025"
  bpf-policy-map-max: "16384"
  cgroup-root: /run/cilium/cgroupv2
  cilium-endpoint-gc-interval: 5m0s
  cluster-id: ""
  cluster-name: default
  cluster-pool-ipv4-cidr: 100.101.0.0/16
  cluster-pool-ipv4-mask-size: "24"
  custom-cni-conf: "false"
  debug: "false"
  devices: ens3
  disable-cnp-status-updates: "true"
  enable-auto-protect-node-port-range: "true"
  enable-bandwidth-manager: "true"
  enable-bpf-clock-probe: "true"
  enable-bpf-masquerade: "true"
  enable-endpoint-health-checking: "true"
  enable-health-check-nodeport: "true"
  enable-health-checking: "true"
  enable-hubble: "true"
  enable-ipv4: "true"
  enable-ipv4-masquerade: "true"
  enable-ipv6: "false"
  enable-ipv6-masquerade: "true"
  enable-l2-neigh-discovery: "true"
  enable-l7-proxy: "false"
  enable-local-redirect-policy: "false"
  enable-policy: default
  enable-remote-node-identity: "true"
  enable-session-affinity: "true"
  enable-well-known-identities: "false"
  enable-xt-socket-fallback: "true"
  hubble-disable-tls: "false"
  hubble-listen-address: :4244
  hubble-socket-path: /var/run/cilium/hubble.sock
  hubble-tls-cert-file: /var/lib/cilium/tls/hubble/server.crt
  hubble-tls-client-ca-files: /var/lib/cilium/tls/hubble/client-ca.crt
  hubble-tls-key-file: /var/lib/cilium/tls/hubble/server.key
  identity-allocation-mode: crd
  install-iptables-rules: "false"
  install-no-conntrack-iptables-rules: "false"
  ipam: cluster-pool
  kube-proxy-replacement: strict
  kube-proxy-replacement-healthz-bind-address: ""
  monitor-aggregation: medium
  monitor-aggregation-flags: all
  monitor-aggregation-interval: 5s
  mtu: "1500"
  native-routing-cidr: 100.101.0.0/16
  node-port-bind-protection: "true"
  operator-api-serve-addr: 127.0.0.1:9234
  preallocate-bpf-maps: "false"
  sidecar-istio-proxy-image: cilium/istio_proxy
  tunnel: disabled

When I used Cilium, I deleted all the iptables with the following command

iptables -F -t raw
iptables -F -t mangle
iptables -F -t nat
iptables -F -t filter

calico qperf results:

root@master1:~# kubectl exec -it qperf-client-master2-789ff8bddf-dxg52 /bin/sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ # qperf -t 10 100.101.161.19 tcp_bw tcp_lat udp_bw udp_lat
tcp_bw:
    bw  =  702 MB/sec
tcp_lat:
    latency  =  77.5 us
udp_bw:
    send_bw  =  355 MB/sec
    recv_bw  =  145 MB/sec
udp_lat:
    latency  =  70.9 us

cilium qperf results:

root@master1:~# kubectl exec -it qperf-client-master2-789ff8bddf-9vb7l sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ # qperf -t 10 100.101.4.13 tcp_bw tcp_lat udp_bw udp_lat
tcp_bw:
    bw  =  597 MB/sec
tcp_lat:
    latency  =  169 us
udp_bw:
    send_bw  =  216 MB/sec
    recv_bw  =  100 MB/sec
udp_lat:
    latency  =  154 us

@vincentmli
Copy link
Contributor

@Charlottell
if you can manage to produce flamegraph like https://cilium.io/blog/2021/05/11/cni-benchmark#flamegraph and compare the difference between your flamegraph and the sample flamegraph, it might show some clue.

@borkmann
Copy link
Member

@Charlottell Hm, 5.4.61 stable kernel is rather very old. Did you try with a latest stable kernel?

For example, for vxlan, might be missing https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=89e5c58fc1e2857ccdaae506fb8bc5fed57ee063 .

@jtaleric
Copy link
Member

jtaleric commented Jan 12, 2022

Just to provide some context here. This is what I am seeing on EKS.

Single TCP Stream, across two nodes. 5 iterations, 60 second run, displaying the avg throughput seen. Y is Mbps, X is different packet sizes.

TCP Stream - Single Thread - 5 iterations

@aanm aanm added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. and removed needs/triage This issue requires triaging to establish severity and next steps. labels Jan 12, 2022
@aanm
Copy link
Member

aanm commented Jan 17, 2022

@Charlottell were you able to replicate the results shared in the previous reply?

@zadunn
Copy link

zadunn commented Jan 24, 2022

@Charlottell - Were you able to improve your performance? I'd love to know as we've hit a similar issue.

@lliu0947
Copy link

Were you able to improve your performance? I'd love to know as we've hit a similar issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. kind/performance There is a performance impact of this. need-more-info More information is required to further debug or fix the issue. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

No branches or pull requests

8 participants