EKS - Random connection reset by peer #21853
Labels
kind/bug
This is a bug in the Cilium logic.
needs/triage
This issue requires triaging to establish severity and next steps.
sig/datapath
Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
stale
The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Is there an existing issue for this?
What happened?
Hello!
After running Cilium on EKS for 3 months we have noticed random issues that are correlated with network failures in Kubernetes. We are running Cilium alongside with kube-proxy. The issue is appearing as
Connection reset by peer
error.I was looking for suggestions in https://docs.cilium.io/en/stable/operations/performance/tuning/ but only kubeproxyless scenarios are described.
It might be correlated with OS conntrack, reporting issues during packets inserting (insert_failed field). The NAT table size seems to be big enough. We were trying to debug it on our own, but It seems that are two layers of conntrack - on the OS level and Cilium CNI. I was trying to increase the NAT table size, but there were not signs of
nf_conntrack: table full, dropping packets
indmesg
.I would kindly ask for guidance, about increasing Cilium reliability.
Cilium Version
v1.11.9
Kernel Version
uname -a
Linux ip-XXXX.eu-central-1.compute.internal 5.4.204-113.362.amzn2.x86_64 #1 SMP Wed Jul 13 21:34:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Kubernetes Version
Server Version: version.Info{Major:"1", Minor:"23+", GitVersion:"v1.23.10-eks-15b7512", GitCommit:"cd6399691d9b1fed9ec20c9c5e82f5993c3f42cb", GitTreeState:"clean", BuildDate:"2022-08-31T19:17:01Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}
Sysdump
No response
Relevant log output
Anything else?
Code of Conduct
The text was updated successfully, but these errors were encountered: