clustermesh: traffic continues to be routed to unresponsive cluster #20729
Labels
kind/bug
This is a bug in the Cilium logic.
needs/triage
This issue requires triaging to establish severity and next steps.
sig/datapath
Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
stale
The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Is there an existing issue for this?
What happened?
Two clusters in mesh created by
kind
tool (repo). Cilium clustermesh connectivity tests was passed.Both clusters have http echo server deployments, exposed using kubernetes service with
io.cilium/global-service: "true"
annotation.Clients are running inside of each cluster and sending requests to the service.
Similar to how x-wings access the rebel-base in the examples.
Cluster1 cilium status --verbose output
Cluster1 cilium-health status --probe output
Cluster2 cilium status --verbose output
Cluster2 cilium-health status --probe output
The problem is, if one of the clusters becomes unresponsive (
docker pause
cluster1 nodes in this case), cilium continues routing traffic to that cluster.Clients start getting http timeout errors.
Health probe is not working:
Cluster2 cilium status --verbose output after cluster1 outage
Cilium Version
Kernel Version
Linux 5.15.0-41-generic
Kubernetes Version
Sysdump
cilium-sysdump-20220801-111040.zip
Relevant log output
No response
Anything else?
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: