Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent view of nodes across agents #11532

Closed
diversario opened this issue May 14, 2020 · 7 comments · Fixed by #12989
Closed

Inconsistent view of nodes across agents #11532

diversario opened this issue May 14, 2020 · 7 comments · Fixed by #12989
Labels
area/health Relates to the cilium-health component kind/community-report This was reported by a user in the Cilium community, eg via Slack.

Comments

@diversario
Copy link
Contributor

Bug report

  • Cilium version (run cilium version)
Client: 1.7.3 952090308 2020-04-29T15:29:53-07:00 go version go1.13.10 linux/amd64
Daemon: 1.7.3 952090308 2020-04-29T15:29:53-07:00 go version go1.13.10 linux/amd64
  • Kernel version (run uname -a)
Linux ip-10-101-8-142 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64 GNU/Linux
  • Orchestration system version in use (e.g. kubectl version, Mesos, ...)
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:58:59Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.9", GitCommit:"2e808b7cb054ee242b68e62455323aa783991f03", GitTreeState:"clean", BuildDate:"2020-01-18T23:24:23Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
  • Upload a system dump

will be passed to a maintainer privately

How to reproduce the issue

Don't know the exact steps, but probably what lead to this was a constantly rolling cluster – i.e., nodes were continuously being replaced by kops rolling-update (we were running a resiliency test) over the course of 7 days. After a couple of days I noticed that one agent was reporting cilium_unreachable_nodes=1. After investigating, I found that one agent was still listing a node that's been terminated over 9 hours ago in cilium-health status and in the node list:

root@ip-10-101-12-162:~# cilium-health status | grep -A4 'ip-10-101-20-58.ec2.internal'
  ip-10-101-20-58.ec2.internal:
    Endpoint connectivity to 100.67.34.21:
      ICMP to stack:   Connection timed out
      HTTP to agent:   Get http://100.67.34.21:4240/hello: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  ip-10-101-21-110.ec2.internal:
root@ip-10-101-12-162:~# cilium node list | grep '58'
ip-10-101-22-225.ec2.internal   10.101.22.225   100.67.58.0/24
@aanm
Copy link
Member

aanm commented May 15, 2020

Seems a little bit similar to #11511 @seanpowell-f5 is the node with connectivity issues still alive in the cluster?

@aanm aanm added area/health Relates to the cilium-health component kind/community-report This was reported by a user in the Cilium community, eg via Slack. labels May 15, 2020
@diversario
Copy link
Contributor Author

diversario commented May 22, 2020

This seems to be related to high node churn. In a cluster that just went through a large node (30 -> 700 -> 260) churn, and large pod churn (3000 -> 27000 -> 3000), many cilium agents report unreachable nodes that exceed the actual in-cluster node count:

503 unreachable nodes:

image

but only 258 nodes in the cluster:
image

I went to get some more info about what Cilium agents think:

➜ kubectl exec -it cilium-mtbjj -- bash
root@ip-10-101-8-221:~# cilium status
KVStore:                Ok   Disabled
Kubernetes:             Ok   1.16 (v1.16.9) [linux/amd64]
Kubernetes APIs:        ["CustomResourceDefinition", "cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Endpoint", "core/v1::Namespace", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:   Partial   []
Cilium:                 Ok        OK
NodeMonitor:            Listening for events on 8 CPUs with 64x4096 of shared memory
Cilium health daemon:   Ok
IPAM:                   IPv4: 7/255 allocated from 100.65.218.0/24,
Controller Status:      34/34 healthy
Proxy Status:           OK, ip 100.65.218.10, 0 redirects active on ports 10000-20000
Cluster health:                              1/503 reachable   (2020-05-22T10:36:40Z)
  Name                                       IP                Reachable   Endpoints reachable
  ip-10-101-8-221.ec2.internal (localhost)   10.101.8.221      false       false
  ip-10-101-0-11.ec2.internal                10.101.0.11       false       false
  ip-10-101-0-112.ec2.internal               10.101.0.112      false       false
  ip-10-101-0-115.ec2.internal               10.101.0.115      false       false
  ip-10-101-0-120.ec2.internal               10.101.0.120      false       false
  ip-10-101-0-139.ec2.internal               10.101.0.139      false       false
  ip-10-101-0-140.ec2.internal               10.101.0.140      false       false
  ip-10-101-0-165.ec2.internal               10.101.0.165      false       false
  ip-10-101-0-166.ec2.internal               10.101.0.166      false       false
  ip-10-101-0-17.ec2.internal                10.101.0.17       false       false
  ip-10-101-0-179.ec2.internal               10.101.0.179      false       false
  ...root@ip-10-101-8-221:~#
root@ip-10-101-8-221:~# cilium-health status | head -n 100
Probe time:   2020-05-22T10:36:40Z
Nodes:
  ip-10-101-8-221.ec2.internal (localhost):
    Host connectivity to 10.101.8.221:
      ICMP to stack:   Connection timed out
    Endpoint connectivity to 100.65.218.93:
      ICMP to stack:   Connection timed out
  ip-10-101-0-11.ec2.internal:
    Host connectivity to 10.101.0.11:
      ICMP to stack:   Connection timed out
    Endpoint connectivity to 100.64.112.39:
      ICMP to stack:   Connection timed out
  ip-10-101-0-112.ec2.internal:
    Host connectivity to 10.101.0.112:
      ICMP to stack:   Connection timed out
    Endpoint connectivity to 100.65.123.137:
      ICMP to stack:   Connection timed out
  ip-10-101-0-115.ec2.internal:
    Host connectivity to 10.101.0.115:
      ICMP to stack:   Connection timed out
    Endpoint connectivity to 100.65.91.161:
      ICMP to stack:   Connection timed out
  ip-10-101-0-120.ec2.internal:
    Host connectivity to 10.101.0.120:
      ICMP to stack:   Connection timed out
    Endpoint connectivity to 100.64.225.93:
      ICMP to stack:   Connection timed out
  ip-10-101-0-139.ec2.internal:
    Host connectivity to 10.101.0.139:
      ICMP to stack:   Connection timed out
    Endpoint connectivity to 100.64.99.41:
      ICMP to stack:   OK, RTT=18.827537ms
  ip-10-101-0-140.ec2.internal:
    Host connectivity to 10.101.0.140:
      ICMP to stack:   Connection timed out
    Endpoint connectivity to 100.64.40.88:
      ICMP to stack:   Connection timed out
  ip-10-101-0-165.ec2.internal:
    Host connectivity to 10.101.0.165:
      ICMP to stack:   Connection timed out
    Endpoint connectivity to 100.64.56.174:
      ICMP to stack:   Connection timed out
  ip-10-101-0-166.ec2.internal:
    Host connectivity to 10.101.0.166:
      ICMP to stack:   OK, RTT=19.466825ms
    Endpoint connectivity to 100.64.117.45:
      ICMP to stack:   Connection timed out
  ip-10-101-0-17.ec2.internal:
    Host connectivity to 10.101.0.17:
      ICMP to stack:   Connection timed out
    Endpoint connectivity to 100.65.26.64:
      ICMP to stack:   Connection timed out
  ip-10-101-0-179.ec2.internal:
    Host connectivity to 10.101.0.179:
      ICMP to stack:   Connection timed out
    Endpoint connectivity to 100.64.9.28:
      ICMP to stack:   Connection timed out
  ip-10-101-0-181.ec2.internal:
    Host connectivity to 10.101.0.181:
      ICMP to stack:   OK, RTT=17.876896ms
    Endpoint connectivity to 100.65.48.219:
      ICMP to stack:   OK, RTT=18.540522ms
  ip-10-101-0-183.ec2.internal:
    Host connectivity to 10.101.0.183:
      ICMP to stack:   Connection timed out
    Endpoint connectivity to 100.64.23.55:
      ICMP to stack:   Connection timed out
  ip-10-101-0-184.ec2.internal:
    Host connectivity to 10.101.0.184:
      ICMP to stack:   OK, RTT=18.836192ms
    Endpoint connectivity to 100.64.78.200:
      ICMP to stack:   OK, RTT=19.608865ms
...

Looking up instances listed in the above output:

root@ip-10-101-8-221:~# cilium node list | egrep 'ip-10-101-8-221.ec2.internal|ip-10-101-0-11.ec2.internal|ip-10-101-0-112.ec2.internal|ip-10-101-0-115.ec2.internal|ip-10-101-0-120.ec2.internal|ip-10-101-0-139.ec2.internal|ip-10-101-0-140.ec2.internal|ip-10-101-0-165.ec2.internal|ip-10-101-0-166.ec2.internal|ip-10-101-0-17.ec2.internal|ip-10-101-0-179.ec2.internal|ip-10-101-0-181.ec2.internal|ip-10-101-0-183.ec2.internal|ip-10-101-0-184.ec2.internal'
ip-10-101-0-17.ec2.internal     10.101.0.17     100.65.26.0/24
ip-10-101-0-181.ec2.internal    10.101.0.181    100.65.48.0/24
ip-10-101-0-183.ec2.internal    10.101.0.183    100.64.23.0/24
ip-10-101-8-221.ec2.internal    10.101.8.221    100.65.218.0/24
➜ kubectl get ciliumnode ip-10-101-8-221.ec2.internal ip-10-101-0-11.ec2.internal ip-10-101-0-112.ec2.internal ip-10-101-0-115.ec2.internal ip-10-101-0-120.ec2.internal ip-10-101-0-139.ec2.internal ip-10-101-0-140.ec2.internal ip-10-101-0-165.ec2.internal ip-10-101-0-166.ec2.internal ip-10-101-0-17.ec2.internal ip-10-101-0-179.ec2.internal ip-10-101-0-181.ec2.internal ip-10-101-0-183.ec2.internal ip-10-101-0-184.ec2.internal
NAME                           AGE
ip-10-101-8-221.ec2.internal   3h42m
ip-10-101-0-17.ec2.internal    3h40m
ip-10-101-0-181.ec2.internal   3h40m
ip-10-101-0-183.ec2.internal   4h4m
Error from server (NotFound): ciliumnodes.cilium.io "ip-10-101-0-11.ec2.internal" not found
Error from server (NotFound): ciliumnodes.cilium.io "ip-10-101-0-112.ec2.internal" not found
Error from server (NotFound): ciliumnodes.cilium.io "ip-10-101-0-115.ec2.internal" not found
Error from server (NotFound): ciliumnodes.cilium.io "ip-10-101-0-120.ec2.internal" not found
Error from server (NotFound): ciliumnodes.cilium.io "ip-10-101-0-139.ec2.internal" not found
Error from server (NotFound): ciliumnodes.cilium.io "ip-10-101-0-140.ec2.internal" not found
Error from server (NotFound): ciliumnodes.cilium.io "ip-10-101-0-165.ec2.internal" not found
Error from server (NotFound): ciliumnodes.cilium.io "ip-10-101-0-166.ec2.internal" not found
Error from server (NotFound): ciliumnodes.cilium.io "ip-10-101-0-179.ec2.internal" not found
Error from server (NotFound): ciliumnodes.cilium.io "ip-10-101-0-184.ec2.internal" not found
➜ kubectl get node ip-10-101-8-221.ec2.internal ip-10-101-0-11.ec2.internal ip-10-101-0-112.ec2.internal ip-10-101-0-115.ec2.internal ip-10-101-0-120.ec2.internal ip-10-101-0-139.ec2.internal ip-10-101-0-140.ec2.internal ip-10-101-0-165.ec2.internal ip-10-101-0-166.ec2.internal ip-10-101-0-17.ec2.internal ip-10-101-0-179.ec2.internal ip-10-101-0-181.ec2.internal ip-10-101-0-183.ec2.internal ip-10-101-0-184.ec2.internal
NAME                           STATUS   ROLES   AGE     VERSION
ip-10-101-8-221.ec2.internal   Ready    node    3h41m   v1.16.9
ip-10-101-0-17.ec2.internal    Ready    node    4h1m    v1.16.9
ip-10-101-0-181.ec2.internal   Ready    node    4h1m    v1.16.9
ip-10-101-0-183.ec2.internal   Ready    node    4h3m    v1.16.9
Error from server (NotFound): nodes "ip-10-101-0-11.ec2.internal" not found
Error from server (NotFound): nodes "ip-10-101-0-112.ec2.internal" not found
Error from server (NotFound): nodes "ip-10-101-0-115.ec2.internal" not found
Error from server (NotFound): nodes "ip-10-101-0-120.ec2.internal" not found
Error from server (NotFound): nodes "ip-10-101-0-139.ec2.internal" not found
Error from server (NotFound): nodes "ip-10-101-0-140.ec2.internal" not found
Error from server (NotFound): nodes "ip-10-101-0-165.ec2.internal" not found
Error from server (NotFound): nodes "ip-10-101-0-166.ec2.internal" not found
Error from server (NotFound): nodes "ip-10-101-0-179.ec2.internal" not found
Error from server (NotFound): nodes "ip-10-101-0-184.ec2.internal" not found

Instances found in ciliumnodes and kubectl get node do indeed exist in EC2.

What's really bizarre here is that cilium-health status shows some dead instances responding OK:

  ip-10-101-0-139.ec2.internal:
    Host connectivity to 10.101.0.139:
      ICMP to stack:   Connection timed out
    Endpoint connectivity to 100.64.99.41:
      ICMP to stack:   OK, RTT=18.827537ms

ip-10-101-0-139.ec2.internal double-confirmed to be non-existent.

Also:

root@ip-10-101-8-221:~# cilium-health status | grep '.ec2.internal' | wc -l
503
root@ip-10-101-8-221:~# cilium node list | wc -l
259

@diversario
Copy link
Contributor Author

The agent in the previous post has fixed itself, it appears:
image

This reconciliation took ~3.5 hours.

@stale
Copy link

stale bot commented Jul 23, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 23, 2020
@trynity
Copy link

trynity commented Jul 28, 2020

Both this and the other linked issue are going to be/have been marked as stale. Is there any further insight we could get from this @aanm?

@stale stale bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 28, 2020
@aanm
Copy link
Member

aanm commented Jul 28, 2020

@trynity It would be helpful to have steps to replicate the issue.

@tgraf
Copy link
Member

tgraf commented Jul 29, 2020

@trynity What version of Cilium did you experience this?

kkourt added a commit to kkourt/cilium that referenced this issue Aug 27, 2020
NodeAdd and NodeUpdate update the node state for clients so that they
can return the changes when client requests so. If a node was added and
then updated, its old and new version would be on the added list and its
old on the removed list. Instead, we can just update the node on the
added list.

Note that the setNodes() function on pkg/health/server/prober.go first
deletes the removed nodes and then adds the new ones, which means that
the old version of the node would be added and remain as stale on the
health server.

This was found during investigation of issues with inconsistent health
reports when nodes are added/removed from the cluster (e.g., cilium#11532),
and it seems to fix inconsistencies observed a small-scale test I did to
reproduce the issue.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
joestringer pushed a commit that referenced this issue Aug 28, 2020
NodeAdd and NodeUpdate update the node state for clients so that they
can return the changes when client requests so. If a node was added and
then updated, its old and new version would be on the added list and its
old on the removed list. Instead, we can just update the node on the
added list.

Note that the setNodes() function on pkg/health/server/prober.go first
deletes the removed nodes and then adds the new ones, which means that
the old version of the node would be added and remain as stale on the
health server.

This was found during investigation of issues with inconsistent health
reports when nodes are added/removed from the cluster (e.g., #11532),
and it seems to fix inconsistencies observed a small-scale test I did to
reproduce the issue.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
kaworu pushed a commit that referenced this issue Aug 28, 2020
[ upstream commit 5550c0f ]

NodeAdd and NodeUpdate update the node state for clients so that they
can return the changes when client requests so. If a node was added and
then updated, its old and new version would be on the added list and its
old on the removed list. Instead, we can just update the node on the
added list.

Note that the setNodes() function on pkg/health/server/prober.go first
deletes the removed nodes and then adds the new ones, which means that
the old version of the node would be added and remain as stale on the
health server.

This was found during investigation of issues with inconsistent health
reports when nodes are added/removed from the cluster (e.g., #11532),
and it seems to fix inconsistencies observed a small-scale test I did to
reproduce the issue.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
joestringer pushed a commit that referenced this issue Aug 28, 2020
[ upstream commit 5550c0f ]

NodeAdd and NodeUpdate update the node state for clients so that they
can return the changes when client requests so. If a node was added and
then updated, its old and new version would be on the added list and its
old on the removed list. Instead, we can just update the node on the
added list.

Note that the setNodes() function on pkg/health/server/prober.go first
deletes the removed nodes and then adds the new ones, which means that
the old version of the node would be added and remain as stale on the
health server.

This was found during investigation of issues with inconsistent health
reports when nodes are added/removed from the cluster (e.g., #11532),
and it seems to fix inconsistencies observed a small-scale test I did to
reproduce the issue.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
christarazi pushed a commit that referenced this issue Sep 3, 2020
[ upstream commit 5550c0f ]

NodeAdd and NodeUpdate update the node state for clients so that they
can return the changes when client requests so. If a node was added and
then updated, its old and new version would be on the added list and its
old on the removed list. Instead, we can just update the node on the
added list.

Note that the setNodes() function on pkg/health/server/prober.go first
deletes the removed nodes and then adds the new ones, which means that
the old version of the node would be added and remain as stale on the
health server.

This was found during investigation of issues with inconsistent health
reports when nodes are added/removed from the cluster (e.g., #11532),
and it seems to fix inconsistencies observed a small-scale test I did to
reproduce the issue.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
joestringer pushed a commit that referenced this issue Sep 3, 2020
[ upstream commit 5550c0f ]

NodeAdd and NodeUpdate update the node state for clients so that they
can return the changes when client requests so. If a node was added and
then updated, its old and new version would be on the added list and its
old on the removed list. Instead, we can just update the node on the
added list.

Note that the setNodes() function on pkg/health/server/prober.go first
deletes the removed nodes and then adds the new ones, which means that
the old version of the node would be added and remain as stale on the
health server.

This was found during investigation of issues with inconsistent health
reports when nodes are added/removed from the cluster (e.g., #11532),
and it seems to fix inconsistencies observed a small-scale test I did to
reproduce the issue.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/health Relates to the cilium-health component kind/community-report This was reported by a user in the Cilium community, eg via Slack.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants