Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: join node status info against liveness info #71033

Open
erikgrinaker opened this issue Oct 2, 2021 · 1 comment
Open

server: join node status info against liveness info #71033

erikgrinaker opened this issue Oct 2, 2021 · 1 comment
Labels
A-kv-observability A-kv-server Relating to the KV-level RPC server A-observability-inf C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-23.2-scale-testing issues found during 23.2 scale testing O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Oct 2, 2021

The status server will return node status data regardless of whether a corresponding KV liveness entry exists for the node, e.g. via StatusServer.Nodes(). This can cause it to return data for nodes that are not considered part of the cluster, if it has a status entry but not a liveness entry. We need to filter these nodes by liveness entries to make sure we don't return data about invalid/unknown nodes, which then shows up in e.g. the DB Console. This situation has been seen to happen with customer clusters.

Jira issue: CRDB-10364

Epic CRDB-32131

@erikgrinaker erikgrinaker added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-server Relating to the KV-level RPC server A-kv-observability T-server-and-security DB Server & Security labels Oct 2, 2021
@erikgrinaker erikgrinaker added this to To do in DB Server & Security via automation Oct 2, 2021
@knz knz added this to Incoming in KV via automation Jun 16, 2022
@knz knz removed this from To do in DB Server & Security Jun 16, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Jun 16, 2022
@knz knz added T-kv-observability and removed T-server-and-security DB Server & Security T-kv KV Team labels Jun 16, 2022
@nvanbenschoten nvanbenschoten moved this from Incoming to On Hold in KV Jul 18, 2022
@nvanbenschoten nvanbenschoten removed this from On Hold in KV Jul 18, 2022
@blathers-crl blathers-crl bot added this to Triage in Cluster Observability Mar 16, 2023
@j82w j82w moved this from Triage to Backlog in Cluster Observability Mar 30, 2023
@erikgrinaker
Copy link
Contributor Author

The inverse is also true: if a node has a liveness entry but not a status entry (e.g. because it hasn't fully joined the cluster yet), then it won't show up but it will still be considered part of the cluster and e.g. block upgrades.

The liveness entry is the canonical record, the status entry is secondary.

@williamkulju williamkulju added O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster O-23.2-scale-testing issues found during 23.2 scale testing labels Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-observability A-kv-server Relating to the KV-level RPC server A-observability-inf C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-23.2-scale-testing issues found during 23.2 scale testing O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster
Projects
No open projects
Development

No branches or pull requests

4 participants