Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server,kv: ensure decommission works even when there's just a status record #70863

Closed
knz opened this issue Sep 29, 2021 · 5 comments
Closed
Assignees
Labels
A-cli-admin CLI commands that pertain to controlling and configuring nodes A-kv-server Relating to the KV-level RPC server C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team T-server-and-security DB Server & Security

Comments

@knz
Copy link
Contributor

knz commented Sep 29, 2021

Some clusters have gotten into a state where there is a status record for a node but no liveness record.

When this happens, the node shows up in various places, but the node decommission command does not work (the deco process wants to know the liveness status before it touches the record).

We want to introduce a special case in decommission to "clean up" these stray node records.

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-server Relating to the KV-level RPC server A-cli-admin CLI commands that pertain to controlling and configuring nodes labels Sep 29, 2021
@knz knz added this to To do in DB Server & Security via automation Sep 29, 2021
@knz knz added this to Incoming in KV via automation Sep 29, 2021
@blathers-crl blathers-crl bot added T-server-and-security DB Server & Security T-kv KV Team labels Sep 29, 2021
@knz
Copy link
Contributor Author

knz commented Sep 29, 2021

Erik proposes a more general KV edit tool in #70863, however I think we should still do this.

@tbg
Copy link
Member

tbg commented Sep 29, 2021

This shouldn't be possible any more on current CRDB versions (21.1+) as the liveness record is written before a new node even learns its assigned NodeID (and before it can possibly write a status record). So I don't think it's very useful to let decommissioning handle this case, as this would cause the reverse problem: it could create liveness records with user-defined NodeIDs, and users shouldn't be able to do this. I would rather go with #70863.

@knz
Copy link
Contributor Author

knz commented Sep 29, 2021

The suggestion is to have deco remove the stray status record, not synthetize a new liveness record.

@tbg
Copy link
Member

tbg commented Sep 29, 2021

Thanks, that is a good idea.

@erikgrinaker erikgrinaker self-assigned this Sep 29, 2021
@lunevalex lunevalex moved this from Incoming to Current Milestone in KV Oct 1, 2021
@erikgrinaker
Copy link
Contributor

erikgrinaker commented Oct 2, 2021

There has been some pushback against this, as it would require relaxing the checks around decommissioning, so I'm going to close the issue. The current code should already make sure that status entries are handled properly on node creation and decommissioning, this state is a result of a bug in a previous version. I'm going to ship a custom binary to the affected users to clean up their status entry.

The reason this even comes up at all is because we have APIs that return node data based solely on the status entry, and these results are then shown in e.g. the DB Console. If these APIs joined/filtered the status data against the KV liveness data, such that only nodes with a liveness entry would be returned, users would not even know about this state at all. I opened #71033 to address this.

KV automation moved this from Current Milestone to Closed Oct 2, 2021
DB Server & Security automation moved this from To do to Done 21.2 Oct 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cli-admin CLI commands that pertain to controlling and configuring nodes A-kv-server Relating to the KV-level RPC server C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team T-server-and-security DB Server & Security
Projects
DB Server & Security
  
Done 21.2
Development

Successfully merging a pull request may close this issue.

3 participants