RFE: Start nodes without immediately accepting KV or SQL requests #70122

bobvawter · 2021-09-13T13:59:34Z

This RFE is motivated by a desire to reduce the risk of adding new nodes to production environments, especially those with non-trivial network configurations. Without presupposing an implementation, it would be useful to be able to require newly-added nodes to be explicitly activated by the operator after they have joined the RPC/gossip mesh, but before they begin accepting KV or SQL requests.

Many of our enterprise customers do not have the luxury of working in flat network topologies, where arbitrary in- or cross-region traffic is guaranteed to "just work". Consider this actual customer scenario:

Kubernetes pod IPs are not directly reachable, but must have a per-pod, dedicated Services, necessitating the use of the --advertise-addr flags.
Every network flow between a pair of IPs and/or Regions must be accounted for by firewall rules, acted upon by some other team within the company.
The teams that manage the CockroachDB cluster, k8s configurations, and network firewalls are disjoint and high-latency.

These sorts of O(n) or O(n^2) configuration issues would ideally be taken care of in an automated, repeatable fashion, but that is not a reality in all situations. We have had customers suffer cluster disfunction due to asymmetric network reachability that could not be tested for without actually launching a new Cockroach node. Past discussions about a network-quality simulator have uniformly converged to "use CockroachDB itself".

As a straw-man proposal, here is a possible set of ergonomics around an implementation:

A new cluster setting cluster.require_node_activation
When a new node is cockroach started, it will connect to existing nodes, obtain a node id, but behave as though it were drained and not a valid target for rebalancing.
Operators (human or otherwise) would be able to verify node functionality (e.g.: examine the network latency data to verify that full-mesh communication is possible with the newly-added node).
An explicit cockroach node activate # command is executed at a time of the operator's choosing.
Once a node has been marked as activated, it can never be deactivated, just drained and/or decommissioned.

Jira issue: CRDB-9952

The text was updated successfully, but these errors were encountered:

github-actions · 2023-08-24T11:11:28Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

knz · 2023-08-24T11:18:54Z

still relevant

bobvawter added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Originated from a customer A-configurability Pertains to cluster settings, CLI flags, env vars etc labels Sep 13, 2021

bobvawter assigned knz Sep 13, 2021

knz added this to To do in DB Server & Security via automation Sep 13, 2021

knz removed their assignment Sep 13, 2021

blathers-crl bot added the T-server-and-security DB Server & Security label Sep 13, 2021

knz moved this from To do to Queued for roadmapping in DB Server & Security Sep 20, 2021

github-actions bot added the no-issue-activity label Aug 24, 2023

knz removed the no-issue-activity label Aug 24, 2023

lunevalex added the P-3 Issues/test failures with no fix SLA label Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFE: Start nodes without immediately accepting KV or SQL requests #70122

RFE: Start nodes without immediately accepting KV or SQL requests #70122

bobvawter commented Sep 13, 2021 •

edited by cockroach-jira-scripts

github-actions bot commented Aug 24, 2023

knz commented Aug 24, 2023

RFE: Start nodes without immediately accepting KV or SQL requests #70122

RFE: Start nodes without immediately accepting KV or SQL requests #70122

Comments

bobvawter commented Sep 13, 2021 • edited by cockroach-jira-scripts

github-actions bot commented Aug 24, 2023

knz commented Aug 24, 2023

bobvawter commented Sep 13, 2021 •

edited by cockroach-jira-scripts