RFE: Start nodes without immediately accepting KV or SQL requests #70122
Labels
A-configurability
Pertains to cluster settings, CLI flags, env vars etc
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
O-support
Originated from a customer
P-3
Issues/test failures with no fix SLA
T-server-and-security
DB Server & Security
Projects
This RFE is motivated by a desire to reduce the risk of adding new nodes to production environments, especially those with non-trivial network configurations. Without presupposing an implementation, it would be useful to be able to require newly-added nodes to be explicitly activated by the operator after they have joined the RPC/gossip mesh, but before they begin accepting KV or SQL requests.
Many of our enterprise customers do not have the luxury of working in flat network topologies, where arbitrary in- or cross-region traffic is guaranteed to "just work". Consider this actual customer scenario:
Service
s, necessitating the use of the--advertise-addr
flags.These sorts of O(n) or O(n^2) configuration issues would ideally be taken care of in an automated, repeatable fashion, but that is not a reality in all situations. We have had customers suffer cluster disfunction due to asymmetric network reachability that could not be tested for without actually launching a new Cockroach node. Past discussions about a network-quality simulator have uniformly converged to "use CockroachDB itself".
As a straw-man proposal, here is a possible set of ergonomics around an implementation:
cluster.require_node_activation
cockroach start
ed, it will connect to existing nodes, obtain a node id, but behave as though it were drained and not a valid target for rebalancing.cockroach node activate #
command is executed at a time of the operator's choosing.Jira issue: CRDB-9952
The text was updated successfully, but these errors were encountered: