release-22.2: kvserver: disable {split,replicate,mvccGC} queues until... #98803

irfansharif · 2023-03-16T19:28:24Z

Backport 1/1 commits from #98422.

/cc @cockroachdb/release

...subscribed to span configs. Do the same for the store rebalancer. We applied this treatment for the merge queue back in #78122 since the fallback behavior, if not subscribed, is to use the statically defined span config for every operation.

For the replicate queue this mean obtusely applying a replication factor of 3, regardless of configuration. This was possible typically post node restart before subscription was initially established. We saw this in roachtest: replicate/wide failed #98385. It was possible then for us to ignore configured voter/non-voter/lease constraints.
For the split queue, we wouldn't actually compute any split keys if unsubscribed, so the missing check was somewhat benign. But we would be obtusely applying the default range sizes [128MiB,512MiB], so for clusters configured with larger range sizes, this could lead to a burst of splitting post node-restart.
For the MVCC GC queue, it would mean applying the the statically configured default GC TTL and ignoring any set protected timestamps. The latter is best-effort protection but could result in internal operations relying on protection (like backups, changefeeds) failing informatively. For clusters configured with GC TTL greater than the default, post node-restart it could lead to a burst of MVCC GC activity and AOST queries failing to find expected data.
For the store rebalancer, ignoring span configs could result in violating lease preferences and voter constraints.

Fixes #98421.
Fixes #98385.

Release note (bug fix): It was previously possible for CockroachDB to not respect non-default zone configs. This only happened for a short window after nodes with existing replicas were restarted, and self-rectified within seconds. This manifested in a few ways:

If num_replicas was set to something other than 3, we would still add or remove replicas to get to 3x replication.
- If num_voters was set explicitly to get a mix of voting and non-voting replicas, it would be ignored. CockroachDB could possibly remove non-voting replicas.
If range_min_bytes or range_max_bytes were changed from 128 MiB and 512 MiB respectively, we would instead try to size ranges to be within [128 MiB, 512MiB]. This could appear as an excess amount of range splits or merges, as visible in the Replication Dashboard under "Range Operations".
If gc.ttlseconds was set to something other than 90000 seconds, we would still GC data only older than 90000s/25h. If the GC TTL was set to something larger than 25h, AOST queries going further back may now start failing. For GC TTLs less than the 25h default, clusters would observe increased disk usage due to more retained garbage.
If constraints, lease_preferences or voter_constraints were set, they would be ignored. Range data and leases would possibly be moved outside where prescribed. This issues only lasted a few seconds post node-restarts, and any zone config violations were rectified shortly after.

Release justification: Bug fix.

...subscribed to span configs. Do the same for the store rebalancer. We applied this treatment for the merge queue back in cockroachdb#78122 since the fallback behavior, if not subscribed, is to use the statically defined span config for every operation. - For the replicate queue this mean obtusely applying a replication factor of 3, regardless of configuration. This was possible typically post node restart before subscription was initially established. We saw this in cockroachdb#98385. It was possible then for us to ignore configured voter/non-voter/lease constraints. - For the split queue, we wouldn't actually compute any split keys if unsubscribed, so the missing check was somewhat benign. But we would be obtusely applying the default range sizes [128MiB,512MiB], so for clusters configured with larger range sizes, this could lead to a burst of splitting post node-restart. - For the MVCC GC queue, it would mean applying the the statically configured default GC TTL and ignoring any set protected timestamps. The latter is best-effort protection but could result in internal operations relying on protection (like backups, changefeeds) failing informatively. For clusters configured with GC TTL greater than the default, post node-restart it could lead to a burst of MVCC GC activity and AOST queries failing to find expected data. - For the store rebalancer, ignoring span configs could result in violating lease preferences and voter/non-voter constraints. Fixes cockroachdb#98421. Fixes cockroachdb#98385. While here, we also introduce the following non-public cluster settings to selectively enable/disable KV queues: - kv.mvcc_gc_queue.enabled - kv.split_queue.enabled - kv.replicate_queue.enabled Release note (bug fix): It was previously possible for CockroachDB to not respect non-default zone configs. This could only happen for a short window after nodes with existing replicas were restarted (measured in seconds), and self-rectified (also within seconds). This manifested in a few ways: - If num_replicas was set to something other than 3, we would still add or remove replicas to get to 3x replication. - If num_voters was set explicitly to get a mix of voting and non-voting replicas, it would be ignored. CockroachDB could possibly remove non-voting replicas. - If range_min_bytes or range_max_bytes were changed from 128 MiB and 512 MiB respectively, we would instead try to size ranges to be within [128 MiB, 512MiB]. This could appear as an excess amount of range splits or merges, as visible in the Replication Dashboard under "Range Operations". - If gc.ttlseconds was set to something other than 90000 seconds, we would still GC data only older than 90000s/25h. If the GC TTL was set to something larger than 25h, AOST queries going further back may now start failing. For GC TTLs less than the 25h default, clusters would observe increased disk usage due to more retained garbage. - If constraints, lease_preferences or voter_constraints were set, they would be ignored. Range data and leases would possibly be moved outside where prescribed. This issues lasted a few seconds post node-restarts, and any zone config violations were rectified shortly after.

blathers-crl · 2023-03-16T19:28:28Z

cockroach-teamcity · 2023-03-16T19:28:34Z

This change is

irfansharif requested review from arulajmani and kvoli March 16, 2023 19:28

irfansharif requested a review from a team as a code owner March 16, 2023 19:28

kvoli approved these changes Mar 17, 2023

View reviewed changes

irfansharif merged commit 4014ae2 into cockroachdb:release-22.2 Mar 17, 2023
6 checks passed

irfansharif deleted the backport22.2-98422 branch March 17, 2023 03:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-22.2: kvserver: disable {split,replicate,mvccGC} queues until... #98803

release-22.2: kvserver: disable {split,replicate,mvccGC} queues until... #98803

irfansharif commented Mar 16, 2023 •

edited

blathers-crl bot commented Mar 16, 2023 •

edited by irfansharif

cockroach-teamcity commented Mar 16, 2023

release-22.2: kvserver: disable {split,replicate,mvccGC} queues until... #98803

release-22.2: kvserver: disable {split,replicate,mvccGC} queues until... #98803

Conversation

irfansharif commented Mar 16, 2023 • edited

blathers-crl bot commented Mar 16, 2023 • edited by irfansharif

cockroach-teamcity commented Mar 16, 2023

irfansharif commented Mar 16, 2023 •

edited

blathers-crl bot commented Mar 16, 2023 •

edited by irfansharif