Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-22.2: kvserver: disable {split,replicate,mvccGC} queues until... #98803

Merged

Conversation

irfansharif
Copy link
Contributor

@irfansharif irfansharif commented Mar 16, 2023

Backport 1/1 commits from #98422.

/cc @cockroachdb/release


...subscribed to span configs. Do the same for the store rebalancer. We applied this treatment for the merge queue back in #78122 since the fallback behavior, if not subscribed, is to use the statically defined span config for every operation.

  • For the replicate queue this mean obtusely applying a replication factor of 3, regardless of configuration. This was possible typically post node restart before subscription was initially established. We saw this in roachtest: replicate/wide failed #98385. It was possible then for us to ignore configured voter/non-voter/lease constraints.
  • For the split queue, we wouldn't actually compute any split keys if unsubscribed, so the missing check was somewhat benign. But we would be obtusely applying the default range sizes [128MiB,512MiB], so for clusters configured with larger range sizes, this could lead to a burst of splitting post node-restart.
  • For the MVCC GC queue, it would mean applying the the statically configured default GC TTL and ignoring any set protected timestamps. The latter is best-effort protection but could result in internal operations relying on protection (like backups, changefeeds) failing informatively. For clusters configured with GC TTL greater than the default, post node-restart it could lead to a burst of MVCC GC activity and AOST queries failing to find expected data.
  • For the store rebalancer, ignoring span configs could result in violating lease preferences and voter constraints.

Fixes #98421.
Fixes #98385.

Release note (bug fix): It was previously possible for CockroachDB to not respect non-default zone configs. This only happened for a short window after nodes with existing replicas were restarted, and self-rectified within seconds. This manifested in a few ways:

  • If num_replicas was set to something other than 3, we would still add or remove replicas to get to 3x replication.
    • If num_voters was set explicitly to get a mix of voting and non-voting replicas, it would be ignored. CockroachDB could possibly remove non-voting replicas.
  • If range_min_bytes or range_max_bytes were changed from 128 MiB and 512 MiB respectively, we would instead try to size ranges to be within [128 MiB, 512MiB]. This could appear as an excess amount of range splits or merges, as visible in the Replication Dashboard under "Range Operations".
  • If gc.ttlseconds was set to something other than 90000 seconds, we would still GC data only older than 90000s/25h. If the GC TTL was set to something larger than 25h, AOST queries going further back may now start failing. For GC TTLs less than the 25h default, clusters would observe increased disk usage due to more retained garbage.
  • If constraints, lease_preferences or voter_constraints were set, they would be ignored. Range data and leases would possibly be moved outside where prescribed. This issues only lasted a few seconds post node-restarts, and any zone config violations were rectified shortly after.

Release justification: Bug fix.

...subscribed to span configs. Do the same for the store
rebalancer. We applied this treatment for the merge queue back in cockroachdb#78122
since the fallback behavior, if not subscribed, is to use the statically
defined span config for every operation.

- For the replicate queue this mean obtusely applying a replication
  factor of 3, regardless of configuration. This was possible typically
  post node restart before subscription was initially established. We
  saw this in cockroachdb#98385. It was possible then for us to ignore configured
  voter/non-voter/lease constraints.
- For the split queue, we wouldn't actually compute any split keys if
  unsubscribed, so the missing check was somewhat benign. But we would
  be obtusely applying the default range sizes [128MiB,512MiB], so for
  clusters configured with larger range sizes, this could lead to a
  burst of splitting post node-restart.
- For the MVCC GC queue, it would mean applying the the statically
  configured default GC TTL and ignoring any set protected timestamps.
  The latter is best-effort protection but could result in internal
  operations relying on protection (like backups, changefeeds) failing
  informatively. For clusters configured with GC TTL greater than the
  default, post node-restart it could lead to a burst of MVCC GC
  activity and AOST queries failing to find expected data.
- For the store rebalancer, ignoring span configs could result in
  violating lease preferences and voter/non-voter constraints.

Fixes cockroachdb#98421.
Fixes cockroachdb#98385.

While here, we also introduce the following non-public cluster settings
to selectively enable/disable KV queues:
- kv.mvcc_gc_queue.enabled
- kv.split_queue.enabled
- kv.replicate_queue.enabled

Release note (bug fix): It was previously possible for CockroachDB to
not respect non-default zone configs. This could only happen for a short
window after nodes with existing replicas were restarted (measured in
seconds), and self-rectified (also within seconds). This manifested in a
few ways:
- If num_replicas was set to something other than 3, we would still
  add or remove replicas to get to 3x replication.
  - If num_voters was set explicitly to get a mix of voting and
    non-voting replicas, it would be ignored. CockroachDB could possibly
    remove non-voting replicas.
- If range_min_bytes or range_max_bytes were changed from 128 MiB and
  512 MiB respectively, we would instead try to size ranges to be within
  [128 MiB, 512MiB]. This could appear as an excess amount of range
  splits or merges, as visible in the Replication Dashboard under "Range
  Operations".
- If gc.ttlseconds was set to something other than 90000 seconds, we
  would still GC data only older than 90000s/25h. If the GC TTL was set
  to something larger than 25h, AOST queries going further back may now
  start failing. For GC TTLs less than the 25h default, clusters would
  observe increased disk usage due to more retained garbage.
- If constraints, lease_preferences or voter_constraints were set, they
  would be ignored. Range data and leases would possibly be moved
  outside where prescribed.
This issues lasted a few seconds post node-restarts, and any zone config
violations were rectified shortly after.
@irfansharif irfansharif requested a review from a team as a code owner March 16, 2023 19:28
@blathers-crl
Copy link

blathers-crl bot commented Mar 16, 2023

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif irfansharif merged commit 4014ae2 into cockroachdb:release-22.2 Mar 17, 2023
6 checks passed
@irfansharif irfansharif deleted the backport22.2-98422 branch March 17, 2023 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants