New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-23.1: kvserver: propagate GCHint for partial deletions #110643
Conversation
The GCHint is currently used for 3 purposes. Document the status quo before making changes. Epic: none Release note: none
This is a cosmetic commit. It replaces a lower-level GCHint "reset" method with a method that takes an effective GC threshold. Epic: none Release note: none
Previously, GCHint was used for 3 purposes: 1. Instruct GC to run at specific times. This is used by SQL table/index drop jobs, to limit the amount of time they wait on data GC. 2. Hint GC that a range is fully covered by range tombstones, so that GC can use ClearRange for faster deletes. This is useful for bulk data deletions as well. 3. Hint GC that a range was fully covered by range tombstones at some recent point. GC prioritizes going through such ranges, and processes them together, because this makes Pebble compaction more efficient. The use-case 1 was broken. If a range tombstone does not fully cover the Range keyspace, GCHint is not written. Correspondingly, for small table/index deletions (spanning less than a single range), or deletions that don't perfectly match range bounds, there will be some ranges with no hints. For these ranges, GC might be arbitrarily delayed, and the schema change jobs would be stuck. In addition, GCHint propagation during merges was lossy. It could drop the hint if either LHS or RHS doesn't have a hint, or has some data. This commit extends GCHint to fix the use-case 1. Two fields are added: min and max deletion timestamp. These fields are populated even for partial range deletions, and are propagated during splits and merges unconditionally. The choice of min-max approach is explained in the comments to the corresponding GCHint fields. Release note (bug fix): Fixed a bug that could occasionally cause schema change jobs (e.g. table/index drops) to appear stuck in state "waiting for MVCC GC" for much longer than expected. The fix only applies to future schema changes -- existing stuck jobs can be processed by manually force-enqueueing the relevant ranges in the MVCC GC queue under the DB Console's advanced debug page. Epic: none
The test checks that all the GCHint invariants hold before/after the ScheduleGCFor operation, and that it returns true iff it modified the hint. The test helped to fix one bug in the method implementation. Epic: none Release note: none
Epic: none Release note: none
Epic: none Release note: none
Epic: none Release note: none
Epic: none Release note: none
The test checks that all the GCHint invariants hold before/after the Merge operation; that the Merge operation is commutative; and that Merge returns true iff it modified the hint. For now, the test only Merges an empty hint with a non-empty hint. Soon adding tests for non-empty + non-empty cases. The test helped to fix one bug in Merge commutativity. Epic: none Release note: none
2e99807
to
4fb239d
Compare
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
Let's let this bake on master for a week or two, and wait until the next 23.1 release is cut to get more branch baking time. |
Should we merge this now? The next release is in 3 weeks, which gives us a fair amount of baking time. Or we can wait for the next release to get more confidence. |
In the spirit of the upcoming backport policy changes, I'd also like to get @nvanbenschoten's high-level take on this. Happy to jump on a call to catch you up. |
I think @nvanbenschoten's instinct was right, I think there is a case where this will lead to assertion failures and node crashes due to diverging in-memory and on-disk state following snapshot application. Below, vNew signifies the new patch version with support for this GCHint, while vOld signifies the old version.
@pavelkalinnikov Can you verify this, and write up a |
Discussed with @erikgrinaker. For this scenario to occur, the |
Since it is late to enable this behaviour in 23.1 (risk of backwards incompatibility), hide it behind a default-off cluster setting. In 23.2, it will be enabled by default, and the cluster setting will be deprecated. The new GCHint behaviour is likely backwards compatible, but we are hiding it behind a setting for extra safety. The safest moment to enable this cluster setting is when there is some confidence that the cluster binaries will not rollback to previous patch versions of 23.1. The risk exists only in mixed-version state in which some 23.1 binaries don't know the new GCHint fields, and some do. Epic: none Release note (ops change): introduce a default-off cluster setting `kv.gc.sticky_hint.enabled` which helps expediting garbage collection after range deletions, such as when a SQL table or index is dropped.
@erikgrinaker @nvanbenschoten I've added one commit from #112948 which introduces a default-off cluster setting for this change. See the PR description for details. |
The tests in CI failed because they assume the setting is on (on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting this across the line.
It's unfortunate that we have to gate a bugfix on a cluster setting, but I suppose it's prudent given the small risk of replica state divergence in mixed-version clusters. The setting also can't be used after-the-fact to resolve an already stuck job, as it only applies to new schema deletions.
Have we run a (manual) test where a 23.1 build with this change persists some descs and then is downgraded to a 23.1 build without the change? |
Yes, we ran a bunch of manual tests, both with mixed-version binaries, and upgraded/downgraded binaries. We believe this should be safe even without the cluster setting, but have gated it on a default-off setting out of caution (and to conform with the new backport policy). |
When I tested this previously, I saw that the Would be good to verify this interaction after the introduction of the cluster setting. |
Epic: none Release note: none
This commit fixed TestStoreMergeGCHint for 23.1 branch which has the sticky GC hint setting off by default. This commit also adds a new test to check that the cluster setting changes the GC hint writing behaviour correctly. Epic: none Release note: none
8512738
to
3bb7709
Compare
Did an end-to-end test again (on 23.1 with this PR as is). First, with the setting off I did the same steps to reach a stuck schema change job state. With a TTL of 10 min, I waited 20 min, and tried manually bumping the job into the mvccGC queue. The queue refused to process this range (low priority, shouldQueue=false). Then I enabled the cluster setting. Waited another 20 min. The job did not magically got unstuck (because we did not override the |
Backport on behalf of @pavelkalinnikov:
Please see individual PRs for details.
/cc @cockroachdb/release
This PR backports the new behaviour of
GCHint
introduced in #110078 to 23.1, and makes it conditional on a default-off cluster setting. Since 23.2, it will always enabled by a version gate.Since it is late to enable this behaviour in 23.1 (risk of incompatibility) or introduce a version gate to make it completely safe, hide it behind a default-off cluster setting. In 23.2, it will be enabled by default, and the cluster setting will be deprecated.
The safest moment to enable this cluster setting is when there is some confidence that the cluster binaries will not rollback to previous patch versions of 23.1. The risk exists only in mixed-version state in which some 23.1 binaries don't know the new
GCHint
fields, and some do.Epic: none
Release note (bug fix): Fixed a bug that could occasionally cause schema change jobs (e.g. table/index drops) to appear stuck in state "waiting for MVCC GC" for much longer than expected. The fix only applies to future schema changes -- existing stuck jobs can be processed by manually force-enqueueing the relevant ranges in the MVCC GC queue under the DB Console's advanced debug page.
Release note (ops change): introduce a default-off cluster setting
kv.gc.sticky_hint.enabled
which helps expediting garbage collection after range deletions, such as when a SQL table or index is dropped.Release justification: fixing bug affecting many users