release-23.1: kvserver: propagate GCHint for partial deletions #110643

blathers-crl · 2023-09-14T14:15:48Z

Backport on behalf of @pavelkalinnikov:

9/9 commits from "kvserver: propagate GCHint for partial deletions" (kvserver: propagate GCHint for partial deletions #110078)
1/3 commits from "kvserver: gate writing the new sticky GCHint" (kvserver: gate writing the new sticky GCHint #112948)

Please see individual PRs for details.

/cc @cockroachdb/release

This PR backports the new behaviour of GCHint introduced in #110078 to 23.1, and makes it conditional on a default-off cluster setting. Since 23.2, it will always enabled by a version gate.

Since it is late to enable this behaviour in 23.1 (risk of incompatibility) or introduce a version gate to make it completely safe, hide it behind a default-off cluster setting. In 23.2, it will be enabled by default, and the cluster setting will be deprecated.

The safest moment to enable this cluster setting is when there is some confidence that the cluster binaries will not rollback to previous patch versions of 23.1. The risk exists only in mixed-version state in which some 23.1 binaries don't know the new GCHint fields, and some do.

Epic: none

Release note (bug fix): Fixed a bug that could occasionally cause schema change jobs (e.g. table/index drops) to appear stuck in state "waiting for MVCC GC" for much longer than expected. The fix only applies to future schema changes -- existing stuck jobs can be processed by manually force-enqueueing the relevant ranges in the MVCC GC queue under the DB Console's advanced debug page.

Release note (ops change): introduce a default-off cluster setting kv.gc.sticky_hint.enabled which helps expediting garbage collection after range deletions, such as when a SQL table or index is dropped.

Release justification: fixing bug affecting many users

The GCHint is currently used for 3 purposes. Document the status quo before making changes. Epic: none Release note: none

This is a cosmetic commit. It replaces a lower-level GCHint "reset" method with a method that takes an effective GC threshold. Epic: none Release note: none

Previously, GCHint was used for 3 purposes: 1. Instruct GC to run at specific times. This is used by SQL table/index drop jobs, to limit the amount of time they wait on data GC. 2. Hint GC that a range is fully covered by range tombstones, so that GC can use ClearRange for faster deletes. This is useful for bulk data deletions as well. 3. Hint GC that a range was fully covered by range tombstones at some recent point. GC prioritizes going through such ranges, and processes them together, because this makes Pebble compaction more efficient. The use-case 1 was broken. If a range tombstone does not fully cover the Range keyspace, GCHint is not written. Correspondingly, for small table/index deletions (spanning less than a single range), or deletions that don't perfectly match range bounds, there will be some ranges with no hints. For these ranges, GC might be arbitrarily delayed, and the schema change jobs would be stuck. In addition, GCHint propagation during merges was lossy. It could drop the hint if either LHS or RHS doesn't have a hint, or has some data. This commit extends GCHint to fix the use-case 1. Two fields are added: min and max deletion timestamp. These fields are populated even for partial range deletions, and are propagated during splits and merges unconditionally. The choice of min-max approach is explained in the comments to the corresponding GCHint fields. Release note (bug fix): Fixed a bug that could occasionally cause schema change jobs (e.g. table/index drops) to appear stuck in state "waiting for MVCC GC" for much longer than expected. The fix only applies to future schema changes -- existing stuck jobs can be processed by manually force-enqueueing the relevant ranges in the MVCC GC queue under the DB Console's advanced debug page. Epic: none

The test checks that all the GCHint invariants hold before/after the ScheduleGCFor operation, and that it returns true iff it modified the hint. The test helped to fix one bug in the method implementation. Epic: none Release note: none

Epic: none Release note: none

The test checks that all the GCHint invariants hold before/after the Merge operation; that the Merge operation is commutative; and that Merge returns true iff it modified the hint. For now, the test only Merges an empty hint with a non-empty hint. Soon adding tests for non-empty + non-empty cases. The test helped to fix one bug in Merge commutativity. Epic: none Release note: none

blathers-crl · 2023-09-14T14:15:52Z

cockroach-teamcity · 2023-09-14T14:16:27Z

This change is

erikgrinaker · 2023-09-14T14:57:59Z

Let's let this bake on master for a week or two, and wait until the next 23.1 release is cut to get more branch baking time.

erikgrinaker · 2023-10-04T10:48:11Z

Should we merge this now? The next release is in 3 weeks, which gives us a fair amount of baking time. Or we can wait for the next release to get more confidence.

erikgrinaker · 2023-10-04T11:24:36Z

In the spirit of the upcoming backport policy changes, I'd also like to get @nvanbenschoten's high-level take on this. Happy to jump on a call to catch you up.

erikgrinaker · 2023-10-05T14:52:03Z

I think @nvanbenschoten's instinct was right, I think there is a case where this will lead to assertion failures and node crashes due to diverging in-memory and on-disk state following snapshot application.

Below, vNew signifies the new patch version with support for this GCHint, while vOld signifies the old version.

A vNew leaseholder n1 writes GCHint{GCTimestamp: foo}.
A vOld follower n2 applies GCHint{} to its in-memory state.
The vOld follower n2 becomes leaseholder.
The vOld leaseholder n2 sends a snapshot to a vNew follower n3.
n3 receives GCHint{} from SnapshotRequest.Header.State, but GCHint{GCTimestamp: foo} in the snapshot data.
n3 applies the snapshot and crashes.

@pavelkalinnikov Can you verify this, and write up a release-blocker issue? We'll need a version gate for this in 23.2, and/or try to come up with a way to handle this safely in 23.1. One option might be to ignore this field if empty during the assertion comparison, similarly to DeprecatedUsingAppliedStateKey.

pav-kv · 2023-10-05T16:30:57Z

Discussed with @erikgrinaker.

For this scenario to occur, the SnapshotRequest.Header.State from step 5 should feed into the in-memory Replica state in step 6. However, it doesn't seem to be the case, we're loading the in-memory Replica state from disk. It's still worth running this test to make sure.

Since it is late to enable this behaviour in 23.1 (risk of backwards incompatibility), hide it behind a default-off cluster setting. In 23.2, it will be enabled by default, and the cluster setting will be deprecated. The new GCHint behaviour is likely backwards compatible, but we are hiding it behind a setting for extra safety. The safest moment to enable this cluster setting is when there is some confidence that the cluster binaries will not rollback to previous patch versions of 23.1. The risk exists only in mixed-version state in which some 23.1 binaries don't know the new GCHint fields, and some do. Epic: none Release note (ops change): introduce a default-off cluster setting `kv.gc.sticky_hint.enabled` which helps expediting garbage collection after range deletions, such as when a SQL table or index is dropped.

pav-kv · 2023-10-26T09:48:45Z

@erikgrinaker @nvanbenschoten I've added one commit from #112948 which introduces a default-off cluster setting for this change. See the PR description for details.

pav-kv · 2023-10-26T10:22:27Z

The tests in CI failed because they assume the setting is on (on master). Need to fix the tests so that they are aware whether the setting is on/off.

erikgrinaker

Thanks for getting this across the line.

It's unfortunate that we have to gate a bugfix on a cluster setting, but I suppose it's prudent given the small risk of replica state divergence in mixed-version clusters. The setting also can't be used after-the-fact to resolve an already stuck job, as it only applies to new schema deletions.

pkg/kv/kvserver/batcheval/cmd_delete_range.go

dt · 2023-10-26T13:13:44Z

replica state divergence in mixed-version clusters

Have we run a (manual) test where a 23.1 build with this change persists some descs and then is downgraded to a 23.1 build without the change?

erikgrinaker · 2023-10-26T13:19:46Z

Have we run a (manual) test where a 23.1 build with this change persists some descs and then is downgraded to a 23.1 build without the change?

Yes, we ran a bunch of manual tests, both with mixed-version binaries, and upgraded/downgraded binaries. We believe this should be safe even without the cluster setting, but have gated it on a default-off setting out of caution (and to conform with the new backport policy).

pav-kv · 2023-10-27T10:55:26Z

@erikgrinaker

The setting also can't be used after-the-fact to resolve an already stuck job, as it only applies to new schema deletions.

When I tested this previously, I saw that the GCHint is overridden when the job is restarted. So the workaround should be: 1. enable the cluster setting, 2. restart the affected jobs (or the nodes). I tested this workflow to check.

Would be good to verify this interaction after the introduction of the cluster setting.

Epic: none Release note: none

This commit fixed TestStoreMergeGCHint for 23.1 branch which has the sticky GC hint setting off by default. This commit also adds a new test to check that the cluster setting changes the GC hint writing behaviour correctly. Epic: none Release note: none

pav-kv · 2023-10-27T13:51:23Z

Would be good to verify this interaction after the introduction of the cluster setting.

Did an end-to-end test again (on 23.1 with this PR as is).

First, with the setting off I did the same steps to reach a stuck schema change job state. With a TTL of 10 min, I waited 20 min, and tried manually bumping the job into the mvccGC queue. The queue refused to process this range (low priority, shouldQueue=false).

Then I enabled the cluster setting. Waited another 20 min. The job did not magically got unstuck (because we did not override the GCHint yet). However, after I restarted the schema change job, in 15 min it completed (meaning it wrote a new GCHint, and then some time later GC did pick it up).

pav-kv added 9 commits September 7, 2023 22:34

kvserver: document the usage of GCHint

a6bdd95

The GCHint is currently used for 3 purposes. Document the status quo before making changes. Epic: none Release note: none

kvserver: add GCHint.UpdateAfterGC method

96a0674

This is a cosmetic commit. It replaces a lower-level GCHint "reset" method with a method that takes an effective GC threshold. Epic: none Release note: none

roachpb: unit tests for GCHint.ScheduleGCFor

92bf80a

The test checks that all the GCHint invariants hold before/after the ScheduleGCFor operation, and that it returns true iff it modified the hint. The test helped to fix one bug in the method implementation. Epic: none Release note: none

roachpb: unit test for GCHint.UpdateAfterGC

fae2d68

Epic: none Release note: none

roachpb: test GCHint.ForwardLatestRangeDeleteTimestamp

c6ce6f6

Epic: none Release note: none

roachpb: more tests for GCHint.Merge

4fb239d

Epic: none Release note: none

kvserver: cover more cases in TestStoreMergeGCHint

c10ef3b

Epic: none Release note: none

blathers-crl bot requested a review from a team as a code owner September 14, 2023 14:15

blathers-crl bot force-pushed the blathers/backport-release-23.1-110078 branch 2 times, most recently from 2e99807 to 4fb239d Compare September 14, 2023 14:15

blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Sep 14, 2023

blathers-crl bot assigned pav-kv Sep 14, 2023

blathers-crl bot added the backport Label PR's that are backports to older release branches label Sep 14, 2023

blathers-crl bot requested review from erikgrinaker and pav-kv September 14, 2023 14:15

erikgrinaker requested a review from nvanbenschoten October 4, 2023 11:24

pav-kv mentioned this pull request Oct 24, 2023

kvserver: gate writing the new sticky GCHint #112948

Merged

erikgrinaker approved these changes Oct 26, 2023

View reviewed changes

pkg/kv/kvserver/batcheval/cmd_delete_range.go Outdated Show resolved Hide resolved

dt approved these changes Oct 26, 2023

View reviewed changes

pav-kv added 2 commits October 27, 2023 12:03

batcheval: drop the line irrelevant in 23.1

7c9aaef

Epic: none Release note: none

pav-kv force-pushed the blathers/backport-release-23.1-110078 branch from 8512738 to 3bb7709 Compare October 27, 2023 11:34

dt approved these changes Oct 27, 2023

View reviewed changes

pav-kv merged commit 7952853 into release-23.1 Oct 30, 2023
6 checks passed

pav-kv deleted the blathers/backport-release-23.1-110078 branch October 30, 2023 09:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-23.1: kvserver: propagate GCHint for partial deletions #110643

release-23.1: kvserver: propagate GCHint for partial deletions #110643

blathers-crl bot commented Sep 14, 2023 •

edited by pav-kv

blathers-crl bot commented Sep 14, 2023 •

edited by pav-kv

cockroach-teamcity commented Sep 14, 2023

erikgrinaker commented Sep 14, 2023

erikgrinaker commented Oct 4, 2023 •

edited

erikgrinaker commented Oct 4, 2023 •

edited

erikgrinaker commented Oct 5, 2023 •

edited

pav-kv commented Oct 5, 2023

pav-kv commented Oct 26, 2023

pav-kv commented Oct 26, 2023

erikgrinaker left a comment

dt commented Oct 26, 2023

erikgrinaker commented Oct 26, 2023 •

edited

pav-kv commented Oct 27, 2023 •

edited

pav-kv commented Oct 27, 2023 •

edited

release-23.1: kvserver: propagate GCHint for partial deletions #110643

release-23.1: kvserver: propagate GCHint for partial deletions #110643

Conversation

blathers-crl bot commented Sep 14, 2023 • edited by pav-kv

blathers-crl bot commented Sep 14, 2023 • edited by pav-kv

cockroach-teamcity commented Sep 14, 2023

erikgrinaker commented Sep 14, 2023

erikgrinaker commented Oct 4, 2023 • edited

erikgrinaker commented Oct 4, 2023 • edited

erikgrinaker commented Oct 5, 2023 • edited

pav-kv commented Oct 5, 2023

pav-kv commented Oct 26, 2023

pav-kv commented Oct 26, 2023

erikgrinaker left a comment

Choose a reason for hiding this comment

dt commented Oct 26, 2023

erikgrinaker commented Oct 26, 2023 • edited

pav-kv commented Oct 27, 2023 • edited

pav-kv commented Oct 27, 2023 • edited

blathers-crl bot commented Sep 14, 2023 •

edited by pav-kv

blathers-crl bot commented Sep 14, 2023 •

edited by pav-kv

erikgrinaker commented Oct 4, 2023 •

edited

erikgrinaker commented Oct 4, 2023 •

edited

erikgrinaker commented Oct 5, 2023 •

edited

erikgrinaker commented Oct 26, 2023 •

edited

pav-kv commented Oct 27, 2023 •

edited

pav-kv commented Oct 27, 2023 •

edited