Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gc: use clear range operations to remove multiple garbage keys across keys and versions #90830

Merged
merged 7 commits into from Dec 19, 2022

Conversation

aliher1911
Copy link
Contributor

@aliher1911 aliher1911 commented Oct 28, 2022

Summary:

GC Will track consecutive keys and issue GCClearRange requests to remove multiple keys at once if it can find more than configured number of keys. This covers multiple versions of the same key as well as multiple consecutive keys.

On the GC side it will count consecutive keys while building "points batches" e.g. batches containing MVCC keys to delete. Those batches are capped to fit into resulting raft entries once deletion is executed. GC could accumulate more than a single batch in memory in that case. Once necessary number of consecutive keys is found, GCClearRange is issued and pending batches are discarded. If non gc-able data is found, pending batches are GC'd and clear range tracking is restarted.
To protect GC from consuming too much memory for example if it collects a range with very large keys, it uses a configrable memory limit on how many key bytes could be held in pending batches. If this limit is reached before GC reaches minimum desired number of keys, oldest batch is GC'd and starting point for clear range is adjusted.

On the evaluation side, GC requests for ranges will obtain write latches removed range to prevent other requests writing in the middle of the range. However, if only a single key is touched i.e. multiple versions are deleted, then no latches are obtained as it is done when deleting individual versions as all the removed data is below the GC threshold.

GC exposes a metric queue.gc.info.numclearconsecutivekeys that shows how many times GCClearRange was used.
GC adds new system setting kv.gc.clear_range_min_keys that sets minimum number of keys eligible for GCClearRange.


batcheval: handle ClearSubRange request fields in GC requests

Handle ClearSubRange GC request fields to use GCClearRange storage
functions for ranges of keys that contain no visible data instead
of removing each key individually.
This PR only enables processing of request field, while actual GC
and storage functionality is added in previous PRs.

Release note: None


gc: add version gate to GC clear range requests

ClearRange fields of GCRequests are ignored by earlier versions,
this feature needs to be version gated to let GC work when talking
to a previous version if GC is run from a different node.

Release note: None


gc: add metrics for clear range requests in GC

This commit adds two metrics for clear range functionality in GC:

queue.gc.info.numclearconsecutivekeys is incremented every time GC sends
a ClearRange request for a bunch of consecutive keys to a replica

Release note (ops change): Metric queue.gc.info.numclearconsecutivekeys
added. It exposes number of clear range requests to remove consecutive
keys issued by mvcc garbage collection.


gc: GC use delete_range_keys for consecutive ranges of keys

When deleting large chunks of keys usually produced by range tombstones
GC will use point deletion for all versions found within replica key
range.
That kind of operations would generate unnecessary load on the storage
because each individual version must be deleted independently.
This patch adds an ability for GC to find ranges of consecutive keys
which are not interleaved with any newer data and create
delete_range_key GC requests to remove them.

Release note: None


rditer: helper to visit multiple replica key spans

IterateMVCCReplicaKeySpans helper to complement IterateplicaKeySpans
Existing tests moved from ReplicaMVCCDataIterator to
IterateMVCCReplicaKeySpans as former is being sunsetted and only a
single call site exists.

Release note: None


storage: add gc operation that uses clear range

Deleting multiple versions of same key or continuous ranges of
point keys with pebble tombstones could be inefficient when range
has lots of garbage.
This pr adds a storage gc operation that uses pebble range tombstones
to remove data.

Release note: None


gc: add delete range parameters to gc request

This commit adds clear_sub_range_key parameters to GC request.
clear_sub_range_keys is used by GC queue to remove chunks of
consecutive keys where no new data was written above the GC
threshold and storage can optimize deletions with range
tombstones.
To support new types of keys, GCer interface is also updated
to pass provided keys down to request.

Release note: None

Fixes: #84560, #61455

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@aliher1911 aliher1911 force-pushed the clear_range_gc_7 branch 2 times, most recently from cac19b4 to 42e5b13 Compare October 28, 2022 23:18
@aliher1911 aliher1911 self-assigned this Oct 28, 2022
@aliher1911 aliher1911 force-pushed the clear_range_gc_7 branch 3 times, most recently from 549ddf1 to 98b1a41 Compare October 31, 2022 10:46
@aliher1911 aliher1911 changed the title gc: add delete range parameters to gc request gc: use clear range operations to remove multiple garbage keys across keys and versions Oct 31, 2022
@aliher1911 aliher1911 force-pushed the clear_range_gc_7 branch 4 times, most recently from 92bc9b3 to ed74dcc Compare October 31, 2022 15:33
@aliher1911 aliher1911 marked this pull request as ready for review October 31, 2022 17:17
@aliher1911 aliher1911 requested review from a team as code owners October 31, 2022 17:17
Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @erikgrinaker)


-- commits line 57 at r7:
can you include a release note for the new metric?

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flushing some comments after an initial high-level pass. I need to do a more thorough pass later, in particular of the GC batching code.

pkg/clusterversion/cockroach_versions.go Outdated Show resolved Hide resolved
pkg/roachpb/api.proto Outdated Show resolved Hide resolved
pkg/roachpb/api.proto Outdated Show resolved Hide resolved
pkg/storage/mvcc.go Outdated Show resolved Hide resolved
pkg/storage/mvcc.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/gc/gc.go Show resolved Hide resolved
pkg/kv/kvserver/rditer/replica_data_iter_test.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/rditer/replica_data_iter_test.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/rditer/replica_data_iter_test.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/rditer/replica_data_iter_test.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/gc/gc_random_test.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/gc/gc.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/gc/gc.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/gc/gc.go Show resolved Hide resolved
pkg/kv/kvserver/gc/gc.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/gc/gc.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/gc/gc.go Show resolved Hide resolved
pkg/kv/kvserver/gc/gc.go Show resolved Hide resolved
pkg/kv/kvserver/gc/gc.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/gc/gc.go Outdated Show resolved Hide resolved
@aliher1911 aliher1911 force-pushed the clear_range_gc_7 branch 9 times, most recently from ffb73af to f80f540 Compare November 28, 2022 14:36
pkg/kv/kvserver/batcheval/cmd_gc.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/batcheval/cmd_gc.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/batcheval/cmd_gc.go Outdated Show resolved Hide resolved
pkg/roachpb/api.proto Outdated Show resolved Hide resolved
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to do another pass over the batch logic, but I've re-reviewed the rest.

pkg/kv/kvserver/rditer/replica_data_iter_test.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/rditer/replica_data_iter_test.go Outdated Show resolved Hide resolved
pkg/storage/mvcc.go Outdated Show resolved Hide resolved
pkg/storage/mvcc.go Outdated Show resolved Hide resolved
pkg/storage/mvcc.go Outdated Show resolved Hide resolved
pkg/storage/mvcc.go Show resolved Hide resolved
pkg/storage/mvcc.go Outdated Show resolved Hide resolved
pkg/storage/mvcc.go Outdated Show resolved Hide resolved
>> at end:
data: "A"/10.000000000,0 -> /BYTES/B

# Check that we can't invoke gc_clear_range if we have live data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a couple of variants that check a live value and a point tombstone that are exactly on the GC threshold too (or do we have them already?).

Copy link
Contributor Author

@aliher1911 aliher1911 Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much sense for the full range deletion. I add one more test but overall we bail if there's anything not masked by range tombstone even if technically it should be collectable. E.g. point tombstone followed by range tombstone is not enough for fast path to succeed.

But I'll check if we can add more for non optimized case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation is that I want to make sure we have test coverage for this branch, and the corresponding branch in the full-range deletion:

https://github.com/cockroachdb/cockroach/pull/90830/files#diff-2a418618ad51e1cdef4b44240ef9ee4789e9f97f2d6d9734b67ea7fc94d5d9afR5452-R5457

I don't believe we currently have any test cases that hit it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think this condition is not needed at all. If key is garbage then it must be covered by newer value (its timestamp must be below coveredBy). And coveredBy timestamp is guaranteed to be at or below gcThreshold so we must be good.

// safe to continue because we bumped the GC
// thresholds. We may leave some inconsistent history
// behind, but nobody can read it.
log.Warningf(ctx, "failed to GC keys with clear range: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we discussed not GCing range keys if point key GC failed.

Comment on lines 666 to 669
// be greater or equal to the last key (lowest, collected last) of the first
// (collected first, containing highest keys) points batch. It is guaranteed
// that there are no garbage keys between this key and the start of first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be equal to the last key? Doesn't it have to be greater than it in order to cover it?

Also, this should be "no keys" rather than "no garbage keys", right? Because any live keys would be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is correct. It could be last when we start iterating, but we will always tighten it to the batch[0][0].Next() before sending.

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GC batcher LGTM, feel free to merge once comments have been addressed.

@erikgrinaker
Copy link
Contributor

PS: this can close out #61455 too.

This commit adds timestamp field to clear_range_key parameters
of GC request. New timestamp parameter is used by GC queue to
remove chunks of consecutive keys where no new data was written
above the GC threshold and storage can optimize deletions with
range tombstones.

Release note: None
Deleting multiple versions of same key or continuous ranges of
point keys with pebble tombstones could be inefficient when range
has lots of garbage.
This pr adds a storage gc operation that uses pebble range tombstones
to remove data.

Release note: None
IterateMVCCReplicaKeySpans helper to complement IterateplicaKeySpans
Existing tests moved from ReplicaMVCCDataIterator to
IterateMVCCReplicaKeySpans as former is being sunsetted and only a
single call site exists.

Release note: None
When deleting large chunks of keys usually produced by range tombstones
GC will use point deletion for all versions found within replica key
range.
That kind of operations would generate unnecessary load on the storage
because each individual version must be deleted independently.
This patch adds an ability for GC to find ranges of consecutive keys
which are not interleaved with any newer data and create
clear_range_key GC requests to remove them.

Release note: None
This commit adds metrics to GC ClearRange requests in GC. Metrics are
shared with with existing GC ClearRangeKey full range deletions since
they are peforming similar operation and exposed as
queue.gc.info.clearrangesuccess/queue.gc.info.clearrangefailed.

Release note (ops change): Metrics queue.gc.info.clearrangesuccess
queue.gc.info.clearrangefailed updated to include statistics about
GC operations that perform ClearRange on parts of range keyspace.
Previously those metrics only included requests to remove range data
completely when performing a schema change.
ClearRange fields of GCRequests are ignored by earlier versions,
this feature needs to be version gated to let GC work when talking
to a previous version if GC is run from a different node.

Release note: None
Handle ClearRange GC request fields to use GCClearRange storage
functions for ranges of keys that contain no visible data instead
of removing each key individually.
This PR only enables processing of request field, while actual GC
and storage functionality is added in previous PRs.

Release note: None
@aliher1911
Copy link
Contributor Author

bors r=erikgrinaker

@craig
Copy link
Contributor

craig bot commented Dec 19, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Dec 19, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: use ClearMVCCRange when GC-ing multiple versions under range tombstones
4 participants