storage: eagerly carry out chunked GC #24209

tbg · 2018-03-26T16:57:13Z

The GC queue was already splitting up the set of keys to be deleted into
small parts to avoid overloading the Raft machinery. From there, it was
only a small step to let it dispatch the corresponding GCRequests eagerly.

The upshot is that now we don't have to hold all of the keys to be deleted
in memory any more.

With appropriate testing, I think this is a change suitable for inclusion
in 2.0.1. "Appropriate testing" includes a roachtest which creates a large
range (larger than available RAM) and fills it with GC'able data. (A
variant of the DROP test that increases the range size suitably should do
it).

Release note (performance improvement): lowered the amount of memory used
during garbage collection of old versions.

cockroach-teamcity · 2018-03-26T16:57:21Z

This change is

bdarnell · 2018-03-26T20:26:22Z

creates a large range (larger than available RAM)

Shouldn't that be disallowed by the backpressure introduced by #21777?

This for 2.1 because it is a step towards increasing range size limits, but before we consider this for 2.0.1 we should do more investigation to figure out how we got such an oversized range to begin with.

Reviewed 1 of 1 files at r1, 3 of 3 files at r2.
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.

pkg/storage/gc_queue.go, line 517 at r2 (raw file):

}

// NoopGCer implements storage.GCer by doing nothing.

Nit: we're in package storage so no need to repeat it in the doc comments.

pkg/storage/gc_queue.go, line 550 at r2 (raw file):

	var ba roachpb.BatchRequest

	// Technically not needed since we're talking directly to the Range.

s/Range/Replica/

pkg/storage/gc_queue.go, line 568 at r2 (raw file):

	req.Threshold = thresh.Key
	req.TxnSpanGCThreshold = thresh.Txn
	return r.send(ctx, req)

Comment that we specifically do not batch SetGCThreshold with keys to GC to minimize contention in the command queue.

Comments from Reviewable

bdarnell · 2018-03-26T21:28:39Z

And approved for 2.0.1 following discussion in #24215

nvanbenschoten · 2018-03-26T23:03:05Z

Reviewed 1 of 1 files at r1, 3 of 3 files at r2.
Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.

pkg/storage/gc_queue.go, line 538 at r2 (raw file):

	desc := r.repl.Desc()
	var template roachpb.GCRequest
	template.Key = desc.StartKey.AsRawKey()

Consider adding a TODO to constrain this header to only what's actually in the GCRequest. Now that we're chunking these requests, we can be a lot more CommandQueue friendly.

pkg/storage/gc_queue.go, line 555 at r2 (raw file):

	var union roachpb.RequestUnion
	union.MustSetInner(&req)
	ba.Requests = append(ba.Requests, union)

ba.Add?

pkg/storage/gc_queue.go, line 568 at r2 (raw file):

minimize contention in the command queue

How does this minimize contention? Just because SetGCThreshold will now be very quick? They both still declare a write over the entire range of keys.

On this note, what time do these GCRequests indicate to the CommandQueue that they're writing at? We should be able to move the requests into the past to the threshold timestamp such that #14342 ensures that future reads don't block on them. In fact, writes shouldn't need to block on them either, although I think there would be practical complications to allowing that.

pkg/storage/gc_queue.go, line 819 at r2 (raw file):

							// Allow releasing the memory backing batchGCKeys.
							iter.ResetAllocator()
							batchGCKeys = []roachpb.GCRequest_GCKey{}

batchGCKeys = nil

pkg/storage/gc_queue.go, line 836 at r2 (raw file):

	// From now on, all newly added keys are range-local.
	// TODO(tschottdorf): Might need to use two requests at some point since we

Does this TODO go away?

Comments from Reviewable

tbg · 2018-03-26T23:44:41Z

Shouldn't that be disallowed by the backpressure introduced by #21777?

True (and exlained in #24215). The comment also applies to scenarios in which we can run with range sizes larger than the available RAM, though, which I hope will be a real option in 2.1 or 2.2 (and which will be what the roachtest will do, except it waits for upreplication before growing the data set to avoid snapshot blow-out).

Review status: 1 of 4 files reviewed at latest revision, 7 unresolved discussions.

pkg/storage/gc_queue.go, line 517 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Nit: we're in package storage so no need to repeat it in the doc comments.

Forgot to update this when I moved this to storage, done.

pkg/storage/gc_queue.go, line 538 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Consider adding a TODO to constrain this header to only what's actually in the GCRequest. Now that we're chunking these requests, we can be a lot more CommandQueue friendly.

This is actually happening under the hood already, see

cockroach/pkg/storage/batcheval/cmd_gc.go

Line 36 in fbffeec

// Intentionally don't call DefaultDeclareKeys: the key range in the header

. We've argued for a while that we should be sending individual requests instead, see #7880, and then we wouldn't have to set this fake span (which we use to bounce the commands on split out of paranoia) at all.
Let me know if you still think there should be a TODO.

pkg/storage/gc_queue.go, line 550 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/Range/Replica/

Done.

pkg/storage/gc_queue.go, line 555 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

ba.Add?

🤦‍♂️ Done

pkg/storage/gc_queue.go, line 568 at r2 (raw file):
See the other comment above. GCRequest declares a nonstandard set of keys.

On this note, what time do these GCRequests indicate to the CommandQueue that they're writing at?

They declare clock.Now() (see template()). You're right that we could run them at the larger of the two involved thresholds instead. That could actually be a nice win. And yeah, ditto for writes. Could we achieve this by having the requests declare synthetic keys only checked by other GCRequests? I.e. if there is a GCRequest for /table/a we really declare /local/magic/table/a to achieve that GCRequests will block on each other, but other traffic doesn't. I don't like the magic of that, though.

Either way, we shouldn't introduce any of these things without a benchmark. Some queue-type workload roachtests are overdue anyway, those should be able to profit from these kinds of tweaks.

Filed #24237.

pkg/storage/gc_queue.go, line 819 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

batchGCKeys = nil

Done.

pkg/storage/gc_queue.go, line 836 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does this TODO go away?

I'm not seeing a TODO.

Comments from Reviewable

nvanbenschoten · 2018-03-27T16:06:05Z

Review status: 1 of 4 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.

pkg/storage/gc_queue.go, line 538 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

This is actually happening under the hood already, see

cockroach/pkg/storage/batcheval/cmd_gc.go

Line 36 in fbffeec

// Intentionally don't call DefaultDeclareKeys: the key range in the header

. We've argued for a while that we should be sending individual requests instead, see #7880, and then we wouldn't have to set this fake span (which we use to bounce the commands on split out of paranoia) at all.
Let me know if you still think there should be a TODO.

Ah ok, I didn't know we were doing that. Thanks for explaining. No need for a TODO.

pkg/storage/gc_queue.go, line 568 at r2 (raw file):
Thanks for filing!

Could we achieve this by having the requests declare synthetic keys only checked by other GCRequests?

That should work, but this has me curious about something else. If GCRequest is declared at clock.Now(), is it ignored by historical reads in the commandQueue? In that case, how does the synchronization work when we check that the requestCanProceed? Is it possible that a historical read passes this check but still finds that the keys it is trying to read have been GCed?

pkg/storage/gc_queue.go, line 836 at r2 (raw file):
The one that was removed. It still seems applicable.

// TODO(tschottdorf): Might need to use two requests at some point since we
// hard-coded the full non-local key range in the header, but that does
// not take into account the range-local keys. It will be OK as long as
// we send directly to the Replica, though.

Comments from Reviewable

tbg · 2018-04-05T22:49:49Z

Review status: 1 of 4 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.

pkg/storage/gc_queue.go, line 568 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Thanks for filing!

Could we achieve this by having the requests declare synthetic keys only checked by other GCRequests?

That should work, but this has me curious about something else. If GCRequest is declared at clock.Now(), is it ignored by historical reads in the commandQueue? In that case, how does the synchronization work when we check that the requestCanProceed? Is it possible that a historical read passes this check but still finds that the keys it is trying to read have been GCed?

I think you might be right and we should fix it but morally this won't happen. Note that we bump the thresholds in the first command that doesn't actually GC any keys. This means that in practice it's visible to all reads by the time we actually remove data. I think there's a scenario in which a read slips by the GC bump but gets stalled before opening a RocksDB snapshot, so that when it finally does we have actually already removed keys? Setting ReplicatedEvalResult.BlockReads = true should fix that problem.

Filed https://github.com/cockroachdb/cockroach/issues/new.

pkg/storage/gc_queue.go, line 836 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The one that was removed. It still seems applicable.

// TODO(tschottdorf): Might need to use two requests at some point since we
// hard-coded the full non-local key range in the header, but that does
// not take into account the range-local keys. It will be OK as long as
// we send directly to the Replica, though.

I'm not sure about that TODO any more. GCRequest is a little weird in that its span doesn't mean much as long as it hits the right range. We use the whole range span, so anything that addresses there is also covered.

Comments from Reviewable

Release note: None

The GC queue was already splitting up the set of keys to be deleted into small parts to avoid overloading the Raft machinery. From there, it was only a small step to let it dispatch the corresponding GCRequests eagerly. The upshot is that now we don't have to hold all of the keys to be deleted in memory any more. With appropriate testing, I think this is a change suitable for inclusion in 2.0.1. "Appropriate testing" includes a roachtest which creates a large range (larger than available RAM) and fills it with GC'able data. (A variant of the `DROP` test that increases the range size suitably should do it). Such a test is filed as cockroachdb#24214. Release note (performance improvement): lowered the amount of memory used during garbage collection of old versions.

tbg · 2018-04-05T23:02:36Z

bors r+

24209: storage: eagerly carry out chunked GC r=tschottdorf a=tschottdorf The GC queue was already splitting up the set of keys to be deleted into small parts to avoid overloading the Raft machinery. From there, it was only a small step to let it dispatch the corresponding GCRequests eagerly. The upshot is that now we don't have to hold all of the keys to be deleted in memory any more. With appropriate testing, I think this is a change suitable for inclusion in 2.0.1. "Appropriate testing" includes a roachtest which creates a large range (larger than available RAM) and fills it with GC'able data. (A variant of the `DROP` test that increases the range size suitably should do it). Release note (performance improvement): lowered the amount of memory used during garbage collection of old versions.

craig · 2018-04-05T23:27:38Z

Build succeeded

GitHub CI (Cockroach)

tbg requested a review from a team as a code owner March 26, 2018 16:57

tbg requested a review from a team March 26, 2018 16:57

tbg mentioned this pull request Mar 26, 2018

roachtest: test that GC does not pull all gc'able keys into memory #24214

Closed

tbg mentioned this pull request Mar 26, 2018

kv: reduce latch contention for GCRequest #24237

Closed

tbg force-pushed the fix/chunked-gc branch from 5f04be3 to f452de3 Compare March 26, 2018 23:44

tbg mentioned this pull request Apr 5, 2018

storage: should set BlockReads in GCRequest's ReplicatedEvalResult #24538

Closed

tbg added 2 commits April 5, 2018 19:00

rditer: add TODO about API shortcoming

077f7e1

Release note: None

tbg force-pushed the fix/chunked-gc branch from f452de3 to 87e85eb Compare April 5, 2018 23:02

craig bot merged commit 87e85eb into cockroachdb:master Apr 5, 2018

tbg deleted the fix/chunked-gc branch April 5, 2018 23:34

tbg restored the fix/chunked-gc branch April 16, 2018 15:24

tbg deleted the fix/chunked-gc branch May 8, 2018 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: eagerly carry out chunked GC #24209

storage: eagerly carry out chunked GC #24209

tbg commented Mar 26, 2018

cockroach-teamcity commented Mar 26, 2018

bdarnell commented Mar 26, 2018

bdarnell commented Mar 26, 2018

nvanbenschoten commented Mar 26, 2018

tbg commented Mar 26, 2018

nvanbenschoten commented Mar 27, 2018

tbg commented Apr 5, 2018

tbg commented Apr 5, 2018

craig bot commented Apr 5, 2018

storage: eagerly carry out chunked GC #24209

storage: eagerly carry out chunked GC #24209

Conversation

tbg commented Mar 26, 2018

cockroach-teamcity commented Mar 26, 2018

bdarnell commented Mar 26, 2018

bdarnell commented Mar 26, 2018

nvanbenschoten commented Mar 26, 2018

tbg commented Mar 26, 2018

nvanbenschoten commented Mar 27, 2018

tbg commented Apr 5, 2018

tbg commented Apr 5, 2018

craig bot commented Apr 5, 2018

Build succeeded