storage/gc: create gc pkg, reverse iteration in rditer, paginate versions during GC #43862

ajwerner · 2020-01-10T00:56:33Z

This PR comes in 6 commits. See the commit messages for more details. The meat of the change is in: storage: rework RunGC so it no longer buffers keys and values in memory.

Fix the spanset Iterator such that SeekLT can seek to the end of the range.
Expose reverse iteration in rditer.ReplicaDataIterator.
Rearrange some code in gc_queue.go to put the RunGC() helpers below that function.
Remove lockableGCInfo because there was no need for it.
Move RunGC into a gc subpackage.
Rework RunGC to paginate versions of keys with reverse iteration.

Fixes #42531.

cockroach-teamcity · 2020-01-10T00:56:41Z

This change is

ajwerner · 2020-01-10T00:57:33Z

The last commit needs more testing and probably a benchmark but I wanted to put it up early because I suspect it's going to be at least a little bit controversial. On the whole I feel that it makes the code easier to read but I'd like some confirmation of that feeling.

ajwerner · 2020-01-13T02:31:39Z

Hey @tbg I hear you're coming back soon. This PR is certainly not ready to merge but I'd appreciate if you could give the 5th commit a look to judge the basic approach. I want to make sure the structure is reasonable before I invest in the testing and benchmarking to get this over the finish line.

tbg

Glad to have you in this area of the code! The changes look very good. I'm not too worried that things will have slowed down due to the reverse iteration, but others will have more informed opinions on that.
The comments I left are mostly smaller suggestions to clarify the code. I have held off on reviewing some core pieces of logic pending your next round of changes.

Reviewed 4 of 4 files at r1, 7 of 7 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 4 of 4 files at r5, 1 of 1 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

pkg/storage/batch_spanset_test.go, line 193 at r1 (raw file):

		t.Fatalf("expected valid iterator, err=%v", err)
	}
	// SeekLT the upper bound is invalid.

Lower bound, right? Span set has [c,g), and insideKey==c.

pkg/storage/gc_iterator.go, line 40 at r5 (raw file):

// this case, next is the value of an intent.
func (it *gcIterator) state() (
	cur, next *engine.MVCCKeyValue,

This is dangerously close to warranting a struct that is being returned. You're often returning mostly zero values, so it would actually help readability quite a bit if you introduced

type gcIterState struct {
  cur, next *engine.MVCCKeyValue
  isNewest, isIntent, ok bool
  err error

It would also give you a more convenient place for comments.
Is ok really needed? A comment explaining what it does would help.

I'll assume this method does the right thing for now, will revisit in post your next round of changes.

pkg/storage/engine/gc.go, line 36 at r5 (raw file):

}

// IsGarbage makes a determination whether a key (cur) is garbage. Next, if

Write 'next' here to make this easier to parse.

pkg/storage/engine/gc.go, line 37 at r5 (raw file):

non-nil,

s/next/newer/

pkg/storage/engine/gc.go, line 40 at r5 (raw file):

// metadata KV if cur is an intent). If isNewest is false, next must be non-nil.
// isNewest implies that this is the highest timestamp committed version for
// this key. If isNewest is true and next is non-nil, it is an intent.

Explain that there being an intent doesn't matter in the impl below. Conservatively we have to assume that the intent will get aborted, so we will be able to GC just the values that we could remove if there weren't an intent. Maybe this comment doesn't have to live in the method comment, but it would be helpful somewhere.

pkg/storage/engine/gc_test.go, line 13 at r5 (raw file):

package engine

import (

Do these tests need to be replaced?

pkg/storage/spanset/spanset.go, line 185 at r1 (raw file):
Can you add a comment for spanKeyExclusive? Something like

If spanKeyExclusive is set, the spans in access will be made open, i.e. [a,b) will be considered (a,b).

pkg/storage/spanset/spanset.go, line 194 at r1 (raw file):

		for _, cur := range s.spans[ac][scope] {
			if cur.Contains(span) &&
				(!spanKeyExclusive || cur.EndKey != nil && !cur.Key.Equal(span.Key)) ||

Is this EndKey check in the right place? I think this should be

!spanKeyExclusive || !cur.Key.Equal(span.Key) || (cur.EndKey != nil && cur.Key.Equal(span.EndKey))

pkg/storage/spanset/spanset.go, line 195 at r1 (raw file):

			if cur.Contains(span) &&
				(!spanKeyExclusive || cur.EndKey != nil && !cur.Key.Equal(span.Key)) ||
				spanKeyExclusive && cur.EndKey.Equal(span.Key) {

Isn't spanKeyExclusive always true if this expr is evaluated?

Having read the tests I think what's happening here is that cur = [k1, k2) is treated like (k1, k2]. I find the bool is not named intuitively for that behavior. Could "reverse" (along with a comment) be a better name?

pkg/storage/spanset/spanset.go, line 225 at r1 (raw file):

		for _, cur := range s.spans[ac][scope] {
			if (cur.Contains(span) &&
				(!spanKeyExclusive || (cur.EndKey != nil && !cur.Key.Equal(span.Key)))) ||

Similarly confused here

pkg/storage/spanset/spanset_test.go, line 224 at r1 (raw file):

	defer leaktest.AfterTest(t)()

	var ss SpanSet

random idea: call this bdGkq to make it self-describing and to make these tests easier to read (this test is small anyway, but maybe this works in general).

pkg/storage/spanset/spanset_test.go, line 235 at r1 (raw file):

	}
	for _, span := range allowed {
		if err := ss.checkAllowed(SpanReadOnly, span, true /* spanKeyExclusive */); err != nil {

Ok this explains why I'm confused. spanKeyExclusive makes the end key inclusive? I see how that's the right thing for the SeekLT usage but the naming is confusing.

petermattis

Glad to have you in this area of the code! The changes look very good. I'm not too worried that things will have slowed down due to the reverse iteration, but others will have more informed opinions on that.

Reverse iteration is definitely slower than forward iteration (25-50% depending on what/where you're measuring). But this can be well worth it if we are scanning less data due to using reverse iteration. Is that the case here? I haven't grok'd what is going on in this PR.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

tbg

No, I think we're scanning the exact same data and it's basically "all of the data". Where previously we'd iterate through all of the kv pairs in a range in forward order, we're now doing the reverse. There is an opportunity to "optimize" by seeking to the previous user key the moment current one's versions are no longer GC'able. Seek internally steps the iter a few times before resorting to the actual seek (as you well know). In this PR as written, we will manually step the iterator through every kv pair (the old code did that too, but not in reverse). Whenever we have lots of keys with >>10 non-gcable versions the actual Seek should be the better choice (and should be equivalent otherwise, due to the internal stepping optimization).
My intuition is that the slowdown of reverse iteration will not be felt much because GC is not a foreground operation and because it hasn't been performance optimized, so there will be other bottlenecks. But it seems like we should check.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

ajwerner · 2020-01-13T15:18:01Z

Thanks for giving it a look. I’ll move forward with writing benchmarks and testing.

ajwerner

Alright, I wrote some benchmarks and some tests. I also took this chance to carve off a new package.

The test coverage is no worse than it used to be because I left all of the old tests in place. I also wrote a test that generates a random dataset and compares the output of the old implementation to the new one. Perhaps a few more unit tests are in order and perhaps more coverage of the unchanged code related to abort spans and the local key range.

After some minor memory optimizations the benchmarks show a modest win and a dramatic reduction in memory usage.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @tbg)

pkg/storage/batch_spanset_test.go, line 193 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Lower bound, right? Span set has [c,g), and insideKey==c.

Right. Fixed.

pkg/storage/gc_iterator.go, line 40 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This is dangerously close to warranting a struct that is being returned. You're often returning mostly zero values, so it would actually help readability quite a bit if you introduced
type gcIterState struct {
  cur, next *engine.MVCCKeyValue
  isNewest, isIntent, ok bool
  err error
It would also give you a more convenient place for comments.
Is ok really needed? A comment explaining what it does would help.

I'll assume this method does the right thing for now, will revisit in post your next round of changes.

I did indeed create a struct. I also simplified the logic.

pkg/storage/spanset/spanset.go, line 185 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Can you add a comment for spanKeyExclusive? Something like

If spanKeyExclusive is set, the spans in access will be made open, i.e. [a,b) will be considered (a,b).

spans in access will be open at the start and closed at the end, i.e. [a,b) will be considered (a,b]

pkg/storage/spanset/spanset.go, line 195 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Isn't spanKeyExclusive always true if this expr is evaluated?

Having read the tests I think what's happening here is that cur = [k1, k2) is treated like (k1, k2]. I find the bool is not named intuitively for that behavior. Could "reverse" (along with a comment) be a better name?

I've renamed the variable and added commentary.

pkg/storage/spanset/spanset_test.go, line 224 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

random idea: call this bdGkq to make it self-describing and to make these tests easier to read (this test is small anyway, but maybe this works in general).

Done.

pkg/storage/spanset/spanset_test.go, line 235 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Ok this explains why I'm confused. spanKeyExclusive makes the end key inclusive? I see how that's the right thing for the SeekLT usage but the naming is confusing.

Hopefully it's less confusing now.

pkg/storage/engine/gc.go, line 36 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Write 'next' here to make this easier to parse.

Done.

pkg/storage/engine/gc.go, line 37 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

non-nil,

s/next/newer/

Done.

pkg/storage/engine/gc.go, line 40 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Explain that there being an intent doesn't matter in the impl below. Conservatively we have to assume that the intent will get aborted, so we will be able to GC just the values that we could remove if there weren't an intent. Maybe this comment doesn't have to live in the method comment, but it would be helpful somewhere.

Done.

pkg/storage/engine/gc_test.go, line 13 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Do these tests need to be replaced?

definitely. I retained them in the testing file which now contains all of this old code and exists to support the old implementation.

ajwerner

@tbg friendly ping

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @tbg)

tbg

Looks great! and thanks for all of the cleanup work. I might've missed some of the changes since my last review (Reviewable and/or I got a little confused with all the revisions), so if there's something in particular that I should give a closer look please let me know.

Reviewed 13 of 13 files at r7, 7 of 7 files at r8, 1 of 1 files at r9, 1 of 1 files at r10, 7 of 7 files at r11, 12 of 12 files at r12.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)

pkg/storage/gc/gc_iterator_test.go, line 27 at r12 (raw file):

)

func TestGCIterator(t *testing.T) {

Bit of commentary as this test proceeds would be nice.

pkg/storage/gc/gc_old_test.go, line 32 at r12 (raw file):

)

// RunGCOld is an older implementation of Run. It is used for benchmarking and

Not that it matters much, but does it need to be exported?

pkg/storage/gc/gc.go, line 477 at r12 (raw file):

// multiple of the heartbeat timeout used by the coordinator.
//
// TODO(tschottdorf): this could be done in Replica.GC itself, but it's

This TODO can be removed (not worth a CI run though)

tbg

(and sorry for letting this sit - I've changed my GH workflow a bit and am still working out the kinks).

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)

SeekLT is exclusive on the key being seeked. It should be allowed to SeekLT to the end of a span. This commit makes that possible. Release note: None

Prior to this commit, the ReplicaDataIterator only permitted forward iteration. This PR exposes the ability to iterate in reverse by adding a `Prev()` method as well as a constructor option to seek the iterator to the end of the data range. Release note: None

This commit is just code movement. It moves the helper functions of `RunGC()`, `processLocalKeyRange()` and `processAbortSpan()` from above to below. Release note: None

There was no concurrency or need for synchronization. Release note: None

This commit moves the logic of RunGC as well as the engine.GC struct into a separate subpackage. As of this commit that subpackage contains no testing other than what existed for the engine.GC code. The RunGC function is a reasonably well specified interface to separate the logic of scanning a range and collecting the garbage from the gcQueue. I was getting overwhelmed by testing boundaries and unit testing in general. Release note: None

This commit reworks the processing of replicated state underneath the gcQueue for the purpose of determining and sending GC requests. The primary intention of this commit is to remove the need to buffer all of the versions of a key in memory. As we learned in cockroachdb#42531, this bufferring can be extremely unfortunate when using sequence data types which are written to frequently. Prior to this commit, the code forward iterates through the range's data and eagerly reads all versions of the a key into memory. It then binary searches those versions to find the latest timestamp for the key which can be GC'd. It then reverse iterates through all of those versions to determine the latest version of the key which would put the current batch over its limit. This last step works to paginate the process of actually deleting the data for many versions of the same key. I suppose this pagination was added to ensure that write batches due to GC requests don't get too large. Unfortunately this logic was unable to paginate the loading of versions from the storage engine. In this new commit, the entire process of computing data to GC now uses reverse iteration; for each key we examine versions from oldest to newest. The commit adds a `gcIterator` which wraps this reverse iteration with some useful lookahead. During this GC process, at most two additional versions need to examined to determine whether a given version is garbage. While this approach relies on reverse iteration which is known to be less efficient than forward iteration, it offers the opportunity to avoid allocating memory for versions of a key which are not going to end up as a part of a GC request. This reduction in memory usage shows up in benchmarks (see below). The change retains the old implementation as a testing strategy and as a basis for the benchmarks. ``` name old time/op new time/op delta Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8 924ns ± 8% 590ns ± 1% -36.13% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8 976ns ± 2% 578ns ± 1% -40.75% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8 944ns ± 0% 570ns ± 9% -39.63% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8 903ns ± 0% 612ns ± 3% -32.29% (p=0.016 n=4+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8 994ns ± 9% 592ns ± 9% -40.47% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8 669ns ± 4% 526ns ± 1% -21.34% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8 624ns ± 0% 529ns ± 2% -15.16% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8 636ns ± 4% 534ns ± 2% -16.04% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8 612ns ± 1% 532ns ± 3% -13.08% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8 638ns ± 2% 534ns ±10% -16.35% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8 603ns ± 6% 527ns ± 8% -12.51% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8 613ns ± 5% 517ns ± 6% -15.78% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8 619ns ± 6% 534ns ± 4% -13.61% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8 607ns ± 7% 520ns ± 2% -14.39% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8 599ns ± 4% 501ns ± 7% -16.36% (p=0.008 n=5+5) name old speed new speed delta Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8 23.9MB/s ± 8% 37.3MB/s ± 1% +56.23% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8 22.6MB/s ± 2% 38.1MB/s ± 1% +68.81% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8 23.3MB/s ± 0% 38.7MB/s ± 9% +66.06% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8 24.4MB/s ± 0% 36.0MB/s ± 3% +47.73% (p=0.016 n=4+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8 22.2MB/s ± 8% 37.3MB/s ± 9% +68.09% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8 34.4MB/s ± 4% 43.7MB/s ± 1% +27.08% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8 36.9MB/s ± 0% 43.4MB/s ± 2% +17.84% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8 36.2MB/s ± 4% 43.1MB/s ± 2% +19.02% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8 37.6MB/s ± 1% 43.3MB/s ± 3% +15.02% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8 36.0MB/s ± 2% 43.2MB/s ±10% +19.87% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8 36.5MB/s ± 5% 41.8MB/s ± 9% +14.39% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8 35.9MB/s ± 5% 42.7MB/s ± 6% +18.83% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8 35.6MB/s ± 6% 41.2MB/s ± 4% +15.66% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8 36.3MB/s ± 6% 42.3MB/s ± 2% +16.69% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8 36.7MB/s ± 4% 44.0MB/s ± 7% +19.69% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8 325B ± 0% 76B ± 0% -76.62% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8 358B ± 0% 49B ± 0% ~ (p=0.079 n=4+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8 340B ± 0% 29B ± 0% -91.47% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8 328B ± 0% 18B ± 0% -94.51% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8 325B ± 0% 14B ± 0% -95.69% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8 226B ± 0% 2B ± 0% ~ (p=0.079 n=4+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8 228B ± 0% 3B ± 0% -98.69% (p=0.000 n=5+4) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8 228B ± 0% 2B ± 0% -99.12% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8 228B ± 0% 2B ± 0% -99.12% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8 226B ± 0% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8 388B ± 2% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8 391B ± 2% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8 389B ± 1% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8 391B ± 2% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8 390B ± 1% 0B -100.00% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8 0.00 0.00 ~ (all equal) ``` Release note (bug fix): The GC process was improved to paginate the key versions of a key to fix OOM crashes which can occur when there are extremely large numbers of versions for a given key.

ajwerner

TFTR!

bors r=tbg

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)

pkg/storage/gc/gc.go, line 477 at r12 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This TODO can be removed (not worth a CI run though)

Done.

pkg/storage/gc/gc_iterator_test.go, line 27 at r12 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Bit of commentary as this test proceeds would be nice.

Done.

pkg/storage/gc/gc_old_test.go, line 32 at r12 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Not that it matters much, but does it need to be exported?

When I first wrote it I made the benchmarks a package gc_test file, I ended up reverting that because symbol sharing with other testing infrastructure got annoying. Unexported.

43862: storage/gc: create gc pkg, reverse iteration in rditer, paginate versions during GC r=tbg a=ajwerner This PR comes in 6 commits. See the commit messages for more details. The meat of the change is in: `storage: rework RunGC so it no longer buffers keys and values in memory`. 1) Fix the spanset Iterator such that SeekLT can seek to the end of the range. 2) Expose reverse iteration in rditer.ReplicaDataIterator. 3) Rearrange some code in gc_queue.go to put the RunGC() helpers below that function. 4) Remove lockableGCInfo because there was no need for it. 5) Move RunGC into a `gc` subpackage. 5) Rework RunGC to paginate versions of keys with reverse iteration. Fixes #42531. 44054: storage/concurrency/lock: define lock Strength and Durability modes r=nvanbenschoten a=nvanbenschoten This commit introduces a new `concurrency/lock` package that provides type definitions for locking-related concepts used by concurrency control in the key-value layer. The package initially provides a lock `Strength` enumeration consisting of `Shared`, `Upgrade`, and `Exclusive` and a lock `Durability` enumeration consisting of `Unreplicated` and `Replicated`. These definitions are currently unused but will be referenced soon by #43740 and #43775. The changes around SELECT FOR UPDATE in the optimizer (#44015) are also approaching the point where they'll soon need the type definitions to interface with the KV API. Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

craig · 2020-01-21T17:31:20Z

Build succeeded

GitHub CI (Cockroach)

ajwerner requested a review from a team as a code owner January 10, 2020 00:56

ajwerner force-pushed the ajwerner/paginate-gc-of-a-row branch 4 times, most recently from a716961 to 8a78bbe Compare January 13, 2020 02:27

ajwerner force-pushed the ajwerner/paginate-gc-of-a-row branch 2 times, most recently from b2b137a to 736a8a4 Compare January 13, 2020 05:11

tbg reviewed Jan 13, 2020

View reviewed changes

petermattis reviewed Jan 13, 2020

View reviewed changes

tbg reviewed Jan 13, 2020

View reviewed changes

ajwerner force-pushed the ajwerner/paginate-gc-of-a-row branch from 736a8a4 to 74c414a Compare January 16, 2020 06:59

ajwerner commented Jan 16, 2020

View reviewed changes

ajwerner force-pushed the ajwerner/paginate-gc-of-a-row branch 2 times, most recently from bff3170 to 658b2f7 Compare January 16, 2020 07:12

ajwerner changed the title ~~[WIP] storage: enable reverse iteration in rditer, paginate versions during GC~~ storage/gc: create gc pkg, reverse iteration in rditer, paginate versions during GC Jan 16, 2020

ajwerner force-pushed the ajwerner/paginate-gc-of-a-row branch 3 times, most recently from b0059e0 to 55b4936 Compare January 16, 2020 16:14

ajwerner commented Jan 21, 2020

View reviewed changes

tbg approved these changes Jan 21, 2020

View reviewed changes

tbg reviewed Jan 21, 2020

View reviewed changes

ajwerner added 5 commits January 21, 2020 10:43

storage/spanset: allow SeekLT to seek to the end of a span

5176bac

SeekLT is exclusive on the key being seeked. It should be allowed to SeekLT to the end of a span. This commit makes that possible. Release note: None

storage: rearrange code in gc_queue.go

2ecb3dd

This commit is just code movement. It moves the helper functions of `RunGC()`, `processLocalKeyRange()` and `processAbortSpan()` from above to below. Release note: None

storage: remove lockableGCInfo

c3dd677

There was no concurrency or need for synchronization. Release note: None

ajwerner force-pushed the ajwerner/paginate-gc-of-a-row branch from 55b4936 to 5512030 Compare January 21, 2020 15:51

ajwerner commented Jan 21, 2020

View reviewed changes

craig bot merged commit 5512030 into cockroachdb:master Jan 21, 2020

ajwerner mentioned this pull request Jan 23, 2020

[DNM] release-19.2: storage/gc: create gc pkg, reverse iteration in rditer, paginate versions during GC #44257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage/gc: create gc pkg, reverse iteration in rditer, paginate versions during GC #43862

storage/gc: create gc pkg, reverse iteration in rditer, paginate versions during GC #43862

ajwerner commented Jan 10, 2020 •

edited

cockroach-teamcity commented Jan 10, 2020

ajwerner commented Jan 10, 2020

ajwerner commented Jan 13, 2020

tbg left a comment

petermattis left a comment

tbg left a comment

ajwerner commented Jan 13, 2020

ajwerner left a comment

ajwerner left a comment

tbg left a comment

tbg left a comment

ajwerner left a comment

craig bot commented Jan 21, 2020

storage/gc: create gc pkg, reverse iteration in rditer, paginate versions during GC #43862

storage/gc: create gc pkg, reverse iteration in rditer, paginate versions during GC #43862

Conversation

ajwerner commented Jan 10, 2020 • edited

cockroach-teamcity commented Jan 10, 2020

ajwerner commented Jan 10, 2020

ajwerner commented Jan 13, 2020

tbg left a comment

Choose a reason for hiding this comment

petermattis left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

ajwerner commented Jan 13, 2020

ajwerner left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

craig bot commented Jan 21, 2020

Build succeeded

ajwerner commented Jan 10, 2020 •

edited