-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: shrink SampleReservoir capacity on memory exhaustion #65491
Conversation
I'm still working on testing, but this is probably RFAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it to @rytaft to review it much closer, just a couple of quick comments from me.
Reviewed 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)
pkg/sql/rowexec/sampler.go, line 115 at r4 (raw file):
sampleSize, minSampleSize := int(spec.SampleSize), int(spec.MinSampleSize) if minSampleSize < 1 || minSampleSize > sampleSize {
super nit: this logic is duplicated in two places. Might be worth extracting into a helper.
pkg/sql/stats/row_sampling.go, line 134 at r2 (raw file):
// for the additional memory used after copying inside copyRow. if sr.memAcc != nil { if err := sr.memAcc.Grow(ctx, int64(len(rowCopy))*int64(rowCopy[0].Size())); err != nil {
This line seems suspicious. If we have a row of INT, BYTES, DECIMAL
, we will account for 3 x INT
. Then in copyRow
our beforeSize
will be rowCopy[i].Size
i.e. each of the datums separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)
pkg/sql/stats/row_sampling.go, line 134 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
This line seems suspicious. If we have a row of
INT, BYTES, DECIMAL
, we will account for3 x INT
. Then incopyRow
ourbeforeSize
will berowCopy[i].Size
i.e. each of the datums separately.
I believe at this point rowCopy[0].Size()
will only be EncDatumOverhead
because we have not yet set the values to anything, so this should be equivalent to the sum of Size()
of each element of rowCopy
. But this line confused me, too. I will change it to use rowCopy.Size()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)
pkg/sql/stats/row_sampling.go, line 134 at r2 (raw file):
Previously, michae2 (Michael Erickson) wrote…
I believe at this point
rowCopy[0].Size()
will only beEncDatumOverhead
because we have not yet set the values to anything, so this should be equivalent to the sum ofSize()
of each element ofrowCopy
. But this line confused me, too. I will change it to userowCopy.Size()
.
Oh yeah, indeed, we haven't performed a copy yet, nvm then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Needs some tests, but definitely seems like it's on the right track.
Reviewed 6 of 6 files at r1, 1 of 1 files at r2, 1 of 1 files at r3, 7 of 7 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2)
pkg/sql/rowexec/sample_aggregator.go, line 287 at r1 (raw file):
// Inverted sample row. // Retain the rows with the top ranks. if capacity, err := row[s.numRowsCol].GetInt(); err == nil {
This could use an update to the comment explaining what's happening.
pkg/sql/rowexec/sample_aggregator.go, line 309 at r1 (raw file):
// Sample row. // Retain the rows with the top ranks. if capacity, err := row[s.numRowsCol].GetInt(); err == nil {
ditto
pkg/sql/rowexec/sampler_test.go, line 102 at r1 (raw file):
} for i := 2; i < len(outTypes); i++ { if i != 3 /* num rows col */ && !row[i].IsNull() {
nit: I'd just make a const var called numRowsCol
. While you're at it, it would be awesome if you could improve this test by making const vars for the other columns too.
pkg/sql/stats/row_sampling.go, line 82 at r1 (raw file):
} func (sr *SampleReservoir) Cap() int {
nit: needs a comment
pkg/sql/stats/row_sampling.go, line 117 at r1 (raw file):
samp := heap.Pop(sr).(SampledRow) if sr.memAcc != nil { sr.memAcc.Shrink(ctx, int64(samp.Row.Size()))
I'm not sure we can actually call shrink here without allocating a completely new array. Even though samples
no longer references these rows, I don't think they can be garbage collected until all references to the underlying array are gone, at which point the entire array will be GC'd.
pkg/sql/stats/row_sampling.go, line 124 at r3 (raw file):
} func (sr *SampleReservoir) trySampleRow(
Would be good to add a function comment here
pkg/sql/stats/row_sampling.go, line 151 at r3 (raw file):
if len(sr.samples) > 0 && rank < sr.samples[0].Rank { if err := sr.copyRow(ctx, evalCtx, sr.samples[0].Row, row); err != nil { // XXX At this point sr.samples[0].Row could have a mix of old and new
What's the XXX for?
pkg/sql/stats/row_sampling.go, line 176 at r3 (raw file):
} // We've used too much memory. Remove the highest-rank sample and try again. sr.Resize(ctx, len(sr.samples)-1)
I think we probably want to be more aggressive here -- otherwise we'll keep hitting the memory limit. Instead of just removing one sample at a time, maybe we should cut the number of samples in half (or reduce by some other fraction of the total). (This will also amortize the cost of creating a new array on each Resize, which as I mention above, is something I think you need to do to make this work correctly).
pkg/sql/stats/row_sampling.go, line 80 at r4 (raw file):
} if minNumSamples < 1 { return errors.AssertionFailedf("minNumSamples (%d) < 1", numSamples)
numSamples -> minNumSamples
pkg/sql/stats/row_sampling.go, line 191 at r4 (raw file):
err := sr.trySampleRow(ctx, evalCtx, row, rank) if err == nil || !sqlerrors.IsOutOfMemoryError(err) || len(sr.samples) <= 1 || cap(sr.samples) <= sr.minNumSamples {
Do you still need to check if len(sr.samples) <= 1
? Isn't that handled by minNumSamples?
I think you also need to run |
f9708f6
to
ee0deda
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, still no test but I've:
- addressed the comments (thank you)
- added some logging
- refactored the retry in
SampleReservoir
- fixed a spot I missed in
sampleAggregator.generateHistogram
Test is coming, I swear, but pushing this to see what TeamCity thinks.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft and @yuzefovich)
pkg/sql/rowexec/sample_aggregator.go, line 287 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
This could use an update to the comment explaining what's happening.
OK, how's this?
pkg/sql/rowexec/sample_aggregator.go, line 309 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
ditto
OK.
pkg/sql/rowexec/sampler.go, line 115 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
super nit: this logic is duplicated in two places. Might be worth extracting into a helper.
OK, I moved it into SampleReservoir.Init
.
pkg/sql/rowexec/sampler_test.go, line 102 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: I'd just make a const var called
numRowsCol
. While you're at it, it would be awesome if you could improve this test by making const vars for the other columns too.
OK, this should be a little better now.
pkg/sql/stats/row_sampling.go, line 117 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I'm not sure we can actually call shrink here without allocating a completely new array. Even though
samples
no longer references these rows, I don't think they can be garbage collected until all references to the underlying array are gone, at which point the entire array will be GC'd.
Oh crap, good catch! Done.
pkg/sql/stats/row_sampling.go, line 134 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Oh yeah, indeed, we haven't performed a copy yet, nvm then.
OK, this is now using rowCopy.Size()
.
pkg/sql/stats/row_sampling.go, line 124 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Would be good to add a function comment here
Done.
pkg/sql/stats/row_sampling.go, line 151 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
What's the XXX for?
This was a type of comment we used at a previous company. Changed it to WARNING
.
pkg/sql/stats/row_sampling.go, line 176 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think we probably want to be more aggressive here -- otherwise we'll keep hitting the memory limit. Instead of just removing one sample at a time, maybe we should cut the number of samples in half (or reduce by some other fraction of the total). (This will also amortize the cost of creating a new array on each Resize, which as I mention above, is something I think you need to do to make this work correctly).
Yes, you're right. Done.
pkg/sql/stats/row_sampling.go, line 80 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
numSamples -> minNumSamples
Ah! Good catch! But I ended up changing it anyway. 😄
pkg/sql/stats/row_sampling.go, line 191 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Do you still need to check if
len(sr.samples) <= 1
? Isn't that handled by minNumSamples?
It is in almost every case, but in the (extremely unlikely) case where we hit a memory limit with only 0 or 1 row sampled the len
check would matter. (The memory limit would have to be very very low for this to happen.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest force push:
- New commit adding memory limit testing to
row_sampling_test.go
. - This uncovered bugs in
SampleReservoir.Pop
andSampleReservoir.MaybeResize
which were fixed. - Also fixed some small compilation errors and a few comments.
I believe there will be one more commit and force push to add some testing around samplerProcessor
and sampleAggregator
, so you don't have to look yet (though you're welcome to).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft and @yuzefovich)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 8 files at r5, 2 of 2 files at r6, 5 of 8 files at r9, 1 of 2 files at r10, 3 of 3 files at r11.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)
pkg/sql/rowexec/sample_aggregator.go, line 287 at r1 (raw file):
Previously, michae2 (Michael Erickson) wrote…
OK, how's this?
Looks good
pkg/sql/rowexec/sample_aggregator.go, line 379 at r6 (raw file):
if sr.MaybeResize(ctx, int(capacity)) { log.Infof( ctx, "histogram samples reduced from %d to %d to match sample processor",
nit: sample -> sampler
pkg/sql/rowexec/sample_aggregator.go, line 401 at r11 (raw file):
} else if sr.Cap() != prevCapacity { log.Infof( ctx, "histogram samples reduced from %d to %d due to excessive memory utilization",
are we logging this same message multiple times? or do these all cover different cases?
pkg/sql/stats/row_sampling.go, line 60 at r11 (raw file):
// minNumSamples is the minimum K needed for sampling to be meaningful. If the // reservoir capacity would fall below this, SampleRow will err instead of // decreasing K further.
nit: Here and below throughout this file you reference "K", but there's no variable named K, and I think it just adds confusion. I'd just say "capacity" (if that's the intended meaning) instead of "K".
pkg/sql/stats/row_sampling_test.go, line 45 at r11 (raw file):
} else if sr.Cap() != prevCapacity { t.Logf( "samples reduced from %d to %d during SampleRow",
Instead of just logging, maybe assert that the number of samples was reduced by an expected amount?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest push:
- rebases on master tip
- adds some testing for
samplerProcessor
andsampleAggregator
- fixes a bug in
SampleReservoir.retryMaybeResize
that this testing caught - no longer silently drops errors returned by
SampleReservoir.Init
- tweaks some comments and commit messages slightly
I think this is RFAL.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft and @yuzefovich)
pkg/sql/rowexec/sample_aggregator.go, line 401 at r11 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
are we logging this same message multiple times? or do these all cover different cases?
These cover two different cases:
- A child
samplerProcessor
ran out of memory, and thissampleAggregator
has to shrink to match the capacity of that child. - This
sampleAggregator
itself ran out of memory.
pkg/sql/stats/row_sampling.go, line 60 at r11 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: Here and below throughout this file you reference "K", but there's no variable named K, and I think it just adds confusion. I'd just say "capacity" (if that's the intended meaning) instead of "K".
"K" comes from the big comment above the definition of SampleReservoir
. I've changed some of the comments so that they say "reservoir capacity (K)" instead of just "K", is that better?
pkg/sql/stats/row_sampling_test.go, line 45 at r11 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Instead of just logging, maybe assert that the number of samples was reduced by an expected amount?
It's a good point, but I'm worried that specifying an exact amount would cause the test to flake if memory usage changed slightly and the SampleReservoir
shrunk more or less than it did before.
b82bd9f
to
3ce7580
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh, sorry for one more push. As it turns out, SampleReservoir
needs to support a capacity of 0. (samplerProcessor
and sampleAggregator
use this when histograms are disabled). So just a slight change to SampleReservoir.Init
and SampleReservoir.retryMaybeResize
is needed. This should make the tests happy.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft and @yuzefovich)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting really close!
It seems like there are a lot of commits -- can you squash any of them? Also make sure that each commit builds/passes tests individually.
Correct me if I'm wrong, but I don't see any tests that specifically verify the goal of this PR to degrade gracefully with tightening memory conditions. So instead of just verifying that we do or don't run out of memory in a certain test case, it would be good to check that in cases where we would have run out of memory previously, now we succeed, but the number of samples is reduced to an expected number (both in the sampler and the sampleAggregator). You can use make stress
to make sure that you're not introducing flakes.
Reviewed 9 of 9 files at r12, 1 of 1 files at r13, 2 of 4 files at r14, 5 of 8 files at r15, 1 of 2 files at r16, 3 of 3 files at r17, 1 of 1 files at r18, 1 of 1 files at r20, 4 of 8 files at r24, 1 of 4 files at r25, 1 of 2 files at r26, 3 of 3 files at r27, 1 of 1 files at r28, 1 of 1 files at r30.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)
pkg/sql/rowexec/sample_aggregator_test.go, line 305 at r20 (raw file):
) inputRows = [][]int{
would help to add a comment to explain what the following cases are testing
pkg/sql/rowexec/sample_aggregator_test.go, line 348 at r20 (raw file):
runSampleAggregator( t, server, sqlDB, kvDB, st, &evalCtx, 1<<15 /* memLimitBytes */, false, /* expectOutOfMemory */ 2 /* numSamples */, 4 /* aggNumSamples */, inputRows, expected,
are these supposed to be different?
pkg/sql/rowexec/sample_aggregator_test.go, line 152 at r30 (raw file):
spec := &execinfrapb.SampleAggregatorSpec{ SampleSize: aggNumSamples, MinSampleSize: 2,
you could pass this as a param too
pkg/sql/rowexec/sampler_test.go, line 208 at r20 (raw file):
runSampler(t, numRows, numSamples, numSamples/2, memLimitBytes, true /* expectOutOfMemory */) runSampler(t, numRows, numSamples, numSamples/3, memLimitBytes, true /* expectOutOfMemory */) runSampler(t, numRows, numSamples, numSamples/4, memLimitBytes, false /* expectOutOfMemory */)
what are these tests showing?
pkg/sql/stats/row_sampling.go, line 60 at r11 (raw file):
Previously, michae2 (Michael Erickson) wrote…
"K" comes from the big comment above the definition of
SampleReservoir
. I've changed some of the comments so that they say "reservoir capacity (K)" instead of just "K", is that better?
ah got it - thanks!
pkg/sql/stats/row_sampling.go, line 131 at r30 (raw file):
// introducing bias. Increasing capacity would cause later rows to be // over-represented relative to earlier rows.) func (sr *SampleReservoir) MaybeResize(ctx context.Context, k int) bool {
mention in the comment what the return param means (or just remove the return param -- is it even used?)
pkg/sql/stats/row_sampling.go, line 152 at r30 (raw file):
// retryMaybeResize tries to execute a memory-allocating operation, shrinking // the capacity of the reservoir (K) as necessary until the operation succeeds // or fails with a non-out-of-memory error.
nit: this implies it would never fail with an out of memory error, but that doesn't seem to be the case
pkg/sql/stats/row_sampling_test.go, line 45 at r11 (raw file):
Previously, michae2 (Michael Erickson) wrote…
It's a good point, but I'm worried that specifying an exact amount would cause the test to flake if memory usage changed slightly and the
SampleReservoir
shrunk more or less than it did before.
Maybe you could assert that it's within a range? I think it would be better to deal with flakes and tweak the test as needed than to have it silently fail. You can check for possible flakes by running make stress
on the test. If it runs a large number of times without failing it should be good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2, @rytaft, and @yuzefovich)
pkg/sql/rowexec/sample_aggregator_test.go, line 348 at r20 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
are these supposed to be different?
Here we're varying the memory limit, and below we're varying numSamples and aggNumSamples. But it's not really clear. Let me clean this up and add some comments.
pkg/sql/rowexec/sampler_test.go, line 208 at r20 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
what are these tests showing?
This is the closest to the testing you describe (though not exactly the same, and not as clear): it holds the memory limit fixed while decreasing minNumSamples
, the required number of samples, until finding the point at which we can stay within the memory limit. minNumSamples = numSamples
is equivalent to the old behavior (fail if we do not have enough memory for the requested capacity). From there we decrease the require number of samples until we're inside the memory limit.
But I think what you described (holding number of samples fixed while varying the memory limit) is a better test, so I will add that.
pkg/sql/stats/row_sampling_test.go, line 45 at r11 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Maybe you could assert that it's within a range? I think it would be better to deal with flakes and tweak the test as needed than to have it silently fail. You can check for possible flakes by running
make stress
on the test. If it runs a large number of times without failing it should be good to go.
Good points, you are right. I will do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2, @rytaft, and @yuzefovich)
pkg/sql/rowexec/sampler_test.go, line 208 at r20 (raw file):
Previously, michae2 (Michael Erickson) wrote…
This is the closest to the testing you describe (though not exactly the same, and not as clear): it holds the memory limit fixed while decreasing
minNumSamples
, the required number of samples, until finding the point at which we can stay within the memory limit.minNumSamples = numSamples
is equivalent to the old behavior (fail if we do not have enough memory for the requested capacity). From there we decrease the require number of samples until we're inside the memory limit.But I think what you described (holding number of samples fixed while varying the memory limit) is a better test, so I will add that.
This seems like a good test too, maybe just add a comment to explain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, latest push squashes this all into three commits, and enhances the testing a bit, hopefully to good effect. 🙂
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft and @yuzefovich)
pkg/sql/rowexec/sample_aggregator_test.go, line 305 at r20 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
would help to add a comment to explain what the following cases are testing
Done (down in the table of test cases).
pkg/sql/rowexec/sample_aggregator_test.go, line 152 at r30 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
you could pass this as a param too
Done.
pkg/sql/rowexec/sampler_test.go, line 208 at r20 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
This seems like a good test too, maybe just add a comment to explain
Done.
pkg/sql/stats/row_sampling.go, line 131 at r30 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
mention in the comment what the return param means (or just remove the return param -- is it even used?)
Done. (It is only used to simplify some logging in sample_aggregator.go
.)
pkg/sql/stats/row_sampling.go, line 152 at r30 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: this implies it would never fail with an out of memory error, but that doesn't seem to be the case
You're right, I changed the comment a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft and @yuzefovich)
pkg/sql/rowexec/sample_aggregator_test.go, line 348 at r20 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Here we're varying the memory limit, and below we're varying numSamples and aggNumSamples. But it's not really clear. Let me clean this up and add some comments.
(Done.)
pkg/sql/stats/row_sampling_test.go, line 45 at r11 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Good points, you are right. I will do that.
I ended up checking it exactly, which does seem fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I wouldn't call this a bug fix (in your release note). "performance improvement" seems more accurate.
Otherwise, Nice work getting this over the finish line!
Let's let this bake on master for a few weeks before backporting to 21.1 and maybe 20.2.
Reviewed 11 of 11 files at r31, 10 of 10 files at r32, 4 of 4 files at r33.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)
To avoid bias, sampleAggregator must always use a capacity <= the capacity of each child samplerProcessor feeding it. This was always implicitly true before, as every samplerProcessor and sampleAggregator used the same fixed capacity. But the next few commits will give sampleProcessors (and sampleAggregators) the ability to dynamically shrink capacity when out of memory, meaning we now sometimes need to resize sampleAggregator to keep this invariant true. (Resizing sampleAggregator really means resizing the underlying SampleReservoir, so most of the changes are there.) This commit does not yet change any behavior, because the capacities of samplerProcessors are still static. Next few commits will change that. Also factor the giant closure out of TestSampleAggregator into an external function, to make table-driven testing easier. Release note: None
Fixes: cockroachdb#62206 Instead of returning an error when out of memory, make SampleReservoir.SampleRow dynamically reduce capacity and retry. Fewer samples will result in a less accurate histogram, but less accurate is probably still better than no histogram at all, up to a point. If the number of samples falls below a minimum threshold, then give up and disable histogram collection, as we were doing originally, rather than using a wildly inaccurate histogram. Also fix some memory accounting in SampleReservoir.copyRow. Release note (performance improvement): continue to generate histograms when table statistics collection reaches memory limits, instead of disabling histogram generation.
SampleAggregator can also run out of memory when gathering the datums for histogram generation. Push this datum gathering down into SampleReservoir so that we can use the same dynamic capacity-shrinking strategy as used in SampleRow to reclaim some memory and try again. Release note: None
TFTR! bors r+ |
Build succeeded: |
sql: dynamically shrink sampleAggregator capacity as needed
To avoid bias, sampleAggregator must always use a capacity <= the
capacity of each samplerProcessor feeding it. This was always implicitly
true before, as every samplerProcessor and sampleAggregator used the
same fixed capacity. But the next few commits will give sampleProcessors
(and sampleAggregators) the ability to dynamically decrease capacity
when out of memory, meaning we now sometimes need to resize
sampleAggregator to keep this invariant true.
This commit does not yet change any behavior because the capacities of
samplerProcessors are still static. Next few commits will change that.
Release note: None
sql: fix memory accounting in SampleReservoir.copyRow
When overwriting the highest-rank sample with a new one, we might have
smaller datums than before, so we should account for memory use
shrinking as well as growing. (Of course, those datums are in a
datumAlloc, so we don't know exactly when they will be GC'd... but this
is the best accounting we can do without hooking into the garbage
collector.)
Release note: None
sql: shrink SampleReservoir capacity on memory exhaustion
Fixes: #62206
Instead of returning an error when out of memory, make
SampleReservoir.SampleRow dynamically reduce capacity and retry. This
will result in a lower-resolution histogram, but lower-resolution is
better than no histogram at all!
Release note (bug fix): fix histogram generation when table statistics
collection reaches memory limits.
sql: add minimum SampleReservoir capacity
Below a certain number of samples, the histograms would be so imprecise
that they are not worth generating. Add the ability to set this
threshold in SampleReservoir. This commit sets this threshold to the
max number of histogram buckets, which was picked because it seemed
to make sense and not for any mathematical or empirical reason.
Release note: None