release-20.2: sql: shrink SampleReservoir capacity on memory exhaustion #67059
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport 3/3 commits from #65491.
/cc @cockroachdb/release
Backport note: execinfrapb changes were removed from the second commit to allow backporting without incrementing execinfrapb.Version. This means we are always using a hardcoded 200 as MinSampleSize instead of picking MinSampleSize based on MaxHistogramBuckets. For most stats this behavior is identical (MaxHistogramBuckets is usually 200). For stats on non-key columns MaxHistogramBuckets is 2, so when collecting those stats this will be slightly more strict about sample size than necessary. But it seems unlikely that we will hit the memory limit of 64 MiB at < 200 samples anyway.
sql, stats: shrink sampleAggregator capacity to match child capacity
To avoid bias, sampleAggregator must always use a capacity <= the
capacity of each child samplerProcessor feeding it. This was always
implicitly true before, as every samplerProcessor and sampleAggregator
used the same fixed capacity. But the next few commits will give
sampleProcessors (and sampleAggregators) the ability to dynamically
shrink capacity when out of memory, meaning we now sometimes need to
resize sampleAggregator to keep this invariant true.
(Resizing sampleAggregator really means resizing the underlying
SampleReservoir, so most of the changes are there.)
This commit does not yet change any behavior, because the capacities of
samplerProcessors are still static. Next few commits will change that.
Also factor the giant closure out of TestSampleAggregator into an
external function, to make table-driven testing easier.
Release note: None
sql, stats: shrink SampleReservoir capacity on memory exhaustion
Fixes: #62206
Instead of returning an error when out of memory, make
SampleReservoir.SampleRow dynamically reduce capacity and retry. Fewer
samples will result in a less accurate histogram, but less accurate is
probably still better than no histogram at all, up to a point.
If the number of samples falls below a minimum threshold, then give up
and disable histogram collection, as we were doing originally, rather
than using a wildly inaccurate histogram.
Also fix some memory accounting in SampleReservoir.copyRow.
Release note (performance improvement): continue to generate histograms
when table statistics collection reaches memory limits, instead of
disabling histogram generation.
sql, stats: shrink sampleAggregator capacity before histogram generation
SampleAggregator can also run out of memory when gathering the datums
for histogram generation. Push this datum gathering down into
SampleReservoir so that we can use the same dynamic capacity-shrinking
strategy as used in SampleRow to reclaim some memory and try again.
Release note: None