-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats: table has no histogram statistics created #62206
Comments
Hi @mikeczabator, please add a C-ategory label to your issue. Check out the label system docs. While you're here, please consider adding an A- label to help keep our repository tidy. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
I'm closing this issue since we're working on it internally. We can reopen later if it makes sense to create a public issue. |
Reopening now that we know the issue was due to "disabling histogram collection due to excessive memory utilization". We should:
|
Fixes: cockroachdb#62206 Instead of returning an error when out of memory, make SampleReservoir.SampleRow dynamically reduce capacity and retry. This will result in a lower-resolution histogram, but lower-resolution is better than no histogram at all! Release note (bug fix): fix histogram generation when table statistics collection reaches memory limits.
Fixes: cockroachdb#62206 Instead of returning an error when out of memory, make SampleReservoir.SampleRow dynamically reduce capacity and retry. This will result in a lower-resolution histogram, but lower-resolution is better than no histogram at all! Release note (bug fix): fix histogram generation when table statistics collection reaches memory limits.
Fixes: cockroachdb#62206 Instead of returning an error when out of memory, make SampleReservoir.SampleRow dynamically reduce capacity and retry. This will result in a less accurate histogram, but less accurate is probably still better than no histogram at all. Release note (bug fix): continue to generate histograms when table statistics collection reaches memory limits, instead of disabling histogram generation.
Fixes: cockroachdb#62206 Instead of returning an error when out of memory, make SampleReservoir.SampleRow dynamically reduce capacity and retry. This will result in a less accurate histogram, but less accurate is probably still better than no histogram at all. Release note (bug fix): continue to generate histograms when table statistics collection reaches memory limits, instead of disabling histogram generation.
65491: sql: shrink SampleReservoir capacity on memory exhaustion r=michae2 a=michae2 **sql: dynamically shrink sampleAggregator capacity as needed** To avoid bias, sampleAggregator must always use a capacity <= the capacity of each samplerProcessor feeding it. This was always implicitly true before, as every samplerProcessor and sampleAggregator used the same fixed capacity. But the next few commits will give sampleProcessors (and sampleAggregators) the ability to dynamically decrease capacity when out of memory, meaning we now sometimes need to resize sampleAggregator to keep this invariant true. This commit does not yet change any behavior because the capacities of samplerProcessors are still static. Next few commits will change that. Release note: None --- **sql: fix memory accounting in SampleReservoir.copyRow** When overwriting the highest-rank sample with a new one, we might have *smaller* datums than before, so we should account for memory use shrinking as well as growing. (Of course, those datums are in a datumAlloc, so we don't know exactly when they will be GC'd... but this is the best accounting we can do without hooking into the garbage collector.) Release note: None --- **sql: shrink SampleReservoir capacity on memory exhaustion** Fixes: #62206 Instead of returning an error when out of memory, make SampleReservoir.SampleRow dynamically reduce capacity and retry. This will result in a lower-resolution histogram, but lower-resolution is better than no histogram at all! Release note (bug fix): fix histogram generation when table statistics collection reaches memory limits. --- **sql: add minimum SampleReservoir capacity** Below a certain number of samples, the histograms would be so imprecise that they are not worth generating. Add the ability to set this threshold in SampleReservoir. This commit sets this threshold to the max number of histogram buckets, which was picked because it seemed to make sense and not for any mathematical or empirical reason. Release note: None Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>
Fixes: cockroachdb#62206 Instead of returning an error when out of memory, make SampleReservoir.SampleRow dynamically reduce capacity and retry. Fewer samples will result in a less accurate histogram, but less accurate is probably still better than no histogram at all, up to a point. If the number of samples falls below a minimum threshold, then give up and disable histogram collection, as we were doing originally, rather than using a wildly inaccurate histogram. Also fix some memory accounting in SampleReservoir.copyRow. Release note (performance improvement): continue to generate histograms when table statistics collection reaches memory limits, instead of disabling histogram generation.
Fixes: cockroachdb#62206 Instead of returning an error when out of memory, make SampleReservoir.SampleRow dynamically reduce capacity and retry. Fewer samples will result in a less accurate histogram, but less accurate is probably still better than no histogram at all, up to a point. If the number of samples falls below a minimum threshold, then give up and disable histogram collection, as we were doing originally, rather than using a wildly inaccurate histogram. Also fix some memory accounting in SampleReservoir.copyRow. Release note (performance improvement): continue to generate histograms when table statistics collection reaches memory limits, instead of disabling histogram generation.
CRDB 20.2.5.
Table has no histogram statistics created for any column.
Table has ~100 columns, 5.4TB and an estimated 1,321,192,796 rows. This is the largest in the db.
Another large table with 2.1TB and 2,067,670,141 rows is showing histograms. I have not reviewed the remaining 40+ tables in this db, but I can if needed.
More details can be found in linked ticket and slack conversation.
gz#7914
The text was updated successfully, but these errors were encountered: