-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: allow number of histogram buckets collected for indexed columns to be configured #72418
Labels
A-sql-table-stats
Table statistics (and their automatic refresh).
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-queries
SQL Queries Team
Comments
mgartner
added
the
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
label
Nov 3, 2021
This depends on a system for making stats configuration per-table. |
michae2
added
the
A-sql-table-stats
Table statistics (and their automatic refresh).
label
Jan 3, 2023
rytaft
added a commit
to rytaft/cockroach
that referenced
this issue
Mar 31, 2023
Informs cockroachdb#72418 Informs cockroachdb#97701 Release note (sql change): Added two new cluster settings that enable users to change the number of histogram samples and buckets collected when building histograms as part of table statistics collection: sql.stats.histogram_samples.count and sql.stats.histogram_buckets.count. While the default values should work for most cases, it may be beneficial to increase the number of samples and buckets for very large tables to avoid creating a histogram that misses important values.
rytaft
added a commit
to rytaft/cockroach
that referenced
this issue
Apr 3, 2023
Informs cockroachdb#72418 Informs cockroachdb#97701 Release note (sql change): Added two new table storage parameters, sql_stats_histogram_buckets_count and sql_stats_histogram_samples_count. These parameters can be used to override the cluster settings sql.stats.histogram_buckets.count and sql.stats.histogram_samples.count at the table level. These settings enable users to change the number of histogram samples and buckets collected when building histograms as part of table statistics collection. While the default values should work for most cases, it may be beneficial to increase the number of samples and buckets for very large tables to avoid creating a histogram that misses important values.
rytaft
added a commit
to rytaft/cockroach
that referenced
this issue
Apr 3, 2023
Informs cockroachdb#72418 Informs cockroachdb#97701 Release note (sql change): Added two new table storage parameters, sql_stats_histogram_buckets_count and sql_stats_histogram_samples_count. These parameters can be used to override the cluster settings sql.stats.histogram_buckets.count and sql.stats.histogram_samples.count at the table level. These settings enable users to change the number of histogram samples and buckets collected when building histograms as part of table statistics collection. While the default values should work for most cases, it may be beneficial to increase the number of samples and buckets for very large tables to avoid creating a histogram that misses important values.
rytaft
added a commit
to rytaft/cockroach
that referenced
this issue
Apr 3, 2023
Informs cockroachdb#72418 Informs cockroachdb#97701 Release note (sql change): Added two new cluster settings that enable users to change the number of histogram samples and buckets collected when building histograms as part of table statistics collection: sql.stats.histogram_samples.count and sql.stats.histogram_buckets.count. While the default values should work for most cases, it may be beneficial to increase the number of samples and buckets for very large tables to avoid creating a histogram that misses important values.
rytaft
added a commit
to rytaft/cockroach
that referenced
this issue
Apr 3, 2023
Fixes cockroachdb#72418 Informs cockroachdb#97701 Release note (sql change): Added two new table storage parameters, sql_stats_histogram_buckets_count and sql_stats_histogram_samples_count. These parameters can be used to override the cluster settings sql.stats.histogram_buckets.count and sql.stats.histogram_samples.count at the table level. These settings enable users to change the number of histogram samples and buckets collected when building histograms as part of table statistics collection. While the default values should work for most cases, it may be beneficial to increase the number of samples and buckets for very large tables to avoid creating a histogram that misses important values.
rytaft
added a commit
to rytaft/cockroach
that referenced
this issue
Apr 3, 2023
Informs cockroachdb#72418 Informs cockroachdb#97701 Release note (sql change): Added two new cluster settings that enable users to change the number of histogram samples and buckets collected when building histograms as part of table statistics collection: sql.stats.histogram_samples.count and sql.stats.histogram_buckets.count. While the default values should work for most cases, it may be beneficial to increase the number of samples and buckets for very large tables to avoid creating a histogram that misses important values.
craig bot
pushed a commit
that referenced
this issue
Apr 3, 2023
99858: screl: Add IndexID as a attr of UniqueWithoutIndex element r=Xiang-Gu a=Xiang-Gu Previously, ALTER TABLE stmt where we add column/drop column/alter PK and adding a unique without index is problematic in that the it can succeed even when there are duplicate values. We already had a dep rule that enforces the new primary index to be backfilled before we validate the constraint against it. Unfortunately, this rule is not enforced on unique without index constraint because IndexID was not a attr of it. This commit fixes this. Fixes #99281 Epic: None Release note (bug fix): Fixed a bug in v23.1 in the declarative schema changer where unique without index can be incorrectly added in tables with duplicate values if it's with a ADD/DROP COLUMN in one ALTER TABLE statement. 100357: sql: allow changing the number of histogram samples and buckets r=rytaft a=rytaft Informs #72418 Informs #97701 Release note (sql change): Added two new cluster settings that enable users to change the number of histogram samples and buckets collected when building histograms as part of table statistics collection: `sql.stats.histogram_samples.count` and `sql.stats.histogram_buckets.count`. While the default values should work for most cases, it may be beneficial to increase the number of samples and buckets for very large tables to avoid creating a histogram that misses important values. 100489: go.mod: bump Pebble to b84a7ec7d8dc r=RaduBerinde a=jbowens ``` b84a7ec7 db: populate return statistics for flushable ingests 5fd58365 objstorage: implement tracing 7f7451f2 db,record: add BatchCommitStats to measure total and component durations for commit 295aaab0 objstorage: implement basic refcounting ``` Epic: None Release note: None 100516: multiregionccl: reenable TestMrSystemDatabase r=ajwerner a=ajwerner I stressed this for a long time on many cores and it did not fail. Epic: none Fixes: #98039 Release note: None 100527: roachtest: skip multitenant/distsql for now r=yuzefovich a=yuzefovich Informs: #100260. Epic: None Release note: None Co-authored-by: Xiang Gu <xiang@cockroachlabs.com> Co-authored-by: Rebecca Taft <becca@cockroachlabs.com> Co-authored-by: Jackson Owens <jackson@cockroachlabs.com> Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
rytaft
added a commit
that referenced
this issue
Apr 4, 2023
Informs #72418 Informs #97701 Release note (sql change): Added two new cluster settings that enable users to change the number of histogram samples and buckets collected when building histograms as part of table statistics collection: sql.stats.histogram_samples.count and sql.stats.histogram_buckets.count. While the default values should work for most cases, it may be beneficial to increase the number of samples and buckets for very large tables to avoid creating a histogram that misses important values.
craig bot
pushed a commit
that referenced
this issue
Apr 4, 2023
100479: sql: add table storage params to set histogram samples and buckets count r=rytaft a=rytaft Fixes #72418 Informs #97701 Release note (sql change): Added two new table storage parameters, `sql_stats_histogram_buckets_count` and `sql_stats_histogram_samples_count`. These parameters can be used to override the cluster settings `sql.stats.histogram_buckets.count` and `sql.stats.histogram_samples.count` at the table level. These settings enable users to change the number of histogram samples and buckets collected when building histograms as part of table statistics collection. While the default values should work for most cases, it may be beneficial to increase the number of samples and buckets for very large tables to avoid creating a histogram that misses important values. Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
rytaft
added a commit
to rytaft/cockroach
that referenced
this issue
Apr 4, 2023
Informs cockroachdb#72418 Informs cockroachdb#97701 Release note (sql change): Added two new cluster settings that enable users to change the number of histogram samples and buckets collected when building histograms as part of table statistics collection: sql.stats.histogram_samples.count and sql.stats.histogram_buckets.count. While the default values should work for most cases, it may be beneficial to increase the number of samples and buckets for very large tables to avoid creating a histogram that misses important values.
rytaft
added a commit
to rytaft/cockroach
that referenced
this issue
Apr 4, 2023
Fixes cockroachdb#72418 Informs cockroachdb#97701 Release note (sql change): Added two new table storage parameters, sql_stats_histogram_buckets_count and sql_stats_histogram_samples_count. These parameters can be used to override the cluster settings sql.stats.histogram_buckets.count and sql.stats.histogram_samples.count at the table level. These settings enable users to change the number of histogram samples and buckets collected when building histograms as part of table statistics collection. While the default values should work for most cases, it may be beneficial to increase the number of samples and buckets for very large tables to avoid creating a histogram that misses important values.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-sql-table-stats
Table statistics (and their automatic refresh).
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-queries
SQL Queries Team
The issue described in #71828 could be mitigated by collecting more than 200 histogram buckets. By collecting more histogram buckets, we are more likely to have an even distribution of values within the histogram bucket range. We could add a setting (ideally per-table or per-column) for configuring the number of histogram buckets collected. However, this would likely be a temporary mitigation for #71828 because even with more histogram buckets, it'd still be possible that histogram upper bound values are not heavy-hitters and the range of values within buckets is no evenly distributed.
Jira issue: CRDB-11140
The text was updated successfully, but these errors were encountered: