Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: allow number of histogram buckets collected for indexed columns to be configured #72418

Closed
mgartner opened this issue Nov 3, 2021 · 1 comment · Fixed by #100479
Closed
Labels
A-sql-table-stats Table statistics (and their automatic refresh). C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

mgartner commented Nov 3, 2021

The issue described in #71828 could be mitigated by collecting more than 200 histogram buckets. By collecting more histogram buckets, we are more likely to have an even distribution of values within the histogram bucket range. We could add a setting (ideally per-table or per-column) for configuring the number of histogram buckets collected. However, this would likely be a temporary mitigation for #71828 because even with more histogram buckets, it'd still be possible that histogram upper bound values are not heavy-hitters and the range of values within buckets is no evenly distributed.

Jira issue: CRDB-11140

@mgartner mgartner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Nov 3, 2021
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Nov 3, 2021
@RaduBerinde
Copy link
Member

This depends on a system for making stats configuration per-table.

@michae2 michae2 added the A-sql-table-stats Table statistics (and their automatic refresh). label Jan 3, 2023
rytaft added a commit to rytaft/cockroach that referenced this issue Mar 31, 2023
Informs cockroachdb#72418
Informs cockroachdb#97701

Release note (sql change): Added two new cluster settings that
enable users to change the number of histogram samples and buckets
collected when building histograms as part of table statistics collection:
sql.stats.histogram_samples.count and sql.stats.histogram_buckets.count.
While the default values should work for most cases, it may be beneficial
to increase the number of samples and buckets for very large tables to
avoid creating a histogram that misses important values.
rytaft added a commit to rytaft/cockroach that referenced this issue Apr 3, 2023
Informs cockroachdb#72418
Informs cockroachdb#97701

Release note (sql change): Added two new table storage parameters,
sql_stats_histogram_buckets_count and sql_stats_histogram_samples_count.
These parameters can be used to override the cluster settings
sql.stats.histogram_buckets.count and sql.stats.histogram_samples.count
at the table level. These settings enable users to change the number of
histogram samples and buckets collected when building histograms as part
of table statistics collection. While the default values should work for
most cases, it may be beneficial to increase the number of samples and
buckets for very large tables to avoid creating a histogram that misses
important values.
rytaft added a commit to rytaft/cockroach that referenced this issue Apr 3, 2023
Informs cockroachdb#72418
Informs cockroachdb#97701

Release note (sql change): Added two new table storage parameters,
sql_stats_histogram_buckets_count and sql_stats_histogram_samples_count.
These parameters can be used to override the cluster settings
sql.stats.histogram_buckets.count and sql.stats.histogram_samples.count
at the table level. These settings enable users to change the number of
histogram samples and buckets collected when building histograms as part
of table statistics collection. While the default values should work for
most cases, it may be beneficial to increase the number of samples and
buckets for very large tables to avoid creating a histogram that misses
important values.
rytaft added a commit to rytaft/cockroach that referenced this issue Apr 3, 2023
Informs cockroachdb#72418
Informs cockroachdb#97701

Release note (sql change): Added two new cluster settings that
enable users to change the number of histogram samples and buckets
collected when building histograms as part of table statistics collection:
sql.stats.histogram_samples.count and sql.stats.histogram_buckets.count.
While the default values should work for most cases, it may be beneficial
to increase the number of samples and buckets for very large tables to
avoid creating a histogram that misses important values.
rytaft added a commit to rytaft/cockroach that referenced this issue Apr 3, 2023
Fixes cockroachdb#72418
Informs cockroachdb#97701

Release note (sql change): Added two new table storage parameters,
sql_stats_histogram_buckets_count and sql_stats_histogram_samples_count.
These parameters can be used to override the cluster settings
sql.stats.histogram_buckets.count and sql.stats.histogram_samples.count
at the table level. These settings enable users to change the number of
histogram samples and buckets collected when building histograms as part
of table statistics collection. While the default values should work for
most cases, it may be beneficial to increase the number of samples and
buckets for very large tables to avoid creating a histogram that misses
important values.
rytaft added a commit to rytaft/cockroach that referenced this issue Apr 3, 2023
Informs cockroachdb#72418
Informs cockroachdb#97701

Release note (sql change): Added two new cluster settings that
enable users to change the number of histogram samples and buckets
collected when building histograms as part of table statistics collection:
sql.stats.histogram_samples.count and sql.stats.histogram_buckets.count.
While the default values should work for most cases, it may be beneficial
to increase the number of samples and buckets for very large tables to
avoid creating a histogram that misses important values.
craig bot pushed a commit that referenced this issue Apr 3, 2023
99858: screl: Add IndexID as a attr of UniqueWithoutIndex element r=Xiang-Gu a=Xiang-Gu

Previously, ALTER TABLE stmt where we add column/drop column/alter PK and adding a unique without index is problematic in that the it can succeed even when there are duplicate values. We already had a dep rule that enforces the new primary index to be backfilled before we validate the constraint against it. Unfortunately, this rule is not enforced on unique without index constraint because IndexID was not a attr of it. This commit fixes this.

Fixes #99281
Epic: None
Release note (bug fix): Fixed a bug in v23.1 in the declarative schema changer where unique without index can be incorrectly added in tables with duplicate values if it's with a ADD/DROP COLUMN in one ALTER TABLE statement.


100357: sql: allow changing the number of histogram samples and buckets r=rytaft a=rytaft

Informs #72418
Informs #97701

Release note (sql change): Added two new cluster settings that enable users to change the number of histogram samples and buckets collected when building histograms as part of table statistics collection: `sql.stats.histogram_samples.count` and `sql.stats.histogram_buckets.count`. While the default values should work for most cases, it may be beneficial to increase the number of samples and buckets for very large tables to avoid creating a histogram that misses important values.

100489: go.mod: bump Pebble to b84a7ec7d8dc r=RaduBerinde a=jbowens

```
b84a7ec7 db: populate return statistics for flushable ingests
5fd58365 objstorage: implement tracing
7f7451f2 db,record: add BatchCommitStats to measure total and component durations for commit
295aaab0 objstorage: implement basic refcounting
```

Epic: None
Release note: None

100516: multiregionccl: reenable TestMrSystemDatabase r=ajwerner a=ajwerner

I stressed this for a long time on many cores and it did not fail.

Epic: none

Fixes: #98039

Release note: None

100527: roachtest: skip multitenant/distsql for now r=yuzefovich a=yuzefovich

Informs: #100260.

Epic: None

Release note: None

Co-authored-by: Xiang Gu <xiang@cockroachlabs.com>
Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>
Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
rytaft added a commit that referenced this issue Apr 4, 2023
Informs #72418
Informs #97701

Release note (sql change): Added two new cluster settings that
enable users to change the number of histogram samples and buckets
collected when building histograms as part of table statistics collection:
sql.stats.histogram_samples.count and sql.stats.histogram_buckets.count.
While the default values should work for most cases, it may be beneficial
to increase the number of samples and buckets for very large tables to
avoid creating a histogram that misses important values.
craig bot pushed a commit that referenced this issue Apr 4, 2023
100479: sql: add table storage params to set histogram samples and buckets count r=rytaft a=rytaft

Fixes #72418
Informs #97701

Release note (sql change): Added two new table storage parameters,
`sql_stats_histogram_buckets_count` and `sql_stats_histogram_samples_count`.
These parameters can be used to override the cluster settings
`sql.stats.histogram_buckets.count` and `sql.stats.histogram_samples.count`
at the table level. These settings enable users to change the number of
histogram samples and buckets collected when building histograms as part
of table statistics collection. While the default values should work for
most cases, it may be beneficial to increase the number of samples and
buckets for very large tables to avoid creating a histogram that misses
important values.


Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
@craig craig bot closed this as completed in a4a51e0 Apr 4, 2023
rytaft added a commit to rytaft/cockroach that referenced this issue Apr 4, 2023
Informs cockroachdb#72418
Informs cockroachdb#97701

Release note (sql change): Added two new cluster settings that
enable users to change the number of histogram samples and buckets
collected when building histograms as part of table statistics collection:
sql.stats.histogram_samples.count and sql.stats.histogram_buckets.count.
While the default values should work for most cases, it may be beneficial
to increase the number of samples and buckets for very large tables to
avoid creating a histogram that misses important values.
rytaft added a commit to rytaft/cockroach that referenced this issue Apr 4, 2023
Fixes cockroachdb#72418
Informs cockroachdb#97701

Release note (sql change): Added two new table storage parameters,
sql_stats_histogram_buckets_count and sql_stats_histogram_samples_count.
These parameters can be used to override the cluster settings
sql.stats.histogram_buckets.count and sql.stats.histogram_samples.count
at the table level. These settings enable users to change the number of
histogram samples and buckets collected when building histograms as part
of table statistics collection. While the default values should work for
most cases, it may be beneficial to increase the number of samples and
buckets for very large tables to avoid creating a histogram that misses
important values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-table-stats Table statistics (and their automatic refresh). C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants