Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/stats: store non-NULL histograms for empty tables #81793

Merged
merged 1 commit into from
May 25, 2022

Conversation

michae2
Copy link
Collaborator

@michae2 michae2 commented May 24, 2022

We have been storing NULL histograms / using nil HistogramData for all
the following cases:

  1. regular stats, GenerateHistogram=false
  2. regular stats, GenerateHistogram=true, empty table
  3. regular stats, GenerateHistogram=true, all NULL values
  4. inverted stats, no inverted index
  5. inverted stats, yes inverted index, empty table
  6. inverted stats, yes inverted index, all NULL values

When predicting histograms for statistics forecasts, we need to
distinguish between case 1 and cases 2 and 3. In case 1 we cannot
predict histograms, but in cases 2 and 3 we can (and the emptiness of
the histogram is important).

So, for cases 2 and 3 we now store an empty histogram instead of NULL,
and correspondingly use an initialized HistogramData with 0-length
Buckets instead of nil HistogramData.

This also helps with testing statistics forecasts.

(The inability to distinguish cases 4-6 doesn't matter, because we
cannot predict histograms for inverted indexes anyway. I tried to change
cases 5 and 6 to be non-NULL for consistency but ran into some
problems, so I'll leave them as they are.)

Release note: None

@michae2 michae2 requested review from rytaft and msirek May 24, 2022 22:40
@michae2 michae2 requested a review from a team as a code owner May 24, 2022 22:40
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change!
:lgtm:

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rytaft)

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: cool!

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @michae2)

@michae2
Copy link
Collaborator Author

michae2 commented May 25, 2022

TFTRs!

bors r=msirek,rytaft

@michae2
Copy link
Collaborator Author

michae2 commented May 25, 2022

Sigh.

bors r-

@craig
Copy link
Contributor

craig bot commented May 25, 2022

Canceled.

We have been storing NULL histograms / using nil HistogramData for all
the following cases:
1. regular stats, GenerateHistogram=false
2. regular stats, GenerateHistogram=true, empty table
3. regular stats, GenerateHistogram=true, all NULL values
4. inverted stats, no inverted index
5. inverted stats, yes inverted index, empty table
6. inverted stats, yes inverted index, all NULL values

When predicting histograms for statistics forecasts, we need to
distinguish between case 1 and cases 2 and 3. In case 1 we cannot
predict histograms, but in cases 2 and 3 we can (and the emptiness of
the histogram is important).

So, for cases 2 and 3 we now store an empty histogram instead of NULL,
and correspondingly use an initialized HistogramData with 0-length
Buckets instead of nil HistogramData.

This also helps with testing statistics forecasts.

(The inability to distinguish cases 4-6 doesn't matter, because we
cannot predict histograms for inverted indexes anyway. I tried to change
cases 5 and 6 to be non-NULL for consistency but ran into some
problems, so I'll leave them as they are.)

Release note: None
@michae2
Copy link
Collaborator Author

michae2 commented May 25, 2022

Let's try this again.

bors r=msirek,rytaft

@craig
Copy link
Contributor

craig bot commented May 25, 2022

Build succeeded:

@craig craig bot merged commit 4dc4279 into cockroachdb:master May 25, 2022
@blathers-crl
Copy link

blathers-crl bot commented May 25, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 82b5926 to blathers/backport-release-22.1-81793: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@michae2 michae2 deleted the histogram branch May 25, 2022 17:39
michae2 added a commit to michae2/cockroach that referenced this pull request Dec 2, 2022
`PrintTableStats` is used by several variants of `EXPLAIN` to print an
`ALTER TABLE INJECT STATISTICS` command with the current statistics for
a given table. At least two of these variants (the existing `EXPLAIN
(OPT, ENV)` and the upcoming `EXPLAIN ANALYZE (DEBUG, REDACT)`) want
those statistics with histograms removed.

We were removing histograms by simply deleting the `histo_buckets` JSON
element, leaving behind some other histogram-related elements
(`histo_col_type` and `histo_version`). As of cockroachdb#81793 the existence of
these elements without `histo_buckets` indicates an empty table with an
empty histogram, rather than a non-empty table without a histogram. To
indicate a non-empty table without a histogram, we have to also remove
`histo_version` and set `histo_col_type` to the empty string.

(The difference currently only matters for statistics forecasting.)

Epic: CRDB-19756

Release note: None
craig bot pushed a commit that referenced this pull request Dec 3, 2022
92719: sql: fix removal of histograms when printing table statistics r=msirek,rytaft a=michae2

`PrintTableStats` is used by several variants of `EXPLAIN` to print an `ALTER TABLE INJECT STATISTICS` command with the current statistics for a given table. At least two of these variants (the existing `EXPLAIN (OPT, ENV)` and the upcoming `EXPLAIN ANALYZE (DEBUG, REDACT)`) want those statistics with histograms removed.

We were removing histograms by simply deleting the `histo_buckets` JSON element, leaving behind some other histogram-related elements (`histo_col_type` and `histo_version`). As of #81793 the existence of these elements without `histo_buckets` indicates an empty table with an empty histogram, rather than a non-empty table without a histogram. To indicate a non-empty table without a histogram, we have to also remove `histo_version` and set `histo_col_type` to the empty string.

(The difference currently only matters for statistics forecasting.)

Epic: CRDB-19756

Release note: None

Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants