Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-23.2.6-rc: sql/stats: evict stats cache entry if user-defined types have changed #124854

Conversation

michae2
Copy link
Collaborator

@michae2 michae2 commented May 30, 2024

Backport 1/3 commits from #124603.

/cc @cockroachdb/release


sql/stats: evict stats cache entry if user-defined types have changed

When adding table statistics to the stats cache, we decode histogram upper bounds into datums. If the histogram column uses a user-defined type, we hydrate the type and use this to decode.

In statistics builder, these histogram upper bound datums are compared against datums in spans and constraints. The comparisons assume that the datums are of equivalent type, but if the user-defined type has changed sometime after loading the stats cache entry, this might not be true.

If the user-defined type has changed, we need to evict and re-load the stats cache entry so that we decode histogram datums with a freshly-hydrated type.

(We were already checking UDT versions when building the optTable in sql.(*optCatalog).dataSourceForTable, but the newly-built optTable used the existing table statistics instead of refreshing those, too.)

Fixes: #124181

Release note (bug fix): Fix a bug where a change to a user-defined type could cause queries against tables using that type to fail with an error message like:

histogram.go:694: span must be fully contained in the bucket

The change to the user-defined type could come directly from an ALTER TYPE statement, or indirectly from an ALTER DATABASE ADD REGION or DROP REGION statement (which implicitly change the crdb_internal_region type).

This bug has existed since UDTs were introduced in v20.2.


Release justification: fix for a serious production issue.

Copy link

blathers-crl bot commented May 30, 2024

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Backports should only be created for serious
    issues
    or test-only changes.
  • Backports should not break backwards-compatibility.
  • Backports should change as little code as possible.
  • Backports should not change on-disk formats or node communication protocols.
  • Backports should not add new functionality (except as defined
    here).
  • Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
  • All backports must be reviewed by the owning areas TL and one additional
    TL. For more information as to how that review should be conducted, please consult the backport
    policy
    .
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added).
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
  • Your backport must be accompanied by a post to the appropriate Slack
    channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.

Also, please add a brief release justification to the body of your PR to justify this
backport.

@blathers-crl blathers-crl bot added the backport Label PR's that are backports to older release branches label May 30, 2024
@cockroach-teamcity
Copy link
Member

This change is Reviewable

When adding table statistics to the stats cache, we decode histogram
upper bounds into datums. If the histogram column uses a user-defined
type, we hydrate the type and use this to decode.

In statistics builder, these histogram upper bound datums are compared
against datums in spans and constraints. The comparisons assume that the
datums are of equivalent type, but if the user-defined type has changed
sometime after loading the stats cache entry, this might not be true.

If the user-defined type has changed, we need to evict and re-load the
stats cache entry so that we decode histogram datums with a freshly-
hydrated type.

(We were already checking UDT versions when building the optTable in
sql.(*optCatalog).dataSourceForTable, but the newly-built optTable used
the existing table statistics instead of refreshing those, too.)

Fixes: cockroachdb#124181

Release note (bug fix): Fix a bug where a change to a user-defined type
could cause queries against tables using that type to fail with an error
message like:

  "histogram.go:694: span must be fully contained in the bucket"

The change to the user-defined type could come directly from an ALTER
TYPE statement, or indirectly from an ALTER DATABASE ADD REGION or DROP
REGION statement (which implicitly change the crdb_internal_region
type).

This bug has existed since UDTs were introduced in v20.2.
@michae2 michae2 force-pushed the backportrelease-23.2.6-rc-124603 branch from 2d14052 to 43fe2b0 Compare June 3, 2024 23:17
@michae2 michae2 requested review from mgartner and rafiss June 5, 2024 15:37
@michae2 michae2 marked this pull request as ready for review June 5, 2024 15:37
@michae2 michae2 requested a review from a team as a code owner June 5, 2024 15:37
@michae2
Copy link
Collaborator Author

michae2 commented Jun 5, 2024

I looked in depth at the Bazel Extended CI timeouts, and they seem to be the normal timeouts for stress-race, experienced frequently even before this PR. There is no deadlock like I feared.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nicely done!

Reviewed 11 of 11 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @rafiss)


pkg/sql/stats/stats_cache.go line 320 at r1 (raw file):

// getTableStatsFromCache is like GetTableStats but assumes that the table ID
// is safe to fetch statistics for: non-system, non-virtual, non-view, etc.

nit for follow-up PR: explain the udtCols argument.

@michae2 michae2 merged commit 212dffa into cockroachdb:release-23.2.6-rc Jun 6, 2024
5 of 6 checks passed
@michae2 michae2 deleted the backportrelease-23.2.6-rc-124603 branch June 6, 2024 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Label PR's that are backports to older release branches
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants