Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-24.1: sql/stats: evict stats cache entry if user-defined types have changed #124810

Merged
merged 2 commits into from
Jun 6, 2024

Conversation

blathers-crl[bot]
Copy link

@blathers-crl blathers-crl bot commented May 29, 2024

Backport 1/3 commits from #124603 and 1/1 commits from #122386 on behalf of @michae2.

/cc @cockroachdb/release


sql/stats: evict stats cache entry if user-defined types have changed

When adding table statistics to the stats cache, we decode histogram upper bounds into datums. If the histogram column uses a user-defined type, we hydrate the type and use this to decode.

In statistics builder, these histogram upper bound datums are compared against datums in spans and constraints. The comparisons assume that the datums are of equivalent type, but if the user-defined type has changed sometime after loading the stats cache entry, this might not be true.

If the user-defined type has changed, we need to evict and re-load the stats cache entry so that we decode histogram datums with a freshly-hydrated type.

(We were already checking UDT versions when building the optTable in sql.(*optCatalog).dataSourceForTable, but the newly-built optTable used the existing table statistics instead of refreshing those, too.)

Fixes: #124181

Release note (bug fix): Fix a bug where a change to a user-defined type could cause queries against tables using that type to fail with an error message like:

histogram.go:694: span must be fully contained in the bucket

The change to the user-defined type could come directly from an ALTER TYPE statement, or indirectly from an ALTER DATABASE ADD REGION or DROP REGION statement (which implicitly change the crdb_internal_region type).

This bug has existed since UDTs were introduced in v20.2.


multiregionccl: skip TestMRSystemDatabase under stress race

Closes #122363

Release note: None


Release justification: fix for a serious production issue.

@blathers-crl blathers-crl bot requested a review from a team as a code owner May 29, 2024 15:17
@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-24.1-124603 branch from 1b30892 to 0d8219b Compare May 29, 2024 15:17
@blathers-crl blathers-crl bot requested review from a team as code owners May 29, 2024 15:17
@blathers-crl blathers-crl bot requested review from michae2 and removed request for a team May 29, 2024 15:17
@blathers-crl blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels May 29, 2024
Copy link
Author

blathers-crl bot commented May 29, 2024

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Backports should only be created for serious
    issues
    or test-only changes.
  • Backports should not break backwards-compatibility.
  • Backports should change as little code as possible.
  • Backports should not change on-disk formats or node communication protocols.
  • Backports should not add new functionality (except as defined
    here).
  • Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
  • All backports must be reviewed by the owning areas TL and one additional
    TL. For more information as to how that review should be conducted, please consult the backport
    policy
    .
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added).
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
  • Your backport must be accompanied by a post to the appropriate Slack
    channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.

Also, please add a brief release justification to the body of your PR to justify this
backport.

@blathers-crl blathers-crl bot added the backport Label PR's that are backports to older release branches label May 29, 2024
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@michae2
Copy link
Collaborator

michae2 commented May 29, 2024

(Hold off on reviewing while I adjust this.)

@michae2 michae2 force-pushed the blathers/backport-release-24.1-124603 branch from 0d8219b to 9718899 Compare May 29, 2024 19:10
@michae2 michae2 marked this pull request as draft May 30, 2024 19:05
@michae2
Copy link
Collaborator

michae2 commented May 30, 2024

Something is going wrong with:

./dev test pkg/ccl/multiregionccl --filter=TestMrSystemDatabase --ignore-cache --race --stress --count 8

I have a feeling we're thrashing the cache when changing the system database to multi-region.

This might be related to #122790 but I'm not sure yet.

@michae2
Copy link
Collaborator

michae2 commented Jun 3, 2024

Disallowing statistics on system.namespace, system.descriptor, system.comments, and system.zones appears to fix this. I'm trying to understand why. This has something to do with loading descriptors and/or lease acquisition, possibly during ALTER DATABASE system SET PRIMARY REGION.

When adding table statistics to the stats cache, we decode histogram
upper bounds into datums. If the histogram column uses a user-defined
type, we hydrate the type and use this to decode.

In statistics builder, these histogram upper bound datums are compared
against datums in spans and constraints. The comparisons assume that the
datums are of equivalent type, but if the user-defined type has changed
sometime after loading the stats cache entry, this might not be true.

If the user-defined type has changed, we need to evict and re-load the
stats cache entry so that we decode histogram datums with a freshly-
hydrated type.

(We were already checking UDT versions when building the optTable in
sql.(*optCatalog).dataSourceForTable, but the newly-built optTable used
the existing table statistics instead of refreshing those, too.)

Fixes: #124181

Release note (bug fix): Fix a bug where a change to a user-defined type
could cause queries against tables using that type to fail with an error
message like:

  "histogram.go:694: span must be fully contained in the bucket"

The change to the user-defined type could come directly from an ALTER
TYPE statement, or indirectly from an ALTER DATABASE ADD REGION or DROP
REGION statement (which implicitly change the crdb_internal_region
type).

This bug has existed since UDTs were introduced in v20.2.
@michae2 michae2 force-pushed the blathers/backport-release-24.1-124603 branch from 9718899 to 78d6cda Compare June 3, 2024 23:35
@michae2
Copy link
Collaborator

michae2 commented Jun 3, 2024

Disallowing statistics on system.namespace, system.descriptor, system.comments, and system.zones appears to fix this. I'm trying to understand why. This has something to do with loading descriptors and/or lease acquisition, possibly during ALTER DATABASE system SET PRIMARY REGION.

Actually, this seems to be wrong. I think this is simply a flaky test, unrelated to this PR. We skipped it under stressrace on master, so let's do that in 24.1 as well.

@michae2 michae2 force-pushed the blathers/backport-release-24.1-124603 branch from 78d6cda to 1dfcb9a Compare June 4, 2024 04:21
@yuzefovich
Copy link
Member

I usually ignore Extended CI being red on the backports since it seems to always be red on mine 🤷‍♂️

@michae2 michae2 requested review from mgartner and rafiss June 5, 2024 15:41
@michae2 michae2 marked this pull request as ready for review June 5, 2024 15:41
@michae2
Copy link
Collaborator

michae2 commented Jun 5, 2024

Ok, this (and all the other backports) are RFAL.

@michae2 michae2 merged commit 7d93e89 into release-24.1 Jun 6, 2024
19 of 20 checks passed
@michae2 michae2 deleted the blathers/backport-release-24.1-124603 branch June 6, 2024 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Label PR's that are backports to older release branches blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants