Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/stats: evict stats cache entry if user-defined types have changed #124603

Merged
merged 3 commits into from
May 29, 2024

Conversation

michae2
Copy link
Collaborator

@michae2 michae2 commented May 23, 2024

sql: add defensive type check to sql.(*optTableStat).init

In statistics builder we assume that the datums decoded from histogram upper bounds are comparable with datums in spans and constraints. If the histogram column type were ever different from the table column type, however, this assumption would not hold.

This should never happen, because histograms are associated with a column ID, and ALTER TABLE should never re-use a column ID during ALTER TABLE ADD COLUMN or ALTER TABLE ALTER COLUMN TYPE. But just to be defensive, add a check to sql.(*optTableStat).init that skips over the TableStatistic if the histogram column type isn't equivalent to the table column type.

Also add this same typecheck to ALTER TABLE INJECT STATISTICS to guard against injecting histograms that don't match the column type. As part of this fix I'm removing the testcase added in #46552 which deliberately injects statistics with a histogram of a different type.

Informs: #124181

Release note: None


sql/stats: evict stats cache entry if user-defined types have changed

When adding table statistics to the stats cache, we decode histogram upper bounds into datums. If the histogram column uses a user-defined type, we hydrate the type and use this to decode.

In statistics builder, these histogram upper bound datums are compared against datums in spans and constraints. The comparisons assume that the datums are of equivalent type, but if the user-defined type has changed sometime after loading the stats cache entry, this might not be true.

If the user-defined type has changed, we need to evict and re-load the stats cache entry so that we decode histogram datums with a freshly-hydrated type.

(We were already checking UDT versions when building the optTable in sql.(*optCatalog).dataSourceForTable, but the newly-built optTable used the existing table statistics instead of refreshing those, too.)

Fixes: #124181

Release note (bug fix): Fix a bug where a change to a user-defined type could cause queries against tables using that type to fail with an error message like:

histogram.go:694: span must be fully contained in the bucket

The change to the user-defined type could come directly from an ALTER TYPE statement, or indirectly from an ALTER DATABASE ADD REGION or DROP REGION statement (which implicitly change the crdb_internal_region type).

This bug has existed since UDTs were introduced in v20.2.


sem/tree: check oid and version in tree.(*DEnum).CompareError

Make sure when we're comparing two enum datums that they are, in fact, the same enum type.

Informs: #124181

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@michae2 michae2 changed the title Udthist sql/stats: evict stats cache entry if user-defined types have changed May 23, 2024
@michae2 michae2 marked this pull request as ready for review May 23, 2024 22:47
@michae2 michae2 requested review from a team as code owners May 23, 2024 22:47
@michae2 michae2 removed the request for review from a team May 23, 2024 22:47
@michae2 michae2 added backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. backport-23.2.6-rc labels May 23, 2024
@michae2
Copy link
Collaborator Author

michae2 commented May 23, 2024

Note that this only contains the fix for #124181 and not the fix for #67050 which we worked on during the collab session today. (I'll open a separate PR for that.)

I'm leaning toward not backporting the 3rd commit. WDYT?

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job figuring this out! I only have minor nits.

For the third commit I wonder whether we should make it both stronger and backportable - namely, thoughts on making those two checks be CrdbTestBuild-only and make the errors AssertionFailed so that we could have a chance to catch bugs like this in randomized tests?

Reviewed 1 of 1 files at r1, 11 of 11 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @michae2)


pkg/sql/opt_catalog.go line 1798 at r1 (raw file):

		}
	}
	if stat.HistogramData != nil && len(stat.ColumnIDs) == 1 {

nit: the check for a single column is here because we don't generate histograms for multi-column stats? Should we reference #49698 around this defensive check to make sure the code is adjusted when we lift that restriction?


pkg/sql/stats/stats_cache.go line 636 at r2 (raw file):

// parseStats converts the given datums to a TableStatistic object. It might
// need to run a query to get user defined type metadata.

nit: mention in the comment that ColumnID is only specified if a single-column statistic is returned. Alternatively, it might be cleaner to not return the ColumnID and fetch it one layer above in getTableStatsFromDB.


pkg/ccl/logictestccl/testdata/logic_test/multi_region_stats line 73 at r2 (raw file):

statement ok
ALTER DATABASE db124181 ADD REGION "ca-central-1"

nit: should we also check statistics after having changed the enum?

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Once Yahor's comments are addressed. I agree it makes sense to either not backport the last commit, or hide it behind CrdbTestBuild.

Reviewed 1 of 1 files at r1, 11 of 11 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2)


pkg/sql/opt_catalog.go line 1806 at r2 (raw file):

				// during an ALTER TABLE ALTER COLUMN TYPE statement, which we should
				// not. Still, never hurts to be defensive.
				return false, nil

[nit] Should we add a log message or something here, since this is unexpected behavior?

@michae2 michae2 force-pushed the udthist branch 2 times, most recently from 2f42da5 to 746092c Compare May 28, 2024 22:37
@michae2 michae2 requested a review from a team as a code owner May 28, 2024 22:37
@michae2 michae2 force-pushed the udthist branch 3 times, most recently from 31abd7b to 0bef5f4 Compare May 28, 2024 23:52
In statistics builder we assume that the datums decoded from histogram
upper bounds are comparable with datums in spans and constraints. If the
histogram column type were ever different from the table column type,
however, this assumption would not hold.

This should never happen, because histograms are associated with a
column ID, and ALTER TABLE should never re-use a column ID during ALTER
TABLE ADD COLUMN or ALTER TABLE ALTER COLUMN TYPE. But just to be
defensive, add a check to sql.(*optTableStat).init that skips over the
TableStatistic if the histogram column type isn't equivalent to the
table column type.

Also add this same typecheck to ALTER TABLE INJECT STATISTICS to guard
against injecting histograms that don't match the column type. As part
of this fix I'm removing the testcase added in cockroachdb#46552 which deliberately
injects statistics with a histogram of a different type.

Informs: cockroachdb#124181

Release note: None
When adding table statistics to the stats cache, we decode histogram
upper bounds into datums. If the histogram column uses a user-defined
type, we hydrate the type and use this to decode.

In statistics builder, these histogram upper bound datums are compared
against datums in spans and constraints. The comparisons assume that the
datums are of equivalent type, but if the user-defined type has changed
sometime after loading the stats cache entry, this might not be true.

If the user-defined type has changed, we need to evict and re-load the
stats cache entry so that we decode histogram datums with a freshly-
hydrated type.

(We were already checking UDT versions when building the optTable in
sql.(*optCatalog).dataSourceForTable, but the newly-built optTable used
the existing table statistics instead of refreshing those, too.)

Fixes: cockroachdb#124181

Release note (bug fix): Fix a bug where a change to a user-defined type
could cause queries against tables using that type to fail with an error
message like:

  "histogram.go:694: span must be fully contained in the bucket"

The change to the user-defined type could come directly from an ALTER
TYPE statement, or indirectly from an ALTER DATABASE ADD REGION or DROP
REGION statement (which implicitly change the crdb_internal_region
type).

This bug has existed since UDTs were introduced in v20.2.
Make sure when we're comparing two enum datums that they are, in fact,
the same enum type.

Informs: cockroachdb#124181

Release note: None
Copy link
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews!

The first commit ran into a testcase added by #46552 which was purposefully injecting stats with a histogram of a different type. I added a similar type check to ALTER TABLE INJECT STATISTICS which prevents injection of stats with a histogram of a different type. The existence of #46552 made me nervous, so I'm not planning to backport the first commit.

For the third commit I wonder whether we should make it both stronger and backportable - namely, thoughts on making those two checks be CrdbTestBuild-only and make the errors AssertionFailed so that we could have a chance to catch bugs like this in randomized tests?

Thinking about this more, I changed the oid difference to a "normal" unsupported comparison, because that means we're comparing two different ENUMs, and the version difference to an assertion failure because it probably indicates failure to hydrate a type. I'm still leaning toward not backporting the third commit, either. So I'm only planning to backport the second commit (the bug fix).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @yuzefovich)


pkg/sql/opt_catalog.go line 1798 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: the check for a single column is here because we don't generate histograms for multi-column stats? Should we reference #49698 around this defensive check to make sure the code is adjusted when we lift that restriction?

Done.


pkg/sql/opt_catalog.go line 1806 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] Should we add a log message or something here, since this is unexpected behavior?

Done.


pkg/sql/stats/stats_cache.go line 636 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: mention in the comment that ColumnID is only specified if a single-column statistic is returned. Alternatively, it might be cleaner to not return the ColumnID and fetch it one layer above in getTableStatsFromDB.

Good call, I moved it up to getTableStatsFromDB.


pkg/ccl/logictestccl/testdata/logic_test/multi_region_stats line 73 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: should we also check statistics after having changed the enum?

Done.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Backporting only the second commit SGTM.

Reviewed 5 of 17 files at r4, 11 of 11 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)

@michae2
Copy link
Collaborator Author

michae2 commented May 29, 2024

Thanks!

bors r=DrewKimball,yuzefovich

@craig
Copy link
Contributor

craig bot commented May 29, 2024

Build failed:

@rickystewart
Copy link
Collaborator

bors r=DrewKimball,yuzefovich

@craig craig bot merged commit 76c744a into cockroachdb:master May 29, 2024
22 checks passed
Copy link

blathers-crl bot commented May 29, 2024

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from ef24bf7 to blathers/backport-release-23.1-124603: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


error creating merge commit from ef24bf7 to blathers/backport-release-23.2-124603: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.2.x failed. See errors above.


error setting reviewers, but backport branch blathers/backport-release-24.1-124603 is ready: POST https://api.github.com/repos/cockroachdb/cockroach/pulls/124810/requested_reviewers: 422 Reviews may only be requested from collaborators. One or more of the teams you specified is not a collaborator of the cockroachdb/cockroach repository. []

Backport to branch 24.1.x failed. See errors above.


error creating merge commit from ef24bf7 to blathers/backport-release-23.2.6-rc-124603: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.2.6-rc failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: v23.1.12: span must be fully contained in the bucket
5 participants