sql/opt: fix null_ordered_last with DISTINCT queries#159154
sql/opt: fix null_ordered_last with DISTINCT queries#159154ajstorm wants to merge 1 commit intocockroachdb:masterfrom
Conversation
4ae0663 to
918c705
Compare
mgartner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajstorm)
pkg/sql/opt/optbuilder/distinct.go line 46 at r1 (raw file):
// and contain "col IS NULL" expressions that are safe to include // in DISTINCT grouping. if strings.HasPrefix(scopeCol.name.MetadataName(), "nulls_ordering_") {
Relying on the metadata name to determine if the column is a synthesized "nulls_ordering_" column is too brittle. It could cause incorrect behavior if a table has a column name with that same prefix. Should we store the set of synthesized "nulls_ordering_" column IDs in the builder? Does it need to be a map from col in col IS NULL to the synthesized column so we can ensure we are not grouping by two unrelated columns?
I'm also curious if it makes sense to instead project-away the synthesized column before constructing the distinct-on? I think that might lead to worse plans, but I'm not sure. My thinking is that by including both columns in the distinct-on grouping columns, the sort can be pulled above the distinct-on. This means that the sort will operate on fewer rows, because the distinct-on will eliminate some. Does that sound right? Is this something you considered?
pkg/sql/logictest/testdata/logic_test/distinct line 251 at r1 (raw file):
# Test DISTINCT with default null ordering query I SELECT DISTINCT i FROM t_nulls ORDER BY i
nit: since this was working fine before, it's not really a regression test case. but it does look like we are missing coverage. can you instead add a test case like SELECT DISTINCT y,z FROM xyz ORDER BY y,z after the test case SELECT DISTINCT y,z FROM xyz above.
pkg/sql/logictest/testdata/logic_test/distinct line 262 at r1 (raw file):
# This previously errored with: # "for SELECT DISTINCT, ORDER BY expressions must appear in select list"
nit: this comment isn't necessary.
pkg/sql/logictest/testdata/logic_test/distinct line 284 at r1 (raw file):
# Test DISTINCT with explicit NULLS FIRST query I SELECT DISTINCT i FROM t_nulls ORDER BY i NULLS FIRST
nit: remove this test, it is not necessary.
pkg/sql/opt/optbuilder/testdata/distinct line 331 at r1 (raw file):
# Test DISTINCT with null_ordered_last=false (default) build SELECT DISTINCT i FROM nulls_t ORDER BY i
nit: remove this test.
Addressed review comments Previously, setting null_ordered_last=true caused queries with DISTINCT to incorrectly error with "for SELECT DISTINCT, ORDER BY expressions must appear in select list". This happened because the optimizer internally adds a synthetic "col IS NULL" expression for NULL ordering, and the constructDistinct function was rejecting this synthetic column. The fix adds special case handling in constructDistinct to recognize and allow these synthetic null ordering columns by checking for the "nulls_ordering_" prefix in the column's metadata name. Fixes cockroachdb#158879 Release note (bug fix): Fixed a bug where setting null_ordered_last=true would cause SELECT DISTINCT queries with ORDER BY to incorrectly fail with an error. Epic: None 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Addressed Review CommentsAll review comments have been addressed. Here's a summary of the changes: Summary of Changes1.
|
918c705 to
96d3714
Compare
Fixes #158879
Summary
Root Cause
When
null_ordered_last=trueis set, CockroachDB internally adds a syntheticcol IS NULLcolumn to handle NULL ordering. TheconstructDistinctfunction was incorrectly rejecting this synthetic column because it was not part of the SELECT list, causing the error "for SELECT DISTINCT, ORDER BY expressions must appear in select list".Fix
Modified the
constructDistinctfunction to recognize and allow synthetic null ordering columns by checking for thenulls_ordering_prefix in the column metadata name. These columns are safe to include in the grouping because they are deterministic functions of columns already in the DISTINCT list.Files Changed
pkg/sql/opt/optbuilder/distinct.go- Added special case handling for null ordering columnspkg/sql/opt/optbuilder/testdata/distinct- Added optimizer test casespkg/sql/logictest/testdata/logic_test/distinct- Added logic testsTest Plan
This PR was auto-generated by crdb-issue-autosolver using Claude Code.
Please review carefully before approving.