New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: Convert InnerJoinApply + Zip to ProjectSet #31922

Merged
merged 1 commit into from Nov 13, 2018

Conversation

Projects
None yet
4 participants
@rytaft
Copy link
Contributor

rytaft commented Oct 26, 2018

This commit creates a new relational operator called ProjectSet,
which zips through a list of generators for every row of the input.

As a reminder, a functional zip over generators a,b,c returns tuples of
values from a,b,c picked "simultaneously". NULLs are used when a generator is
"shorter" than another. For example:

zip([1,2,3], ['a','b']) = [(1,'a'), (2,'b'), (3, null)]

ProjectSet corresponds to a relational operator project(R, a, b, c, ...)
which, for each row in R, produces all the rows produced by zip(a, b, c, ...)
with the values of R prefixed. Formally, this performs a lateral cross join
of R with zip(a,b,c).

This operator replaces Zip, which was a non-standard relational operator in
the optimizer. Zip had to be joined with its input via an InnerJoinApply cross
join, which caused some difficulties in creating decorrelation patterns. As a
result, certain queries were not decorrelated that should have been fairly
easy to decorrelate.

As part of this commit, the decorrelation patterns have been updated to
allow decorrelation of generator functions inside EXISTS.

Fixes #31706

Release note (sql change): Many queries containing a correlated EXISTS
subquery with a generator function can now be decorrelated and executed
successfully. Previously, these queries caused a decorrelation error.

@rytaft rytaft requested a review from cockroachdb/sql-opt-prs as a code owner Oct 26, 2018

@cockroach-teamcity

This comment has been minimized.

Copy link
Member

cockroach-teamcity commented Oct 26, 2018

This change is Reviewable

@andy-kimball
Copy link
Contributor

andy-kimball left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/memo/statistics_builder.go, line 1662 at r1 (raw file):

	reqZipCols := colSet.Difference(inputCols).Intersection(zipCols)
	if !reqZipCols.Empty() {
		zipDistinctCount := float64(1)

Suggest adding a comment here clarifying that zipDistinctCount and zipNullCount are totaled across all rows, not just zip rows (i.e. totals after cross-join is applied).


pkg/sql/opt/memo/statistics_builder.go, line 1673 at r1 (raw file):

					// null count based on the type of generator function and its parameters.
					zipDistinctCount *= unknownGeneratorRowCount * unknownGeneratorDistinctCountRatio
					zipNullCount += (s.RowCount * unknownNullCountRatio) * (1 - zipNullCount/s.RowCount)

Is there possibility of division by zero error here, if s.RowCount = 0?


pkg/sql/opt/norm/rules/decorrelate.opt, line 56 at r1 (raw file):

# ProjectSet only need to be executed once in total, instead of once for each
# input row.
[DecorrelateProjectSet, Normalize]

This should be TryDecorrelateProjectSet to be consistent with others. It's called that because there's no guarantee that this rule will result in decorrelation - it's just an attempt to do so.


pkg/sql/opt/norm/rules/decorrelate.opt, line 471 at r1 (raw file):

        (HasOuterCols $right) &
        (CanHaveZeroRows $right) & # Let EliminateExistsGroupBy match instead.
        (GroupBy | DistinctOn | Project | ProjectSet)

Did you add a test for this new case?


pkg/sql/opt/norm/rules/decorrelate.opt, line 543 at r1 (raw file):

            []
        )
        $zip

Did you fully think through cases where the $zip has functions with different cardinalities that are correlated with the $input? Do you have test cases for that?


pkg/sql/opt/norm/testdata/rules/decorrelate, line 89 at r1 (raw file):

# --------------------------------------------------

opt expect=DecorrelateProjectSet

Can you find a test case where this rule must be applied multiple times in succession?


pkg/sql/opt/norm/testdata/rules/prune_cols, line 1562 at r1 (raw file):

# --------------------------------------------------
opt expect=PruneProjectSetCols
SELECT a, b, generate_series(c, 10) FROM abcde

You need at least one other case to test case where zip depends upon input cols.

@RaduBerinde
Copy link
Member

RaduBerinde left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/memo/statistics_builder.go, line 1673 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Is there possibility of division by zero error here, if s.RowCount = 0?

It's better to write this as

zipNullCount = s.RowCount * unknownNullCountRatio + zipNullCount * (1 - unknownNullCountRatio)

(same below)

@rytaft rytaft force-pushed the rytaft:project-set branch from d893d3f to 3efedba Nov 1, 2018

@rytaft
Copy link
Contributor

rytaft left a comment

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/memo/statistics_builder.go, line 1662 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Suggest adding a comment here clarifying that zipDistinctCount and zipNullCount are totaled across all rows, not just zip rows (i.e. totals after cross-join is applied).

Done.


pkg/sql/opt/memo/statistics_builder.go, line 1673 at r1 (raw file):

Previously, RaduBerinde wrote…

It's better to write this as

zipNullCount = s.RowCount * unknownNullCountRatio + zipNullCount * (1 - unknownNullCountRatio)

(same below)

I simplified this expression but couldn't figure out how to do the same below. I added a check at the top of the function to see if s.RowCount == 0.


pkg/sql/opt/norm/rules/decorrelate.opt, line 56 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

This should be TryDecorrelateProjectSet to be consistent with others. It's called that because there's no guarantee that this rule will result in decorrelation - it's just an attempt to do so.

There is a TryDecorrelateProjectSet rule below which is similar to the other TryDecorrelate rules. But this DecorrelateProjectSet rule is guaranteed to decorrelate the ProjectSet. (It's basically the equivalent of DecorrelateJoin for the cross join between the input and the Zip)


pkg/sql/opt/norm/rules/decorrelate.opt, line 471 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Did you add a test for this new case?

I've added a new test case now and also added rule expectations to the existing cases.


pkg/sql/opt/norm/rules/decorrelate.opt, line 543 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Did you fully think through cases where the $zip has functions with different cardinalities that are correlated with the $input? Do you have test cases for that?

I added some test cases. These cases are a bit difficult to test and only happen when a ROWS FROM clause is used inside a correlated subquery.


pkg/sql/opt/norm/testdata/rules/decorrelate, line 89 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Can you find a test case where this rule must be applied multiple times in succession?

Not sure how that would work... can you explain? I added another case where the Zip is correlated with the outer query.


pkg/sql/opt/norm/testdata/rules/prune_cols, line 1562 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

You need at least one other case to test case where zip depends upon input cols.

Done.

@andy-kimball
Copy link
Contributor

andy-kimball left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/norm/rules/decorrelate.opt, line 56 at r1 (raw file):

Previously, rytaft wrote…

There is a TryDecorrelateProjectSet rule below which is similar to the other TryDecorrelate rules. But this DecorrelateProjectSet rule is guaranteed to decorrelate the ProjectSet. (It's basically the equivalent of DecorrelateJoin for the cross join between the input and the Zip)

Ah, yes, I meant to remove this comment after I saw that, but submitted it accidentally.

@andy-kimball
Copy link
Contributor

andy-kimball left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/norm/testdata/rules/decorrelate, line 89 at r1 (raw file):

Previously, rytaft wrote…

Not sure how that would work... can you explain? I added another case where the Zip is correlated with the outer query.

i mean where there's a generate_series nested in another generate_series. I often find that "nesting" test cases highlight interesting issues.

@rytaft rytaft force-pushed the rytaft:project-set branch from 3efedba to eb7a294 Nov 5, 2018

@rytaft
Copy link
Contributor

rytaft left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/norm/testdata/rules/decorrelate, line 89 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

i mean where there's a generate_series nested in another generate_series. I often find that "nesting" test cases highlight interesting issues.

It's not possible to nest SRFs directly (returns an error unimplemented: nested set-returning functions), but I added a test case with a subquery containing another generate_series.

opt: Convert InnerJoinApply + Zip to ProjectSet
This commit creates a new relational operator called ProjectSet,
which zips through a list of generators for every row of the input.

As a reminder, a functional zip over generators a,b,c returns tuples of
values from a,b,c picked "simultaneously". NULLs are used when a generator is
"shorter" than another.  For example:

   zip([1,2,3], ['a','b']) = [(1,'a'), (2,'b'), (3, null)]

ProjectSet corresponds to a relational operator project(R, a, b, c, ...)
which, for each row in R, produces all the rows produced by zip(a, b, c, ...)
with the values of R prefixed. Formally, this performs a lateral cross join
of R with zip(a,b,c).

This operator replaces Zip, which was a non-standard relational operator in
the optimizer. Zip had to be joined with its input via an InnerJoinApply cross
join, which caused some difficulties in creating decorrelation patterns. As a
result, certain queries were not decorrelated that should have been fairly
easy to decorrelate.

As part of this commit, the decorrelation patterns have been updated to
allow decorrelation of generator functions inside EXISTS.

Fixes #31706

Release note (sql change): Many queries containing a correlated EXISTS
subquery with a generator function can now be decorrelated and executed
successfully. Previously, these queries caused a decorrelation error.

@rytaft rytaft force-pushed the rytaft:project-set branch from eb7a294 to d6a9221 Nov 5, 2018

@rytaft
Copy link
Contributor

rytaft left a comment

Gentle ping - do the last set of changes look ok? Thanks!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@andy-kimball
Copy link
Contributor

andy-kimball left a comment

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@rytaft

This comment has been minimized.

Copy link
Contributor

rytaft commented Nov 13, 2018

Thanks!

bors r+

craig bot pushed a commit that referenced this pull request Nov 13, 2018

Merge #31922
31922: opt: Convert InnerJoinApply + Zip to ProjectSet r=rytaft a=rytaft

This commit creates a new relational operator called ProjectSet,
which zips through a list of generators for every row of the input.

As a reminder, a functional zip over generators a,b,c returns tuples of
values from a,b,c picked "simultaneously". NULLs are used when a generator is
"shorter" than another.  For example:

   zip([1,2,3], ['a','b']) = [(1,'a'), (2,'b'), (3, null)]

ProjectSet corresponds to a relational operator project(R, a, b, c, ...)
which, for each row in R, produces all the rows produced by zip(a, b, c, ...)
with the values of R prefixed. Formally, this performs a lateral cross join
of R with zip(a,b,c).

This operator replaces Zip, which was a non-standard relational operator in
the optimizer. Zip had to be joined with its input via an InnerJoinApply cross
join, which caused some difficulties in creating decorrelation patterns. As a
result, certain queries were not decorrelated that should have been fairly
easy to decorrelate.

As part of this commit, the decorrelation patterns have been updated to
allow decorrelation of generator functions inside EXISTS.

Fixes #31706

Release note (sql change): Many queries containing a correlated EXISTS
subquery with a generator function can now be decorrelated and executed
successfully. Previously, these queries caused a decorrelation error.

Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
@craig

This comment has been minimized.

Copy link

craig bot commented Nov 13, 2018

Build succeeded

@craig craig bot merged commit d6a9221 into cockroachdb:master Nov 13, 2018

3 checks passed

GitHub CI (Cockroach) TeamCity build finished
Details
bors Build succeeded
Details
license/cla Contributor License Agreement is signed.
Details

rytaft added a commit to rytaft/cockroach that referenced this pull request Nov 26, 2018

opt: add missing regression test for SRF with correlated EXISTS
I added a regression test for cockroachdb#31706 in the PR that was backported to
the 2.1 release branch to decorrelate a Zip with a correlated EXISTS
subquery (cockroachdb#32026). I forgot to include the same test in cockroachdb#31922, which
fixed the issue in the master branch and introduced the ProjectSet
operator. This commit adds that missing test.

Informs cockroachdb#31706

Release note: None

craig bot pushed a commit that referenced this pull request Nov 27, 2018

Merge #32619
32619: opt: add missing regression test for SRF with correlated EXISTS r=rytaft a=rytaft

I added a regression test for #31706 in the PR that was backported to
the 2.1 release branch to decorrelate a `Zip` with a correlated `EXISTS`
subquery (#32026). I forgot to include the same test in #31922, which
fixed the issue in the master branch and introduced the `ProjectSet`
operator. This commit adds that missing test.

Informs #31706

Release note: None

Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment