New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: push limit into FK and self-joins in more cases #75582
Conversation
When calculating join multiplicity in the presence of a foreign key, it was previously possible for the incorrect base table to be selected from the right expression, in the case that the right expression contained a self-join. The base table selected was incorrect because it did not contain the equality column from the join filter. This prevented the optimizer from determining that these joins preserve all rows from the left side of the join, which could prevent further optimization of a query. Now, only the right equality columns are passed to `checkForeignKeyCase` which ensures that the correct base table is selected. This is safe because `verifyFiltersAreValidEqualities` has already checked that the right equality columns are unfiltered in the right expression, so the same checks in `checkForeignKeyCase` are redundant. Release note (performance improvement): The optimizer better optimizes queries that include both foreign key joins and self-joins.
b21ab05
to
f73b530
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
pkg/sql/opt/memo/multiplicity_builder.go, line 415 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I think I'm missing a check that the leftCol and rightCol are unfiltered in the inputs of the Select expressions.
We're checking the right one (deriveUnfilteredCols
above). Do we need to check the left one? I don't think so, given that we're tolerating other filters on that side anyway.
pkg/sql/opt/memo/multiplicity_builder.go, line 333 at r3 (raw file):
leftColID, rightColID = rightColID, leftColID } if !leftNotNullCols.Contains(leftColID) || (!rightUnfilteredCols.Contains(rightColID) &&
[nit] it would be easier to read if we split this into two different if
s (and the first one can go inside the one above).
pkg/sql/opt/memo/multiplicity_builder.go, line 362 at r3 (raw file):
} // rightHasSingleFilterThatMatchesLeft returns true if:
I'm trying to figure out what's the higher level primitive here conceptually. I believe we want to establish that the filters on the left side are strictly stronger than (i.e. imply) the filters on the right side (allowing equality column remapping), no? So eg if we had leftCol < 10
and rightCol < 20
, that would still work?
pkg/sql/opt/memo/multiplicity_builder.go, line 367 at r3 (raw file):
// 2. right has a single filter on rightCol that matches a filter in left on // leftCol. // 3. rightCol is unfiltered in right's input.
[nit] this doesn't make it clear that the filter must be of = const
form currently
pkg/sql/opt/memo/multiplicity_builder.go, line 404 at r3 (raw file):
if foundFilterWithCol { // Ensure that this is the only filter on col. If there are // others, we can't be sure that col is held to this constant.
Why can't we be sure? If it's a top-level filter, it must be true for any rows produced. Or from another angle, why would it be ok for the left side to have arbitrary filters on other columns but not other filters on this column? (I realize this is kind of moot currently but maybe we'll extend the code in the future).
f73b530
to
2c873eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/memo/multiplicity_builder.go, line 415 at r2 (raw file):
Previously, RaduBerinde wrote…
We're checking the right one (
deriveUnfilteredCols
above). Do we need to check the left one? I don't think so, given that we're tolerating other filters on that side anyway.
Correct. This comment was from before I had the check on right. While adding it I realized we don't need the same check for the left.
pkg/sql/opt/memo/multiplicity_builder.go, line 333 at r3 (raw file):
Previously, RaduBerinde wrote…
[nit] it would be easier to read if we split this into two different
if
s (and the first one can go inside the one above).
Done.
pkg/sql/opt/memo/multiplicity_builder.go, line 362 at r3 (raw file):
Previously, RaduBerinde wrote…
I'm trying to figure out what's the higher level primitive here conceptually. I believe we want to establish that the filters on the left side are strictly stronger than (i.e. imply) the filters on the right side (allowing equality column remapping), no? So eg if we had
leftCol < 10
andrightCol < 20
, that would still work?
I think "implication" is spot-on. Might be a great opportunity to reuse the partial index Implicator. I've extended the TODO to mention this.
pkg/sql/opt/memo/multiplicity_builder.go, line 367 at r3 (raw file):
Previously, RaduBerinde wrote…
[nit] this doesn't make it clear that the filter must be of
= const
form currently
Done.
pkg/sql/opt/memo/multiplicity_builder.go, line 404 at r3 (raw file):
Previously, RaduBerinde wrote…
Why can't we be sure? If it's a top-level filter, it must be true for any rows produced. Or from another angle, why would it be ok for the left side to have arbitrary filters on other columns but not other filters on this column? (I realize this is kind of moot currently but maybe we'll extend the code in the future).
Good point. I orginally added this before I had the check above to ensure the right filters have length 1. This was to prevent returning true when the right filters were something like (FiltersExpr [(rightCol=Const) (otherCol=Const OR rightCol > Const)])
. Now that only the left filters can have a length > 1, this isn't needed. I've removed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
pkg/sql/opt/memo/multiplicity_builder.go, line 297 at r4 (raw file):
// 3. All equalities contain a column from the left not-null columns. // 4. All equalities contain a column either in the right unfiltered columns // or a column that has an identical filter as its corresponding left
[nit] in the latter case there are also the conditions of having just one filter on the right side, and it being unfiltered inside that Select.
We could say it like this:
- All equalities x=y (or y=x) have x as a left non-null column and y as a right column, and either:
3a. y is an unfiltered column in the right expression, or
3b. both the left and right expressions are Selects; the right side filters imply the filters on the left side when replacing y with x; and y is an unfiltered column in the right Select's input.
I think this captures the conceptual requirement in an understandable way, even if our implementation covers only specific cases of "imply" (we can note that separately).
By the way, whenever I think of "replacing" a column with another as in the description above, I try to think whether there are any possible problems with composite columns. I don't think we have a problem here because equality is by definition composite-insensitive, but wanted to run it by you just in case.
pkg/sql/opt/memo/multiplicity_builder.go, line 374 at r4 (raw file):
// 4. The right Select has a single filter in the form rightCol=const where // the const value is the same as the const value in (2). //
[nit] this can also point to the comment for verifyFiltersAreValidEqualities for more info
2c873eb
to
058e04f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/memo/multiplicity_builder.go, line 297 at r4 (raw file):
Updated. With the change to your suggestion that the left filters must imply the right filters, not the inverse. For example:
FK JOIN ON x = y
left filters: x > 0
right filters: y > 10 (imply the left filters)
Not valid: x in [0,10] will not match rows on the right
FK JOIN ON x = y
left filters: x > 10 (imply the right filters)
right filters: y > 0
Valid: all values of x will have a matching y on the right
I try to think whether there are any possible problems with composite columns.
I added a warning to the TODO about composite columns.
pkg/sql/opt/memo/multiplicity_builder.go, line 374 at r4 (raw file):
Previously, RaduBerinde wrote…
[nit] this can also point to the comment for verifyFiltersAreValidEqualities for more info
Done.
pkg/sql/opt/memo/multiplicity_builder.go, line 297 at r4 (raw file): Previously, mgartner (Marcus Gartner) wrote…
Minor typo in that first example. It should read "Not valid: x in (0, 10] will not match rows on the right." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
pkg/sql/opt/memo/multiplicity_builder.go, line 343 at r5 (raw file):
} if !rightUnfilteredCols.Contains(rightColID) &&
[nit] Might be easier to follow this way:
switch {
case rightUnfilteredCols.Contains(rightColID):
// Condition 3a: the right column is unfiltered.
case rightHasSingleFilterThatMatchesLeft(left, right, leftColID, rightColID):
// Condition 3b: ...
default:
return ..
}
pkg/sql/opt/memo/multiplicity_builder.go, line 345 at r5 (raw file):
if !rightUnfilteredCols.Contains(rightColID) && !rightHasSingleFilterThatMatchesLeft(left, right, leftColID, rightColID) { // Condition #3a and #3b: The right column is filtered, and the left
[nit] unfiltered
Previously, a constant equality condition pushed into both sides of a foreign key join or self-join would prevent a limit from being pushed into the left side of the join. This was because the multiplicity builder could not determine that right filter would not remove any values also removed by the left filter. Without the join being labelled as left-preserving, the limit could not be pushed down. The multiplicity builder has been updated to recognize a few additional cases where left rows are preserved in foreign key joins and self-joins, allowing a limit to be pushed into the left side of the join. Currently, the multiplicity builder only recognizes cases where corresponding left and right columns are held equal to the same constant value. It is possible to extend this to more complex inequalities and boolean expressions, but this is left as a TODO for now. Fixes cockroachdb#74419 Release note (performance improvement): A LIMIT can now be pushed below a foreign key join or self-join in more cases, which may result in more efficient query plans.
058e04f
to
40390d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/memo/multiplicity_builder.go, line 343 at r5 (raw file):
Previously, RaduBerinde wrote…
[nit] Might be easier to follow this way:
switch { case rightUnfilteredCols.Contains(rightColID): // Condition 3a: the right column is unfiltered. case rightHasSingleFilterThatMatchesLeft(left, right, leftColID, rightColID): // Condition 3b: ... default: return .. }
Good idea. Simpler to describe the positive cases in comments.
Thanks for all the great feedback - the complicated logic demands clear comments. bors r+ |
Build succeeded: |
opt: more accurate join multiplicity with FKs and self-joins
When calculating join multiplicity in the presence of a foreign key, it
was previously possible for the incorrect base table to be selected from
the right expression, in the case that the right expression contained a
self-join. The base table selected was incorrect because it did not
contain the equality column from the join filter. This prevented the
optimizer from determining that these joins preserve all rows from the
left side of the join, which could prevent further optimization of a
query.
Now, only the right equality columns are passed to
checkForeignKeyCase
which ensures that the correct base table is selected. This is safe
because
verifyFiltersAreValidEqualities
has already checked that theright equality columns are unfiltered in the right expression, so the
same checks in
checkForeignKeyCase
are redundant.Release note (performance improvement): The optimizer better optimizes
queries that include both foreign key joins and self-joins.
opt: push limit into FK and self-joins in more cases
Previously, a constant equality condition pushed into both sides of a
foreign key join or self-join would prevent a limit from being pushed
into the left side of the join. This was because the multiplicity
builder could not determine that right filter would not remove any
values also removed by the left filter. Without the join being labelled
as left-preserving, the limit could not be pushed down.
The multiplicity builder has been updated to recognize a few additional
cases where left rows are preserved in foreign key joins and self-joins,
allowing a limit to be pushed into the left side of the join. Currently,
the multiplicity builder only recognizes cases where corresponding left
and right columns are held equal to the same constant value. It is
possible to extend this to more complex inequalities and boolean
expressions, but this is left as a TODO for now.
Fixes #74419
Release note (performance improvement): A LIMIT can now be pushed below
a foreign key join or self-join in more cases, which may result in more
efficient query plans.