Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: push limit into FK and self-joins in more cases #75582

Merged
merged 2 commits into from Jan 31, 2022

Conversation

mgartner
Copy link
Collaborator

opt: more accurate join multiplicity with FKs and self-joins

When calculating join multiplicity in the presence of a foreign key, it
was previously possible for the incorrect base table to be selected from
the right expression, in the case that the right expression contained a
self-join. The base table selected was incorrect because it did not
contain the equality column from the join filter. This prevented the
optimizer from determining that these joins preserve all rows from the
left side of the join, which could prevent further optimization of a
query.

Now, only the right equality columns are passed to checkForeignKeyCase
which ensures that the correct base table is selected. This is safe
because verifyFiltersAreValidEqualities has already checked that the
right equality columns are unfiltered in the right expression, so the
same checks in checkForeignKeyCase are redundant.

Release note (performance improvement): The optimizer better optimizes
queries that include both foreign key joins and self-joins.

opt: push limit into FK and self-joins in more cases

Previously, a constant equality condition pushed into both sides of a
foreign key join or self-join would prevent a limit from being pushed
into the left side of the join. This was because the multiplicity
builder could not determine that right filter would not remove any
values also removed by the left filter. Without the join being labelled
as left-preserving, the limit could not be pushed down.

The multiplicity builder has been updated to recognize a few additional
cases where left rows are preserved in foreign key joins and self-joins,
allowing a limit to be pushed into the left side of the join. Currently,
the multiplicity builder only recognizes cases where corresponding left
and right columns are held equal to the same constant value. It is
possible to extend this to more complex inequalities and boolean
expressions, but this is left as a TODO for now.

Fixes #74419

Release note (performance improvement): A LIMIT can now be pushed below
a foreign key join or self-join in more cases, which may result in more
efficient query plans.

When calculating join multiplicity in the presence of a foreign key, it
was previously possible for the incorrect base table to be selected from
the right expression, in the case that the right expression contained a
self-join. The base table selected was incorrect because it did not
contain the equality column from the join filter. This prevented the
optimizer from determining that these joins preserve all rows from the
left side of the join, which could prevent further optimization of a
query.

Now, only the right equality columns are passed to `checkForeignKeyCase`
which ensures that the correct base table is selected. This is safe
because `verifyFiltersAreValidEqualities` has already checked that the
right equality columns are unfiltered in the right expression, so the
same checks in `checkForeignKeyCase` are redundant.

Release note (performance improvement): The optimizer better optimizes
queries that include both foreign key joins and self-joins.
@mgartner mgartner requested review from rytaft, RaduBerinde and a team January 26, 2022 22:23
@mgartner mgartner requested a review from a team as a code owner January 26, 2022 22:23
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@mgartner mgartner force-pushed the 74419-multiplicity branch 2 times, most recently from b21ab05 to f73b530 Compare January 27, 2022 22:22
Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/memo/multiplicity_builder.go, line 415 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I think I'm missing a check that the leftCol and rightCol are unfiltered in the inputs of the Select expressions.

We're checking the right one (deriveUnfilteredCols above). Do we need to check the left one? I don't think so, given that we're tolerating other filters on that side anyway.


pkg/sql/opt/memo/multiplicity_builder.go, line 333 at r3 (raw file):

			leftColID, rightColID = rightColID, leftColID
		}
		if !leftNotNullCols.Contains(leftColID) || (!rightUnfilteredCols.Contains(rightColID) &&

[nit] it would be easier to read if we split this into two different ifs (and the first one can go inside the one above).


pkg/sql/opt/memo/multiplicity_builder.go, line 362 at r3 (raw file):

}

// rightHasSingleFilterThatMatchesLeft returns true if:

I'm trying to figure out what's the higher level primitive here conceptually. I believe we want to establish that the filters on the left side are strictly stronger than (i.e. imply) the filters on the right side (allowing equality column remapping), no? So eg if we had leftCol < 10 and rightCol < 20, that would still work?


pkg/sql/opt/memo/multiplicity_builder.go, line 367 at r3 (raw file):

//   2. right has a single filter on rightCol that matches a filter in left on
//      leftCol.
//   3. rightCol is unfiltered in right's input.

[nit] this doesn't make it clear that the filter must be of = const form currently


pkg/sql/opt/memo/multiplicity_builder.go, line 404 at r3 (raw file):

			if foundFilterWithCol {
				// Ensure that this is the only filter on col. If there are
				// others, we can't be sure that col is held to this constant.

Why can't we be sure? If it's a top-level filter, it must be true for any rows produced. Or from another angle, why would it be ok for the left side to have arbitrary filters on other columns but not other filters on this column? (I realize this is kind of moot currently but maybe we'll extend the code in the future).

Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)


pkg/sql/opt/memo/multiplicity_builder.go, line 415 at r2 (raw file):

Previously, RaduBerinde wrote…

We're checking the right one (deriveUnfilteredCols above). Do we need to check the left one? I don't think so, given that we're tolerating other filters on that side anyway.

Correct. This comment was from before I had the check on right. While adding it I realized we don't need the same check for the left.


pkg/sql/opt/memo/multiplicity_builder.go, line 333 at r3 (raw file):

Previously, RaduBerinde wrote…

[nit] it would be easier to read if we split this into two different ifs (and the first one can go inside the one above).

Done.


pkg/sql/opt/memo/multiplicity_builder.go, line 362 at r3 (raw file):

Previously, RaduBerinde wrote…

I'm trying to figure out what's the higher level primitive here conceptually. I believe we want to establish that the filters on the left side are strictly stronger than (i.e. imply) the filters on the right side (allowing equality column remapping), no? So eg if we had leftCol < 10 and rightCol < 20, that would still work?

I think "implication" is spot-on. Might be a great opportunity to reuse the partial index Implicator. I've extended the TODO to mention this.


pkg/sql/opt/memo/multiplicity_builder.go, line 367 at r3 (raw file):

Previously, RaduBerinde wrote…

[nit] this doesn't make it clear that the filter must be of = const form currently

Done.


pkg/sql/opt/memo/multiplicity_builder.go, line 404 at r3 (raw file):

Previously, RaduBerinde wrote…

Why can't we be sure? If it's a top-level filter, it must be true for any rows produced. Or from another angle, why would it be ok for the left side to have arbitrary filters on other columns but not other filters on this column? (I realize this is kind of moot currently but maybe we'll extend the code in the future).

Good point. I orginally added this before I had the check above to ensure the right filters have length 1. This was to prevent returning true when the right filters were something like (FiltersExpr [(rightCol=Const) (otherCol=Const OR rightCol > Const)]). Now that only the left filters can have a length > 1, this isn't needed. I've removed it.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/memo/multiplicity_builder.go, line 297 at r4 (raw file):

//   3. All equalities contain a column from the left not-null columns.
//   4. All equalities contain a column either in the right unfiltered columns
//   	or a column that has an identical filter as its corresponding left

[nit] in the latter case there are also the conditions of having just one filter on the right side, and it being unfiltered inside that Select.

We could say it like this:

  1. All equalities x=y (or y=x) have x as a left non-null column and y as a right column, and either:
    3a. y is an unfiltered column in the right expression, or
    3b. both the left and right expressions are Selects; the right side filters imply the filters on the left side when replacing y with x; and y is an unfiltered column in the right Select's input.

I think this captures the conceptual requirement in an understandable way, even if our implementation covers only specific cases of "imply" (we can note that separately).

By the way, whenever I think of "replacing" a column with another as in the description above, I try to think whether there are any possible problems with composite columns. I don't think we have a problem here because equality is by definition composite-insensitive, but wanted to run it by you just in case.


pkg/sql/opt/memo/multiplicity_builder.go, line 374 at r4 (raw file):

//   4. The right Select has a single filter in the form rightCol=const where
//      the const value is the same as the const value in (2).
//

[nit] this can also point to the comment for verifyFiltersAreValidEqualities for more info

Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)


pkg/sql/opt/memo/multiplicity_builder.go, line 297 at r4 (raw file):
Updated. With the change to your suggestion that the left filters must imply the right filters, not the inverse. For example:

FK JOIN ON x = y
left filters:  x > 0
right filters: y > 10 (imply the left filters)
Not valid: x in [0,10] will not match rows on the right

FK JOIN ON x = y
left filters:  x > 10 (imply the right filters)
right filters: y > 0
Valid: all values of x will have a matching y on the right

I try to think whether there are any possible problems with composite columns.

I added a warning to the TODO about composite columns.


pkg/sql/opt/memo/multiplicity_builder.go, line 374 at r4 (raw file):

Previously, RaduBerinde wrote…

[nit] this can also point to the comment for verifyFiltersAreValidEqualities for more info

Done.

@mgartner
Copy link
Collaborator Author


pkg/sql/opt/memo/multiplicity_builder.go, line 297 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Updated. With the change to your suggestion that the left filters must imply the right filters, not the inverse. For example:

FK JOIN ON x = y
left filters:  x > 0
right filters: y > 10 (imply the left filters)
Not valid: x in [0,10] will not match rows on the right

FK JOIN ON x = y
left filters:  x > 10 (imply the right filters)
right filters: y > 0
Valid: all values of x will have a matching y on the right

I try to think whether there are any possible problems with composite columns.

I added a warning to the TODO about composite columns.

Minor typo in that first example. It should read "Not valid: x in (0, 10] will not match rows on the right."

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/memo/multiplicity_builder.go, line 343 at r5 (raw file):

		}

		if !rightUnfilteredCols.Contains(rightColID) &&

[nit] Might be easier to follow this way:

switch {
case rightUnfilteredCols.Contains(rightColID):
  // Condition 3a: the right column is unfiltered.
case rightHasSingleFilterThatMatchesLeft(left, right, leftColID, rightColID):
  // Condition 3b: ...
default:
  return ..
}

pkg/sql/opt/memo/multiplicity_builder.go, line 345 at r5 (raw file):

		if !rightUnfilteredCols.Contains(rightColID) &&
			!rightHasSingleFilterThatMatchesLeft(left, right, leftColID, rightColID) {
			// Condition #3a and #3b: The right column is filtered, and the left

[nit] unfiltered

Previously, a constant equality condition pushed into both sides of a
foreign key join or self-join would prevent a limit from being pushed
into the left side of the join. This was because the multiplicity
builder could not determine that right filter would not remove any
values also removed by the left filter. Without the join being labelled
as left-preserving, the limit could not be pushed down.

The multiplicity builder has been updated to recognize a few additional
cases where left rows are preserved in foreign key joins and self-joins,
allowing a limit to be pushed into the left side of the join. Currently,
the multiplicity builder only recognizes cases where corresponding left
and right columns are held equal to the same constant value. It is
possible to extend this to more complex inequalities and boolean
expressions, but this is left as a TODO for now.

Fixes cockroachdb#74419

Release note (performance improvement): A LIMIT can now be pushed below
a foreign key join or self-join in more cases, which may result in more
efficient query plans.
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)


pkg/sql/opt/memo/multiplicity_builder.go, line 343 at r5 (raw file):

Previously, RaduBerinde wrote…

[nit] Might be easier to follow this way:

switch {
case rightUnfilteredCols.Contains(rightColID):
  // Condition 3a: the right column is unfiltered.
case rightHasSingleFilterThatMatchesLeft(left, right, leftColID, rightColID):
  // Condition 3b: ...
default:
  return ..
}

Good idea. Simpler to describe the positive cases in comments.

@mgartner
Copy link
Collaborator Author

Thanks for all the great feedback - the complicated logic demands clear comments.

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 31, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: presence of foreign key breaks limit pushdown through join
3 participants