New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql,opt: add hint for STRAIGHT join #116013
Conversation
It looks like your PR touches SQL parser code but doesn't add or edit parser tests. Please make sure you add or edit parser tests if you edit the parser. Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
a94961f
to
d040bf3
Compare
Thank you for updating your pull request. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
5a134d4
to
c918bdb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution, this looks pretty good! I have one test suggestion. I will defer the rest of the review to Michael, though, since he's thought about this feature more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kevinmingtarja and @michae2)
pkg/sql/opt/optbuilder/join.go
line 106 at r1 (raw file):
// is inherently part of the join's logic. panic(pgerror.Newf(pgcode.Syntax, "%s can only be used with INNER joins", tree.AstStraight,
It would be great to have some logic tests that exercise this failure mode (and validate that the hint only works with INNER
). Adding a subtest to pkg/sql/logictest/testdata/logic_test/inner-join
seems like a good home for this kind of test, and it can use a lot of what you have already int the optbuilder test. Something like:
subtest straight_join
statement ok
CREATE TABLE t1 (x INT, PRIMARY KEY (x))
statement ok
CREATE TABLE t2 (x INT, y INT, PRIMARY KEY (x), INDEX idx_y (y))
statement ok
[INSERT some values into t1 and t2 here]
query I
SELECT * FROM t1 INNER STRAIGHT JOIN t2 ON t1.x = t2.y
----
[some output from the join here]
query I
SELECT * FROM t2 INNER STRAIGHT JOIN t1 ON t1.x = t2.y
----
[some output from the join here]
statement error pgcode: 42601 pq: STRAIGHT can only be used with INNER joins
SELECT * FROM t1 LEFT OUTER STRAIGHT JOIN t2 ON t1.x = t2.y
statement error pgcode: 42601 pq: STRAIGHT can only be used with INNER joins
SELECT * FROM t1 FULL STRAIGHT JOIN t2 ON t1.x = t2.y
subtest end
Previously, rharding6373 (Rachael Harding) wrote…
Outer joins can be reordered in some cases. For example, here's a test case for join ordering (apologies if the output isn't straightforward to read) that shows two different orderings considered for a query with two adjacent left joins: cockroach/pkg/sql/opt/xform/testdata/rules/join_order Lines 1544 to 1609 in b0e509b
I would expect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kevinmingtarja, @michae2, and @rharding6373)
pkg/sql/opt/memo/expr.go
line 527 at r1 (raw file):
// DisallowHashJoinStoreRight, DisallowLookupJoinIntoRight, // DisallowInvertedJoinIntoRight, and DisallowMergeJoin. AllowAllJoinsIntoRight = disallowAll ^ DisallowHashJoinStoreRight ^ DisallowLookupJoinIntoRight ^ DisallowInvertedJoinIntoRight ^ DisallowMergeJoin
The current logic here:
cockroach/pkg/sql/opt/xform/join_order_builder.go
Lines 408 to 413 in f342485
if !flags.Empty() || jb.joinCount > int(jb.evalCtx.SessionData().ReorderJoinsLimit) { | |
// If the join has flags or the join limit has been reached, we can't | |
// reorder. Simply treat the join as a base relation. | |
jb.addBaseRelation(t) | |
break | |
} |
Will prevent reordering if there is any join flag present. So I suggest creating a seperate join flag under PreferLookupJoinIntoRight
called something like DisallowReordering
and using just that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @rharding6373)
pkg/sql/opt/memo/expr.go
line 527 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
The current logic here:
cockroach/pkg/sql/opt/xform/join_order_builder.go
Lines 408 to 413 in f342485
if !flags.Empty() || jb.joinCount > int(jb.evalCtx.SessionData().ReorderJoinsLimit) { // If the join has flags or the join limit has been reached, we can't // reorder. Simply treat the join as a base relation. jb.addBaseRelation(t) break } Will prevent reordering if there is any join flag present. So I suggest creating a seperate join flag under
PreferLookupJoinIntoRight
called something likeDisallowReordering
and using just that.
Got it, that will be cleaner.
pkg/sql/opt/optbuilder/join.go
line 106 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Outer joins can be reordered in some cases. For example, here's a test case for join ordering (apologies if the output isn't straightforward to read) that shows two different orderings considered for a query with two adjacent left joins:
cockroach/pkg/sql/opt/xform/testdata/rules/join_order
Lines 1544 to 1609 in b0e509b
# Left-asscom property for left joins. reorderjoins format=hide-all SELECT * FROM bx LEFT JOIN cy ON b = c LEFT JOIN dz ON x = z ---- -------------------------------------------------------------------------------- Join Tree #1 -------------------------------------------------------------------------------- left-join (hash) ├── scan bx ├── scan cy └── filters └── b = c Vertexes A: scan bx B: scan cy Edges b = c [left, ses=AB, tes=AB, rules=()] Joining AB A B [left, refs=AB] Joins Considered: 1 -------------------------------------------------------------------------------- Join Tree #2 -------------------------------------------------------------------------------- left-join (hash) ├── left-join (hash) │ ├── scan bx │ ├── scan cy │ └── filters │ └── b = c ├── scan dz └── filters └── x = z Vertexes A: scan bx B: scan cy C: scan dz Edges b = c [left, ses=AB, tes=AB, rules=()] x = z [left, ses=AC, tes=AC, rules=()] Joining AB A B [left, refs=AB] Joining AC A C [left, refs=AC] Joining ABC AC B [left, refs=AB] AB C [left, refs=AC] Joins Considered: 4 ================================================================================ Final Plan ================================================================================ left-join (hash) ├── left-join (merge) │ ├── scan bx │ ├── scan cy │ └── filters (true) ├── scan dz └── filters └── x = z I would expect
l LEFT STRAIGHT JOIN r
to lock in the ordering of joiningr
tol
.
Ah I missed that, thanks for pointing it out @mgartner!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @rharding6373)
pkg/sql/opt/optbuilder/join.go
line 106 at r1 (raw file):
Previously, kevinmingtarja (Kevin Mingtarja) wrote…
Ah I missed that, thanks for pointing it out @mgartner!
Hi, I have a follow up question. Does this mean we should disable the CommuteRightJoin
and CommuteLeftJoin
rule as well to prevent the inputs from being swapped?
Which means in CommuteJoinFlags(), we should probably add an OR on p.Flags.Has(memo.DisallowReordering)
right (line 586)?
cockroach/pkg/sql/opt/norm/join_funcs.go
Lines 582 to 588 in b0e509b
// CommuteJoinFlags returns a join private for the commuted join (where the left | |
// and right sides are swapped). It adjusts any join flags that are specific to | |
// one side. | |
func (c *CustomFuncs) CommuteJoinFlags(p *memo.JoinPrivate) *memo.JoinPrivate { | |
if p.Flags.Empty() { | |
return p | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @rharding6373)
pkg/sql/opt/optbuilder/join.go
line 106 at r1 (raw file):
Previously, kevinmingtarja (Kevin Mingtarja) wrote…
Hi, I have a follow up question. Does this mean we should disable the
CommuteRightJoin
andCommuteLeftJoin
rule as well to prevent the inputs from being swapped?Which means in CommuteJoinFlags(), we should probably add an OR on
p.Flags.Has(memo.DisallowReordering)
right (line 586)?
cockroach/pkg/sql/opt/norm/join_funcs.go
Lines 582 to 588 in b0e509b
// CommuteJoinFlags returns a join private for the commuted join (where the left // and right sides are swapped). It adjusts any join flags that are specific to // one side. func (c *CustomFuncs) CommuteJoinFlags(p *memo.JoinPrivate) *memo.JoinPrivate { if p.Flags.Empty() { return p }
My understanding isn't the clearest yet, but I believe if we don't disable CommuteRightJoin
and CommuteLeftJoin
, it is still possible for the optimizer to convert a LEFT JOIN b
to b RIGHT JOIN a
(even if we set the DisallowReordering
flag), and we don't want this right?
But if we set the join flag as disallowAll ^ DisallowHashJoinStoreRight ^ DisallowLookupJoinIntoRight ^ DisallowInvertedJoinIntoRight ^ DisallowMergeJoin
(which is what I did originally), I think this won't happen as we are explicitly disallowing the joins that processes the right table first.
So even if in the memo, the reordering from commute left/right is "enumerated", it does not get explored as the optimizer will see from the join flags that those joins that processes the right table first are disallowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 11 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kevinmingtarja, @michae2, and @rharding6373)
pkg/sql/opt/optbuilder/join.go
line 106 at r1 (raw file):
Previously, kevinmingtarja (Kevin Mingtarja) wrote…
My understanding isn't the clearest yet, but I believe if we don't disable
CommuteRightJoin
andCommuteLeftJoin
, it is still possible for the optimizer to converta LEFT JOIN b
tob RIGHT JOIN a
(even if we set theDisallowReordering
flag), and we don't want this right?But if we set the join flag as
disallowAll ^ DisallowHashJoinStoreRight ^ DisallowLookupJoinIntoRight ^ DisallowInvertedJoinIntoRight ^ DisallowMergeJoin
(which is what I did originally), I think this won't happen as we are explicitly disallowing the joins that processes the right table first.So even if in the memo, the reordering from commute left/right is "enumerated", it does not get explored as the optimizer will see from the join flags that those joins that processes the right table first are disallowed.
Great find! Thanks for bringing up CommuteRightJoin
and CommuteLeftJoin
. I think you're absolutely correct, your original implementation will force the optimizer to pick the non-commuted joins, even though those rules are still applied and the commuted joins are explored. Let's stick with your original.
It may be more efficient to not apply/explore those rules at all, but there could be some complications there, so let's not do that for now (specifically, CommuteRightJoin
is a normalization rule that makes it simpler for other rules to pattern match on joins—they only need to match left joins, not right joins too. Disabling CommuteRightJoin
could lead to unexpectedly poor plans when rules dependent on it aren't applied).
pkg/sql/opt/exec/execbuilder/testdata/straight_join
line 1 at r1 (raw file):
# LogicTest: local
This is a good integration test, but I think this also deserves some more extensive and "unit-y" optimizer tests. I see two main things to test:
- That join reordering is not explored with the
STRAIGHT
keyword. I think this is a good place to test that: https://github.com/cockroachdb/cockroach/blob/b0e509b273235abc88d28a2dc0385c6b47c59c1b/pkg/sql/opt/xform/testdata/rules/join_order - That a really high cost is given to commuted joins so that non-commuted joins are picked as the best plan by the optimizer. I think this is a good place to test that:
cockroach/pkg/sql/opt/xform/testdata/coster/join
Lines 100 to 104 in 5d626ba
# Verify that if we force lookup join but one isn't possible, the hash join has # huge cost (this will result in an error if we try to execbuild the result). opt SELECT k, x FROM a INNER LOOKUP JOIN b ON k=x ----
In both places it'd be great to test INNER, LEFT, and RIGHT joins.
If you have any questions about how these tests work, let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @kevinmingtarja! This is looking great!
Reviewed 11 of 11 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kevinmingtarja, @mgartner, and @rharding6373)
pkg/sql/opt/optbuilder/join.go
line 106 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Great find! Thanks for bringing up
CommuteRightJoin
andCommuteLeftJoin
. I think you're absolutely correct, your original implementation will force the optimizer to pick the non-commuted joins, even though those rules are still applied and the commuted joins are explored. Let's stick with your original.It may be more efficient to not apply/explore those rules at all, but there could be some complications there, so let's not do that for now (specifically,
CommuteRightJoin
is a normalization rule that makes it simpler for other rules to pattern match on joins—they only need to match left joins, not right joins too. DisablingCommuteRightJoin
could lead to unexpectedly poor plans when rules dependent on it aren't applied).
Just chiming in to agree that it would be nice to also support l LEFT STRAIGHT JOIN r
if possible!
IMO we should allow InnerJoin
, LeftOuterJoin
, LeftSemiJoin
, LeftAntiJoin
, IntersectAllJoin
, and ExceptAllJoin
here (though I don't think we can produce plans with those latter 4 join types and the STRAIGHT
hint yet).
pkg/sql/opt/exec/execbuilder/testdata/straight_join
line 73 at r1 (raw file):
# thus enforcing t1 to be on the right side of the join. query T EXPLAIN (VERBOSE) SELECT * FROM t2 INNER STRAIGHT JOIN t1 ON t1.x = t2.y
It would be interesting to add another testcase which naturally produces a hash join into the left side and see that we get a hash join into the right side instead. Maybe something like:
SELECT * FROM t1 INNER JOIN t2 ON t1.x + 1 = t2.y * 2;
409ae40
to
365ea77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @rharding6373)
pkg/sql/opt/optbuilder/join.go
line 106 at r1 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Just chiming in to agree that it would be nice to also support
l LEFT STRAIGHT JOIN r
if possible!IMO we should allow
InnerJoin
,LeftOuterJoin
,LeftSemiJoin
,LeftAntiJoin
,IntersectAllJoin
, andExceptAllJoin
here (though I don't think we can produce plans with those latter 4 join types and theSTRAIGHT
hint yet).
Got it! I have removed the restriction on the join type, so now l LEFT STRAIGHT JOIN r
and l RIGHT STRAIGHT JOIN r
is possible, plus I've added test cases for those as well.
pkg/sql/opt/exec/execbuilder/testdata/straight_join
line 1 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
This is a good integration test, but I think this also deserves some more extensive and "unit-y" optimizer tests. I see two main things to test:
- That join reordering is not explored with the
STRAIGHT
keyword. I think this is a good place to test that: https://github.com/cockroachdb/cockroach/blob/b0e509b273235abc88d28a2dc0385c6b47c59c1b/pkg/sql/opt/xform/testdata/rules/join_order- That a really high cost is given to commuted joins so that non-commuted joins are picked as the best plan by the optimizer. I think this is a good place to test that:
cockroach/pkg/sql/opt/xform/testdata/coster/join
Lines 100 to 104 in 5d626ba
# Verify that if we force lookup join but one isn't possible, the hash join has # huge cost (this will result in an error if we try to execbuild the result). opt SELECT k, x FROM a INNER LOOKUP JOIN b ON k=x ---- In both places it'd be great to test INNER, LEFT, and RIGHT joins.
If you have any questions about how these tests work, let me know.
Thanks for the pointers @mgartner ! This was super helpful, I've added test cases for INNER, LEFT, and RIGHT joins for (1), and for LEFT and RIGHT joins for (2).
For (2), I used exploretrace
to output the transformation step that involves the commuted join with huge cost, since opt
and memo
only shows the best plan. Let me know if this works or if there's a better way to capture this in the test!
pkg/sql/opt/exec/execbuilder/testdata/straight_join
line 73 at r1 (raw file):
Previously, michae2 (Michael Erickson) wrote…
It would be interesting to add another testcase which naturally produces a hash join into the left side and see that we get a hash join into the right side instead. Maybe something like:
SELECT * FROM t1 INNER JOIN t2 ON t1.x + 1 = t2.y * 2;
That's a good idea. I've added a test for that by adding a new unindexed column t2.z
and use that as the join key, which will naturally produce a hash join with t1
as the "build" side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! Great job @kevinmingtarja. I just have a few last suggestions about the execbuilder test.
@mgartner do you have any comments?
Reviewed 10 of 10 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @kevinmingtarja, @mgartner, and @rharding6373)
pkg/sql/opt/exec/execbuilder/testdata/straight_join
line 16 at r2 (raw file):
"created_at": "2018-01-01 1:00:00.00000+00:00", "row_count": 100, "distinct_count": 100
Let's stick a "null_count": 0
in here, too, just to be sure.
pkg/sql/opt/exec/execbuilder/testdata/straight_join
line 28 at r2 (raw file):
"distinct_count": 10000 } ]'
Using "columns": ["y", "z"]
actually creates a single multi-column stat rather than the more usual two single-column stats. Let's revise this a little to something like:
statement ok
ALTER TABLE t2 INJECT STATISTICS '[
{
"columns": ["x"],
"created_at": "2018-01-01 1:00:00.00000+00:00",
"row_count": 10000,
"distinct_count": 10000,
"null_count": 0
},
{
"columns": ["y"],
"created_at": "2018-01-01 1:00:00.00000+00:00",
"row_count": 10000,
"distinct_count": 10000,
"null_count": 0
},
{
"columns": ["z"],
"created_at": "2018-01-01 1:00:00.00000+00:00",
"row_count": 10000,
"distinct_count": 10000,
"null_count": 0
}
]'
pkg/sql/opt/exec/execbuilder/testdata/straight_join
line 218 at r2 (raw file):
# Now, the best plan should no longer be picked, as we're forcing the join order. query T EXPLAIN (VERBOSE) SELECT * FROM t1 LEFT STRAIGHT JOIN t2 ON t1.x = t2.z
It would be nice to also confirm that EXPLAIN (VERBOSE) SELECT * FROM t1 RIGHT STRAIGHT JOIN t2 ON t1.x = t2.z
also produces the same plan except with a right outer hash join on top.
365ea77
to
9291950
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review and suggestions @michae2 ! I've just pushed the fixes.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @michae2, and @rharding6373)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 11 files at r1, 9 of 10 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @kevinmingtarja and @rharding6373)
pkg/sql/parser/sql.y
line 13795 at r3 (raw file):
// - STRAIGHT forces the join order, but not the algorithm; in other words, the // left table is always read before the right table. `STRAIGHT` can only // be used with INNER joins (though this is not enforced by the syntax).
The part about only being used with INNER joins is no longer correct.
nit: "in other words, the left table is always read before the right table" is not technically correct. I would just remove that part to avoid confusion.
pkg/sql/opt/exec/execbuilder/testdata/straight_join
line 1 at r1 (raw file):
For (2), I used
exploretrace
to output the transformation step that involves the commuted join with huge cost, sinceopt
andmemo
only shows the best plan. Let me know if this works or if there's a better way to capture this in the test!
Works for me. Nice job finding and making use of the exploretrace
test command!
pkg/sql/opt/xform/testdata/rules/join_order
line 3542 at r3 (raw file):
├── scan straight_join2 └── filters └── straight_join1.x = y
I think it'd be good to have some tests with 2 joins where both or only one of the joins has a STRAIGHT
hint.
Our current join hints (a INNER HASH JOIN b, a LEFT LOOKUP JOIN b, etc) fixes both the join order and the join algorithm. This commit adds the syntax and support for hinting the join order without hinting the join algorithm. This will be useful for the (few) cases where the optimizer processes the tables in a suboptimal order. Resolves: cockroachdb#115308 Release note (sql change): It is now possible to hint to the optimizer that it should plan a straight join by using the syntax `... INNER STRAIGHT JOIN ...`. If the hint is provided, the optimizer will now fix the join order as given in the query, even if it estimates that a different plan using join reordering would have a lower cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rharding6373)
pkg/sql/parser/sql.y
line 13795 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
The part about only being used with INNER joins is no longer correct.
nit: "in other words, the left table is always read before the right table" is not technically correct. I would just remove that part to avoid confusion.
Done.
pkg/sql/opt/xform/testdata/rules/join_order
line 3542 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I think it'd be good to have some tests with 2 joins where both or only one of the joins has a
STRAIGHT
hint.
Done.
9291950
to
9c788cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Thanks again for contributing!
bors r+
Reviewed 2 of 2 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @kevinmingtarja and @rharding6373)
Build failed (retrying...): |
Build succeeded: |
Thanks for all the help and review folks! |
Our current join hints (a INNER HASH JOIN b, a LEFT LOOKUP JOIN b, etc) fixes both the join order and the join algorithm. This commit adds the syntax and support for hinting the join order without hinting the join algorithm. This will be useful for the (few) cases where the optimizer processes the tables in a suboptimal order.
Resolves: #115308
Release note (sql change): It is now possible to hint to the optimizer that it should plan a straight join by using the syntax
... INNER STRAIGHT JOIN ...
. If the hint is provided, the optimizer will now fix the join order as given in the query, even if it estimates that a different plan using join reordering would have a lower cost.