New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FOLLOW-UP][SPARK-26065][SQL] Revert hint behavior in join reordering #23524
Conversation
Test build #101103 has finished for PR 23524 at commit
|
retest this please |
Test build #101110 has finished for PR 23524 at commit
|
Retest this please. |
Test build #101115 has finished for PR 23524 at commit
|
Retest this please. |
Test build #101121 has finished for PR 23524 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is based on an offline discussion with Maryann. We should revert it back to the original behavior. In the future, we can improve it and make HINT transparent to join reordering and add new HINT types for specifying the join orders.
LGTM Thanks! Merged to master. |
+1 for this. I was reading the PR #23036 and wondering if that's okay. |
## What changes were proposed in this pull request? This is to fix a bug in apache#23036 that would cause a join hint to be applied on node it is not supposed to after join reordering. For example, ``` val join = df.join(df, "id") val broadcasted = join.hint("broadcast") val join2 = join.join(broadcasted, "id").join(broadcasted, "id") ``` There should only be 2 broadcast hints on `join2`, but after join reordering there would be 4. It is because the hint application in join reordering compares the attribute set for testing relation equivalency. Moreover, it could still be problematic even if the child relations were used in testing relation equivalency, due to the potential exprId conflict in nested self-join. As a result, this PR simply reverts the join reorder hint behavior change introduced in apache#23036, which means if a join hint is present, the join node itself will not participate in the join reordering, while the sub-joins within its children still can. ## How was this patch tested? Added new tests Closes apache#23524 from maryannxue/query-hint-followup-2. Authored-by: maryannxue <maryannxue@apache.org> Signed-off-by: gatorsmile <gatorsmile@gmail.com>
…hints ## What changes were proposed in this pull request? This is a fix for apache#23524, which did not stop cost-based join reorder when the CostBasedJoinReorder rule recurses down the tree and applies join reorder for nested joins with hints. The issue had not been detected by the existing tests because CBO is disabled by default. ## How was this patch tested? Enabled CBO for JoinHintSuite. Closes apache#23759 from maryannxue/spark-26840. Lead-authored-by: maryannxue <maryannxue@apache.org> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>
…hints ## What changes were proposed in this pull request? This is a fix for apache#23524, which did not stop cost-based join reorder when the CostBasedJoinReorder rule recurses down the tree and applies join reorder for nested joins with hints. The issue had not been detected by the existing tests because CBO is disabled by default. ## How was this patch tested? Enabled CBO for JoinHintSuite. Closes apache#23759 from maryannxue/spark-26840. Lead-authored-by: maryannxue <maryannxue@apache.org> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>
What changes were proposed in this pull request?
This is to fix a bug in #23036 that would cause a join hint to be applied on node it is not supposed to after join reordering. For example,
There should only be 2 broadcast hints on
join2
, but after join reordering there would be 4. It is because the hint application in join reordering compares the attribute set for testing relation equivalency.Moreover, it could still be problematic even if the child relations were used in testing relation equivalency, due to the potential exprId conflict in nested self-join.
As a result, this PR simply reverts the join reorder hint behavior change introduced in #23036, which means if a join hint is present, the join node itself will not participate in the join reordering, while the sub-joins within its children still can.
How was this patch tested?
Added new tests