-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19994][SQL] Wrong outputOrdering for right/full outer smj #17331
Conversation
Test build #74731 has finished for PR 17331 at commit
|
} | ||
} | ||
|
||
test("EnsureRequirements for sort operator after left outer sort merge join") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we just moving test around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added three test cases for left/right/full outer join. For other tests, I moved tests about sort together and extract common logic to a method and some private fields.
case RightOuter => | ||
// For right outer join, values of the left key will be filled with nulls if it can't | ||
// match the value of the right key, so `nullOrdering` of the left key can't be guaranteed. | ||
// We should output right key order here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is misleading. The output ordering is mainly affected by how we implement SortMergeJoinExec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// For left and right outer joins, the output is ordered by the streamed input's join keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I'll use those comments in the original PR.
The bug was introduced when we merge https://github.com/apache/spark/pull/11743/files#diff-b669f8cf35f1d2d786582f4d8c49ed14 |
case FullOuter => | ||
// Neither left key nor right key guarantees `nullOrdering` after full outer join. | ||
Nil | ||
case _ => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, please use the white list. Otherwise, we might forget to update this when adding new join types. Then, we should issue the exception for the default case.
Test build #74809 has started for PR 17331 at commit |
retest this please |
Test build #74817 has finished for PR 17331 at commit
|
retest this please |
Test build #74835 has finished for PR 17331 at commit
|
## What changes were proposed in this pull request? For right outer join, values of the left key will be filled with nulls if it can't match the value of the right key, so `nullOrdering` of the left key can't be guaranteed. We should output right key order instead of left key order. For full outer join, neither left key nor right key guarantees `nullOrdering`. We should not output any ordering. In tests, besides adding three test cases for left/right/full outer sort merge join, this patch also reorganizes code in `PlannerSuite` by putting together tests for `Sort`, and also extracts common logic in Sort tests into a method. ## How was this patch tested? Corresponding test cases are added. Author: wangzhenhua <wangzhenhua@huawei.com> Author: Zhenhua Wang <wzh_zju@163.com> Closes #17331 from wzhfy/wrongOrdering. (cherry picked from commit 965a5ab) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
thanks, merging to master/2.1/2.0! |
## What changes were proposed in this pull request? For right outer join, values of the left key will be filled with nulls if it can't match the value of the right key, so `nullOrdering` of the left key can't be guaranteed. We should output right key order instead of left key order. For full outer join, neither left key nor right key guarantees `nullOrdering`. We should not output any ordering. In tests, besides adding three test cases for left/right/full outer sort merge join, this patch also reorganizes code in `PlannerSuite` by putting together tests for `Sort`, and also extracts common logic in Sort tests into a method. ## How was this patch tested? Corresponding test cases are added. Author: wangzhenhua <wangzhenhua@huawei.com> Author: Zhenhua Wang <wzh_zju@163.com> Closes #17331 from wzhfy/wrongOrdering. (cherry picked from commit 965a5ab) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
It seems this breaks the build in 2.0 https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.0-compile-maven-hadoop-2.7/lastBuild/console
|
@HyukjinKwon branch2.0 doesn't have InnerLike, I'll fix this. |
@wzhfy Thank you so much. |
What changes were proposed in this pull request?
For right outer join, values of the left key will be filled with nulls if it can't match the value of the right key, so
nullOrdering
of the left key can't be guaranteed. We should output right key order instead of left key order.For full outer join, neither left key nor right key guarantees
nullOrdering
. We should not output any ordering.In tests, besides adding three test cases for left/right/full outer sort merge join, this patch also reorganizes code in
PlannerSuite
by putting together tests forSort
, and also extracts common logic in Sort tests into a method.How was this patch tested?
Corresponding test cases are added.