New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2518][SQL] Fix foldability of Substring expression. #1432
Conversation
LGTM |
QA tests have started for PR 1432. This patch merges cleanly. |
LGTM |
QA results for PR 1432: |
Merging in master and branch-1.0. Thanks. |
This is a follow-up of #1428. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1432 from ueshin/issues/SPARK-2518 and squashes the following commits: 37d1ace [Takuya UESHIN] Fix foldability of Substring expression. (cherry picked from commit cc965ee) Signed-off-by: Reynold Xin <rxin@apache.org>
This is a follow-up of apache#1428. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes apache#1432 from ueshin/issues/SPARK-2518 and squashes the following commits: 37d1ace [Takuya UESHIN] Fix foldability of Substring expression.
…id unnecessary sort operations ### What changes were proposed in this pull request? This pull request tries to normalize the SortOrder properly to prevent unnecessary sort operators. Currently the sameOrderExpressions are not normalized as part of AliasAwareOutputOrdering. Example: consider this join of three tables: """ |SELECT t2id, t3.id as t3id |FROM ( | SELECT t1.id as t1id, t2.id as t2id | FROM t1, t2 | WHERE t1.id = t2.id |) t12, t3 |WHERE t1id = t3.id """. The plan for this looks like: *(8) Project [t2id#1059L, id#1004L AS t3id#1060L] +- *(8) SortMergeJoin [t2id#1059L], [id#1004L], Inner :- *(5) Sort [t2id#1059L ASC NULLS FIRST ], false, 0 <----------------------------- : +- *(5) Project [id#1000L AS t2id#1059L] : +- *(5) SortMergeJoin [id#996L], [id#1000L], Inner : :- *(2) Sort [id#996L ASC NULLS FIRST ], false, 0 : : +- Exchange hashpartitioning(id#996L, 5), true, [id=#1426] : : +- *(1) Range (0, 10, step=1, splits=2) : +- *(4) Sort [id#1000L ASC NULLS FIRST ], false, 0 : +- Exchange hashpartitioning(id#1000L, 5), true, [id=#1432] : +- *(3) Range (0, 20, step=1, splits=2) +- *(7) Sort [id#1004L ASC NULLS FIRST ], false, 0 +- Exchange hashpartitioning(id#1004L, 5), true, [id=#1443] +- *(6) Range (0, 30, step=1, splits=2) In this plan, the marked sort node could have been avoided as the data is already sorted on "t2.id" by the lower SortMergeJoin. ### Why are the changes needed? To remove unneeded Sort operators. ### Does this PR introduce any user-facing change? No ### How was this patch tested? New UT added. Closes #30302 from prakharjain09/SPARK-33400-sortorder. Authored-by: Prakhar Jain <prakharjain09@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This is a follow-up of #1428.