-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-17142][SQL] Complex query triggers binding error in HashAggregateExec #14917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #64774 has finished for PR 14917 at commit
|
|
Cool! Thanks for picking this up. cc @JoshRosen Could you make the description a bit more readable by using markdown formatting (mostly adding 1 or 3 backticks here and there)? |
|
I will try to get to this one ASAP. |
|
@hvanhovell I've updated the description following your advice, thank you for your time! |
|
@hvanhovell Could you review this PR when you have some time please? Thank you! |
| case other => other :: Nil | ||
| } | ||
|
|
||
| private def collectGroupingExpressions(plan: LogicalPlan): ExpressionSet = plan match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets move this into the apply (you already have the relevant comment there).
|
LGTM. Merging to master. This is a good find. Thanks! |
|
We should actually make sure that |
|
@jiangxb1987 could also open a backport against 2.0? Thanks! |
|
NVM. |
…ateExec In `ReorderAssociativeOperator` rule, we extract foldable expressions with Add/Multiply arithmetics, and replace with eval literal. For example, `(a + 1) + (b + 2)` is optimized to `(a + b + 3)` by this rule. For aggregate operator, output expressions should be derived from groupingExpressions, current implemenation of `ReorderAssociativeOperator` rule may break this promise. A instance could be: ``` SELECT ((t1.a + 1) + (t2.a + 2)) AS out_col FROM testdata2 AS t1 INNER JOIN testdata2 AS t2 ON (t1.a = t2.a) GROUP BY (t1.a + 1), (t2.a + 2) ``` `((t1.a + 1) + (t2.a + 2))` is optimized to `(t1.a + t2.a + 3)`, which could not be derived from `ExpressionSet((t1.a +1), (t2.a + 2))`. Maybe we should improve the rule of `ReorderAssociativeOperator` by adding a GroupingExpressionSet to keep Aggregate.groupingExpressions, and respect these expressions during the optimize stage. Add new test case in `ReorderAssociativeOperatorSuite`. Author: jiangxingbo <jiangxb1987@gmail.com> Closes apache#14917 from jiangxb1987/rao.
|
@hvanhovell Thank you! This bug was imported in spark-2.1.0, and in spark-2.0 we don't have the problem. So maybe we don't need to open backport against 2.0. |
|
@hvanhovell Do you mean we should check other optimize rules to ensure that |
|
No, I am saying that it might be a good idea to incorporate this rule into |
|
Sure!Will do it soon! |
…ateExec ## What changes were proposed in this pull request? In `ReorderAssociativeOperator` rule, we extract foldable expressions with Add/Multiply arithmetics, and replace with eval literal. For example, `(a + 1) + (b + 2)` is optimized to `(a + b + 3)` by this rule. For aggregate operator, output expressions should be derived from groupingExpressions, current implemenation of `ReorderAssociativeOperator` rule may break this promise. A instance could be: ``` SELECT ((t1.a + 1) + (t2.a + 2)) AS out_col FROM testdata2 AS t1 INNER JOIN testdata2 AS t2 ON (t1.a = t2.a) GROUP BY (t1.a + 1), (t2.a + 2) ``` `((t1.a + 1) + (t2.a + 2))` is optimized to `(t1.a + t2.a + 3)`, which could not be derived from `ExpressionSet((t1.a +1), (t2.a + 2))`. Maybe we should improve the rule of `ReorderAssociativeOperator` by adding a GroupingExpressionSet to keep Aggregate.groupingExpressions, and respect these expressions during the optimize stage. ## How was this patch tested? Add new test case in `ReorderAssociativeOperatorSuite`. Author: jiangxingbo <jiangxb1987@gmail.com> Closes apache#14917 from jiangxb1987/rao.
|
@hvanhovell In |
What changes were proposed in this pull request?
In
ReorderAssociativeOperatorrule, we extract foldable expressions with Add/Multiply arithmetics, and replace with eval literal. For example,(a + 1) + (b + 2)is optimized to(a + b + 3)by this rule.For aggregate operator, output expressions should be derived from groupingExpressions, current implemenation of
ReorderAssociativeOperatorrule may break this promise. A instance could be:((t1.a + 1) + (t2.a + 2))is optimized to(t1.a + t2.a + 3), which could not be derived fromExpressionSet((t1.a +1), (t2.a + 2)).Maybe we should improve the rule of
ReorderAssociativeOperatorby adding a GroupingExpressionSet to keep Aggregate.groupingExpressions, and respect these expressions during the optimize stage.How was this patch tested?
Add new test case in
ReorderAssociativeOperatorSuite.