[SPARK-17114][SQL] Fix aggregates grouped by literals with empty input#15101
[SPARK-17114][SQL] Fix aggregates grouped by literals with empty input#15101hvanhovell wants to merge 3 commits intoapache:masterfrom
Conversation
| @@ -1098,9 +1098,16 @@ object ReplaceExceptWithAntiJoin extends Rule[LogicalPlan] { | |||
| */ | |||
| object RemoveLiteralFromGroupExpressions extends Rule[LogicalPlan] { | |||
There was a problem hiding this comment.
is there an existing unit test suite for this? might be good to add a test case there too.
|
LGTM |
|
Test build #65398 has finished for PR 15101 at commit
|
|
Test build #65406 has finished for PR 15101 at commit
|
| // Do not rewrite the aggregate if we drop all grouping expressions, because this can | ||
| // change the return semantics when the input of the Aggregate is empty. See SPARK-17114 | ||
| // for more information. | ||
| a |
There was a problem hiding this comment.
how about a.copy(groupingExpressions = Seq(grouping.head))? I think we can still remove some literal grouping if we keep one of them
There was a problem hiding this comment.
Then it might be even better to replace it with something that is trivial to hash.
|
Test build #65434 has finished for PR 15101 at commit
|
|
LGTM, pending jenkins. |
|
retest this please |
|
Test build #65435 has finished for PR 15101 at commit
|
|
(My bad on MiMa issue -- should be fixed in master, retesting ...) |
|
Test build #3270 has finished for PR 15101 at commit
|
|
Merging to master/2.0. Thanks for the reviews. |
## What changes were proposed in this pull request? This PR fixes an issue with aggregates that have an empty input, and use a literals as their grouping keys. These aggregates are currently interpreted as aggregates **without** grouping keys, this triggers the ungrouped code path (which aways returns a single row). This PR fixes the `RemoveLiteralFromGroupExpressions` optimizer rule, which changes the semantics of the Aggregate by eliminating all literal grouping keys. ## How was this patch tested? Added tests to `SQLQueryTestSuite`. Author: Herman van Hovell <hvanhovell@databricks.com> Closes #15101 from hvanhovell/SPARK-17114-3. (cherry picked from commit d403562) Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
## What changes were proposed in this pull request? This PR fixes an issue with aggregates that have an empty input, and use a literals as their grouping keys. These aggregates are currently interpreted as aggregates **without** grouping keys, this triggers the ungrouped code path (which aways returns a single row). This PR fixes the `RemoveLiteralFromGroupExpressions` optimizer rule, which changes the semantics of the Aggregate by eliminating all literal grouping keys. ## How was this patch tested? Added tests to `SQLQueryTestSuite`. Author: Herman van Hovell <hvanhovell@databricks.com> Closes apache#15101 from hvanhovell/SPARK-17114-3.
What changes were proposed in this pull request?
This PR fixes an issue with aggregates that have an empty input, and use a literals as their grouping keys. These aggregates are currently interpreted as aggregates without grouping keys, this triggers the ungrouped code path (which aways returns a single row).
This PR fixes the
RemoveLiteralFromGroupExpressionsoptimizer rule, which changes the semantics of the Aggregate by eliminating all literal grouping keys.How was this patch tested?
Added tests to
SQLQueryTestSuite.