Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31663][SQL] Grouping sets with having clause returns the wrong result #28501

Closed
wants to merge 11 commits into from

Conversation

xuanyuanking
Copy link
Member

@xuanyuanking xuanyuanking commented May 11, 2020

What changes were proposed in this pull request?

  • Resolve the havingcondition with expanding the GROUPING SETS/CUBE/ROLLUP expressions together in ResolveGroupingAnalytics:
    • Change the operations resolving directions to top-down.
    • Try resolving the condition of the filter as though it is in the aggregate clause by reusing the function in ResolveAggregateFunctions
    • Push the aggregate expressions into the aggregate which contains the expanded operations.
  • Use UnresolvedHaving for all having clause.

Why are the changes needed?

Correctness bug fix. See the demo and analysis in SPARK-31663.

Does this PR introduce any user-facing change?

Yes, correctness bug fix for HAVING with GROUPING SETS.

How was this patch tested?

New UTs added.

@SparkQA
Copy link

SparkQA commented May 11, 2020

Test build #122510 has finished for PR 28501 at commit ced9568.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking xuanyuanking changed the title [WIP][SPARK-31663][SQL] Grouping sets with having clause returns the wrong result [SPARK-31663][SQL] Grouping sets with having clause returns the wrong result May 12, 2020
@SparkQA
Copy link

SparkQA commented May 12, 2020

Test build #122547 has finished for PR 28501 at commit 6f618ab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 12, 2020

Test build #122548 has finished for PR 28501 at commit b48eb75.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class UnresolvedHaving(

@SparkQA
Copy link

SparkQA commented May 12, 2020

Test build #122551 has finished for PR 28501 at commit 9d3dc8e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 13, 2020

Test build #122572 has finished for PR 28501 at commit 47a3b33.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented May 13, 2020

Test build #122580 has finished for PR 28501 at commit 47a3b33.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 14, 2020

Test build #122610 has finished for PR 28501 at commit 567ea7f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

retest this please

case _ =>
Filter(predicate, plan)
}
UnresolvedHaving(predicate, plan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does global aggregate still work? e.g. UnresolvedHaving(Project(agg_func ...))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it still works, the UnresolvedHaving will be changed to Filter in rule ResolveAggregateFunction.

@SparkQA
Copy link

SparkQA commented May 14, 2020

Test build #122615 has finished for PR 28501 at commit 9b3dfb4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 14, 2020

Test build #122611 has finished for PR 28501 at commit 9b3dfb4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor

holdenk commented May 14, 2020

I'm planning to cut RC2 tomorrow for 2.4.6, looking at the state of this PR I'm assuming it won't make it for 2.4.6 and we can put it in 2.4.7. Does that sound ok to folks?

@xuanyuanking
Copy link
Member Author

@holdenk Thanks for notifying, I'll address all the comments today. Yep, if it can be merged before cutting 2.4.6, let's put it in 2.4.7.

@cloud-fan
Copy link
Contributor

It's a long-standing bug, so it doesn't block 2.4.6. I'll see if I can merge it today.

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #122648 has finished for PR 28501 at commit 3b48e38.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5038 has started for PR 28501 at commit 1de0c75.

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #122657 has finished for PR 28501 at commit 3b48e38.

  • This patch fails from timeout after a configured wait of 400m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #122678 has finished for PR 28501 at commit 1de0c75.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #122679 has finished for PR 28501 at commit 1de0c75.

  • This patch fails from timeout after a configured wait of 400m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0/2.4!

@cloud-fan cloud-fan closed this in 86bd37f May 16, 2020
@dongjoon-hyun
Copy link
Member

Thank you so much, @xuanyuanking and @cloud-fan .
cc @holdenk

cloud-fan pushed a commit to cloud-fan/spark that referenced this pull request May 16, 2020
… result

- Resolve the havingcondition with expanding the GROUPING SETS/CUBE/ROLLUP expressions together in `ResolveGroupingAnalytics`:
    - Change the operations resolving directions to top-down.
    - Try resolving the condition of the filter as though it is in the aggregate clause by reusing the function in `ResolveAggregateFunctions`
    - Push the aggregate expressions into the aggregate which contains the expanded operations.
- Use UnresolvedHaving for all having clause.

Correctness bug fix. See the demo and analysis in SPARK-31663.

Yes, correctness bug fix for HAVING with GROUPING SETS.

New UTs added.

Closes apache#28501 from xuanyuanking/SPARK-31663.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 86bd37f)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@dongjoon-hyun
Copy link
Member

Hi, @cloud-fan . This seems to be not in 2.4 yet.

thanks, merging to master/3.0/2.4!

cloud-fan pushed a commit that referenced this pull request May 16, 2020
… result

- Resolve the havingcondition with expanding the GROUPING SETS/CUBE/ROLLUP expressions together in `ResolveGroupingAnalytics`:
    - Change the operations resolving directions to top-down.
    - Try resolving the condition of the filter as though it is in the aggregate clause by reusing the function in `ResolveAggregateFunctions`
    - Push the aggregate expressions into the aggregate which contains the expanded operations.
- Use UnresolvedHaving for all having clause.

Correctness bug fix. See the demo and analysis in SPARK-31663.

Yes, correctness bug fix for HAVING with GROUPING SETS.

New UTs added.

Closes #28501 from xuanyuanking/SPARK-31663.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 86bd37f)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan
Copy link
Contributor

done. It has some conflicts, so I fixed them and ran tests locally, which takes some time.

@xuanyuanking xuanyuanking deleted the SPARK-31663 branch May 21, 2020 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants