Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24369][SQL] Correct handling for multiple distinct aggregations having the same argument set #21487

Closed
wants to merge 2 commits into from

Commits on Jun 3, 2018

  1. [SPARK-24369][SQL] Correct handling for multiple distinct aggregation…

    …s having the same argument set
    
    ## What changes were proposed in this pull request?
    This pr fixed an issue when having multiple distinct aggregations having the same argument set, e.g.,
    ```
    scala>: paste
    val df = sql(
      s"""SELECT corr(DISTINCT x, y), corr(DISTINCT y, x), count(*)
         | FROM (VALUES (1, 1), (2, 2), (2, 2)) t(x, y)
       """.stripMargin)
    
    java.lang.RuntimeException
    You hit a query analyzer bug. Please report your query to Spark user mailing list.
    ```
    The root cause is that `RewriteDistinctAggregates` can't detect multiple distinct aggregations if they have the same argument set. This pr modified code so that `RewriteDistinctAggregates` could count the number of aggregate expressions with `isDistinct=true`.
    
    ## How was this patch tested?
    Added tests in `DataFrameAggregateSuite`.
    
    Author: Takeshi Yamamuro <yamamuro@apache.org>
    
    Closes apache#21443 from maropu/SPARK-24369.
    maropu authored and cloud-fan committed Jun 3, 2018
    Configuration menu
    Copy the full SHA
    6c4b295 View commit details
    Browse the repository at this point in the history
  2. another fix

    cloud-fan committed Jun 3, 2018
    Configuration menu
    Copy the full SHA
    8386b42 View commit details
    Browse the repository at this point in the history