[SPARK-10165][SQL] Await child resolution in ResolveFunctions #8371

marmbrus · 2015-08-21T23:39:11Z

Currently, we eagerly attempt to resolve functions, even before their children are resolved. However, this is not valid in cases where we need to know the types of the input arguments (i.e. when resolving Hive UDFs).

As a fix, this PR delays function resolution until the functions children are resolved. This change also necessitates a change to the way we resolve aggregate expressions that are not in aggregate operators (e.g., in HAVING or ORDER BY clauses). Specifically, we can't assume that these misplaced functions will be resolved, allowing us to differentiate aggregate functions from normal functions. To compensate for this change we now attempt to resolve these unresolved expressions in the context of the aggregate operator, before checking to see if any aggregate expressions are present.

SparkQA · 2015-08-22T00:05:49Z

Test build #41397 has finished for PR 8371 at commit 5b9b4ae.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2015-08-22T06:30:01Z

I think this breaks having group resolution ...

SparkQA · 2015-08-24T03:24:34Z

Test build #41432 has finished for PR 8371 at commit cff8b1d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-24T07:07:38Z

Test build #41439 has finished for PR 8371 at commit 827688e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2015-08-24T20:11:29Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+            val evaluatedOrderings: Seq[SortOrder] = sortOrder.zip(resolvedAggregateOrdering).map {
+              case (order, evaluated) => order.copy(child = evaluated.toAttribute)
+            }
+            val aggExprsWithHaving: Seq[NamedExpression] =


wrong variable name aggExprsWithHaving?

SparkQA · 2015-08-24T21:38:11Z

Test build #41468 has finished for PR 8371 at commit f00e1e9.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-24T22:37:56Z

Test build #41470 has finished for PR 8371 at commit a52d305.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-25T00:33:53Z

Test build #41483 has finished for PR 8371 at commit ab80fad.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Currently, we eagerly attempt to resolve functions, even before their children are resolved. However, this is not valid in cases where we need to know the types of the input arguments (i.e. when resolving Hive UDFs). As a fix, this PR delays function resolution until the functions children are resolved. This change also necessitates a change to the way we resolve aggregate expressions that are not in aggregate operators (e.g., in `HAVING` or `ORDER BY` clauses). Specifically, we can't assume that these misplaced functions will be resolved, allowing us to differentiate aggregate functions from normal functions. To compensate for this change we now attempt to resolve these unresolved expressions in the context of the aggregate operator, before checking to see if any aggregate expressions are present. Author: Michael Armbrust <michael@databricks.com> Closes #8371 from marmbrus/hiveUDFResolution. (cherry picked from commit 2bf338c) Signed-off-by: Michael Armbrust <michael@databricks.com>

Before #8371, there was a bug for `Sort` on `Aggregate` that we can't use aggregate expressions named `_aggOrdering` and can't use more than one ordering expressions which contains aggregate functions. The reason of this bug is that: The aggregate expression in `SortOrder` never get resolved, we alias it with `_aggOrdering` and call `toAttribute` which gives us an `UnresolvedAttribute`. So actually we are referencing aggregate expression by name, not by exprId like we thought. And if there is already an aggregate expression named `_aggOrdering` or there are more than one ordering expressions having aggregate functions, we will have conflict names and can't search by name. However, after #8371 got merged, the `SortOrder`s are guaranteed to be resolved and we are always referencing aggregate expression by exprId. The Bug doesn't exist anymore and this PR add regression tests for it. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8231 from cloud-fan/sort-agg.

marmbrus added 4 commits August 23, 2015 22:05

[SPARK-10165][SQL] Await child resolution in ResolveFunctions

5e93654

handle sorts

ca76997

remove duplicate rules

e5ea534

fix fallback

827688e

marmbrus force-pushed the hiveUDFResolution branch from cff8b1d to 827688e Compare August 24, 2015 05:05

marmbrus added 2 commits August 24, 2015 12:34

handle induced ambiguity

34bc2b6

better comment

f00e1e9

yhuai reviewed Aug 24, 2015
View reviewed changes

marmbrus added 3 commits August 24, 2015 13:14

naming

614eb67

naming

a52d305

Merge remote-tracking branch 'origin/master' into hiveUDFResolution

b7e90b5

don't push down unless in grouping expressions

ab80fad

asfgit closed this in 2bf338c Aug 25, 2015

cloud-fan mentioned this pull request Aug 25, 2015

[SPARK-10034][SQL] add regression test for Sort on Aggregate #8231

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-10165][SQL] Await child resolution in ResolveFunctions #8371

[SPARK-10165][SQL] Await child resolution in ResolveFunctions #8371

marmbrus commented Aug 21, 2015

SparkQA commented Aug 22, 2015

rxin commented Aug 22, 2015

SparkQA commented Aug 24, 2015

SparkQA commented Aug 24, 2015

yhuai Aug 24, 2015

SparkQA commented Aug 24, 2015

SparkQA commented Aug 24, 2015

SparkQA commented Aug 25, 2015

[SPARK-10165][SQL] Await child resolution in ResolveFunctions #8371

[SPARK-10165][SQL] Await child resolution in ResolveFunctions #8371

Conversation

marmbrus commented Aug 21, 2015

SparkQA commented Aug 22, 2015

rxin commented Aug 22, 2015

SparkQA commented Aug 24, 2015

SparkQA commented Aug 24, 2015

yhuai Aug 24, 2015

Choose a reason for hiding this comment

SparkQA commented Aug 24, 2015

SparkQA commented Aug 24, 2015

SparkQA commented Aug 25, 2015