[SPARK-37802][SQL] Composite field name should work with Aggregate push down #35108

huaxingao · 2022-01-05T19:31:51Z

What changes were proposed in this pull request?

Currently, composite filed name such as dept id doesn't work with aggregate push down

sql("SELECT COUNT(`dept id`) FROM h2.test.dept")

org.apache.spark.sql.catalyst.parser.ParseException: 
extraneous input 'id' expecting <EOF>(line 1, pos 5)

== SQL ==
dept id
-----^^^

	at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:271)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:132)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(ParseDriver.scala:63)
	at org.apache.spark.sql.connector.expressions.LogicalExpressions$.parseReference(expressions.scala:39)
	at org.apache.spark.sql.connector.expressions.FieldReference$.apply(expressions.scala:365)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.translateAggregate(DataSourceStrategy.scala:717)
	at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.$anonfun$pushAggregates$1(PushDownUtils.scala:125)
	at scala.collection.immutable.List.flatMap(List.scala:366)
	at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.pushAggregates(PushDownUtils.scala:125)

This PR fixes the problem.

Why are the changes needed?

bug fixing

Does this PR introduce any user-facing change?

No

How was this patch tested?

New test

huaxingao · 2022-01-05T23:14:16Z

@cloud-fan Could you please take a look? Thanks!

Dintion · 2022-01-06T02:56:57Z

and Chinese filed name has same problem

== SQL ==
缺陷编号
^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:265)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:126)

huaxingao · 2022-01-06T04:31:48Z

Should be good now @Dintion

cloud-fan · 2022-01-06T05:07:05Z

sql/catalyst/src/main/scala/org/apache/spark/sql/connector/expressions/expressions.scala

@@ -349,7 +349,7 @@ private[sql] final case class FieldReference(parts: Seq[String]) extends NamedRe

 private[sql] object FieldReference {
  def apply(column: String): NamedReference = {
-    LogicalExpressions.parseReference(column)
+    LogicalExpressions.parseReference("`" + column + "`")


I think we need to fix the caller side. We shouldn't call FieldReference.apply(String) which parses the given string. We should call FieldReference(Seq(col_name)).

cloud-fan · 2022-01-06T07:34:08Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala

@@ -706,21 +706,21 @@ object DataSourceStrategy
    if (agg.filter.isEmpty) {
      agg.aggregateFunction match {
        case aggregate.Min(PushableColumnWithoutNestedColumn(name)) =>
-          Some(new Min(FieldReference(name)))
+          Some(new Min(FieldReference(s"`$name`")))


We know it's a top-level column and it's a waste to parse it again. The column name may contain backtick as well and we need to escape it.

A simpler solution is to skip the parsing: FieldReference(Seq(name)). We can even create an util method for it: FieldReference.column(name)

beliefer · 2022-01-06T08:30:18Z

@huaxingao Could you wait #35101 merged and update with FieldReference.column(name) ?

Dintion · 2022-01-06T08:50:38Z

@huaxingao I think the code at org.apache.spark.sql.execution.datasources.v2.PushDownUtils#pushAggregates#columnAsString

 def columnAsString(e: Expression): Option[FieldReference] = e match {
      case PushableColumnWithoutNestedColumn(name) =>
        Some(FieldReference(name).asInstanceOf[FieldReference])
      case _ => None
    }```

 also exist same problem

…sh down

beliefer · 2022-01-07T01:16:04Z

sql/catalyst/src/main/scala/org/apache/spark/sql/connector/expressions/expressions.scala

@@ -351,6 +351,10 @@ private[sql] object FieldReference {
  def apply(column: String): NamedReference = {
    LogicalExpressions.parseReference(column)
  }
+
+  def column(name: String) : NamedReference = {
+    FieldReference(Seq(name))


Thank you for your work

cloud-fan · 2022-01-07T03:37:25Z

thanks, merging to master!

huaxingao · 2022-01-07T05:47:45Z

Thank you all!

…sh down ### What changes were proposed in this pull request? Currently, composite filed name such as dept id doesn't work with aggregate push down sql("SELECT COUNT(\`dept id\`) FROM h2.test.dept") ``` org.apache.spark.sql.catalyst.parser.ParseException: extraneous input 'id' expecting <EOF>(line 1, pos 5) == SQL == dept id -----^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:271) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:132) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(ParseDriver.scala:63) at org.apache.spark.sql.connector.expressions.LogicalExpressions$.parseReference(expressions.scala:39) at org.apache.spark.sql.connector.expressions.FieldReference$.apply(expressions.scala:365) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.translateAggregate(DataSourceStrategy.scala:717) at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.$anonfun$pushAggregates$1(PushDownUtils.scala:125) at scala.collection.immutable.List.flatMap(List.scala:366) at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.pushAggregates(PushDownUtils.scala:125) ``` This PR fixes the problem. ### Why are the changes needed? bug fixing ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New test Closes apache#35108 from huaxingao/composite_name. Authored-by: Huaxin Gao <huaxin_gao@apple.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

github-actions bot added the SQL label Jan 5, 2022

huaxingao force-pushed the composite_name branch from b58c093 to 01ae878 Compare January 5, 2022 19:38

cloud-fan reviewed Jan 6, 2022

View reviewed changes

huaxingao mentioned this pull request Jan 6, 2022

[SPARK-37527][SQL] Translate more standard aggregate functions for pushdown #35101

Closed

cloud-fan reviewed Jan 6, 2022

View reviewed changes

cloud-fan approved these changes Jan 6, 2022

View reviewed changes

huaxingao added 6 commits January 6, 2022 14:35

[SPARK-37802][SQL] Composite field name should work with Aggregate pu…

fcf9110

…sh down

rebase

722a518

fix parsing problem for non-ascii

288250b

address comments

8fa9ee6

add FieldReference.column(name)

80fe97a

FieldReference(name) => FieldReference.column(name) in a few more places

c9361a5

huaxingao force-pushed the composite_name branch from 49348a9 to c9361a5 Compare January 7, 2022 00:20

beliefer reviewed Jan 7, 2022

View reviewed changes

cloud-fan closed this in cf193b9 Jan 7, 2022

huaxingao deleted the composite_name branch January 7, 2022 05:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-37802][SQL] Composite field name should work with Aggregate push down #35108

[SPARK-37802][SQL] Composite field name should work with Aggregate push down #35108

huaxingao commented Jan 5, 2022

huaxingao commented Jan 5, 2022

Dintion commented Jan 6, 2022

huaxingao commented Jan 6, 2022

cloud-fan Jan 6, 2022

cloud-fan Jan 6, 2022

beliefer commented Jan 6, 2022

Dintion commented Jan 6, 2022 •

edited

beliefer Jan 7, 2022

cloud-fan commented Jan 7, 2022

huaxingao commented Jan 7, 2022

[SPARK-37802][SQL] Composite field name should work with Aggregate push down #35108

[SPARK-37802][SQL] Composite field name should work with Aggregate push down #35108

Conversation

huaxingao commented Jan 5, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

huaxingao commented Jan 5, 2022

Dintion commented Jan 6, 2022

huaxingao commented Jan 6, 2022

cloud-fan Jan 6, 2022

Choose a reason for hiding this comment

cloud-fan Jan 6, 2022

Choose a reason for hiding this comment

beliefer commented Jan 6, 2022

Dintion commented Jan 6, 2022 • edited

beliefer Jan 7, 2022

Choose a reason for hiding this comment

cloud-fan commented Jan 7, 2022

huaxingao commented Jan 7, 2022

Dintion commented Jan 6, 2022 •

edited