New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37802][SQL] Composite field name should work with Aggregate push down #35108
Conversation
b58c093
to
01ae878
Compare
@cloud-fan Could you please take a look? Thanks! |
and Chinese filed name has same problem == SQL == |
Should be good now @Dintion |
@@ -349,7 +349,7 @@ private[sql] final case class FieldReference(parts: Seq[String]) extends NamedRe | |||
|
|||
private[sql] object FieldReference { | |||
def apply(column: String): NamedReference = { | |||
LogicalExpressions.parseReference(column) | |||
LogicalExpressions.parseReference("`" + column + "`") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to fix the caller side. We shouldn't call FieldReference.apply(String)
which parses the given string. We should call FieldReference(Seq(col_name))
.
@@ -706,21 +706,21 @@ object DataSourceStrategy | |||
if (agg.filter.isEmpty) { | |||
agg.aggregateFunction match { | |||
case aggregate.Min(PushableColumnWithoutNestedColumn(name)) => | |||
Some(new Min(FieldReference(name))) | |||
Some(new Min(FieldReference(s"`$name`"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know it's a top-level column and it's a waste to parse it again. The column name may contain backtick as well and we need to escape it.
A simpler solution is to skip the parsing: FieldReference(Seq(name))
. We can even create an util method for it: FieldReference.column(name)
@huaxingao Could you wait #35101 merged and update with |
@huaxingao I think the code at def columnAsString(e: Expression): Option[FieldReference] = e match {
case PushableColumnWithoutNestedColumn(name) =>
Some(FieldReference(name).asInstanceOf[FieldReference])
case _ => None
}```
also exist same problem |
49348a9
to
c9361a5
Compare
@@ -351,6 +351,10 @@ private[sql] object FieldReference { | |||
def apply(column: String): NamedReference = { | |||
LogicalExpressions.parseReference(column) | |||
} | |||
|
|||
def column(name: String) : NamedReference = { | |||
FieldReference(Seq(name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your work
thanks, merging to master! |
Thank you all! |
…sh down ### What changes were proposed in this pull request? Currently, composite filed name such as dept id doesn't work with aggregate push down sql("SELECT COUNT(\`dept id\`) FROM h2.test.dept") ``` org.apache.spark.sql.catalyst.parser.ParseException: extraneous input 'id' expecting <EOF>(line 1, pos 5) == SQL == dept id -----^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:271) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:132) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(ParseDriver.scala:63) at org.apache.spark.sql.connector.expressions.LogicalExpressions$.parseReference(expressions.scala:39) at org.apache.spark.sql.connector.expressions.FieldReference$.apply(expressions.scala:365) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.translateAggregate(DataSourceStrategy.scala:717) at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.$anonfun$pushAggregates$1(PushDownUtils.scala:125) at scala.collection.immutable.List.flatMap(List.scala:366) at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.pushAggregates(PushDownUtils.scala:125) ``` This PR fixes the problem. ### Why are the changes needed? bug fixing ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New test Closes apache#35108 from huaxingao/composite_name. Authored-by: Huaxin Gao <huaxin_gao@apple.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Currently, composite filed name such as dept id doesn't work with aggregate push down
sql("SELECT COUNT(`dept id`) FROM h2.test.dept")
This PR fixes the problem.
Why are the changes needed?
bug fixing
Does this PR introduce any user-facing change?
No
How was this patch tested?
New test