Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46632][SQL] EquivalentExpressions addExprTree should allow all type of expressions #45894

Conversation

planga82
Copy link
Contributor

@planga82 planga82 commented Apr 5, 2024

What changes were proposed in this pull request?

When we try to add some expressions to equivalent expressions it fail

import org.apache.spark.sql.catalyst.dsl.expressions.DslExpression
import org.apache.spark.sql.catalyst.expressions.{EquivalentExpressions, If, Literal}
import org.apache.spark.sql.functions.col

val one = Literal(1.0)
val y = col("y").expr
val e1 = If(
  Literal(true),
  y * one * one + one * one * y,
  y * one * one + one * one * y
)
(new EquivalentExpressions).addExprTree(e1)

java.lang.IllegalStateException: Cannot update expression: (1.0 * 1.0) in map: Map(ExpressionEquals(('y * 1.0)) -> ExpressionStats(('y * 1.0))) with use count: -1
  at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.updateExprInMap(EquivalentExpressions.scala:85)
  at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.updateExprTree(EquivalentExpressions.scala:198)
  at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.$anonfun$updateExprTree$1(EquivalentExpressions.scala:200)
  at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.$anonfun$updateExprTree$1$adapted(EquivalentExpressions.scala:200)
  at scala.collection.Iterator.foreach(Iterator.scala:943)
  at scala.collection.Iterator.foreach$(Iterator.scala:943)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
...

The main problem is when updateExprTree is used with useCount = 1 it iterates through the children of an expression in a certain order. When in updateExprInMap an equivalent (but not equal) expression is found, it does not iterate over its children. Therefore to be able to perform the inverse operation (useCount = -1) it is necessary to traverse the children in inverse order how they were added because the children of two equivalent expressions are not necessarily equal, as the example that we have (y + 1) + 1 is equivalent to (1 + 1) + y but (y + 1) is not equal to (1 + 1).

Why are the changes needed?

To fix an issue with some type of expressions

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit testing

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Apr 5, 2024
@planga82
Copy link
Contributor Author

CC @viirya

@@ -193,7 +193,9 @@ class EquivalentExpressions(

if (!skip && !updateExprInMap(expr, map, useCount)) {
val uc = useCount.sign
childrenToRecurse(expr).foreach(updateExprTree(_, map, uc))
val children = childrenToRecurse(expr)
val updatedChildren = if (uc < 0) children.reverse else children
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverse does not seem to solve the problem, for example:

Seq((1, 2, 3), (2, 3, 4)).toDF("a", "b", "c")
      .selectExpr("case when a + b + c>3 then 1" +
        " when c + a + b>10 then 2 else 0 end as d").show()

@planga82
Copy link
Contributor Author

Thank you @zml1206 . I saw that you have proposed other solution. I'm going to close this PR

@planga82 planga82 closed this Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants