Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46632][SQL] Fix subexpression elimination when equivalent ternary expressions have different children #46135

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zml1206
Copy link
Contributor

@zml1206 zml1206 commented Apr 19, 2024

What changes were proposed in this pull request?

Remove unexpected exception thrown in EquivalentExpressions.updateExprInMap(). Equivalent expressions may contain different children, it should happen expression not in map and useCount is -1.
For example, before this PR will throw IllegalStateException

Seq((1, 2, 3), (2, 3, 4)).toDF("a", "b", "c")
      .selectExpr("case when a + b + c>3 then 1 when c + a + b>0 then 2 else 0 end as d").show()

Why are the changes needed?

Bug fix.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New unit test, before this PR will throw IllegalStateException: *** with use count: -1

Was this patch authored or co-authored using generative AI tooling?

No.

…fferent children throw IllegalStateException
@github-actions github-actions bot added the SQL label Apr 19, 2024
@zml1206 zml1206 changed the title Fix subexpression elimination when equivalent ternary expressions have different children [SPARK-46632][SQL] Fix subexpression elimination when equivalent ternary expressions have different children Apr 19, 2024
@zml1206
Copy link
Contributor Author

zml1206 commented Apr 19, 2024

cc @peter-toth @cloud-fan

@@ -494,6 +494,18 @@ class SubexpressionEliminationSuite extends SparkFunSuite with ExpressionEvalHel
checkShortcut(Or(equal, Literal(true)), 1)
checkShortcut(Not(And(equal, Literal(false))), 1)
}

test("Equivalent ternary expressions have different children") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is because ((1 + 2) + 3) semanticEquals to ((3 + 1) + 2) but their children are different. So when we compute the common elements between the above 2 here:

var localEquivalenceMap = mutable.HashMap.empty[ExpressionEquals, ExpressionStats]
updateExprTree(exprs.head, localEquivalenceMap)
exprs.tail.foreach { expr =>
val otherLocalEquivalenceMap = mutable.HashMap.empty[ExpressionEquals, ExpressionStats]
updateExprTree(expr, otherLocalEquivalenceMap)
localEquivalenceMap = localEquivalenceMap.filter { case (key, _) =>
otherLocalEquivalenceMap.contains(key)
}
}

only ((1 + 2) + 3) remains in localEquivalenceMap but it's children don't. So later when we want to remove ((1 + 2) + 3) we can't find its children...

I think we could fix the above code that computes localEquivalenceMap, but the change in PR is simpler and doesn't seem to do any harm.

@zml1206
Copy link
Contributor Author

zml1206 commented Apr 23, 2024

cc @cloud-fan Do you have time to help take a look? Thank you.

@zml1206
Copy link
Contributor Author

zml1206 commented May 10, 2024

@peter-toth Can this PR be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants