-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-54881][SQL][FOLLOWUP] Extract simplifyNot method in BooleanSimplification #54112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…plification This is a followup to apache#53658. Instead of recursively calling the entire `actualExprTransformer` to optimize new Not expressions created by De Morgan's Laws, this PR extracts a `simplifyNot` method that only handles Not simplification and calls itself recursively. This is more efficient because: - We only apply the Not-specific simplifications (not the And/Or factor elimination logic) - We avoid the overhead of tree pattern matching and pruning checks - The recursion is more direct and focused Co-authored-by: Cursor <cursoragent@cursor.com>
JIRA Issue Information=== Improvement SPARK-54881 === This comment was automatically generated by GitHub Actions |
| case _ => not | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have 2 comments:
- Have found recursive calls to be less efficient, as compared to getting the recursive behaviour via call on an instance, where the instance itself is changing at every step.. like it is happening in transformDown
- Any future modification , say adding new case etc, would require analysis whether it should go in both the places ( original transform down, and new simplifyNot function, or only in once place... atleast this is what I feel.. though I may be wrong.
But if that identification would be straightforward and any hypothetical new case added would be needed only in one of the functions, then this distinction is good. Though recursive call on method will be a concern for complex expressions....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously it was O(n^2): actualExprTransformer is called in transformExpressionsUpWithPruning, and itself also calls transformDownWithPruning. We repeatedly transform the sub-expression tree.
I don't think recursion is a concern, as these transform APIs are all recursive under the hood.
|
I am not sure there will be any change in terms of number of nodes being
traversed , in new and current code.. my understanding is that the moment
while traversing down, the Not operation is subsumed, the traversal will
stop as the transform down is with pruning.. so if Not pattern is absent
from the subtree it will stop, the way recursion will stop . Isn't it?
…On Tue, Feb 3, 2026, 2:16 AM Wenchen Fan ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
<#54112 (comment)>:
>
case Not(Not(e)) => e
case Not(IsNull(e)) => IsNotNull(e)
case Not(IsNotNull(e)) => IsNull(e)
+
+ case _ => not
}
}
Previously it was O(n^2): actualExprTransformer is called in
transformExpressionsUpWithPruning, and itself also calls
transformDownWithPruning. We repeatedly transform the sub-expression tree.
I don't think recursion is a concern, as these transform APIs are all
recursive under the hood.
—
Reply to this email directly, view it on GitHub
<#54112 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC6XG2BQ654FG23MLRMINIL4KBYQVAVCNFSM6AAAAACTZIEGL6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTONBUGM3TANBXGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
@ahshahid checking Not pattern also traverse the tree. |
|
That is right, but if the Not is subsumed, the children would be unchanged,
and each child would have the flag mask already set, as they have undergone
transform up with pruning so the lazy flag buys must have set...this was
my understanding...
…On Tue, Feb 3, 2026, 3:05 AM Wenchen Fan ***@***.***> wrote:
*cloud-fan* left a comment (apache/spark#54112)
<#54112 (comment)>
@ahshahid <https://github.com/ahshahid> checking Not pattern also
traverse the tree.
—
Reply to this email directly, view it on GitHub
<#54112 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC6XG2BZ2SZI3BGQAW27P334KB6F7AVCNFSM6AAAAACTZIEGL6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQNBQGY2DMNZYGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very similar what I suggested to amend the original PR: ahshahid#1, especially at this point: ahshahid@ab29d7b.
I think due to @ahshahid provided ruleId to the Not handling transformDownWithPruning(), unlikely it is ever O(n^2) in practice.
But I like the clean approach of this PR.
What changes were proposed in this pull request?
This is a followup to #53658. Instead of recursively calling the entire
actualExprTransformerto optimize new Not expressions created by De Morgan's Laws, this PR extracts asimplifyNotmethod that only handles Not simplification and calls itself recursively.Why are the changes needed?
This is more efficient because:
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests.
Was this patch authored or co-authored using generative AI tooling?
Yes.
Made with Cursor