[SPARK-54881][SQL][FOLLOWUP] Extract simplifyNot method in BooleanSimplification #54112

cloud-fan · 2026-02-03T08:50:25Z

What changes were proposed in this pull request?

This is a followup to #53658. Instead of recursively calling the entire actualExprTransformer to optimize new Not expressions created by De Morgan's Laws, this PR extracts a simplifyNot method that only handles Not simplification and calls itself recursively.

Why are the changes needed?

This is more efficient because:

We only apply the Not-specific simplifications (not the And/Or factor elimination logic)
We avoid the overhead of tree pattern matching and pruning checks
The recursion is more direct and focused

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

Yes.

Made with Cursor

…plification This is a followup to apache#53658. Instead of recursively calling the entire `actualExprTransformer` to optimize new Not expressions created by De Morgan's Laws, this PR extracts a `simplifyNot` method that only handles Not simplification and calls itself recursively. This is more efficient because: - We only apply the Not-specific simplifications (not the And/Or factor elimination logic) - We avoid the overhead of tree pattern matching and pruning checks - The recursion is more direct and focused Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions · 2026-02-03T08:50:35Z

JIRA Issue Information

=== Improvement SPARK-54881 ===
Summary: BooleanSimplification rule using transformExpressionsUp instead of transformExpressionsDown, is inefficient in some cases resulting in delayed idempotency
Assignee: Asif
Status: Resolved
Affected: ["4.0.2","3.5.8","4.1.2"]

This comment was automatically generated by GitHub Actions

cloud-fan · 2026-02-03T08:51:32Z

cc @ahshahid @peter-toth

ahshahid · 2026-02-03T09:58:13Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

+    case _ => not
  }
 }



Have 2 comments:

Have found recursive calls to be less efficient, as compared to getting the recursive behaviour via call on an instance, where the instance itself is changing at every step.. like it is happening in transformDown

Any future modification , say adding new case etc, would require analysis whether it should go in both the places ( original transform down, and new simplifyNot function, or only in once place... atleast this is what I feel.. though I may be wrong.
But if that identification would be straightforward and any hypothetical new case added would be needed only in one of the functions, then this distinction is good. Though recursive call on method will be a concern for complex expressions....

Previously it was O(n^2): actualExprTransformer is called in transformExpressionsUpWithPruning, and itself also calls transformDownWithPruning. We repeatedly transform the sub-expression tree.

I don't think recursion is a concern, as these transform APIs are all recursive under the hood.

ahshahid · 2026-02-03T10:54:40Z

I am not sure there will be any change in terms of number of nodes being traversed , in new and current code.. my understanding is that the moment while traversing down, the Not operation is subsumed, the traversal will stop as the transform down is with pruning.. so if Not pattern is absent from the subtree it will stop, the way recursion will stop . Isn't it?

…

On Tue, Feb 3, 2026, 2:16 AM Wenchen Fan ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala <#54112 (comment)>: > case Not(Not(e)) => e case Not(IsNull(e)) => IsNotNull(e) case Not(IsNotNull(e)) => IsNull(e) + + case _ => not } } Previously it was O(n^2): actualExprTransformer is called in transformExpressionsUpWithPruning, and itself also calls transformDownWithPruning. We repeatedly transform the sub-expression tree. I don't think recursion is a concern, as these transform APIs are all recursive under the hood. — Reply to this email directly, view it on GitHub <#54112 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC6XG2BQ654FG23MLRMINIL4KBYQVAVCNFSM6AAAAACTZIEGL6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTONBUGM3TANBXGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

cloud-fan · 2026-02-03T11:04:41Z

@ahshahid checking Not pattern also traverse the tree.

ahshahid · 2026-02-03T11:18:28Z

That is right, but if the Not is subsumed, the children would be unchanged, and each child would have the flag mask already set, as they have undergone transform up with pruning so the lazy flag buys must have set...this was my understanding...

…

On Tue, Feb 3, 2026, 3:05 AM Wenchen Fan ***@***.***> wrote: *cloud-fan* left a comment (apache/spark#54112) <#54112 (comment)> @ahshahid <https://github.com/ahshahid> checking Not pattern also traverse the tree. — Reply to this email directly, view it on GitHub <#54112 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC6XG2BZ2SZI3BGQAW27P334KB6F7AVCNFSM6AAAAACTZIEGL6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQNBQGY2DMNZYGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

peter-toth

This is very similar what I suggested to amend the original PR: ahshahid#1, especially at this point: ahshahid@ab29d7b.

I think due to @ahshahid provided ruleId to the Not handling transformDownWithPruning(), unlikely it is ever O(n^2) in practice.
But I like the clean approach of this PR.

github-actions bot added the SQL label Feb 3, 2026

ahshahid reviewed Feb 3, 2026

View reviewed changes

peter-toth approved these changes Feb 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-54881][SQL][FOLLOWUP] Extract simplifyNot method in BooleanSimplification #54112

[SPARK-54881][SQL][FOLLOWUP] Extract simplifyNot method in BooleanSimplification #54112

cloud-fan commented Feb 3, 2026

Uh oh!

github-actions bot commented Feb 3, 2026

Uh oh!

cloud-fan commented Feb 3, 2026

Uh oh!

ahshahid Feb 3, 2026

Uh oh!

cloud-fan Feb 3, 2026

Uh oh!

ahshahid commented Feb 3, 2026 via email

Uh oh!

cloud-fan commented Feb 3, 2026

Uh oh!

ahshahid commented Feb 3, 2026 via email

Uh oh!

peter-toth left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-54881][SQL][FOLLOWUP] Extract simplifyNot method in BooleanSimplification #54112

Are you sure you want to change the base?

[SPARK-54881][SQL][FOLLOWUP] Extract simplifyNot method in BooleanSimplification #54112

Conversation

cloud-fan commented Feb 3, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions bot commented Feb 3, 2026

JIRA Issue Information

Uh oh!

cloud-fan commented Feb 3, 2026

Uh oh!

ahshahid Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

ahshahid commented Feb 3, 2026 via email

Uh oh!

cloud-fan commented Feb 3, 2026

Uh oh!

ahshahid commented Feb 3, 2026 via email

Uh oh!

peter-toth left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

peter-toth left a comment •

edited

Loading