[SPARK-40177][SQL] Simplify condition of form (a==b) || (a==null&&b==null) to a<=>b #37625

ayushi-agarwal · 2022-08-23T09:05:25Z

What changes were proposed in this pull request?

New case is added in Boolean simplification to convert condition of form (a==b) || (a==null&&b==null) to a<=>b.

Why are the changes needed?

If the join condition is like key1==key2 || (key1==null && key2==null), join is executed as Broadcast Nested Loop Join as this condition doesn't satisfy equi join condition. BNLJ takes more time as compared to Sort merge or broadcast hash join. This condition can be converted to key1<=>key2 to make the join execute as Broadcast or sort merge join. It will improve the performance of queries which have join with condition which matches this pattern.

Sample query:
val dfAns = df.join(df1, (df("v")===df1("x") or (isnull(df("v")) and isnull(df1("x")))), "leftanti")

Plan before change
OptimizedPlan:
Join LeftAnti, ((v#1 = x#15) || (isnull(v#1) && isnull(x#15)))
:- LocalRelation [g#0, v#1, o#2, x#3]
+- LocalRelation [x#15]

dfAns.queryExecution.executedPlan
*(1) BroadcastNestedLoopJoin BuildRight, LeftAnti, ((v#256 = x#270) || (isnull(v#256) && isnull(x#270)))
:- LocalTableScan [g#255, v#256, o#257, x#258]
+- BroadcastExchange IdentityBroadcastMode, [id=#91]
+- LocalTableScan [x#270]

Plan after change
OptimizedPlan
Join LeftAnti, (v#29 <=> x#79)
:- LocalRelation [g#28, v#29, o#30, x#31]
+- LocalRelation [x#79]

ExecutedPlan
*(1) BroadcastHashJoin [coalesce(v#29, 0), isnull(v#29)], [coalesce(x#71, 0), isnull(x#71)], LeftAnti, BuildRight
:- LocalTableScan [g#28, v#29, o#30, x#31]
+- BroadcastExchange HashedRelationBroadcastMode(ArrayBuffer(coalesce(input[0, int, false], 0), isnull(input[0, int, false]))), [id=#57]
+- LocalTableScan [x#71]

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests run

Merge from apache master

Merge with master

Merge apache spark master

AmplabJenkins · 2022-08-23T15:27:41Z

Can one of the admins verify this patch?

ayushi-agarwal · 2022-08-24T07:53:28Z

gently ping @cloud-fan @srowen
Can you please help to verify this patch?

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

cloud-fan · 2022-08-24T09:50:17Z

cc @sigmod @wangyum

srowen

The optimization seems logically correct. I don't know a lot about this part of the code, to review the code change. My only question would be how common it is to find this type of join condition, but I could believe it for join conditions

cloud-fan · 2022-09-16T06:33:11Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

@@ -412,6 +412,16 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
          }
        }

+      case Or(EqualTo(l, r), And(IsNull(c1), IsNull(c2)))


Assume that we have a chain of predicates combined by OR cond1 OR cond2 OR cond3 OR ... condN. I think we can merge condX and condY if they are EqualTo(l, r) and And(IsNull(l), isNull(r)). This is more general than the current approach.

@cloud-fan I didn't generalize it because it would be tricky and will add complexity to code. Also it might be less common where these conditions are separated out with some other expressions in between.

github-actions · 2022-12-27T00:19:26Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

ayushi-agarwal and others added 5 commits November 24, 2020 00:06

Merge pull request #1 from apache/master

5452f39

Merge from apache master

Merge pull request #2 from apache/master

c59f30f

Merge with master

Merge pull request #3 from apache/master

3ab19ea

Merge apache spark master

Merge branch 'apache:master' into master

de71793

Add rule and UTs for join condition simplification

70a9533

github-actions bot added the SQL label Aug 23, 2022

mskapilks approved these changes Aug 24, 2022

View reviewed changes

cloud-fan reviewed Aug 24, 2022

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala Outdated Show resolved Hide resolved

srowen reviewed Aug 24, 2022

View reviewed changes

Ayushi Agarwal added 2 commits August 25, 2022 14:23

move simplification to boolean simplification rule

47d6d55

remove rule from collection

15b100d

ayushi-agarwal changed the title ~~[SPARK-40177][SQL] Simplify join condition of form (a==b) || (a==null&&b==null) to a<=>b~~ [SPARK-40177][SQL] Simplify condition of form (a==b) || (a==null&&b==null) to a<=>b Aug 25, 2022

ayushi-agarwal closed this Sep 14, 2022

ayushi-agarwal reopened this Sep 14, 2022

cloud-fan reviewed Sep 16, 2022

View reviewed changes

github-actions bot added the Stale label Dec 27, 2022

github-actions bot closed this Dec 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-40177][SQL] Simplify condition of form (a==b) || (a==null&&b==null) to a<=>b #37625

[SPARK-40177][SQL] Simplify condition of form (a==b) || (a==null&&b==null) to a<=>b #37625

ayushi-agarwal commented Aug 23, 2022 •

edited

AmplabJenkins commented Aug 23, 2022

ayushi-agarwal commented Aug 24, 2022

cloud-fan commented Aug 24, 2022

srowen left a comment

cloud-fan Sep 16, 2022

ayushi-agarwal Sep 17, 2022

github-actions bot commented Dec 27, 2022

[SPARK-40177][SQL] Simplify condition of form (a==b) || (a==null&&b==null) to a<=>b #37625

[SPARK-40177][SQL] Simplify condition of form (a==b) || (a==null&&b==null) to a<=>b #37625

Conversation

ayushi-agarwal commented Aug 23, 2022 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

AmplabJenkins commented Aug 23, 2022

ayushi-agarwal commented Aug 24, 2022

cloud-fan commented Aug 24, 2022

srowen left a comment

Choose a reason for hiding this comment

cloud-fan Sep 16, 2022

Choose a reason for hiding this comment

ayushi-agarwal Sep 17, 2022

Choose a reason for hiding this comment

github-actions bot commented Dec 27, 2022

ayushi-agarwal commented Aug 23, 2022 •

edited