Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40177][SQL] Simplify condition of form (a==b) || (a==null&&b==null) to a<=>b #37625

Closed
wants to merge 7 commits into from

Conversation

ayushi-agarwal
Copy link
Contributor

@ayushi-agarwal ayushi-agarwal commented Aug 23, 2022

What changes were proposed in this pull request?

New case is added in Boolean simplification to convert condition of form (a==b) || (a==null&&b==null) to a<=>b.

Why are the changes needed?

If the join condition is like key1==key2 || (key1==null && key2==null), join is executed as Broadcast Nested Loop Join as this condition doesn't satisfy equi join condition. BNLJ takes more time as compared to Sort merge or broadcast hash join. This condition can be converted to key1<=>key2 to make the join execute as Broadcast or sort merge join. It will improve the performance of queries which have join with condition which matches this pattern.

Sample query:
val dfAns = df.join(df1, (df("v")===df1("x") or (isnull(df("v")) and isnull(df1("x")))), "leftanti")

Plan before change
OptimizedPlan:
Join LeftAnti, ((v#1 = x#15) || (isnull(v#1) && isnull(x#15)))
:- LocalRelation [g#0, v#1, o#2, x#3]
+- LocalRelation [x#15]

dfAns.queryExecution.executedPlan
*(1) BroadcastNestedLoopJoin BuildRight, LeftAnti, ((v#256 = x#270) || (isnull(v#256) && isnull(x#270)))
:- LocalTableScan [g#255, v#256, o#257, x#258]
+- BroadcastExchange IdentityBroadcastMode, [id=#91]
+- LocalTableScan [x#270]

Plan after change
OptimizedPlan
Join LeftAnti, (v#29 <=> x#79)
:- LocalRelation [g#28, v#29, o#30, x#31]
+- LocalRelation [x#79]

ExecutedPlan
*(1) BroadcastHashJoin [coalesce(v#29, 0), isnull(v#29)], [coalesce(x#71, 0), isnull(x#71)], LeftAnti, BuildRight
:- LocalTableScan [g#28, v#29, o#30, x#31]
+- BroadcastExchange HashedRelationBroadcastMode(ArrayBuffer(coalesce(input[0, int, false], 0), isnull(input[0, int, false]))), [id=#57]
+- LocalTableScan [x#71]

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests run

@github-actions github-actions bot added the SQL label Aug 23, 2022
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@ayushi-agarwal
Copy link
Contributor Author

gently ping @cloud-fan @srowen
Can you please help to verify this patch?

@cloud-fan
Copy link
Contributor

cc @sigmod @wangyum

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization seems logically correct. I don't know a lot about this part of the code, to review the code change. My only question would be how common it is to find this type of join condition, but I could believe it for join conditions

@ayushi-agarwal ayushi-agarwal changed the title [SPARK-40177][SQL] Simplify join condition of form (a==b) || (a==null&&b==null) to a<=>b [SPARK-40177][SQL] Simplify condition of form (a==b) || (a==null&&b==null) to a<=>b Aug 25, 2022
@@ -412,6 +412,16 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
}
}

case Or(EqualTo(l, r), And(IsNull(c1), IsNull(c2)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume that we have a chain of predicates combined by OR cond1 OR cond2 OR cond3 OR ... condN. I think we can merge condX and condY if they are EqualTo(l, r) and And(IsNull(l), isNull(r)). This is more general than the current approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I didn't generalize it because it would be tricky and will add complexity to code. Also it might be less common where these conditions are separated out with some other expressions in between.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Dec 27, 2022
@github-actions github-actions bot closed this Dec 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants