[SPARK-14502][SQL] Add optimization for Binary Comparison Simplification#12267
[SPARK-14502][SQL] Add optimization for Binary Comparison Simplification#12267dongjoon-hyun wants to merge 5 commits intoapache:masterfrom dongjoon-hyun:SPARK-14502
Conversation
|
Test build #55404 has finished for PR 12267 at commit
|
|
Hi, @rxin . |
|
cc @cloud-fan |
There was a problem hiding this comment.
I think this case doesn't need !a.nullable && !b.nullable
There was a problem hiding this comment.
Right. I'll fix that.
There was a problem hiding this comment.
Oh. @cloud-fan . When I looked at the EqualNullSafe one more time, I realized that that a EqualNullSafe b is null-safe but both a and b are not. The following is the code snippet of EqualNullSafe.
case class EqualNullSafe(left: Expression, right: Expression) extends BinaryComparison {
...
override def nullable: Boolean = false
...
override def eval(input: InternalRow): Any = {
val input1 = left.eval(input)
val input2 = right.eval(input)
if (input1 == null && input2 == null) {
true
In this case, I think we should keep !a.nullable && !b.nullable because we need to evaluate a and b indeed. How do you think about that?
There was a problem hiding this comment.
if a semanticEquals b, does it mean a and b should be both null or both not null?
There was a problem hiding this comment.
Oh, actually the semanticEquals just used deterministic and canonicalized string comparison.
def semanticEquals(other: Expression): Boolean =
deterministic && other.deterministic && canonicalized == other.canonicalized
There was a problem hiding this comment.
The javadoc of semanticEquals says: returns true when two expressions will always compute the same result. So I think they should be both null or both not null if they are semanticEquals.
There was a problem hiding this comment.
Correct. I missed that. Thank you so much. I'll fix that.
|
Thank you for deep review, @cloud-fan . :) |
There was a problem hiding this comment.
- the
eis not used. - it's a overkill to use loop here for only 2 cases.
|
@cloud-fan . Thank you so much for improving this PR. |
|
|
|
Up to now, I updated the followings.
If I missed something, please let me know. |
|
Test build #55505 has finished for PR 12267 at commit
|
|
Test build #55507 has finished for PR 12267 at commit
|
|
Test build #55511 has finished for PR 12267 at commit
|
|
The failure is irrelevant to this PR. So, I rebased to the master in order to trigger Jenkins again. |
| val nonNullableRelation = LocalRelation('a.int.withNullability(false)) | ||
|
|
||
| test("Preserve nullable or non-deterministic exprs in general") { | ||
| for (e <- Seq('a === 'a, 'a <= 'a, 'a >= 'a, 'a < 'a, 'a > 'a, Rand(0) === Rand(0))) { |
There was a problem hiding this comment.
We should not test non-deterministic expressions on nullable relation, as nullable relation stops this optimization for all kind of expressions.
There was a problem hiding this comment.
My bad. It smells a bug. I will split the testcase.
…ALSE Filter tests.
|
Test build #55517 has finished for PR 12267 at commit
|
|
Test build #55520 has finished for PR 12267 at commit
|
|
LGTM, cc @davies |
|
Merging this into master, thanks! |
|
Thank you, @davies , @cloud-fan , and @rxin ! |
What changes were proposed in this pull request?
We can simplifies binary comparisons with semantically-equal operands:
For example, the following example plan
will be optimized into the following.
How was this patch tested?
Pass the Jenkins tests including new
BinaryComparisonSimplificationSuite.