Skip to content

[SPARK-14502][SQL] Add optimization for Binary Comparison Simplification#12267

Closed
dongjoon-hyun wants to merge 5 commits intoapache:masterfrom
dongjoon-hyun:SPARK-14502
Closed

[SPARK-14502][SQL] Add optimization for Binary Comparison Simplification#12267
dongjoon-hyun wants to merge 5 commits intoapache:masterfrom
dongjoon-hyun:SPARK-14502

Conversation

@dongjoon-hyun
Copy link
Member

What changes were proposed in this pull request?

We can simplifies binary comparisons with semantically-equal operands:

  1. Replace '<=>' with 'true' literal.
  2. Replace '=', '<=', and '>=' with 'true' literal if both operands are non-nullable.
  3. Replace '<' and '>' with 'false' literal if both operands are non-nullable.

For example, the following example plan

scala> sql("SELECT * FROM (SELECT explode(array(1,2,3)) a) T WHERE a BETWEEN a AND a+7").explain()
...
:  +- Filter ((a#59 >= a#59) && (a#59 <= (a#59 + 7)))
...

will be optimized into the following.

:  +- Filter (a#47 <= (a#47 + 7))

How was this patch tested?

Pass the Jenkins tests including new BinaryComparisonSimplificationSuite.

@SparkQA
Copy link

SparkQA commented Apr 9, 2016

Test build #55404 has finished for PR 12267 at commit 69fadfe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Hi, @rxin .
Could you review this PR please?

@rxin
Copy link
Contributor

rxin commented Apr 10, 2016

cc @cloud-fan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case doesn't need !a.nullable && !b.nullable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I'll fix that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. @cloud-fan . When I looked at the EqualNullSafe one more time, I realized that that a EqualNullSafe b is null-safe but both a and b are not. The following is the code snippet of EqualNullSafe.

case class EqualNullSafe(left: Expression, right: Expression) extends BinaryComparison {
...
  override def nullable: Boolean = false
...
  override def eval(input: InternalRow): Any = {
    val input1 = left.eval(input)
    val input2 = right.eval(input)
    if (input1 == null && input2 == null) {
      true

In this case, I think we should keep !a.nullable && !b.nullable because we need to evaluate a and b indeed. How do you think about that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a semanticEquals b, does it mean a and b should be both null or both not null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually the semanticEquals just used deterministic and canonicalized string comparison.

  def semanticEquals(other: Expression): Boolean =
    deterministic && other.deterministic && canonicalized == other.canonicalized

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The javadoc of semanticEquals says: returns true when two expressions will always compute the same result. So I think they should be both null or both not null if they are semanticEquals.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I missed that. Thank you so much. I'll fix that.

@dongjoon-hyun
Copy link
Member Author

Thank you for deep review, @cloud-fan . :)
I changed PR according to the one comment first.
For the others, I asked some questions to understand more correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the e is not used.
  2. it's a overkill to use loop here for only 2 cases.

@dongjoon-hyun
Copy link
Member Author

@cloud-fan . Thank you so much for improving this PR.
Overall, this PR starts to include one Non-Null case (EqualNullSafe) now.
May I change the classname from NonNullableBinaryComparisonSimplification to BinaryComparisonSimplification, too?
(I thought it's a little bit too long.)

@cloud-fan
Copy link
Contributor

BinaryComparisonSimplification LGTM

@dongjoon-hyun
Copy link
Member Author

Up to now, I updated the followings.

  1. Extends EqualNullSafe case to handle non-nullable operands.
  2. Adds a testcase Nullable Simplication Primitive.
  3. Changes the class name from NonNullableBinaryComparisonSimplification to BinaryComparisonSimplification.
  4. Fix test coverage by using for correctly and adding more testcases in Preserve nullable or non-deterministic exprs in general.
  5. Fix test coverage by replacing '||' to '&&' in testcase Expression Normalization.
  6. Simplify test relations

If I missed something, please let me know.

@SparkQA
Copy link

SparkQA commented Apr 11, 2016

Test build #55505 has finished for PR 12267 at commit 7ea4e29.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-14502][SQL] Add optimization for Non-Nullable Binary Comparison Simplification [SPARK-14502][SQL] Add optimization for Binary Comparison Simplification Apr 11, 2016
@SparkQA
Copy link

SparkQA commented Apr 11, 2016

Test build #55507 has finished for PR 12267 at commit 86380de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 11, 2016

Test build #55511 has finished for PR 12267 at commit 4e9096d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

The failure is irrelevant to this PR. So, I rebased to the master in order to trigger Jenkins again.

[info] - event ordering *** FAILED *** (1 minute)
[info]   The code passed to failAfter did not complete within 1 minute. (ContinuousQueryListenerSuite.scala:150)

val nonNullableRelation = LocalRelation('a.int.withNullability(false))

test("Preserve nullable or non-deterministic exprs in general") {
for (e <- Seq('a === 'a, 'a <= 'a, 'a >= 'a, 'a < 'a, 'a > 'a, Rand(0) === Rand(0))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not test non-deterministic expressions on nullable relation, as nullable relation stops this optimization for all kind of expressions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. It smells a bug. I will split the testcase.

@SparkQA
Copy link

SparkQA commented Apr 11, 2016

Test build #55517 has finished for PR 12267 at commit bdc309c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 11, 2016

Test build #55520 has finished for PR 12267 at commit 9ca13be.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM, cc @davies

@davies
Copy link
Contributor

davies commented Apr 11, 2016

Merging this into master, thanks!

@dongjoon-hyun
Copy link
Member Author

Thank you, @davies , @cloud-fan , and @rxin !

@asfgit asfgit closed this in 5de2619 Apr 11, 2016
@dongjoon-hyun dongjoon-hyun deleted the SPARK-14502 branch May 12, 2016 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants