Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7142][SQL]: Minor enhancement to BooleanSimplification Optimizer rule #5700

Closed
wants to merge 4 commits into from

Conversation

saucam
Copy link

@saucam saucam commented Apr 25, 2015

Use these in the optimizer as well:

        A and (not(A) or B) => A and B
        not(A and B) => not(A) or not(B)
        not(A or B) => not(A) and not(B)

@SparkQA
Copy link

SparkQA commented Apr 25, 2015

Test build #30953 has finished for PR 5700 at commit 9e1f8dd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class IDF extends Estimator[IDFModel] with IDFBase
  • This patch adds the following new dependencies:
    • tachyon-0.6.4.jar
    • tachyon-client-0.6.4.jar
  • This patch removes the following dependencies:
    • tachyon-0.5.0.jar
    • tachyon-client-0.5.0.jar

// not(l || r) => not(l) && not(r)
case Or(l, r) => And(Not(l), Not(r))
// not(l && r) => not(l) or not(r)
case And(l, r) => Or(Not(l), Not(r))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How these 2 rules optimize execution?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for example the filter is not(Or(left, r)) , where r might be some filter on a partitioned column like part>=12 , in the present case this filter cannot be pushed down, since while evaluating we will encounter reference of partitioned column, whereas if this rule is applied we get And(not(l), part<12) and then not(l) might be pushed down since now splitting into conjunctive predicates is possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case Not(Or(l, r))? Seems you miss the Not...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inside a case match :

 case not @ Not(exp) => exp match {
  ....
  ....
  case Or(l, r) => And(Not(l), Not(r))

}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing Not(And(a, b)) to And(Not(a), Not(b)) may not always be an optimization. It do helps to push down filters for datasource, but I think #8200 is more reasonable since it only do this optimization in datasource part.

cc @marmbrus

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan could you please explain a bit more when and how converting to "And" may not be an optimization ? I was wondering would it actually result in any kind of performance hit ? Also could you tell how #8200 is more reasonable ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I thought this optimization should be only used in datasource as it may cause performance regression for normal cases(see these comments). However after an offline discussion with @marmbrus , this optimization is pretty standard and generally good, so let's keep it in Optimizer :)

@marmbrus
Copy link
Contributor

marmbrus commented Sep 3, 2015

Thanks for working on this! Can you please add test cases for these optimizations?

Yash Datta added 2 commits September 10, 2015 11:37
…, using these rules:

            A and (not(A) or B) => A and B
            not(A and B) => not(A) or not(B)
            not(A or B) => not(A) and not(B)
@saucam
Copy link
Author

saucam commented Sep 10, 2015

added test cases

@SparkQA
Copy link

SparkQA commented Sep 10, 2015

Test build #42249 has finished for PR 5700 at commit de2dbdc.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 10, 2015

Test build #42243 has finished for PR 5700 at commit 7ada093.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 10, 2015

Test build #42257 has finished for PR 5700 at commit 2dbfba6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

Thanks, merging to master.

@asfgit asfgit closed this in f892d92 Sep 10, 2015
@saucam
Copy link
Author

saucam commented Sep 10, 2015

thanks @marmbrus :)

case (l, Or(l1, r)) if (Not(l) fastEquals l1) => And(l, r)
case (l, Or(r, l1)) if (Not(l) fastEquals l1) => And(l, r)
case (Or(l, l1), r) if (l1 fastEquals Not(r)) => And(l, r)
case (Or(l1, l), r) if (l1 fastEquals Not(r)) => And(l, r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need fastEquals here? Not(l) and l1 will never be a same reference and we always fallback to normal equality check. I think just here == here is more reasonable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants