Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17075][SQL][followup] Add Estimation of Constant Literal #17446

Closed
wants to merge 4 commits into from

Conversation

gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Mar 28, 2017

What changes were proposed in this pull request?

FalseLiteral and TrueLiteral should have been eliminated by optimizer rule BooleanSimplification, but null literals might be added by optimizer rule NullPropagation. For safety, our filter estimation should handle all the eligible literal cases.

Our optimizer rule BooleanSimplification is unable to remove the null literal in many cases. For example, a < 0 or null. Thus, we need to handle null literal in filter estimation.

Not can be pushed down below And and Or. Then, we could see two consecutive Not, which need to be collapsed into one. Because of the limited expression support for filter estimation, we just need to handle the case Not(null) for avoiding incorrect error due to the boolean operation on null. For details, see below matrix.

not NULL = NULL
NULL or false = NULL
NULL or true = true
NULL or NULL = NULL
NULL and false = false
NULL and true = NULL
NULL and NULL = NULL

How was this patch tested?

Added the test cases.

@gatorsmile
Copy link
Member Author

cc @ron8hu @cloud-fan @wzhfy

@gatorsmile
Copy link
Member Author

Ideally, our optimizer rule BooleanSimplification should be able to remove the null literal, like what we did for true and false literals. Here, I think we should just assume our filter estimation does not depend on any other optimizer rules.

@SparkQA
Copy link

SparkQA commented Mar 28, 2017

Test build #75289 has finished for PR 17446 at commit d770dbb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 28, 2017

Test build #75291 has finished for PR 17446 at commit 1775ca7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ron8hu
Copy link
Contributor

ron8hu commented Mar 28, 2017

The logic is straightforward. LGTM.

@gatorsmile
Copy link
Member Author

Thank you! @ron8hu

*/
def evaluateLiteral(literal: Literal): Option[Double] = {
literal match {
case Literal(null, _) => Some(0.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handling null in filter estimation is not trivial, e.g. null and false returns false, null and true returns true. If we estimate cond && null, we will report 0 selectivity, which is wrong.

I think we should eliminate null literal in optimizer when it's involved in filter condition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let me close it.

not NULL = NULL
NULL or false = NULL
NULL or true = true
NULL or NULL = NULL
NULL and false = false
NULL and true = NULL
NULL and NULL = NULL

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait... It behaves correctly, right?

@gatorsmile gatorsmile closed this Mar 28, 2017
@gatorsmile gatorsmile reopened this Mar 28, 2017
@wzhfy
Copy link
Contributor

wzhfy commented Mar 28, 2017

Ideally, our optimizer rule BooleanSimplification should be able to remove the null literal, like what we did for true and false literals. Here, I think we should just assume our filter estimation does not depend on any other optimizer rules.

Sorry I may miss some context, but I prefer enhancing BooleanSimplification to deal with null literals. There's no need to complicate estimation logic for those deterministic literals which can be removed by optimizer.

@gatorsmile
Copy link
Member Author

gatorsmile commented Mar 28, 2017

@wzhfy BooleanSimplification is unable to get rid of null in many cases. Boolean operations on null is complex. We need more investigation.

@gatorsmile
Copy link
Member Author

gatorsmile commented Mar 28, 2017

Since this PR does not correctly handle the cases like Not(null), I close this PR at first.

@gatorsmile gatorsmile closed this Mar 28, 2017
@gatorsmile gatorsmile reopened this Mar 29, 2017
@gatorsmile
Copy link
Member Author

null should not be simply treated as a false literal in filter estimation. Based on the definition, Not(null) should return null. If we treat null as false, Not(null) will return 1.0, which is wrong in many cases.

@wzhfy
Copy link
Contributor

wzhfy commented Mar 29, 2017

LGTM

@SparkQA
Copy link

SparkQA commented Mar 29, 2017

Test build #75348 has finished for PR 17446 at commit 8ec57f3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

Thanks! Merging to master.

@asfgit asfgit closed this in 5c8ef37 Mar 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants