[SPARK-17075][SQL][followup] Add Estimation of Constant Literal #17446

gatorsmile · 2017-03-28T00:53:51Z

What changes were proposed in this pull request?

FalseLiteral and TrueLiteral should have been eliminated by optimizer rule BooleanSimplification, but null literals might be added by optimizer rule NullPropagation. For safety, our filter estimation should handle all the eligible literal cases.

Our optimizer rule BooleanSimplification is unable to remove the null literal in many cases. For example, a < 0 or null. Thus, we need to handle null literal in filter estimation.

Not can be pushed down below And and Or. Then, we could see two consecutive Not, which need to be collapsed into one. Because of the limited expression support for filter estimation, we just need to handle the case Not(null) for avoiding incorrect error due to the boolean operation on null. For details, see below matrix.

not NULL = NULL
NULL or false = NULL
NULL or true = true
NULL or NULL = NULL
NULL and false = false
NULL and true = NULL
NULL and NULL = NULL

How was this patch tested?

Added the test cases.

…imation

gatorsmile · 2017-03-28T00:54:17Z

cc @ron8hu @cloud-fan @wzhfy

gatorsmile · 2017-03-28T00:56:37Z

Ideally, our optimizer rule BooleanSimplification should be able to remove the null literal, like what we did for true and false literals. Here, I think we should just assume our filter estimation does not depend on any other optimizer rules.

SparkQA · 2017-03-28T03:03:02Z

Test build #75289 has finished for PR 17446 at commit d770dbb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-28T03:47:10Z

Test build #75291 has finished for PR 17446 at commit 1775ca7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ron8hu · 2017-03-28T03:53:28Z

The logic is straightforward. LGTM.

gatorsmile · 2017-03-28T04:01:36Z

Thank you! @ron8hu

cloud-fan · 2017-03-28T05:29:45Z

...ain/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala

+   */
+  def evaluateLiteral(literal: Literal): Option[Double] = {
+    literal match {
+      case Literal(null, _) => Some(0.0)


handling null in filter estimation is not trivial, e.g. null and false returns false, null and true returns true. If we estimate cond && null, we will report 0 selectivity, which is wrong.

I think we should eliminate null literal in optimizer when it's involved in filter condition.

Yes, let me close it.

not NULL = NULL NULL or false = NULL NULL or true = true NULL or NULL = NULL NULL and false = false NULL and true = NULL NULL and NULL = NULL

Wait... It behaves correctly, right?

wzhfy · 2017-03-28T10:47:36Z

Ideally, our optimizer rule BooleanSimplification should be able to remove the null literal, like what we did for true and false literals. Here, I think we should just assume our filter estimation does not depend on any other optimizer rules.

Sorry I may miss some context, but I prefer enhancing BooleanSimplification to deal with null literals. There's no need to complicate estimation logic for those deterministic literals which can be removed by optimizer.

gatorsmile · 2017-03-28T19:41:21Z

@wzhfy BooleanSimplification is unable to get rid of null in many cases. Boolean operations on null is complex. We need more investigation.

gatorsmile · 2017-03-28T22:59:19Z

Since this PR does not correctly handle the cases like Not(null), I close this PR at first.

gatorsmile · 2017-03-29T07:39:28Z

null should not be simply treated as a false literal in filter estimation. Based on the definition, Not(null) should return null. If we treat null as false, Not(null) will return 1.0, which is wrong in many cases.

wzhfy · 2017-03-29T09:33:12Z

LGTM

SparkQA · 2017-03-29T09:44:06Z

Test build #75348 has finished for PR 17446 at commit 8ec57f3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-03-29T19:42:55Z

Thanks! Merging to master.

gatorsmile added 2 commits March 27, 2017 17:47

fix.

8a0e8b0

Merge remote-tracking branch 'upstream/master' into constantFilterEst…

d770dbb

…imation

revert

1775ca7

cloud-fan reviewed Mar 28, 2017

View reviewed changes

gatorsmile closed this Mar 28, 2017

gatorsmile reopened this Mar 28, 2017

gatorsmile closed this Mar 28, 2017

gatorsmile reopened this Mar 29, 2017

fix.

8ec57f3

asfgit closed this in 5c8ef37 Mar 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-17075][SQL][followup] Add Estimation of Constant Literal #17446

[SPARK-17075][SQL][followup] Add Estimation of Constant Literal #17446

gatorsmile commented Mar 28, 2017 •

edited

gatorsmile commented Mar 28, 2017

gatorsmile commented Mar 28, 2017

SparkQA commented Mar 28, 2017

SparkQA commented Mar 28, 2017

ron8hu commented Mar 28, 2017

gatorsmile commented Mar 28, 2017

cloud-fan Mar 28, 2017

gatorsmile Mar 28, 2017

gatorsmile Mar 28, 2017

wzhfy commented Mar 28, 2017

gatorsmile commented Mar 28, 2017 •

edited

gatorsmile commented Mar 28, 2017 •

edited

gatorsmile commented Mar 29, 2017

wzhfy commented Mar 29, 2017

SparkQA commented Mar 29, 2017

gatorsmile commented Mar 29, 2017

[SPARK-17075][SQL][followup] Add Estimation of Constant Literal #17446

[SPARK-17075][SQL][followup] Add Estimation of Constant Literal #17446

Conversation

gatorsmile commented Mar 28, 2017 • edited

What changes were proposed in this pull request?

How was this patch tested?

gatorsmile commented Mar 28, 2017

gatorsmile commented Mar 28, 2017

SparkQA commented Mar 28, 2017

SparkQA commented Mar 28, 2017

ron8hu commented Mar 28, 2017

gatorsmile commented Mar 28, 2017

cloud-fan Mar 28, 2017

Choose a reason for hiding this comment

gatorsmile Mar 28, 2017

Choose a reason for hiding this comment

gatorsmile Mar 28, 2017

Choose a reason for hiding this comment

wzhfy commented Mar 28, 2017

gatorsmile commented Mar 28, 2017 • edited

gatorsmile commented Mar 28, 2017 • edited

gatorsmile commented Mar 29, 2017

wzhfy commented Mar 29, 2017

SparkQA commented Mar 29, 2017

gatorsmile commented Mar 29, 2017

gatorsmile commented Mar 28, 2017 •

edited

gatorsmile commented Mar 28, 2017 •

edited

gatorsmile commented Mar 28, 2017 •

edited