[SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateInSet produces wrong stats for STRING #21147

gatorsmile · 2018-04-25T04:00:20Z

What changes were proposed in this pull request?

colStat.min AND colStat.max are empty for string type. Thus, evaluateInSet should not return zero when either colStat.min or colStat.max.

How was this patch tested?

Added a test case.

gatorsmile · 2018-04-25T04:00:50Z

cc @cloud-fan @wzhfy

cloud-fan · 2018-04-25T04:25:56Z

...ain/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala

    // use [min, max] to filter the original hSet
    dataType match {
      case _: NumericType | BooleanType | DateType | TimestampType =>
+        if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty)  {


I think we always have max/min for integral type? cc @wzhfy

min/max could be None when the table is empty

min/max can be None if the column contains only null values. This is exactly the case for my query.

SparkQA · 2018-04-25T05:53:19Z

Test build #89815 has finished for PR 21147 at commit 9672f92.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

wzhfy · 2018-04-26T08:29:04Z

LGTM

wzhfy · 2018-04-26T10:53:13Z

retest this please

cloud-fan · 2018-04-26T11:09:17Z

somehow I thought it has passed tests and I has merged it to master... Anyway this is a pretty safe change and I don't think it will break any tests. Let's see the test result later.

SparkQA · 2018-04-26T12:56:26Z

Test build #89884 has finished for PR 21147 at commit 9672f92.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-04-26T16:14:48Z

The failed HiveClientSuite is known to be flaky and should not be related to this PR.

fix

9672f92

cloud-fan reviewed Apr 25, 2018

View reviewed changes

gatorsmile mentioned this pull request Apr 25, 2018

[SPARK-23799][SQL] FilterEstimation.evaluateInSet produces devision by zero in a case of empty table with analyzed statistics #21052

Closed

asfgit closed this in ce2f919 Apr 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateInSet produces wrong stats for STRING #21147

[SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateInSet produces wrong stats for STRING #21147

gatorsmile commented Apr 25, 2018

gatorsmile commented Apr 25, 2018

cloud-fan Apr 25, 2018

wzhfy Apr 26, 2018

mshtelma Apr 26, 2018

SparkQA commented Apr 25, 2018

wzhfy commented Apr 26, 2018

wzhfy commented Apr 26, 2018

cloud-fan commented Apr 26, 2018

SparkQA commented Apr 26, 2018

cloud-fan commented Apr 26, 2018

[SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateInSet produces wrong stats for STRING #21147

[SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateInSet produces wrong stats for STRING #21147

Conversation

gatorsmile commented Apr 25, 2018

What changes were proposed in this pull request?

How was this patch tested?

gatorsmile commented Apr 25, 2018

cloud-fan Apr 25, 2018

Choose a reason for hiding this comment

wzhfy Apr 26, 2018

Choose a reason for hiding this comment

mshtelma Apr 26, 2018

Choose a reason for hiding this comment

SparkQA commented Apr 25, 2018

wzhfy commented Apr 26, 2018

wzhfy commented Apr 26, 2018

cloud-fan commented Apr 26, 2018

SparkQA commented Apr 26, 2018

cloud-fan commented Apr 26, 2018