Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateInSet produces wrong stats for STRING #21147

Closed
wants to merge 1 commit into from

Conversation

gatorsmile
Copy link
Member

What changes were proposed in this pull request?

colStat.min AND colStat.max are empty for string type. Thus, evaluateInSet should not return zero when either colStat.min or colStat.max.

How was this patch tested?

Added a test case.

@gatorsmile
Copy link
Member Author

cc @cloud-fan @wzhfy

// use [min, max] to filter the original hSet
dataType match {
case _: NumericType | BooleanType | DateType | TimestampType =>
if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we always have max/min for integral type? cc @wzhfy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min/max could be None when the table is empty

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min/max can be None if the column contains only null values. This is exactly the case for my query.

@SparkQA
Copy link

SparkQA commented Apr 25, 2018

Test build #89815 has finished for PR 21147 at commit 9672f92.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wzhfy
Copy link
Contributor

wzhfy commented Apr 26, 2018

LGTM

@wzhfy
Copy link
Contributor

wzhfy commented Apr 26, 2018

retest this please

@cloud-fan
Copy link
Contributor

somehow I thought it has passed tests and I has merged it to master... Anyway this is a pretty safe change and I don't think it will break any tests. Let's see the test result later.

@asfgit asfgit closed this in ce2f919 Apr 26, 2018
@SparkQA
Copy link

SparkQA commented Apr 26, 2018

Test build #89884 has finished for PR 21147 at commit 9672f92.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

The failed HiveClientSuite is known to be flaky and should not be related to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants