-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateInSet produces wrong stats for STRING #21147
Conversation
cc @cloud-fan @wzhfy |
// use [min, max] to filter the original hSet | ||
dataType match { | ||
case _: NumericType | BooleanType | DateType | TimestampType => | ||
if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we always have max/min for integral type? cc @wzhfy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
min/max could be None when the table is empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
min/max can be None if the column contains only null values. This is exactly the case for my query.
Test build #89815 has finished for PR 21147 at commit
|
LGTM |
retest this please |
somehow I thought it has passed tests and I has merged it to master... Anyway this is a pretty safe change and I don't think it will break any tests. Let's see the test result later. |
Test build #89884 has finished for PR 21147 at commit
|
The failed |
What changes were proposed in this pull request?
colStat.min
ANDcolStat.max
are empty for string type. Thus,evaluateInSet
should not return zero when eithercolStat.min
orcolStat.max
.How was this patch tested?
Added a test case.