New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet: Should not throw exception for filter contains non-reference term #8243
Conversation
|
||
@Override | ||
public <T> Boolean handleNonReference(Bound<T> term) { | ||
return ROWS_MIGHT_MATCH; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the column is not referenced, then the column is missing, right? How could it then match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here the NonReference
should be it is not a column reference term
. It can be a transform term, for example equal(year(ts), 10)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fokko, I think at this point the columns are bound and transforms have already been removed if they can be (e.g. if they can be translated to a normal predicate on a partition field). This is the case where the transform is left over because there is no corresponding partition field. We could implement future optimizations to run the filter, but returning all rows is good enough for now.
boolean shouldRead = | ||
new ParquetMetricsRowGroupFilter(SCHEMA, equal(truncate("required", 2), "some_value"), true) | ||
.shouldRead(parquetSchema, rowGroupMetadata); | ||
Assumptions.assumeThat(shouldRead) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to use an assertion rather than an assumption? If you use an assumption the test will be skipped if it fails, instead of failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good other than the assumption that I think should be an assertion. I think @Fokko also has some questions that I don't fully understand so I'll defer to him on merging this.
Thanks, @ConeyLiu! |
The parquet filter evaluator should not throw exceptions for those filters containing non-reference terms. For example, the filter evaluator apply on
year(ts) = 20
will throwValidationException
;