New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27698][SQL] Add new method convertibleFilters
for getting pushed down filters in Parquet file reader
#24597
Conversation
mark this as WIP before #24598 is merged. |
Test build #105376 has finished for PR 24597 at commit
|
75fe737
to
fa8f48c
Compare
fa8f48c
to
b22ea80
Compare
Test build #105527 has finished for PR 24597 at commit
|
Test build #105529 has finished for PR 24597 at commit
|
This is ready for review @dongjoon-hyun @wangyum @rdblue @cloud-fan |
Thank you for pinging me, @gengliangwang . Shall we wait for one day? Currently, after SPARK-27699, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me.
retest this please. |
@dongjoon-hyun @cloud-fan Please review this, so that we can continue the migration of Parquet V2. |
Test build #105613 has finished for PR 24597 at commit
|
convertibleFilters
for getting pushed down filters in Parquet file reader
thanks, merging to master! |
/** | ||
* Returns a map, which contains parquet field name and data type, if predicate push down applies. | ||
*/ | ||
private def getFieldMap(dataType: MessageType): Map[String, ParquetField] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gengliangwang, don't have to move codes around to make it easier to track ...
Looks fine but can you clarify the relation between convertibleFilters and createFilters? |
What changes were proposed in this pull request?
To return accurate pushed filters in Parquet file scan(#24327 (review)), we can process the original data source filters in the following way:
For "And" operators, split the conjunctive predicates and try converting each of them. After that
1.1 if partially predicate pushed down is allowed, return convertible results;
1.2 otherwise, return the whole predicate if convertible, or empty result if not convertible.
For "Or" operators, if both children can be pushed down, it is partially or totally convertible; otherwise, return empty result
For other operators, they are not able to be partially pushed down.
2.1 if the entire predicate is convertible, return itself
2.2 otherwise, return an empty result.
This PR also contains code refactoring. Currently
ParquetFilters. createFilter
accepts parameterschema: MessageType
and create field mapping for every input filter. We can make it a class member and avoid creating thenameToParquetField
mapping for every input filter.How was this patch tested?
Unit test