-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect lower/upper bounds for nested struct fields in ParquetMetrics #136
Collect lower/upper bounds for nested struct fields in ParquetMetrics #136
Conversation
Type currentType = schema.asStruct(); | ||
|
||
while (pathIterator.hasNext()) { | ||
if (currentType == null || !currentType.isStructType()) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: control flow should always use {
and }
.
Looks great to me other than one style problem. @prodeezy has also been working in this area, so I'd like to hear what he thinks, too. |
@prodeezy it would be great to do an end-to-end test with your work to see that everything works as expected. |
Thanks for this PR @aokolnychyi , ran an end to end test with this patch applied on latest code in master ..
|
Functionally this patch works end to end along with the struct filter fix. nice work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit about comment to allow struct nesting. LGTM otherwise.
@@ -105,6 +107,22 @@ public static Metrics fromMetadata(ParquetMetadata metadata) { | |||
toBufferMap(fileSchema, lowerBounds), toBufferMap(fileSchema, upperBounds)); | |||
} | |||
|
|||
// we allow struct nesting, but not maps or arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment can be a bit more descriptive of the fact that this check also precludes structs containing maps or array and vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the wording here is okay. I'd rather merge now than wait for an update here. I'd be happy to merge a clarification PR though.
Merged. Thanks @aokolnychyi for fixing this and @prodeezy for the review! |
FYI, @prodeezy I restart the work of apache/spark#22573 and I will try to have it merged by Spark 3.0 |
This PR enables collection of lower/upper bounds for nested struct fields in
ParquetMetrics
.The test is pretty simple as
TestParquetMetrics
already has a test for map/list elements as well as a test for all supported data types.This resolves #78.