-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Respect page index policy option for ParquetObjectReader when it's not skip #8857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @zhuqi-lucas -- this makes sense to me
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Thank you @alamb for quick review! |
etseidl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ok to me. I'm not up on all uses of this API so I wonder if down the road we can just get rid of the preloading options and require use of ArrowReaderOptions if one wants control over page index loading.
I think for now we should add a note to with_preload_column_index and with_preload_offset_index that settings in ArrowReaderOptions might override, as this is non-obvious.
Thank you @etseidl for review and good suggestion, i added comments in latest commit. And i also created a follow-up ticket, we may can get rid of the reloading options and require use of |
I agree the current state of the structures is confusing (nothing that you did @zhuqi-lucas). In my opinion part of the confusion stems from having I think those are separate concerns and in an ideal API they wouldn't be in the same object |
|
Thank you @zhuqi-lucas and @etseidl |
Which issue does this PR close?
When i try to wrapper ParquetObjectReader and implement our internal metadata cache, and i will pass the option to the inner ParquetObjectReader, but it does not respect the index policy option even it's not skip, and it always be false and will not load page index which i want to prefetch and cache.
cc @alamb
Rationale for this change
Support option with page index if it's not skip.
Are these changes tested?
Yes
Are there any user-facing changes?
No