-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Enable parquet reader v3 by default #88827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Workflow [PR], commit [9fb2c95] Summary: ❌
|
|
Im curious, where can I find benchmarks for the v1 vs v3 parquet reader? |
Probably nowhere yet. |
|
Disabled prewhere in datalakes for now. To make it work we'll have to change how ColumnMapper is used and how delta lake partition columns are added, @divanik : |
|
The remaining failed tests are flaky. |
In both these suggestions I see one problem: we delegate complicated behaviour from the source to a format (one layer down). This could be ok if we support only parquet v3, but I think that we want to preserve normal behaviour with other formats and old parquet reader, and in this case the task is becoming more tedious. |
Yep, the same transform would need to be implemented both inside the reader and as a separate IProcessor (for other formats), and there'd be some awkward code duplication between the two (hopefully reusing code for the important parts), likely with some bugs caused by unintended differences in behavior between the two. I don't have other ideas though. Do you? |
|
The Upgrade check is broken in master: |
…eld_id Antalya 25.8 Partially backport ClickHouse#88827
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Parquet reader v3 is enabled by default.