New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor and nested types support for Parquet Reader #1314
Conversation
Performance went down ~10% so need to investigate what's going on there. |
CC @oerling |
It seems that there is a bug: after all data has been |
Could you please post a full example of what goes wrong in a new issue? Not sure I understand. |
Ah, actually, disregard that, it seems that the behaviour of duckdb has changed
Now:
So, it used to return an empty chunk, now it returns a nullptr. |
This PR refactors and extends the Parquet reader. A major feature addition is the support for nested types in Parquet files, which are mapped to DuckDB's
STRUCT
andLIST
types. Under the hood the Parquet reader now does zero-copy of strings, which should increase performance. While I was at it, I also addedDATE
support to Parquet, should improve the TPCH timings ^^