-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet filter pushdown #23297
Comments
That was implemented for Hive (only) some time ago. Same is needed for file(..) and s3(...) |
Reviving this because it's super useful, duckdb has this and that makes an insane difference in more selective queries on larger files |
second it, on select with where query duckdb vastly outperform Clickhouse (10+ times ). |
Don't forget |
For reference, you can see clickhouse is faster until the queries have any sort of selectivity in such a way that the minxmax index of the file can be used. In my experience, I am always filtering so I am forced to use DuckDB for the performance gains of parquet files. |
Don’t forget to update clickbench! |
Limit reads from parquet file, when filters exist: similar https://drill.apache.org/docs/parquet-filter-pushdown/
The text was updated successfully, but these errors were encountered: