Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet filter pushdown #23297

Closed
filimonov opened this issue Apr 19, 2021 · 6 comments · Fixed by #52951
Closed

Parquet filter pushdown #23297

filimonov opened this issue Apr 19, 2021 · 6 comments · Fixed by #52951
Assignees
Labels

Comments

@filimonov
Copy link
Contributor

Limit reads from parquet file, when filters exist: similar https://drill.apache.org/docs/parquet-filter-pushdown/

@filimonov
Copy link
Contributor Author

That was implemented for Hive (only) some time ago. Same is needed for file(..) and s3(...)

#34631

@danthegoodman1
Copy link

Reviving this because it's super useful, duckdb has this and that makes an insane difference in more selective queries on larger files

@minguyen9988
Copy link

minguyen9988 commented Jul 28, 2023

second it, on select with where query duckdb vastly outperform Clickhouse (10+ times ).

@danthegoodman1
Copy link

That was implemented for Hive (only) some time ago. Same is needed for file(..) and s3(...)

#34631

Don't forget url() (and *cluster versions!)

@danthegoodman1
Copy link

danthegoodman1 commented Jul 28, 2023

second it, on select with where query duckdb vastly outperform Clickhouse (10+ times ).

For reference, you can see clickhouse is faster until the queries have any sort of selectivity in such a way that the minxmax index of the file can be used. In my experience, I am always filtering so I am forced to use DuckDB for the performance gains of parquet files.

@danthegoodman1
Copy link

Don’t forget to update clickbench!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants