Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support filter for parquet data type #2126

Closed
liukun4515 opened this issue Jul 22, 2022 · 4 comments
Closed

Support filter for parquet data type #2126

liukun4515 opened this issue Jul 22, 2022 · 4 comments
Assignees
Labels
parquet Changes to the parquet crate

Comments

@liukun4515
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently, we want to filter a column in parquet file whose logical data type is decimal or string.

In the parquet, the data of decimal and string will be stored as binary and fixed_len_binary.

parquet-rs doesn't has the filter system or the comparison system.

Describe the solution you'd like

implement the filter system and comparison system in parquet-rs.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@liukun4515 liukun4515 added the enhancement Any new improvement worthy of a entry in the changelog label Jul 22, 2022
@liukun4515
Copy link
Contributor Author

Do you have any opinions and suggestions?
@tustvold @Ted-Jiang @alamb

@liukun4515 liukun4515 self-assigned this Jul 22, 2022
@tustvold
Copy link
Contributor

Could you perhaps expand a bit on what API you are expecting for this, I'm not entirely sure what you mean by supporting a filter or comparison system within parquet-rs?

@alamb
Copy link
Contributor

alamb commented Jul 22, 2022

Are you thinking about something like "apply some predicate on a parquet binary column and then only decode pages from other columns that might have matching positions" 🤔

@liukun4515
Copy link
Contributor Author

This has been resolved in the datafusion I have found.
we convert the parquet statistic data to arrow array/ arrow data and apply the filter/predication to them.

Thanks @alamb @tustvold
I will close this issue

@alamb alamb added parquet Changes to the parquet crate and removed enhancement Any new improvement worthy of a entry in the changelog labels Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

No branches or pull requests

3 participants