Skip to content

Columnar encoding of Parquet statistics #9296

@Dandandan

Description

@Dandandan

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently, if using statistics, a lot of time can be spent decoding/summarizing the statistics from the ValueStatistics / Statistics structs (which are large / inefficient structs).
In DataFusion this can sometimes take as much time running the query (or more if the query can be answered from statistics directly).

Describe the solution you'd like
We should consider decoding the statistics into a columnar format (values + null bitmap (if needed)) directly, avoiding needing to convert this later (and possibly help decoding as well a bit as well as memory usage).

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

enhancementAny new improvement worthy of a entry in the changelogperformance

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions