Skip to content

Use Parquet statistics for count(*) #618

@Dandandan

Description

@Dandandan

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently, we need to read some column in order to compute the number of records of the table correctly. For sources like Parquet however, we can use the table statistics instead.

Describe the solution you'd like
Create an optimization rule that uses the statistics and replaced the count expression with the statistics based size of the table (sum of row counts).

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestperformanceMake DataFusion faster

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions