Skip to content

Addition of option to allow empty row groups in pyarrow #2396

@ghost

Description

While our use case is not common, I was able to find one related request from roughly a year ago. Could this be added as a feature?

https://issues.apache.org/jira/browse/PARQUET-1047

Motivation

We have an application where each row is associated with one of N contexts, though a minority of contexts may have no associated rows. When encountering the Nth context, we will wish to retrieve all the associated rows. Row groups would provide a natural way to index the data, as the nth context could naturally relate to the nth row group.

Unfortunately, this is not possible at the present time, as pyarrow does not support writing empty row groups. If one writes a pyarrow.Table containing zero rows using pyarrow.parquet.ParquetWriter, it is ommited from the final file, and this distorts the indexing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions