[C++] Change parquet::arrow::FileReader::ReadRowGroups to read into contiguous arrays #15869

asfimport · 2018-08-19T14:27:27Z

Instead of creating a chunk per RowGroup, we should read at least for primitive type into a single, pre-allocated Array. This needs some new functionality in the Record reader classes and thus should be done after apache/parquet-cpp#462 is merged.

Reporter: Uwe Korn / @xhochy
Assignee: Wes McKinney / @wesm

Related issues:

[C++] Basic RowGroup filtering (relates to)

_{Note: This issue was originally created as ARROW-3774. Please see the migration documentation for further details.}

asfimport · 2018-12-13T03:18:14Z

Wes McKinney / @wesm:
I'm not sure this is really useful. Closing

asfimport · 2018-12-13T03:19:49Z

Wes McKinney / @wesm:
The main use case would be for pandas (where things need to be contiguous), but there memory will have to be copied in general when calling pyarrow.Table.to_pandas, so the benefits of this optimization would be minimal, if any. Producing large contiguous arrays could even be more expensive than the current behavior of creating chunked arrays

asfimport closed this as completed Dec 13, 2018

asfimport assigned wesm Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Change parquet::arrow::FileReader::ReadRowGroups to read into contiguous arrays #15869

[C++] Change parquet::arrow::FileReader::ReadRowGroups to read into contiguous arrays #15869

asfimport commented Aug 19, 2018

asfimport commented Dec 13, 2018

asfimport commented Dec 13, 2018

[C++] Change parquet::arrow::FileReader::ReadRowGroups to read into contiguous arrays #15869

[C++] Change parquet::arrow::FileReader::ReadRowGroups to read into contiguous arrays #15869

Comments

asfimport commented Aug 19, 2018

Related issues:

asfimport commented Dec 13, 2018

asfimport commented Dec 13, 2018