Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Change parquet::arrow::FileReader::ReadRowGroups to read into contiguous arrays #15869

Closed
asfimport opened this issue Aug 19, 2018 · 2 comments

Comments

@asfimport
Copy link

Instead of creating a chunk per RowGroup, we should read at least for primitive type into a single, pre-allocated Array. This needs some new functionality in the Record reader classes and thus should be done after apache/parquet-cpp#462 is merged.

Reporter: Uwe Korn / @xhochy
Assignee: Wes McKinney / @wesm

Related issues:

Note: This issue was originally created as ARROW-3774. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
I'm not sure this is really useful. Closing

@asfimport
Copy link
Author

Wes McKinney / @wesm:
The main use case would be for pandas (where things need to be contiguous), but there memory will have to be copied in general when calling pyarrow.Table.to_pandas, so the benefits of this optimization would be minimal, if any. Producing large contiguous arrays could even be more expensive than the current behavior of creating chunked arrays

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants