Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Enable fine-grained I/O (coalescing) in IPC reader #28430

Closed
asfimport opened this issue May 7, 2021 · 1 comment
Closed

[C++] Enable fine-grained I/O (coalescing) in IPC reader #28430

asfimport opened this issue May 7, 2021 · 1 comment

Comments

@asfimport
Copy link
Collaborator

asfimport commented May 7, 2021

ARROW-11772 enables I/O coalescing in the IPC reader, but the reader operates at the granularity of an entire record batch; even if you're loading only a few columns, the entire record batch is read. When on a high-latency file system (e.g. S3), we may be able to get further performance improvement by traversing the schema and reading only the buffers we need to read. This can be combined with coalescing to reduce the number of I/O calls that need to be made.

(Maybe there's another savings here in that instead of traversing the schema every time to figure out the buffer layout, we can do that only once up front and then reuse the layout subsequently?)

While ArrayLoader already appears to perform this optimization, it's being handed an in-memory buffer in the first place, so no savings are accomplished.

Reporter: David Li / @lidavidm
Assignee: Yue Ni / @niyue

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-12683. Please see the migration documentation for further details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant