Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flight_data_from_arrow_batch sends too much data #28075

Closed
asfimport opened this issue Apr 7, 2021 · 1 comment
Closed

flight_data_from_arrow_batch sends too much data #28075

asfimport opened this issue Apr 7, 2021 · 1 comment

Comments

@asfimport
Copy link

Arrow arrays can share the same backing store, even if the array is just a "view" of a slice of another array.

Yet, when flight_data_from_arrow_batch encodes the arrays into a FlightData, it blindly copies the entire buffer ready to be sent over the wire.

Thus, for example, when DataFusion uses the arrow::compute::limit operator to return a few elements of an array, we still end up with a the full (potentially) large array being sent over the wire.

 

Since encoding the array in a FlightData involves copying the data anyway, perhaps it would be beneficial to take the Array length in consideration and copy only the parts of the buffer that contain actual data.

Reporter: Marko Mikulicic

Note: This issue was originally created as ARROW-12265. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Andrew Lamb / @alamb:
Migrated to github: apache/arrow-rs#208

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant