Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrayData buffers are inconsistent accross implementations #28039

Closed
asfimport opened this issue Apr 6, 2021 · 5 comments
Closed

ArrayData buffers are inconsistent accross implementations #28039

asfimport opened this issue Apr 6, 2021 · 5 comments

Comments

@asfimport
Copy link

ArrayData implementations seems to share close structure fields accross languages, but their usage is not consistent accross implementation.

 

Example using ListArray's offsets buffer, in C++, Rust and JavaScript implementation:

 - C++: offset's buffer is the second buffer (validity bitmap is first buffer, and buffers are laid in a type-dependant way) https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/array_nested.cc#L189

 - Rust: offset's buffer is the first buffer (validity bitmap is not part of the collection, and buffers are laid in a type-dependant way) https://github.com/apache/arrow/blob/master/rust/arrow/src/array/array_list.rs#L235

 - JavaScript: offset's buffer is the first buffer (they have fixed position)

(buffer = (buffers as Buffers<T>)[0]) && (this.valueOffsets = buffer);

 

Note that we have the same inconsistency for validity and data buffers.

 

This is important in my project because I would like to transport buffers list accross technologies, and ArrayData seemed the easiest structure to transport.

Reporter: Vincent Trumpff

Note: This issue was originally created as ARROW-12223. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Andrew Lamb / @alamb:
Migrated to github: apache/arrow-rs#207

@asfimport
Copy link
Author

Vincent Trumpff:
Thanks for pushing on it!

Note that there is not only inconsistency between rust implementation and the others, but also between C++ and JavaScript implementation.

@asfimport
Copy link
Author

Vincent Trumpff:
Reopening as we still have difference between C++ and JavaScript unless I missed something?

 

Edit:

If the community do agree with the point, I would be happy to propose a PR on C++ and JavaScript sources.

@asfimport
Copy link
Author

Micah Kornfield / @emkornfield:
As far as I know it has not been the goal of the project to have identical implementations of items not specified in the specification (IPC/File and FFI ABI).

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Late answer, but I don't think this is a bug indeed. If you want to transport data between runtimes, there are two official solutions:

  • the Arrow IPC protocol
  • the Arrow C data interface (only for in-process data sharing, guaranteed zero-copy)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant