Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Java implementation of Arrow C data interface #28685

Closed
asfimport opened this issue Jun 4, 2021 · 11 comments
Closed

[Java] Java implementation of Arrow C data interface #28685

asfimport opened this issue Jun 4, 2021 · 11 comments

Comments

@asfimport
Copy link

asfimport commented Jun 4, 2021

As mentioned in ARROW-7272 or in PR #10201, it would be valuable to have the Arrow C data interface implemented in Java to provide better inter-process data sharing facility without depending on 3rd party serialization protocols.

Reporter: Hongze Zhang / @zhztheplayer
Assignee: Roee Shlomo / @roee88

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-12965. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Roee Shlomo / @roee88:
FYI we have a use case for interop between arrow Java and Rust. I started to implement the C data interface in https://github.com/roee88/arrow/tree/java-ffi/java (ffi module).

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
@roee88 Are you still interested to work on this? It would be nice to have the C data interface supported by the Arrow Java implementation at some point.

@asfimport
Copy link
Author

Roee Shlomo / @roee88:

 Are you still interested to work on this?

Yes I plan to contribute it. Nothing changed in the last two days :)

However, from a testing perspective it seems like the best I can do are some roundtrip tests (like in arrow-rs).  A separate effort for C Data Interface integration tests is needed (not just for java) to fully test such implementations. Are there any plans or a design/discussion on it? 

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Woops, sorry, for some reason I thought your comment was much older than that :-)

Yes, roundtrip tests are a good start. I agree that integration tests would be desirable; there hasn't been any discussion about them for now.

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Note we have Python-based tests for an ad-hoc Python-Java bridge, which may give ideas for how to test the C data interface:
https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_jvm.py

@asfimport
Copy link
Author

Roee Shlomo / @roee88:
We noticed that for UnionArray the cpp implementation includes a validity buffer although it shouldn't be there according to the specification. In the arrow2 rust implementation a workaround is used (in the FFI code for Union an empty validity buffer is explicitly added). To me it sounds like a bug that should be resolved in arrow-cpp instead. @pitrou Should we try to workaround it in Java too or open a bug ticket?

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Hmm... I think the validity buffer is null, even though the pointer is there. Isn't that the case?

@asfimport
Copy link
Author

Roee Shlomo / @roee88:
According to the specification it shouldn't be there at all. https://arrow.apache.org/docs/format/Columnar.html#buffer-listing-for-each-layout

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Oh, you're right. Can you open a Jira then?

@asfimport
Copy link
Author

Roee Shlomo / @roee88:
ARROW-14179 opened

@asfimport
Copy link
Author

Kouhei Sutou / @kou:
Issue resolved by pull request 11067
#11067

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant