Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Endianness field not emitted in IPC stream #31221

Closed
asfimport opened this issue Feb 24, 2022 · 11 comments
Closed

[Java] Endianness field not emitted in IPC stream #31221

asfimport opened this issue Feb 24, 2022 · 11 comments
Assignees
Milestone

Comments

@asfimport
Copy link
Collaborator

It seems the Java IPC writer implementation does not emit the Endianness information at all (making it Little by default). This complicates interoperability with the C++ IPC reader, which does read this information and acts on it to decide whether it needs to byteswap the incoming data.

Reporter: Antoine Pitrou / @pitrou
Assignee: Kazuaki Ishizaki / @kiszk

PRs and other links:

Note: This issue was originally created as ARROW-15778. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
The offending code seems to be there:
https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/types/pojo/Schema.java#L202-L213

This seems reasonably easy to fix (perhaps a one-line fix, though a test should ideally be added as well).

@emkornfield @kiszk

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
Also cc @lidavidm since this affects Flight.

@asfimport
Copy link
Collaborator Author

David Li / @lidavidm:
Flight doesn't do any endianness detection/negotiation anyways (it expects producer/consumer to set appropriate options) though we should eventually fix that.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
I'm not sure any form of negotiation is needed? The way it works at the IPC level is that the writer emits data in whichever endianness it chooses (also setting the corresponding metadata field to the appropriate value) and the reader decides to byte-swap data is required. So it would work similarly at the Flight level.

@asfimport
Copy link
Collaborator Author

David Li / @lidavidm:
Ah, I misunderstood then, thanks for clarifying.

@asfimport
Copy link
Collaborator Author

Kazuaki Ishizaki / @kiszk:
@pitrou Thank you. In another issue, I am suspecting the endianness in schema. I will look at this.

@asfimport
Copy link
Collaborator Author

Kazuaki Ishizaki / @kiszk:
This fix is already created here. We confirmed this issue was solved by this fix.

The challenge is to create a test case. I realized that the latest arrow requires more native libraries for the build. It takes some time. So, I will create a test case with an older version arrow.

@asfimport
Copy link
Collaborator Author

Ravi Gummadi:
Thanks @kiszk  

@asfimport
Copy link
Collaborator Author

Bryan Cutler / @BryanCutler:
@kiszk since your patch looks to be pretty simple and low risk, what do you think about making a PR for it as is? If it passes current tests, we know it doesn't mess up anything for little endian. Then we could have a separate task to add a proper test with big endian machine, if that requires more effort?

@asfimport
Copy link
Collaborator Author

Kazuaki Ishizaki / @kiszk:
@BryanCutler Thank you for your suggestion. Sure, I will create a simple PR as is, and will open an follow-up issue to add a proper test.

@asfimport
Copy link
Collaborator Author

Bryan Cutler / @BryanCutler:
Issue resolved by pull request 12777
#12777

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants