New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python, Java] UnionArray round trip not working #17700
Comments
Wes McKinney / @wesm: See open PR #987 |
Wes McKinney / @wesm: |
Li Jin / @icexelloss: |
Li Jin / @icexelloss: |
Philipp Moritz / @pcmoritz: If there is anything I can do to speed up us having Dense Union support in Java that interoperates with C++ let me know! |
Philipp Moritz / @pcmoritz: jshell> ArrowStreamReader reader = new ArrowStreamReader(in, allocator);
reader ==> org.apache.arrow.vector.ipc.ArrowStreamReader@55cf0d14
jshell> ByteArrayInputStream in = new ByteArrayInputStream(Files.readAllBytes(Paths.get("/Users/pcmoritz/arrow/python/union_array.arrow")));
in ==> java.io.ByteArrayInputStream@3b74ac8
jshell> ArrowStreamReader reader = new ArrowStreamReader(in, allocator);
reader ==> org.apache.arrow.vector.ipc.ArrowStreamReader@27adc16e
jshell> reader.loadNextBatch()
| java.lang.IndexOutOfBoundsException thrown:
| at Buffer.checkIndex (Buffer.java:675)
| at HeapByteBuffer.getInt (HeapByteBuffer.java:405)
| at Table.__string (Table.java:50)
| at KeyValue.key (KeyValue.java:21)
| at Field.convertField (Field.java:126)
| at Field.convertField (Field.java:118)
| at Schema.convertSchema (Schema.java:85)
| at MessageSerializer.deserializeSchema (MessageSerializer.java:112)
| at ArrowStreamReader.readSchema (ArrowStreamReader.java:128)
| at ArrowReader.initialize (ArrowReader.java:181)
| at ArrowReader.ensureInitialized (ArrowReader.java:172)
| at ArrowReader.prepareLoadNextBatch (ArrowReader.java:211)
| at ArrowStreamReader.loadNextBatch (ArrowStreamReader.java:103)
| at (#12:1) |
Wes McKinney / @wesm: |
Wes McKinney / @wesm: |
I'm currently working on making pyarrow.serialization data available from the Java side, one problem I was running into is that it seems the Java implementation cannot read UnionArrays generated from C++. To make this easily reproducible I created a clean Python implementation for creating UnionArrays: #1216
The data is generated with the following script:
I attached the file generated by that script. Then when I run the following code in Java:
I get the following error:
It seems like Java is not picking up that the UnionArray is Dense instead of Sparse. After changing the default in java/vector/src/main/codegen/templates/UnionVector.java from Sparse to Dense, I get this:
but then reading doesn't work:
Any help with this is appreciated!
Reporter: Philipp Moritz / @pcmoritz
Assignee: Ryan Murray / @rymurr
Related issues:
Original Issue Attachments:
PRs and other links:
Note: This issue was originally created as ARROW-1692. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: