New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-5224: [Java] Add APIs for supporting directly serialize/deserialize ValueVector #4280
Conversation
cc @BryanCutler , could you please take a look? thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments. I'm still not sure this is a change we want to officially support. @jacques-n or @siddharthteotia could you comment.
java/vector/src/main/java/org/apache/arrow/vector/util/FieldVectorUtility.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/util/FieldVectorUtility.java
Outdated
Show resolved
Hide resolved
java/vector/src/test/java/org/apache/arrow/vector/util/TestFieldVectorUtility.java
Outdated
Show resolved
Hide resolved
java/vector/src/test/java/org/apache/arrow/vector/util/TestFieldVectorUtility.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/util/FieldVectorUtility.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/util/FieldVectorUtility.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/util/FieldVectorUtility.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/util/FieldVectorUtility.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/util/FieldVectorUtility.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/util/FieldVectorUtility.java
Outdated
Show resolved
Hide resolved
I should be more specific. There shouldn't be a Java only serialization format, as that is against the arrow philosophy if a new communication protocol is desired it should get consensus from the broader community (via a mailing list discussion) that we want to support it across all languages. |
@emkornfield Thanks very much for your comments. I know this is not not conforming to Arrow standard format as I described in ARROW-5224. The inspiration for this comes from our usage of JAVA API, so we would like to provide another option for user to implement this (not in MessageSerializer). I have updated this PR, please take a look. Thanks very much. |
Note: https://issues.apache.org/jira/browse/ARROW-300 covers doing compression on buffers, maybe you can propose something there? |
@emkornfield Got it, thanks for your kind reminder. |
related to ARROW-5224.
There is no API to directly serialize/deserialize ValueVector. The only way to implement this is to put a single FieldVector in VectorSchemaRoot and convert it to ArrowRecordBatch, and the deserialize process is as well.
Provide a utility class to implement this may be better, I know all serializations should follow IPC format so that data can be shared between different Arrow implementations. But for users who only use Java API and want to do some further optimization, this seem to be no problem and we could provide them a more option.
This may take some benefits for Java user who only use ValueVector rather than IPC series classes such as ArrowReordBatch:
We could do some shuffle optimization such as compression and some encoding algorithm for numerical type which could greatly improve performance.
Do serialize/deserialize with the actual buffer size within vector since the buffer size is power of 2 which is actually bigger than it really need.
Reduce data conversion(VectorSchemaRoot, ArrowRecordBatch etc) to make it user-friendly.