-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-363: [Java/C++] integration testing harness, initial integration tests #211
Conversation
@julienledem alright, I found a couple initial issues in the file implementation.
so here is what things look like right now on the C++ side:
and on the java side:
In Java, the metadata size is encoded in the Block component in the footer: https://github.com/apache/arrow/blob/master/format/File.fbs#L36 There's a couple ways forward
The reason we have things as they are right now was so that you could reconstruct a record batch from shared memory having only the offset to the end of the payload. Since we don't have a rigid model for handling IPC metadata (i.e. the |
cc @emkornfield |
@julienledem another thing is the record batch metadata itself
I believe this is one of the reasons why the metadata is being written after the buffers (since we need to know the absolute positions of everything prior to writing out the flatbuffer) The problem with absolute offsets is that the record batches are not relocatable (the metadata would have to be rewritten). This seems like a flaw -- I'm going to change the C++ to use relative offsets, and make the record batch layout conform to the Java implementation as it is right now so we can proceed with the integration tests |
@wesm sounds good. |
@julienledem I refactored to account for our discussion in ARROW-384. I left a bunch of notes in the |
@julienledem we'll need to make the corresponding changes in the Java format (adding the metadata length prefix to record batches in the file) -- how do you want to proceed? |
Change-Id: Ifa749bee89654da257a61a6f2f5255abd4ec5a38
Change-Id: I6f98e27f4f106f5919ec1868770866d1e155dd69
Change-Id: Ifce9e8f23d98435b7ea841bdd5de4020d4bdeac3
Change-Id: Ibe6461c0e7673072471e3a613e6d6934759ed6e4
…sion in ARROW-384 Change-Id: Ic119c5647bdb6b1ff1fb31b3712527058c68e40c
Change-Id: I6d77fb2944098e6c8815a3c20871838ab3c62b67
Change-Id: I4da527fb90b66750ddaf54c783414858f2be7150
Change-Id: I057c329090faa0c2eb5fb65f8566c17130dd2a69
This LGTM. |
…or now Change-Id: Ib493e982ff6c2d1502212985cec858a1e341c874
Sure, sounds good |
…etadata Change-Id: Ic04ef893957c98ba2747f6ad9cef0e7ebe596958
Finally I have got a passing minimal integration test:
I suggest merging this and proceeding to write more integration tests in a new patch. |
@julienledem this include some Java changes, if you wouldn't mind taking a look. thanks! |
Change-Id: Id6a5ee0d9c716fcbf92c452b51a249595bc0f90d
Just incremented the metadata version. I'm just using |
…ring test case to simple.json Change-Id: If62adb8cc209868e105f07f276c54140fe366df5
I added a broken string data test case. It fails in C++ with:
Let's address this and other data type tests in a follow up JIRA |
Change-Id: Id0508a1a282738ab0346141b7718a050ac1fb2ba
+1, we can continue to iterate on fine details while writing more tests |
Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#211 from xhochy/PARQUET-819 and squashes the following commits: 4d21407 [Uwe L. Korn] PARQUET-819: Don't try to install no longer existing arrow/utils.h
Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#211 from xhochy/PARQUET-819 and squashes the following commits: 4d21407 [Uwe L. Korn] PARQUET-819: Don't try to install no longer existing arrow/utils.h Change-Id: I323a15754e6d0150e83eb89c8821dced80566ce3
Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#211 from xhochy/PARQUET-819 and squashes the following commits: 4d21407 [Uwe L. Korn] PARQUET-819: Don't try to install no longer existing arrow/utils.h Change-Id: I323a15754e6d0150e83eb89c8821dced80566ce3
Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#211 from xhochy/PARQUET-819 and squashes the following commits: 4d21407 [Uwe L. Korn] PARQUET-819: Don't try to install no longer existing arrow/utils.h Change-Id: I323a15754e6d0150e83eb89c8821dced80566ce3
Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#211 from xhochy/PARQUET-819 and squashes the following commits: 4d21407 [Uwe L. Korn] PARQUET-819: Don't try to install no longer existing arrow/utils.h Change-Id: I323a15754e6d0150e83eb89c8821dced80566ce3
This also includes format reconciliation as discussed in ARROW-384.