ARROW-697: JAVA Throw exception for record batches > 2GB#597
ARROW-697: JAVA Throw exception for record batches > 2GB#597holdenk wants to merge 3 commits intoapache:masterfrom
Conversation
…atches over 2GB in size
|
I'm still new to the deserializer though so if I've missed something let me know. |
|
The original problem in ARROW-697 is slightly different. Between 0.2 and 0.3, we changed the record batch lengths to be 64-bits instead of 32: https://github.com/apache/arrow/blob/master/format/Message.fbs#L50 You can see the length being casted to int in my patch here: ced9d76#diff-5b26e7b5a174693cfda85a27e0ade86bR219 If |
|
Ah, that makes sense. From looking through the Message.fbs it seemed like the recordBatchFB.length would have to be less than the message bodyLength size, but the Null type data makes sense. I'll update this PR to add those checks :) |
…are larger than Int.MAX_VALUE
|
can you change the PR title to start with ARROW-697 -- the brackets are fouling up our PR merge tool. Thanks |
|
Done, thanks :) |
Add a test to verify that we throw a clear error message for record batches over 2GB. This entry point is easist to test without adding some magic bytes to the tests suite since its explicit on the input, and the other public entry points for deserialization have the same checks (just extracted from the metadata). Author: Holden Karau <holden@us.ibm.com> Closes apache#597 from holdenk/ARROW-697-java-raise-exception-for-large-batch-size and squashes the following commits: d2d6b3d [Holden Karau] Merge branch 'master' into ARROW-697-java-raise-exception-for-large-batch-size d56daab [Holden Karau] Throw IOException if record batch length, node length, or null count are larger than Int.MAX_VALUE 0a96b74 [Holden Karau] Add a test to verify that we throw a clear error message for record batches over 2GB in size
Add a test to verify that we throw a clear error message for record batches over 2GB. This entry point is easist to test without adding some magic bytes to the tests suite since its explicit on the input, and the other public entry points for deserialization have the same checks (just extracted from the metadata). Author: Holden Karau <holden@us.ibm.com> Closes apache#597 from holdenk/ARROW-697-java-raise-exception-for-large-batch-size and squashes the following commits: d2d6b3d [Holden Karau] Merge branch 'master' into ARROW-697-java-raise-exception-for-large-batch-size d56daab [Holden Karau] Throw IOException if record batch length, node length, or null count are larger than Int.MAX_VALUE 0a96b74 [Holden Karau] Add a test to verify that we throw a clear error message for record batches over 2GB in size
…#597) Bumps [org.bouncycastle:bcpkix-jdk18on](https://github.com/bcgit/bc-java) from 1.79 to 1.80. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Add a test to verify that we throw a clear error message for record batches over 2GB. This entry point is easist to test without adding some magic bytes to the tests suite since its explicit on the input, and the other public entry points for deserialization have the same checks (just extracted from the metadata).