Skip to content

ARROW-697: JAVA Throw exception for record batches > 2GB#597

Closed
holdenk wants to merge 3 commits intoapache:masterfrom
holdenk:ARROW-697-java-raise-exception-for-large-batch-size
Closed

ARROW-697: JAVA Throw exception for record batches > 2GB#597
holdenk wants to merge 3 commits intoapache:masterfrom
holdenk:ARROW-697-java-raise-exception-for-large-batch-size

Conversation

@holdenk
Copy link
Contributor

@holdenk holdenk commented Apr 25, 2017

Add a test to verify that we throw a clear error message for record batches over 2GB. This entry point is easist to test without adding some magic bytes to the tests suite since its explicit on the input, and the other public entry points for deserialization have the same checks (just extracted from the metadata).

@holdenk
Copy link
Contributor Author

holdenk commented Apr 25, 2017

I'm still new to the deserializer though so if I've missed something let me know.

@wesm
Copy link
Member

wesm commented Apr 25, 2017

The original problem in ARROW-697 is slightly different. Between 0.2 and 0.3, we changed the record batch lengths to be 64-bits instead of 32:

https://github.com/apache/arrow/blob/master/format/Message.fbs#L50

You can see the length being casted to int in my patch here:

ced9d76#diff-5b26e7b5a174693cfda85a27e0ade86bR219

If recordBatchFB.length() exceeds INT32_MAX, an exception should be thrown there. If we are dealing with Boolean or Null type data, the size of the record batch payload may be less than 2GB. Separately raising on large message bodies like you've done here is a good idea, too (since we will be unable to compute an ArrowBuf slice exceeding 2GB).

@holdenk
Copy link
Contributor Author

holdenk commented Apr 25, 2017

Ah, that makes sense. From looking through the Message.fbs it seemed like the recordBatchFB.length would have to be less than the message bodyLength size, but the Null type data makes sense. I'll update this PR to add those checks :)

@holdenk holdenk changed the title [ARROW-697][JAVA] Throw exception for record batches > 2GB [WIP][ARROW-697][JAVA] Throw exception for record batches > 2GB Apr 25, 2017
@holdenk holdenk changed the title [WIP][ARROW-697][JAVA] Throw exception for record batches > 2GB [ARROW-697][JAVA] Throw exception for record batches > 2GB Apr 26, 2017
Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks @holdenk!

@wesm
Copy link
Member

wesm commented Apr 26, 2017

can you change the PR title to start with ARROW-697 -- the brackets are fouling up our PR merge tool. Thanks

@holdenk holdenk changed the title [ARROW-697][JAVA] Throw exception for record batches > 2GB ARROW-697: JAVA Throw exception for record batches > 2GB Apr 26, 2017
@holdenk
Copy link
Contributor Author

holdenk commented Apr 26, 2017

Done, thanks :)

@asfgit asfgit closed this in 8bf61d1 Apr 26, 2017
jeffknupp pushed a commit to jeffknupp/arrow that referenced this pull request Jun 3, 2017
Add a test to verify that we throw a clear error message for record batches over 2GB. This entry point is easist to test without adding some magic bytes to the tests suite since its explicit on the input, and the other public entry points for deserialization have the same checks (just extracted from the metadata).

Author: Holden Karau <holden@us.ibm.com>

Closes apache#597 from holdenk/ARROW-697-java-raise-exception-for-large-batch-size and squashes the following commits:

d2d6b3d [Holden Karau] Merge branch 'master' into ARROW-697-java-raise-exception-for-large-batch-size
d56daab [Holden Karau] Throw IOException if record batch length, node length, or null count are larger than Int.MAX_VALUE
0a96b74 [Holden Karau] Add a test to verify that we throw a clear error message for record batches over 2GB in size
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
Add a test to verify that we throw a clear error message for record batches over 2GB. This entry point is easist to test without adding some magic bytes to the tests suite since its explicit on the input, and the other public entry points for deserialization have the same checks (just extracted from the metadata).

Author: Holden Karau <holden@us.ibm.com>

Closes apache#597 from holdenk/ARROW-697-java-raise-exception-for-large-batch-size and squashes the following commits:

d2d6b3d [Holden Karau] Merge branch 'master' into ARROW-697-java-raise-exception-for-large-batch-size
d56daab [Holden Karau] Throw IOException if record batch length, node length, or null count are larger than Int.MAX_VALUE
0a96b74 [Holden Karau] Add a test to verify that we throw a clear error message for record batches over 2GB in size
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
…#597)

Bumps
[org.bouncycastle:bcpkix-jdk18on](https://github.com/bcgit/bc-java) from
1.79 to 1.80.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants