Skip to content

GH-3261: Fix integer overflow in CapacityByteArrayOutputStream#3525

Merged
Fokko merged 1 commit intoapache:masterfrom
yadavay-amzn:fix/3261-capacity-overflow
May 6, 2026
Merged

GH-3261: Fix integer overflow in CapacityByteArrayOutputStream#3525
Fokko merged 1 commit intoapache:masterfrom
yadavay-amzn:fix/3261-capacity-overflow

Conversation

@yadavay-amzn
Copy link
Copy Markdown
Contributor

What changes were made?

Fix integer overflow in CapacityByteArrayOutputStream.addSlab() that causes ArithmeticException when writing large ARRAY<STRING> columns (issue #3261).

Root cause (identified by @Kimahriman)

The overflow check in addSlab() used bytesUsed to detect overflow, but bytesUsed is not updated until after addSlab() returns in write(). Additionally, nextSlabSize can be larger than minimumSize (due to the doubling strategy), so checking only bytesUsed + minimumSize was insufficient.

This meant bytesAllocated = Math.addExact(this.bytesAllocated, nextSlabSize) could overflow without being caught by the guard, throwing an uncaught ArithmeticException instead of the intended OutOfMemoryError.

Fix

  1. Use bytesAllocated instead of bytesUsed for the overflow check — bytesAllocated is always up to date when addSlab() is called.
  2. Cap nextSlabSize when it would cause bytesAllocated to overflow Integer.MAX_VALUE, preventing the uncaught ArithmeticException on the Math.addExact call.

Tests

Added TestCapacityByteArrayOutputStreamOverflow with two tests:

  • Verifies that slab allocation near Integer.MAX_VALUE succeeds (previously threw ArithmeticException)
  • Verifies that a true overflow still throws OutOfMemoryError as intended

The overflow check in addSlab() used bytesUsed which is not updated until
after addSlab() returns in write(). This caused the overflow guard to miss
cases where bytesAllocated + nextSlabSize exceeds Integer.MAX_VALUE.

Fix:
- Use bytesAllocated instead of bytesUsed for the overflow check, since
  bytesAllocated is always up to date when addSlab() is called.
- Cap nextSlabSize when it would cause bytesAllocated to overflow, instead
  of letting Math.addExact throw an uncaught ArithmeticException.
@yadavay-amzn yadavay-amzn force-pushed the fix/3261-capacity-overflow branch from 2dab1ed to 29bbc79 Compare April 22, 2026 01:42
Copy link
Copy Markdown
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yadavay-amzn
Copy link
Copy Markdown
Contributor Author

Friendly ping to @wgtmac @Fokko, this is a small doc/parser change that's been LGTM'd by @steveloughran. Could one of you please take a look when you have a moment? Thanks!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes an integer-overflow edge case in CapacityByteArrayOutputStream.addSlab(int) that could previously surface as an uncaught ArithmeticException when writing very large values (e.g., large ARRAY<STRING> columns), and adds regression tests to validate the corrected behavior.

Changes:

  • Adjust overflow guarding in addSlab() to use bytesAllocated (which is current at slab-allocation time) rather than bytesUsed.
  • Cap the computed nextSlabSize so bytesAllocated + nextSlabSize cannot overflow Integer.MAX_VALUE, preventing Math.addExact from throwing unexpectedly.
  • Add JUnit regression tests covering both the “near max” capping case and the “true overflow” OutOfMemoryError case.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
parquet-common/src/main/java/org/apache/parquet/bytes/CapacityByteArrayOutputStream.java Fixes the overflow detection and prevents bytesAllocated growth from triggering an uncaught ArithmeticException.
parquet-common/src/test/java/org/apache/parquet/bytes/TestCapacityByteArrayOutputStreamOverflow.java Adds regression tests validating slab-size capping near Integer.MAX_VALUE and correct OutOfMemoryError behavior on genuine overflow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@wgtmac
Copy link
Copy Markdown
Member

wgtmac commented May 6, 2026

Thanks @yadavay-amzn for fixing it and @steveloughran for the review! This makes sense to me.

@Fokko Fokko changed the title PARQUET-3261: Fix integer overflow in CapacityByteArrayOutputStream GH-3261: Fix integer overflow in CapacityByteArrayOutputStream May 6, 2026
@Fokko Fokko merged commit 26fa353 into apache:master May 6, 2026
9 checks passed
@Fokko
Copy link
Copy Markdown
Contributor

Fokko commented May 6, 2026

Let's move this forward, thanks @yadavay-amzn for working on this, and thanks @steveloughran and @wgtmac for the review 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants