Skip to content

[FLINK-39754][core] Fix int overflow in DataOutputSerializer.resize#28252

Open
sauliusvl wants to merge 1 commit into
apache:masterfrom
sauliusvl:FLINK-39754-resize-overflow
Open

[FLINK-39754][core] Fix int overflow in DataOutputSerializer.resize#28252
sauliusvl wants to merge 1 commit into
apache:masterfrom
sauliusvl:FLINK-39754-resize-overflow

Conversation

@sauliusvl
Copy link
Copy Markdown

What is the purpose of the change

Fixes FLINK-39754. DataOutputSerializer.resize() uses int arithmetic for buffer.length * 2. Once buffer.length crosses Integer.MAX_VALUE / 2 (~1.07 GB), doubling overflows to a negative int, Math.max then picks buffer.length + minCapacityAdd, and every subsequent resize grows the buffer by a handful of bytes instead of doubling — doing a full System.arraycopy of the ~1+ GB buffer each call. On large heaps this manifests as a silent O(n²) hang until buffer.length + minCapacityAdd itself overflows and the existing catch (NegativeArraySizeException) translates it to an IOException.

Brief change log

  • Extract the size computation from resize(int) into a @VisibleForTesting package-private static helper computeNewBufferLength(int, int).
  • The helper uses long arithmetic, validates against a new MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8 cap (matching java.util.ArrayList), and jumps to the cap when doubling would overflow — so serializations that just barely fit under 2 GB still complete instead of grinding through a linear-step resize loop.
  • Remove the now-unreachable catch (NegativeArraySizeException) block from resize. The existing OutOfMemoryError retry path is preserved (it addresses an independent concern — doubled size exceeding available heap).

Verifying this change

This change added tests and can be verified as follows:

  • Five pure-arithmetic unit tests on computeNewBufferLength in DataInputOutputSerializerTest covering: normal doubling, minCapacityAdd-dominated growth, jump-to-cap when currentLength * 2 would overflow, exact-cap boundary, and IOException when the required size exceeds the cap. No multi-GB allocations required.
  • Existing DataInputOutputSerializerTest tests continue to pass, confirming the normal write/read paths through resize() are unchanged.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no (DataOutputSerializer is unannotated / internal)
  • The serializers: no (this is the byte-buffer growth path, not record (de)serialization logic)
  • The runtime per-record code paths (performance sensitive): no (the helper runs only on buffer growth, not per record; the buggy linear-step path it replaces is what was previously degrading performance)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no behavior change for serializations < ~1 GB. Serializations that previously silently O(n²)-hung near 2 GB now either complete cleanly (one final grow to the cap) or fail with an actionable IOException instead of an opaque NegativeArraySizeException-derived message.
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: Claude (Anthropic, Opus 4.7) via Zed editor

buffer.length * 2 uses int arithmetic and overflows to a negative value once the buffer crosses Integer.MAX_VALUE / 2 (~1.07 GB). Math.max then picks buffer.length + minCapacityAdd, so every subsequent resize grows the buffer by a handful of bytes instead of doubling, doing a full System.arraycopy of the ~1+ GB buffer each call. On large heaps this manifests as a silent O(n^2) hang until buffer.length + minCapacityAdd itself overflows and the existing NegativeArraySizeException catch translates it to an IOException.

Extract the size computation into a package-private static helper that uses long arithmetic, caps at Integer.MAX_VALUE - 8 (matching java.util.ArrayList), and jumps to the cap once doubling would overflow so serializations that just barely fit under 2 GB still complete.
@flinkbot
Copy link
Copy Markdown
Collaborator

flinkbot commented May 25, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants