Skip to content

[FLINK-32410] Allocate hash-based collections with sufficient capacity for expected size#22841

Merged
StefanRRichter merged 4 commits intoapache:masterfrom
StefanRRichter:srichter-FLINK-32410-hash-size
Jun 30, 2023
Merged

[FLINK-32410] Allocate hash-based collections with sufficient capacity for expected size#22841
StefanRRichter merged 4 commits intoapache:masterfrom
StefanRRichter:srichter-FLINK-32410-hash-size

Conversation

@StefanRRichter
Copy link
Contributor

What is the purpose of the change

The JDK API to create hash-based collections for a certain capacity is arguably misleading because it doesn't size the collections to "hold a specific number of items" like you'd expect it would. Instead it sizes it to hold "load-factor%" of the specified number.

For the common pattern to allocate a hash-based collection with the size of expected elements to avoid rehashes, this means that a rehash is essentially guaranteed.

This PR replaces constructor calls for allocations for expected size with helper methods (similar to Guava's Maps.newHashMapWithExpectedSize(int)) .

Verifying this change

This change is already covered by existing tests.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@flinkbot
Copy link
Collaborator

flinkbot commented Jun 21, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@StefanRRichter StefanRRichter requested a review from pnowojski June 27, 2023 08:01
@StefanRRichter StefanRRichter force-pushed the srichter-FLINK-32410-hash-size branch from d8f3e1e to 8cf0f15 Compare June 27, 2023 08:07
Copy link
Contributor

@pnowojski pnowojski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming green build/no conflicts

@StefanRRichter StefanRRichter force-pushed the srichter-FLINK-32410-hash-size branch 4 times, most recently from 9535590 to 8211eb8 Compare June 29, 2023 15:44
@StefanRRichter StefanRRichter force-pushed the srichter-FLINK-32410-hash-size branch from 8211eb8 to 5bc1273 Compare June 30, 2023 08:05
@StefanRRichter StefanRRichter merged commit 1ef9973 into apache:master Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants