Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-32896][Runtime/Coordination] Incorrect Map.computeIfAbsent(..., ...::new) usage which misinterprets key as initial capacity #23518

Merged
merged 1 commit into from
Oct 24, 2023

Conversation

tzy-0x7cf
Copy link

What is the purpose of the change

The are multiple cases in the code which look like this:

map.computeIfAbsent(..., ArrayList::new)

Not only does this create a new collection (here an ArrayList), but computeIfAbsent also passes the map key as argument to the mapping function, so instead of calling the no-args constructor such as new ArrayList<>(), this actually calls the constructor with int initial capacity parameter, such as new ArrayList<>(initialCapacity).

This can lead either to runtime exceptions in case the map key is negative, since the collection constructors reject negative initial capacity values, or it can lead to bad performance if the key (which is misinterpreted as initial capacity) is pretty low, such as 0, or is pretty large and therefore pre-allocates a lot of memory for the collection.

it might be good to replace them with lambda expressions to make this more explicit:

map.computeIfAbsent(..., k -> new ArrayList<>())

Brief change log

  • Replace all map.computeIfAbsent(..., ... ::new) with map.computeIfAbsent(..., k -> new ...)

Verifying this change

Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing

This change is already covered by existing tests, such as HsSpillingStrategyUtilsTest.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@tzy-0x7cf tzy-0x7cf changed the title [FLINK-32896] [Runtime/Coordination] Incorrect Map.computeIfAbsent(..., ...::new) usage which misinterprets key as initial capacity [FLINK-32896][Runtime/Coordination] Incorrect Map.computeIfAbsent(..., ...::new) usage which misinterprets key as initial capacity Oct 12, 2023
@flinkbot
Copy link
Collaborator

flinkbot commented Oct 12, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@tzy-0x7cf
Copy link
Author

@flinkbot run azure

@tzy-0x7cf
Copy link
Author

@wuchong Can you look at it? I think it's a very simple change : )

@itonyli
Copy link

itonyli commented Oct 16, 2023

LGTM

Copy link
Contributor

@XComp XComp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for picking it up, @tzy-0x7cf and @itonyli for checking it as well. Can you rebase the branch to make CI pass? The error was introduced in master and was fixed already.

@tzy-0x7cf
Copy link
Author

@flinkbot run azure

2 similar comments
@tzy-0x7cf
Copy link
Author

@flinkbot run azure

@tzy-0x7cf
Copy link
Author

@flinkbot run azure

@XComp
Copy link
Contributor

XComp commented Oct 23, 2023

Could squash the commits and rebase the branch to most-recent master? We don't want have merge commits cluttering the git history.

The Flink CI bot is known to have issues with force-pushes. You can try to work around it by pushing an empty commit after you've reorganized and rebased the branch (in a separate push to be on the save side).

@tzy-0x7cf tzy-0x7cf force-pushed the FLINK-32896 branch 2 times, most recently from 1a872e2 to 622108a Compare October 23, 2023 20:55
….., ...::new)` usage which misinterprets key as initial capacity

[FLINK-32896] [Runtime/Coordination] Incorrect `Map.computeIfAbsent(..., ...::new)` usage which misinterprets key as initial capacity
@tzy-0x7cf
Copy link
Author

@flinkbot run azure

2 similar comments
@tzy-0x7cf
Copy link
Author

@flinkbot run azure

@tzy-0x7cf
Copy link
Author

@flinkbot run azure

@XComp XComp merged commit 30e8b3d into apache:master Oct 24, 2023
@tzy-0x7cf
Copy link
Author

Could squash the commits and rebase the branch to most-recent master? We don't want have merge commits cluttering the git history.

The Flink CI bot is known to have issues with force-pushes. You can try to work around it by pushing an empty commit after you've reorganized and rebased the branch (in a separate push to be on the save side).

Thanks Matthias! I'm new to flink and would like to contribute more , so if there's anything about test or simple issues, please feel free to assign them to me!

@XComp
Copy link
Contributor

XComp commented Oct 25, 2023

That's great to hear. You might be lucky if you look for issues that are labeled as "starter" in Flink's Jira. A bit more context around how the Flink community organizes Jira can be found in this Confluence wiki page.

Other sources that might be worth reading (in case you haven't done so, yet):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants