Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-25819][runtime] Reordered requesting and recycling buffers in order to avoid race condition in testIsAvailableOrNotAfterRequestAndRecycleMultiSegments #18865

Merged
merged 3 commits into from Feb 25, 2022

Conversation

akalash
Copy link
Contributor

@akalash akalash commented Feb 21, 2022

What is the purpose of the change

This PR fixes NetworkBufferPoolTest.testIsAvailableOrNotAfterRequestAndRecycleMultiSegments

Brief change log

  • Reordered requesting and recycling buffers in order to avoid race condition in testIsAvailableOrNotAfterRequestAndRecycleMultiSegments
  • *Added new test for 'Insufficient number of network buffers' scenario into NetworkBufferPoolTest *
  • CodeStyle correction for NetworkBufferPoolTest

Verifying this change

It is test

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@akalash
Copy link
Contributor Author

akalash commented Feb 21, 2022

Please pay attention to the last commit where the global test timeout was removed

@flinkbot
Copy link
Collaborator

flinkbot commented Feb 21, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 31fce89 (Mon Feb 21 15:53:54 UTC 2022)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@akalash
Copy link
Contributor Author

akalash commented Feb 22, 2022

@flinkbot run azure


try {
// request another 5 segments
globalPool.requestUnpooledMemorySegments(numberOfSegmentsToRequest);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be good to check that requesting even single one segment should fail to make sure it is empty? Or maybe both requesting one and five?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, perhaps using the one instead of five is good idea

// Starting to requesting next segments only after previous segments were release
// otherwise the race condition between requesting new segments and releasing old ones
// are possible:
asyncRequest.start();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the async request actually make sense here? I'm not sure if the async request adds something relevant to the test case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, isn't the use of ArrayList in a concurrent thread non-thread-safe? (CopyOnWriteArrayList would be a thread-safe alternative.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, right now I see that this test was incorrect. This test supposed that requestUnpooledMemorySegments would be blocked until enough buffers will be available but it is just not true. I don't know it was the initial mistake in the test or the old behavior was different. Anyway I will remove this extra thread that doesn't make sense anymore.

Copy link
Contributor

@smattheis smattheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments in code.

Copy link
Contributor Author

@akalash akalash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smattheis, thanks for the review. I made a couple of changes inside of the current commits. I think it would not be a problem since the commits are small


try {
// request another 5 segments
globalPool.requestUnpooledMemorySegments(numberOfSegmentsToRequest);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, perhaps using the one instead of five is good idea

// Starting to requesting next segments only after previous segments were release
// otherwise the race condition between requesting new segments and releasing old ones
// are possible:
asyncRequest.start();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, right now I see that this test was incorrect. This test supposed that requestUnpooledMemorySegments would be blocked until enough buffers will be available but it is just not true. I don't know it was the initial mistake in the test or the old behavior was different. Anyway I will remove this extra thread that doesn't make sense anymore.

Copy link
Contributor

@smattheis smattheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -757,7 +746,7 @@ public void testBlockingRequestFromMultiLocalBufferPool()
// wait until all available buffers are requested
while (segmentsRequested.size() + segments.size() + exclusiveSegments.size()
< numBuffers) {
Thread.sleep(100);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a necessary change? Is it in the right commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary at all. I just notice that 100 ms is too high here considering we have a loop. But perhaps, it is indeed not a good idea to have these changes in this commit. I can remove this change at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can leave it the way it is. Just wanted to check.

@dawidwys dawidwys merged commit 44da40f into apache:master Feb 25, 2022
@dawidwys
Copy link
Contributor

@akalash Do you mind creating a backport for 1.14?

@akalash
Copy link
Contributor Author

akalash commented Feb 25, 2022

@dawidwys, good idea. I will do that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants