Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-16412 : Race condition in SizeLimitedDistributedMap for cleanup #1032

Conversation

patsonluk
Copy link
Contributor

@patsonluk patsonluk commented Sep 21, 2022

https://issues.apache.org/jira/browse/SOLR-16412

Description

Details of the issue is described in the JIRA issue linked above.

Although we could enforce synchronization to prevent threads from purging the same set of child nodes, it might not be desirable to add extra blocking.

Instead, we should be more forgiving SizeLimitedDistributedMap#shrinkIfNeeded if any items/nodes in the map no longer exist.

Solution

Catch KeeperEx1ception.NoNodeException in SizeLimitedDistributedMap#shrinkIfNeeded if such node is already deleted, which is likely triggered by concurrent shrinkIfNeeded call.

Tests

Added unit test case TestSizeLimitedDistributedMap#testConcurrentCleanup, it's shown that without the current fix, the exception will be thrown

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@sonatype-lift
Copy link

sonatype-lift bot commented Oct 19, 2022

⚠️ 312 God Classes were detected by Lift in this project. Visit the Lift web console for more details.

@patsonluk
Copy link
Contributor Author

patsonluk commented Oct 19, 2022

@noblepaul @chatman Can you have a review on this please? 😊

@patsonluk
Copy link
Contributor Author

@risdenk would you mind to take a second look please? Many thanks! 🙇🏼

zookeeper.delete(dir + "/" + child, -1, true);
if (onOverflowObserver != null)
onOverflowObserver.onChildDelete(child.substring(PREFIX.length()));
} catch (KeeperException.NoNodeException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: naming the exception ignore or ignored I think makes IDEs not care that the exception isn't used. It doesn't change the behavior at all just clear we meant to ignore the exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 changed

@patsonluk
Copy link
Contributor Author

Can someone with write access please merge if it's all good? 😊
cc @noblepaul @chatman

@noblepaul noblepaul merged commit babcb9e into apache:main Oct 26, 2022
@noblepaul
Copy link
Contributor

thanks @patsonluk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants