Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEODE-8536: Allow limited retries when creating Lucene IndexWriter #5553

Merged
merged 2 commits into from
Oct 3, 2020

Conversation

DonalEvans
Copy link
Contributor

Authored-by: Donal Evans doevans@vmware.com

Thank you for submitting a contribution to Apache Geode.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

  • Has your PR been rebased against the latest commit within the target branch (typically develop)?

  • Is your initial contribution a single, squashed commit?

  • Does gradlew build run cleanly?

  • Have you written or updated unit tests to verify your changes?

  • [N/A] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

Note:

Please ensure that once the PR is submitted, check Concourse for build issues and
submit an update to your PR as soon as possible. If you need help, please send an
email to dev@geode.apache.org.

@@ -44,6 +44,7 @@
private static final Logger logger = LogService.getLogger();
public static final String FILE_REGION_LOCK_FOR_BUCKET_ID = "FileRegionLockForBucketId:";
public static final String APACHE_GEODE_INDEX_COMPLETE = "APACHE_GEODE_INDEX_COMPLETE";
protected static final int GET_INDEX_WRITER_MAX_ATTEMPTS = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think the number of retries is enough?
Based on the original ticket description, the IOException thrown is caused by "LuceneEventListener is asynchronously updating the fileAndChunkRegion". Do we know if the wait is enough for the updating to finish? Is it a problem if IndexWriter creation needs to wait longer for the resources to be freed?
Do we know "updating the fileAndChunkRegion" usually do not require a minute to finish, or we actually hit this issue due to different threads keep updating the fileAndChunkRegion? If so, we can decide the number of attempts and whether the wait needs to be using different intervals.
If we do not know answers for these, I think this code change is fine to fix the StackOverflowError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, the timing window to hit the IOException is quite small and difficult to hit, since this problem only shows up in about 1 in 1000 runs of the test I used to diagnose the issue. If the fileAndChunkRegion was unavailable for a long period of time, I would expect to see this issue reproduce more often. After running some experiments, I was able to increase the number of retries to 200 without any noticeable negative effects, which would increase the time window during which IOExceptions would have to be consistently encountered and an exception thrown to 1 second, which should help reduce the chances of encountering it. However, I don't think it's possible to know for certain how long the fileAndChunkRegion might be unavailable, since that could change based on the operation being used on it, the size of the region, current system resources etc.

Authored-by: Donal Evans <doevans@vmware.com>
Authored-by: Donal Evans <doevans@vmware.com>
@DonalEvans DonalEvans merged commit eccd4f0 into apache:develop Oct 3, 2020
@DonalEvans DonalEvans deleted the feature/GEODE-8536 branch October 3, 2020 01:43
onichols-pivotal added a commit that referenced this pull request Oct 3, 2020
DonalEvans pushed a commit that referenced this pull request Oct 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants