GEODE-8536: Allow limited retries when creating Lucene IndexWriter #5553

DonalEvans · 2020-09-25T16:12:05Z

Authored-by: Donal Evans doevans@vmware.com

Thank you for submitting a contribution to Apache Geode.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?
Has your PR been rebased against the latest commit within the target branch (typically develop)?
Is your initial contribution a single, squashed commit?
Does gradlew build run cleanly?
Have you written or updated unit tests to verify your changes?
[N/A] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

Note:

Please ensure that once the PR is submitted, check Concourse for build issues and
submit an update to your PR as soon as possible. If you need help, please send an
email to dev@geode.apache.org.

pivotal-eshu · 2020-10-01T21:39:55Z

geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/IndexRepositoryFactory.java

@@ -44,6 +44,7 @@
  private static final Logger logger = LogService.getLogger();
  public static final String FILE_REGION_LOCK_FOR_BUCKET_ID = "FileRegionLockForBucketId:";
  public static final String APACHE_GEODE_INDEX_COMPLETE = "APACHE_GEODE_INDEX_COMPLETE";
+  protected static final int GET_INDEX_WRITER_MAX_ATTEMPTS = 10;


Do you think the number of retries is enough?
Based on the original ticket description, the IOException thrown is caused by "LuceneEventListener is asynchronously updating the fileAndChunkRegion". Do we know if the wait is enough for the updating to finish? Is it a problem if IndexWriter creation needs to wait longer for the resources to be freed?
Do we know "updating the fileAndChunkRegion" usually do not require a minute to finish, or we actually hit this issue due to different threads keep updating the fileAndChunkRegion? If so, we can decide the number of attempts and whether the wait needs to be using different intervals.
If we do not know answers for these, I think this code change is fine to fix the StackOverflowError.

As I understand it, the timing window to hit the IOException is quite small and difficult to hit, since this problem only shows up in about 1 in 1000 runs of the test I used to diagnose the issue. If the fileAndChunkRegion was unavailable for a long period of time, I would expect to see this issue reproduce more often. After running some experiments, I was able to increase the number of retries to 200 without any noticeable negative effects, which would increase the time window during which IOExceptions would have to be consistently encountered and an exception thrown to 1 second, which should help reduce the chances of encountering it. However, I don't think it's possible to know for certain how long the fileAndChunkRegion might be unavailable, since that could change based on the operation being used on it, the size of the region, current system resources etc.

Authored-by: Donal Evans <doevans@vmware.com>

…riter (#5553)" This reverts commit eccd4f0.

…riter (#5553)" (#5588) This reverts commit eccd4f0.

DonalEvans requested review from boglesby, nabarunnag and pivotal-eshu September 25, 2020 16:12

kirklund approved these changes Sep 30, 2020

View reviewed changes

pivotal-eshu reviewed Oct 1, 2020

View reviewed changes

DonalEvans added 2 commits October 2, 2020 13:55

GEODE-8536: Allow limited retries when creating Lucene IndexWriter

ce55fcf

Authored-by: Donal Evans <doevans@vmware.com>

Increase max retry count to 200

8afd208

Authored-by: Donal Evans <doevans@vmware.com>

DonalEvans force-pushed the feature/GEODE-8536 branch from c54594d to 8afd208 Compare October 2, 2020 21:28

DonalEvans merged commit eccd4f0 into apache:develop Oct 3, 2020

DonalEvans deleted the feature/GEODE-8536 branch October 3, 2020 01:43

onichols-pivotal added a commit that referenced this pull request Oct 3, 2020

Revert "GEODE-8536: Allow limited retries when creating Lucene IndexW…

4811704

…riter (#5553)" This reverts commit eccd4f0.

onichols-pivotal mentioned this pull request Oct 3, 2020

Revert "GEODE-8536: Allow limited retries when creating Lucene IndexWriter" #5588

Merged

DonalEvans pushed a commit that referenced this pull request Oct 3, 2020

Revert "GEODE-8536: Allow limited retries when creating Lucene IndexW…

c91e915

…riter (#5553)" (#5588) This reverts commit eccd4f0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GEODE-8536: Allow limited retries when creating Lucene IndexWriter #5553

GEODE-8536: Allow limited retries when creating Lucene IndexWriter #5553

DonalEvans commented Sep 25, 2020

pivotal-eshu Oct 1, 2020

DonalEvans Oct 2, 2020

GEODE-8536: Allow limited retries when creating Lucene IndexWriter #5553

GEODE-8536: Allow limited retries when creating Lucene IndexWriter #5553

Conversation

DonalEvans commented Sep 25, 2020

For all changes:

Note:

pivotal-eshu Oct 1, 2020

Choose a reason for hiding this comment

DonalEvans Oct 2, 2020

Choose a reason for hiding this comment