GEODE-9881: Oplog not compacted after recovery #7193

jvarenina · 2021-12-13T10:59:09Z

Oplogs that are recovered after Region.close() will never be marked as
eligible for compaction. This is because unrecoveredRegionCount
is not cleared from Oplog during recovery of the region. When DiskRegion with
AbstractDiskRegionInfo.unrecovered flag is recovered, then unrecoveredRegionCount
should be decreased for the Oplog (check checkForRecoverableRegion function).
Problem here is that AbstractDiskRegionInfo objects are deleted in
Region.close(DiskRegion) and then recreated during recovery with flag
AbstractDiskRegionInfo.unrecovered set to false. Due to this unrecoveredRegionCount
for the Oplog is not decreased during the recovery. This issue doesn't happen when
cache/server is restarted because in that case unrecoveredRegionCount will be cleared.

Solution:
During close() create and mark AbstractDiskRegionInfo object as unrecovered in all Oplog's
that are not yet compacted. This way flag AbstractDiskRegionInfo.unrecovered will be read
during the recovery and unrecoveredRegionCount will be decreased.

For all changes:

Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?
Has your PR been rebased against the latest commit within the target branch (typically develop)?
Is your initial contribution a single, squashed commit?
Does gradlew build run cleanly?
Have you written or updated unit tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

DonalEvans · 2022-01-06T21:29:03Z

...ava/org/apache/geode/internal/cache/DiskRegionCompactorClearOplogAfterRecoveryJUnitTest.java

+   * Verifies that compaction works as expected after region is recovered
+   **/
+  @Test
+  public void testThatCompactionWorksAfterRegionIsClosedAndThenRecovered()


This test passes even with the code changes in this PR reverted. Would it be possible to add a test that fails with the previous behaviour but passes with the new behaviour?

Thank you for the review! I executed mentioned test case multiple times with reverted changes and test case failed every time. It was always the case that first five Oplogs were not compacted as expected. It is possible that this test case is flaky in some way, but I cannot reproduce it or see anything suspicious. Could you please send me logs from test case execution where it pass after you revert the changes?

I must have screwed up reverting the changes to Oplog.java when first testing this, because when I check now, the test fails as expected.

DonalEvans · 2022-01-06T21:32:51Z

...ava/org/apache/geode/internal/cache/DiskRegionCompactorClearOplogAfterRecoveryJUnitTest.java

+/**
+ * Verifies that automatic compaction works after cache recovered from oplogs
+ */
+public class DiskRegionCompactorClearOplogAfterRecoveryJUnitTest {


Per the guidelines in the Geode Wiki, integration test classes should have names ending with "IntegrationTest" rather than "JUnitTest." Could this class be renamed to reflect that please?

kirklund

I second Donal's request for a test that fails without the change. It would be best to make it a lower level unit test but integration test is ok. Assuming that, I'll approve the PR now. Thanks!

kirklund · 2022-01-07T21:00:11Z

geode-core/src/integrationTest/java/org/apache/geode/internal/offheap/OffHeapRegionBase.java

+        await()
+            .untilAsserted(() -> assertTrue(ma.getUsedMemory() > 0));


Please use AssertJ so it prints out a more detailed failure message:

await().untilAsserted(() -> { assertThat(ma.getUsedMemory()).isGreaterThan(0); });

Thank you for the review. I have applied all of your comments.

kirklund · 2022-01-07T21:01:05Z

...ava/org/apache/geode/internal/cache/DiskRegionCompactorClearOplogAfterRecoveryJUnitTest.java

+    config.setProperty(MCAST_PORT, "0");
+    config.setProperty(LOCATORS, "");


These are the default values so specifying them is extraneous. You don't need them.

kirklund · 2022-01-07T21:03:07Z

...ava/org/apache/geode/internal/cache/DiskRegionCompactorClearOplogAfterRecoveryJUnitTest.java

+      cache.close();
+    } finally {
+      DiskStoreImpl.SET_IGNORE_PREALLOCATE = false;
+      disconnectAllFromDS();


You don't need to call disconnectAllFromDS. It's a leftover from long ago when cache.close() did not disconnect from DS (it does now).

kirklund · 2022-01-07T21:06:04Z

...ava/org/apache/geode/internal/cache/DiskRegionCompactorClearOplogAfterRecoveryJUnitTest.java

+
+    createDiskStore(30, 10000);
+    Region<Object, Object> region = createRegion();
+    DiskStoreImpl diskStore = ((LocalRegion) region).getDiskStore();


Please use or cast to interfaces instead of concrete impls where possible. You can use InternalRegion instead of LocalRegion here. If you change createRegion to return InternalRegion (and perform the cast in that method) then you can avoid the cast here, making it easier to read.

kirklund · 2022-01-07T21:08:24Z

...ava/org/apache/geode/internal/cache/DiskRegionCompactorClearOplogAfterRecoveryJUnitTest.java

+  private void createDiskStoreWithSizeInBytes(String diskStoreName,
+      DiskStoreFactory diskStoreFactory,
+      long maxOplogSizeInBytes) {
+    ((DiskStoreFactoryImpl) diskStoreFactory).setMaxOplogSizeInBytes(maxOplogSizeInBytes);
+    ((DiskStoreFactoryImpl) diskStoreFactory).setDiskDirSizesUnit(DiskDirSizesUnit.BYTES);
+    diskStoreFactory.create(diskStoreName);
+  }


I would change this method to require DiskStoreFactoryImpl diskStoreFactory as a parameter just to be a bit cleaner.

Oplogs that are recovered after Region.close() will never be marked as eligible for compaction. This is because during recovery of the region the counter unrecoveredRegionCount is not cleared. When AbstractDiskRegionInfo object contained in Oplog is recovered then unrecoveredRegionCount should be decreased. This counter is decreased only if AbstractDiskRegionInfo.unrecovered is set to true for the object that is about to be recover (check checkForRecoverableRegion function). Problem is that these objects are deleted in Region.close(DiskRegion) and then recreated during recovery with flag unrecoverd set to false. Due to this unrecoveredRegionCount for the Oplog is not decreased after the recovery.This issue doesn't happen when cache/server is restarted because in that case this counter will be cleared. Solution: During close() mark all DiskRegion info objects as unrecovered in all Oplog's that are not yet compacted. After region is recovered then flag unrecovered is check and the counter checkForRecoverableRegion will be decreased.

jvarenina force-pushed the feature/GEODE-9881 branch 2 times, most recently from ec550cc to e2cf136 Compare December 14, 2021 08:25

jvarenina marked this pull request as ready for review December 15, 2021 07:55

jvarenina requested review from DonalEvans, dschneider-pivotal, jchen21, kirklund, nabarunnag and pivotal-eshu as code owners December 15, 2021 07:55

DonalEvans requested changes Jan 6, 2022

View reviewed changes

kirklund approved these changes Jan 7, 2022

View reviewed changes

jvarenina added 2 commits January 10, 2022 13:09

Rebase and updates after review

0f1c4d6

jvarenina force-pushed the feature/GEODE-9881 branch from e2cf136 to 0f1c4d6 Compare January 10, 2022 12:42

jvarenina requested review from DonalEvans and kirklund January 10, 2022 12:43

DonalEvans approved these changes Jan 11, 2022

View reviewed changes

Empty commit to trigger LGTM

ef829ef

mkevo merged commit c0fbe30 into apache:develop Jan 11, 2022

jvarenina deleted the feature/GEODE-9881 branch January 11, 2022 12:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GEODE-9881: Oplog not compacted after recovery #7193

GEODE-9881: Oplog not compacted after recovery #7193

Uh oh!

jvarenina commented Dec 13, 2021 •

edited

Loading

Uh oh!

DonalEvans Jan 6, 2022

Uh oh!

jvarenina Jan 10, 2022

Uh oh!

DonalEvans Jan 11, 2022

Uh oh!

DonalEvans Jan 6, 2022

Uh oh!

kirklund left a comment

Uh oh!

kirklund Jan 7, 2022

Uh oh!

jvarenina Jan 10, 2022

Uh oh!

kirklund Jan 7, 2022

Uh oh!

kirklund Jan 7, 2022

Uh oh!

kirklund Jan 7, 2022

Uh oh!

kirklund Jan 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		await()
		.untilAsserted(() -> assertTrue(ma.getUsedMemory() > 0));

		config.setProperty(MCAST_PORT, "0");
		config.setProperty(LOCATORS, "");

GEODE-9881: Oplog not compacted after recovery #7193

GEODE-9881: Oplog not compacted after recovery #7193

Uh oh!

Conversation

jvarenina commented Dec 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

For all changes:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kirklund left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jvarenina commented Dec 13, 2021 •

edited

Loading