Skip to content

Conversation

@jvarenina
Copy link
Contributor

@jvarenina jvarenina commented Dec 13, 2021

Oplogs that are recovered after Region.close() will never be marked as
eligible for compaction. This is because unrecoveredRegionCount
is not cleared from Oplog during recovery of the region. When DiskRegion with
AbstractDiskRegionInfo.unrecovered flag is recovered, then unrecoveredRegionCount
should be decreased for the Oplog (check checkForRecoverableRegion function).
Problem here is that AbstractDiskRegionInfo objects are deleted in
Region.close(DiskRegion) and then recreated during recovery with flag
AbstractDiskRegionInfo.unrecovered set to false. Due to this unrecoveredRegionCount
for the Oplog is not decreased during the recovery. This issue doesn't happen when
cache/server is restarted because in that case unrecoveredRegionCount will be cleared.

Solution:
During close() create and mark AbstractDiskRegionInfo object as unrecovered in all Oplog's
that are not yet compacted. This way flag AbstractDiskRegionInfo.unrecovered will be read
during the recovery and unrecoveredRegionCount will be decreased.

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

  • Has your PR been rebased against the latest commit within the target branch (typically develop)?

  • Is your initial contribution a single, squashed commit?

  • Does gradlew build run cleanly?

  • Have you written or updated unit tests to verify your changes?

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

@jvarenina jvarenina force-pushed the feature/GEODE-9881 branch 2 times, most recently from ec550cc to e2cf136 Compare December 14, 2021 08:25
@jvarenina jvarenina marked this pull request as ready for review December 15, 2021 07:55
* Verifies that compaction works as expected after region is recovered
**/
@Test
public void testThatCompactionWorksAfterRegionIsClosedAndThenRecovered()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test passes even with the code changes in this PR reverted. Would it be possible to add a test that fails with the previous behaviour but passes with the new behaviour?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review! I executed mentioned test case multiple times with reverted changes and test case failed every time. It was always the case that first five Oplogs were not compacted as expected. It is possible that this test case is flaky in some way, but I cannot reproduce it or see anything suspicious. Could you please send me logs from test case execution where it pass after you revert the changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have screwed up reverting the changes to Oplog.java when first testing this, because when I check now, the test fails as expected.

/**
* Verifies that automatic compaction works after cache recovered from oplogs
*/
public class DiskRegionCompactorClearOplogAfterRecoveryJUnitTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the guidelines in the Geode Wiki, integration test classes should have names ending with "IntegrationTest" rather than "JUnitTest." Could this class be renamed to reflect that please?

Copy link
Contributor

@kirklund kirklund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second Donal's request for a test that fails without the change. It would be best to make it a lower level unit test but integration test is ok. Assuming that, I'll approve the PR now. Thanks!

Comment on lines 478 to 479
await()
.untilAsserted(() -> assertTrue(ma.getUsedMemory() > 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use AssertJ so it prints out a more detailed failure message:

await().untilAsserted(() -> {
  assertThat(ma.getUsedMemory()).isGreaterThan(0);
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review. I have applied all of your comments.

Comment on lines 74 to 75
config.setProperty(MCAST_PORT, "0");
config.setProperty(LOCATORS, "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the default values so specifying them is extraneous. You don't need them.

cache.close();
} finally {
DiskStoreImpl.SET_IGNORE_PREALLOCATE = false;
disconnectAllFromDS();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to call disconnectAllFromDS. It's a leftover from long ago when cache.close() did not disconnect from DS (it does now).


createDiskStore(30, 10000);
Region<Object, Object> region = createRegion();
DiskStoreImpl diskStore = ((LocalRegion) region).getDiskStore();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use or cast to interfaces instead of concrete impls where possible. You can use InternalRegion instead of LocalRegion here. If you change createRegion to return InternalRegion (and perform the cast in that method) then you can avoid the cast here, making it easier to read.

Comment on lines 183 to 189
private void createDiskStoreWithSizeInBytes(String diskStoreName,
DiskStoreFactory diskStoreFactory,
long maxOplogSizeInBytes) {
((DiskStoreFactoryImpl) diskStoreFactory).setMaxOplogSizeInBytes(maxOplogSizeInBytes);
((DiskStoreFactoryImpl) diskStoreFactory).setDiskDirSizesUnit(DiskDirSizesUnit.BYTES);
diskStoreFactory.create(diskStoreName);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this method to require DiskStoreFactoryImpl diskStoreFactory as a parameter just to be a bit cleaner.

Oplogs that are recovered after Region.close() will never be marked as
eligible for compaction. This is because during recovery of the region
the counter unrecoveredRegionCount is not cleared. When
AbstractDiskRegionInfo object contained in Oplog is recovered then
unrecoveredRegionCount should be decreased. This counter
is decreased only if AbstractDiskRegionInfo.unrecovered
is set to true for the object that is about to be recover (check
checkForRecoverableRegion function). Problem is that
these objects are deleted in Region.close(DiskRegion)
and then recreated during recovery with flag unrecoverd set to false.
Due to this unrecoveredRegionCount for the Oplog is not decreased after
the recovery.This issue doesn't happen when cache/server is restarted
because in that case this counter will be cleared.

Solution:
During close() mark all DiskRegion info objects as unrecovered in all Oplog's
that are not yet compacted. After region is recovered then flag unrecovered
is check and the counter checkForRecoverableRegion will be decreased.
@mkevo mkevo merged commit c0fbe30 into apache:develop Jan 11, 2022
@jvarenina jvarenina deleted the feature/GEODE-9881 branch January 11, 2022 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants