Skip to content

HDDS-6555. Container Deletion should not depend on usedBytes being zero#3276

Merged
errose28 merged 3 commits intoapache:masterfrom
hanishakoneru:HDDS-6555
Apr 22, 2022
Merged

HDDS-6555. Container Deletion should not depend on usedBytes being zero#3276
errose28 merged 3 commits intoapache:masterfrom
hanishakoneru:HDDS-6555

Conversation

@hanishakoneru
Copy link
Contributor

What changes were proposed in this pull request?

Container BlockCount and UsedBytes have not been not reliable. HDDS-5359 fixes the issues with how blockCount and usedBytes are updated. HDDS-6234 provides a Container Inspector and Repair tool to fix existing containers with wrong blockCount and usedBytes values in container metadata.

Even after the fix in HDDS-5359, usedBytes cannot be trusted to be an accurate representation of the actual number of bytes in the container. This is because usedBytes is updated in memory first when a chunk is written and then updated in DB during the putBlock call. Also, there could be orphaned chunks in the container which contribute to the usedBytes.

After HDDS-5359,blockCount is reliable for new containers. So SCM should delete a container based on the blockCount = 0 and not check for usedBytes.

Also, when a DN receives a delete container command from SCM, it should double check that there are no valid blocks in the container before deleting it. This is an extra check on the DN side to avoid deleting a non-empty container.

What is the link to the Apache JIRA

(https://issues.apache.org/jira/browse/HDDS-6555)

How was this patch tested?

(Please explain how this patch was tested. Ex: unit tests, manual tests)
(If this patch involves UI changes, please attach a screen-shot; otherwise, remove this)

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement @hanishakoneru. I think we should add a unit test to TestReplicationManager to test that deletions are/are not done when bytes and block counts are/are not zero. A modified version of TestReplicationManager#testDeleteCommandTimeout should allow us to do this, although TestReplicationManager#createContainer and HddsTestUtils#getReplicas are setting arbitrary used bytes values by default which may interfere with some cases.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tests @hanishakoneru. One minor comment otherwise LGTM.

@errose28
Copy link
Contributor

Thanks for the fix @hanishakoneru. Merging since both datanode and SCM side tests have been added to address the comments.

@errose28 errose28 merged commit cb2754f into apache:master Apr 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants