HDDS-7099. Provide a configurable way to cleanup closed unrecoverable container by Xushaohong · Pull Request #3657 · apache/ozone

Xushaohong · 2022-08-05T03:21:56Z

What changes were proposed in this pull request?

Background:
The async write is still not robust enough, sometimes there will be some uncoverable containers (no healthy replicas) when the cluster load is too high.

Currently, such an unrecoverable ratis container will go through the following process.

DN will mark the container as unhealthy and report it to the SCM.
SCM then tries to close the container, and the container state will be closing.
DN won't close an unhealthy replica.
SCM RM will not send close cmd to those unhealthy containers.

Hence, the unrecoverable container will be stuck in the state of Closing.

After the admin fixes some available data in such containers or just abandons them, these containers shall be closed on purpose.

Under such circumstances, we shall provide a configurable way to clean up these closed containers.
After closing the unhealthy container, the unrecoverable container with only unhealthy replicas could be deleted.

This solution could clean up the closed container with any number of replicas( 1, 2, 3).
If three replicas are all unhealthy, RM will delete one replica to attempt to recover. The replicas num will decreased to 2.
If one or two replicas are unhealthy, RM will walk through this PR's logic and could delete the container.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7099

How was this patch tested?

UT and in production env

… container

kerneltime · 2022-08-08T16:07:15Z

@DuongNguyen0 can you take a look

kerneltime · 2022-08-10T08:33:16Z

What is the expected behavior if there is only one replica left and it is unhealthy? Unhealthy does not imply that there are no customer readable keys that have data in that replica. Deletion, in general, is an unsafe option, and we need to be sure we do not introduce a data loss scenario.

Xushaohong · 2022-08-10T08:41:49Z

What is the expected behavior if there is only one replica left and it is unhealthy?

By default, such containers will be left alone and ozone will do nothing to them.

Unhealthy does not imply that there are no customer readable keys that have data in that replica. Deletion, in general, is an unsafe option, and we need to be sure we do not introduce a data loss scenario.

Yes, this is for the case the administrator definitely knows these containers, may have restored part of readable keys, and then need to delete the unrecoverable containers instead of resetting the whole cluster. By enabling the corresponding config, the SCM then will send delete CMDs to DN.

@kerneltime

errose28 · 2022-08-15T23:30:41Z

Hi @Xushaohong, I don't think we want to be deleting containers with all replicas unhealthy automatically, because this will cause divergence between the OM's metadata and the corresponding block storage. If all container replicas are unhealthy, there is no way to recover the containers, and the admin would like the containers removed, it would be better for the admin to delete the keys with data in those containers. The keys can be found using Recon's REST API. Here you can query an index mapping container ID to keys with blocks in the container from the /api/v1/containers/:id/keys endpoint. We should double check that this code path works though. I am not sure what happens if delete block commands get queued for unhealthy containers. If there are bugs in this area we should fix them so that deleting all keys with data in an unhealthy container causes the unhealthy container to eventually be deleted.

Xushaohong · 2022-08-16T07:07:01Z

Hi @Xushaohong, I don't think we want to be deleting containers with all replicas unhealthy automatically, because this will cause divergence between the OM's metadata and the corresponding block storage. If all container replicas are unhealthy, there is no way to recover the containers, and the admin would like the containers removed, it would be better for the admin to delete the keys with data in those containers. The keys can be found using Recon's REST API. Here you can query an index mapping container ID to keys with blocks in the container from the /api/v1/containers/:id/keys endpoint. We should double check that this code path works though. I am not sure what happens if delete block commands get queued for unhealthy containers. If there are bugs in this area we should fix them so that deleting all keys with data in an unhealthy container causes the unhealthy container to eventually be deleted.

@errose28
Hi, Ethan. Thx for the reply. Delete container from SCM RM side seems not a strongly reasonable idea. currently, the logic in isDeletionAllowed only permits the closed container to delete blocks, if the container is unhealthy, DN will not process them and hence reported them back to SCM. Can we add the check condition to support the deletion of the unhealthy container?

kerneltime · 2022-08-22T16:25:15Z

@Xushaohong we discussed this in the weekly open source meeting, and we plan to dive a bit deeper into the overall deletion logic and how this should be operationalized.

errose28 · 2022-08-22T17:45:50Z

Thanks for raising the issue @Xushaohong. Handling containers where all replicas are in a degenerate state is definitely something the system should improve on. Adding to @kerneltime's response based on other discussions around this issue, it seems the desired solution would be to provide a path to remove containers from the system who have all replicas unhealthy or missing and no keys mapped to them, and that the system should do this automatically without extra configuration. My current understanding of this patch is that it is not doing the two parts in bold. Handling this is going to be a bit involved and may require a design document. I will try to write up some ideas to share out soon.

kerneltime · 2022-08-29T16:28:05Z

cc @GeorgeJahad

neils-dev · 2022-08-29T20:03:51Z

@errose28, @kerneltime , I've seen the unrecoverable container condition described in this PR as well where the unrecoverable container is always reported to be in the state of 'Closing' in the SCM where the container is reported as 'Missing' by Recon. I brought this up with @errose28 offline.
In this case, the datanode goes down causing Recon to update the state of the containers to be 'Unheathly' under refesh with an associated Missing Container. The SCM, however always reports this unrecoverable container in the State - Closing which never changes and misleading to the admin. It would be helpful if the PR handles this case to handle and cleanup such unrecoverable containers. See

attached images:

Xushaohong · 2022-08-30T03:16:50Z

Adding to @kerneltime's response based on other discussions around this issue, it seems the desired solution would be to provide a path to remove containers from the system who have all replicas unhealthy or missing and no keys mapped to them, and that the system should do this automatically without extra configuration. My current understanding of this patch is that it is not doing the two parts in bold. Handling this is going to be a bit involved and may require a design document. I will try to write up some ideas to share out soon.

Thx @errose28 for the reply, the auto-detection and cleanup is what we need. Currently, the single component either OM or SCM doesn't have the map of keys to the container. If the service is on SCM, it might need another query to OM to check if the container remains some keys.
One concern is that such a map is only available in the recon API, which is not clear enough and not commonly used.

Xushaohong · 2022-10-25T05:31:34Z

We need a more complete patch, close it first

HDDS-7099. Provide a configurable way to cleanup closed unrecoverable…

c9cd4a7

… container

Xushaohong force-pushed the HDDS-7099 branch from 21d5436 to c9cd4a7 Compare August 5, 2022 08:55

kerneltime requested a review from errose28 August 8, 2022 16:07

Xushaohong closed this Oct 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

HDDS-7099. Provide a configurable way to cleanup closed unrecoverable container#3657

HDDS-7099. Provide a configurable way to cleanup closed unrecoverable container#3657
Xushaohong wants to merge 1 commit intoapache:masterfrom
Xushaohong:HDDS-7099

Xushaohong commented Aug 5, 2022 •

edited

Loading

Uh oh!

kerneltime commented Aug 8, 2022

Uh oh!

kerneltime commented Aug 10, 2022

Uh oh!

Xushaohong commented Aug 10, 2022 •

edited

Loading

Uh oh!

errose28 commented Aug 15, 2022

Uh oh!

Xushaohong commented Aug 16, 2022

Uh oh!

kerneltime commented Aug 22, 2022

Uh oh!

errose28 commented Aug 22, 2022

Uh oh!

kerneltime commented Aug 29, 2022

Uh oh!

neils-dev commented Aug 29, 2022

Uh oh!

Xushaohong commented Aug 30, 2022

Uh oh!

Xushaohong commented Oct 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

Xushaohong commented Aug 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

kerneltime commented Aug 8, 2022

Uh oh!

kerneltime commented Aug 10, 2022

Uh oh!

Xushaohong commented Aug 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

errose28 commented Aug 15, 2022

Uh oh!

Xushaohong commented Aug 16, 2022

Uh oh!

kerneltime commented Aug 22, 2022

Uh oh!

errose28 commented Aug 22, 2022

Uh oh!

kerneltime commented Aug 29, 2022

Uh oh!

neils-dev commented Aug 29, 2022

Uh oh!

Xushaohong commented Aug 30, 2022

Uh oh!

Xushaohong commented Oct 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Xushaohong commented Aug 5, 2022 •

edited

Loading

Xushaohong commented Aug 10, 2022 •

edited

Loading