HDDS-7099. Provide a configurable way to cleanup closed unrecoverable container#3657
HDDS-7099. Provide a configurable way to cleanup closed unrecoverable container#3657Xushaohong wants to merge 1 commit intoapache:masterfrom
Conversation
|
@DuongNguyen0 can you take a look |
|
What is the expected behavior if there is only one replica left and it is unhealthy? Unhealthy does not imply that there are no customer readable keys that have data in that replica. Deletion, in general, is an unsafe option, and we need to be sure we do not introduce a data loss scenario. |
By default, such containers will be left alone and ozone will do nothing to them.
Yes, this is for the case the administrator definitely knows these containers, may have restored part of readable keys, and then need to delete the unrecoverable containers instead of resetting the whole cluster. By enabling the corresponding config, the SCM then will send delete CMDs to DN. |
|
Hi @Xushaohong, I don't think we want to be deleting containers with all replicas unhealthy automatically, because this will cause divergence between the OM's metadata and the corresponding block storage. If all container replicas are unhealthy, there is no way to recover the containers, and the admin would like the containers removed, it would be better for the admin to delete the keys with data in those containers. The keys can be found using Recon's REST API. Here you can query an index mapping container ID to keys with blocks in the container from the |
@errose28 |
|
@Xushaohong we discussed this in the weekly open source meeting, and we plan to dive a bit deeper into the overall deletion logic and how this should be operationalized. |
|
Thanks for raising the issue @Xushaohong. Handling containers where all replicas are in a degenerate state is definitely something the system should improve on. Adding to @kerneltime's response based on other discussions around this issue, it seems the desired solution would be to provide a path to remove containers from the system who have all replicas unhealthy or missing and no keys mapped to them, and that the system should do this automatically without extra configuration. My current understanding of this patch is that it is not doing the two parts in bold. Handling this is going to be a bit involved and may require a design document. I will try to write up some ideas to share out soon. |
|
cc @GeorgeJahad |
|
@errose28, @kerneltime , I've seen the unrecoverable container condition described in this PR as well where the unrecoverable container is always reported to be in the state of 'Closing' in the SCM where the container is reported as 'Missing' by Recon. I brought this up with @errose28 offline. attached images: |
Thx @errose28 for the reply, the auto-detection and cleanup is what we need. Currently, the single component either OM or SCM doesn't have the map of keys to the container. If the service is on SCM, it might need another query to OM to check if the container remains some keys. |
|
We need a more complete patch, close it first |


What changes were proposed in this pull request?
Background:
The async write is still not robust enough, sometimes there will be some uncoverable containers (no healthy replicas) when the cluster load is too high.
Currently, such an unrecoverable ratis container will go through the following process.
DN will mark the container as unhealthy and report it to the SCM.
SCM then tries to close the container, and the container state will be closing.
DN won't close an unhealthy replica.
SCM RM will not send close cmd to those unhealthy containers.
Hence, the unrecoverable container will be stuck in the state of Closing.
After the admin fixes some available data in such containers or just abandons them, these containers shall be closed on purpose.
Under such circumstances, we shall provide a configurable way to clean up these closed containers.
After closing the unhealthy container, the unrecoverable container with only unhealthy replicas could be deleted.
This solution could clean up the closed container with any number of replicas( 1, 2, 3).
If three replicas are all unhealthy, RM will delete one replica to attempt to recover. The replicas num will decreased to 2.
If one or two replicas are unhealthy, RM will walk through this PR's logic and could delete the container.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-7099
How was this patch tested?
UT and in production env