HDDS-9037. Provide documentation for Decommissioning in Ozone#5146
HDDS-9037. Provide documentation for Decommissioning in Ozone#5146ChenSammi merged 4 commits intoapache:masterfrom
Conversation
| Ozone Manager (OM) decommissioning is the process in which you gracefully remove one of the OM from the OM HA Ring. | ||
|
|
||
| To decommission an OM and remove the node from the OM HA ring, the following steps need to be executed. | ||
| 1. Stop the OzoneManager process only on the node which needs to be decommissioned. <p> **Note -** Do not stop the decommissioning OM if there are |
There was a problem hiding this comment.
Is stop the OM a must step before decommission?
There was a problem hiding this comment.
Yes, if we even won't stop, after adding the property ozone.om.decommissioned.nodes.<omServiceId> mentioned in step 2, the OM node will be automatically stopped.
There was a problem hiding this comment.
If OM stop is a must step. Then it's kind conflict with the Note that "Do not stop the decommissioning OM if there are only two OMs in the ring as both the OMs would be needed to reach consensus to update the Ratis configuration".
If change the property ozone.om.decommissioned.nodes. will cause OM to automatically stop, then it looks like it's not feasible to decommission one OM when there are only two OM instances left.
I'm a little confused at the point. Could you do a further explain?
There was a problem hiding this comment.
@aryangupta1998 , please remove this step 1, as it's not a must step for upstream Ozone.
There was a problem hiding this comment.
Removed Step 1.
| ``` | ||
| You can obtain the 'nodeid' by executing this command, **"ozone admin scm roles"** | ||
|
|
||
| **Note -** If you want to decommission a **primordial** scm, first change the primordial scm. |
There was a problem hiding this comment.
Why we suggest to change the primordial node configure? AFAIK, primordial scm configuration cannot be changed after the cluster is initialized and running for a period of time.
There was a problem hiding this comment.
We can't directly decommission a primordial scm so we make some other scm host as primordial scm node by updating the property ozone.scm.primordial.node.id and then decommission the original primordial scm. Same mentioned here.
There was a problem hiding this comment.
Is this the limitation posted by CM? In the community version, I think there is no such limitation. You can refer to scm-decommission.robot test.
There was a problem hiding this comment.
Hi @nandakumar131 , can you please help to take a look of this statement?
|
Hi @aryangupta1998 , could you help to create a follow up JIRA for Chinese version document of this decommission? |
Sure. |
sodonnel
left a comment
There was a problem hiding this comment.
LGTM from the Datanode Decommissioning perspective. Once you reach a conclusion on the other points we can commit.
ChenSammi
left a comment
There was a problem hiding this comment.
Thanks @aryangupta1998, the latest patch LGTM +1.
What changes were proposed in this pull request?
This PR provides the documentation for decommissioning of OM, SCM and Datanode.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-9037
How was this patch tested?
Tested Manually.