Skip to content

HDDS-9037. Provide documentation for Decommissioning in Ozone#5146

Merged
ChenSammi merged 4 commits intoapache:masterfrom
aryangupta1998:HDDS-9037
Aug 30, 2023
Merged

HDDS-9037. Provide documentation for Decommissioning in Ozone#5146
ChenSammi merged 4 commits intoapache:masterfrom
aryangupta1998:HDDS-9037

Conversation

@aryangupta1998
Copy link
Contributor

What changes were proposed in this pull request?

This PR provides the documentation for decommissioning of OM, SCM and Datanode.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9037

How was this patch tested?

Tested Manually.

Ozone Manager (OM) decommissioning is the process in which you gracefully remove one of the OM from the OM HA Ring.

To decommission an OM and remove the node from the OM HA ring, the following steps need to be executed.
1. Stop the OzoneManager process only on the node which needs to be decommissioned. <p> **Note -** Do not stop the decommissioning OM if there are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is stop the OM a must step before decommission?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we even won't stop, after adding the property ozone.om.decommissioned.nodes.<omServiceId> mentioned in step 2, the OM node will be automatically stopped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If OM stop is a must step. Then it's kind conflict with the Note that "Do not stop the decommissioning OM if there are only two OMs in the ring as both the OMs would be needed to reach consensus to update the Ratis configuration".
If change the property ozone.om.decommissioned.nodes. will cause OM to automatically stop, then it looks like it's not feasible to decommission one OM when there are only two OM instances left.
I'm a little confused at the point. Could you do a further explain?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aryangupta1998 , please remove this step 1, as it's not a must step for upstream Ozone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed Step 1.

```
You can obtain the 'nodeid' by executing this command, **"ozone admin scm roles"**

**Note -** If you want to decommission a **primordial** scm, first change the primordial scm.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we suggest to change the primordial node configure? AFAIK, primordial scm configuration cannot be changed after the cluster is initialized and running for a period of time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't directly decommission a primordial scm so we make some other scm host as primordial scm node by updating the property ozone.scm.primordial.node.id and then decommission the original primordial scm. Same mentioned here.

Copy link
Contributor

@ChenSammi ChenSammi Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the limitation posted by CM? In the community version, I think there is no such limitation. You can refer to scm-decommission.robot test.

Copy link
Contributor

@ChenSammi ChenSammi Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nandakumar131 , can you please help to take a look of this statement?

@ChenSammi
Copy link
Contributor

Hi @aryangupta1998 , could you help to create a follow up JIRA for Chinese version document of this decommission?

@aryangupta1998
Copy link
Contributor Author

aryangupta1998 commented Aug 16, 2023

Hi @aryangupta1998 , could you help to create a follow up JIRA for Chinese version document of this decommission?

Sure.
Update: Created HDDS-9181 for documentation of Ozone decommissioning in Mandarin.

Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from the Datanode Decommissioning perspective. Once you reach a conclusion on the other points we can commit.

Copy link
Contributor

@ChenSammi ChenSammi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aryangupta1998, the latest patch LGTM +1.

@ChenSammi ChenSammi merged commit 5242302 into apache:master Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants