HDDS-14501. [Website v2] [Docs] [Administrator Guide] Replacing Datanode Disks #285
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
https://ozone-site-v2.staged.apache.org/docs/administrator-guide/operations/disk-replacement/datanodes
A datanode may have multiple data volumes, specified in hdds.datanode.dir. For example,
/data1,/data2,/data3
hdds.datanode.failed.data.volumes.tolerated: The number of data volumes that are allowed to fail before a datanode stops offering service. By default, this value is -1, meaning unlimited.
Similarly, hdds.datanode.failed.metadata.volumes.tolerated allows a number of metadata volumes to fail.
During datanode startup, it performs check to determine if a volume fails. If the datanode is allowed to continue without abort, the volume is taken off. After datanode starts, a periodic disk check is run every 60 minutes (determined by configuration property hdds.datanode.periodic.disk.check.interval.minutes.
When a volume is determined failed, it is chosen by volume choosing policy to allocate new containers.
To replace the failed disks, shut down the datanode, update hdds.datanode.dir to remove it from the directory list, and then restart the datanode.
note: Ozone datanode does not support hotswap yet, meaning to update the disk list, it must restart the datanode process.
The state of volumes can be seen in Datanode metrics and web UI.
Also did some more add ons.
What is the link to the Apache Jira?
https://issues.apache.org/jira/browse/HDDS-14501
How was this patch tested?
Check off which of the following tests were done on this change. If additional testing was done, please elaborate here as well.