Skip to content

Conversation

@Gargi-jais11
Copy link

@Gargi-jais11 Gargi-jais11 commented Jan 25, 2026

What changes were proposed in this pull request?

https://ozone-site-v2.staged.apache.org/docs/administrator-guide/operations/disk-replacement/datanodes

A datanode may have multiple data volumes, specified in hdds.datanode.dir. For example,

/data1,/data2,/data3
hdds.datanode.failed.data.volumes.tolerated: The number of data volumes that are allowed to fail before a datanode stops offering service. By default, this value is -1, meaning unlimited.

Similarly, hdds.datanode.failed.metadata.volumes.tolerated allows a number of metadata volumes to fail.

During datanode startup, it performs check to determine if a volume fails. If the datanode is allowed to continue without abort, the volume is taken off. After datanode starts, a periodic disk check is run every 60 minutes (determined by configuration property hdds.datanode.periodic.disk.check.interval.minutes.

When a volume is determined failed, it is chosen by volume choosing policy to allocate new containers.

To replace the failed disks, shut down the datanode, update hdds.datanode.dir to remove it from the directory list, and then restart the datanode.

note: Ozone datanode does not support hotswap yet, meaning to update the disk list, it must restart the datanode process.

The state of volumes can be seen in Datanode metrics and web UI.

Also did some more add ons.

What is the link to the Apache Jira?

https://issues.apache.org/jira/browse/HDDS-14501

How was this patch tested?

Check off which of the following tests were done on this change. If additional testing was done, please elaborate here as well.

  • The CI checks on my fork are passing
  • I verified the rendered content using a local preview
  • I manually verified the steps provided in this change work as described

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 looks correct to me. I think this is good enough in any case. We can revisit later if there are minor issues.

@jojochuang jojochuang merged commit 7bc7394 into apache:HDDS-9225-website-v2 Jan 26, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants