Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-10720. Datanode volume DU reserved percent should have a non-zero default value. #6561

Merged
merged 5 commits into from
May 3, 2024

Conversation

errose28
Copy link
Contributor

@errose28 errose28 commented Apr 19, 2024

What changes were proposed in this pull request?

Currently there are two ways to reserve space in datanode volumes:

  1. hdds.datanode.dir.du.reserved.percent allows specifying a percentage of the volume's space to be unused. It applies to all volumes
  2. hdds.datanode.dir.du.reserved allows specifying a map of volume name to bytes reserved. Since it depends on a volume path, it cannot have a default value.

By default Ozone should not allow datanode volumes to get 100% full. This can cause the drive to "lock up" because some operations like block delete that would free up space still need extra disk space before they can complete because they must append to the RocksDB WAL. Once encountered, such issues are difficult to resolve. Add a default value for hdds.datanode.dir.du.reserved.percent to prevent this from happening.

A default value of 0.0001f is currently chosen. This is 0.01% which reserves 1GB out of a 10TB volume, 1MB out of a 1TB volume, etc. Ideally we could reserve a fixed size (like 1GB) regardless of drive size, but we would need to re-work the configs before we can do that which might need more discussion. See HDDS-10721.

This PR also fixes a few other bugs that prevented tests from passing after the change:

  • A non-zero default value for hdds.datanode.dir.du.reserved.percent would not be used.
  • Canonicalization was not done on paths in hdds.datanode.dir.du.reserved. This may have passed in CI but was failing due to my local filesystem setup.
  • Invalid space reserved configurations would fall back to 0 (hardcoded) instead of the default value.

What is the link to the Apache JIRA

HDDS-10720

How was this patch tested?

Unit test added.

@errose28 errose28 marked this pull request as draft April 20, 2024 00:19
@errose28
Copy link
Contributor Author

Looks like there's other tests that were depending on 0 being the default value. Let me get them fixed on my fork before running CI here.

@errose28
Copy link
Contributor Author

Green CI on my fork, ready for review.

@errose28 errose28 marked this pull request as ready for review April 22, 2024 17:36
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @errose28 for the patch, LGTM.

@adoroszlai adoroszlai merged commit 1324e95 into apache:master May 3, 2024
50 checks passed
@errose28
Copy link
Contributor Author

errose28 commented May 3, 2024

Thanks for the review @adoroszlai

jojochuang pushed a commit to jojochuang/ozone that referenced this pull request May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants