HDDS-15371. [DiskBalancer] DiskBalancer should use monotonic time for delayed replica deletion.#10364
Conversation
adoroszlai
left a comment
There was a problem hiding this comment.
Thanks @slfan1989 for the patch.
Can we use Clock instead of hard-coded time source? (MonotonicClock in prod, TestClock in test to avoid the need for sleep (waitFor).)
@adoroszlai Thanks for the suggestion. I have updated the implementation to inject a |
adoroszlai
left a comment
There was a problem hiding this comment.
Thanks @slfan1989 for updating the patch.
Before:
[INFO] Tests run: 56, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 28.32 s -- in org.apache.hadoop.ozone.container.diskbalancer.TestDiskBalancerTask
After:
[INFO] Tests run: 56, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.95 s -- in org.apache.hadoop.ozone.container.diskbalancer.TestDiskBalancerTask
| pendingDeletionContainers.put(System.currentTimeMillis() + replicaDeletionDelay, container); | ||
| pendingDeletionContainers.put(clock.millis() + replicaDeletionDelay, | ||
| container); |
There was a problem hiding this comment.
nit: This line got shorter, so there was no need to wrap. Please adjust line length in your IDE to 120 to avoid it in the future.
What changes were proposed in this pull request?
DiskBalancer uses System.currentTimeMillis() to schedule and check delayed deletion of old replicas after a successful container move.
The delayed deletion is based on a relative delay configured by replicaDeletionDelay. Since System.currentTimeMillis() is wall-clock time, it can be affected by system clock changes such as NTP adjustment, manual time changes, or VM time drift.
If the system clock moves forward, old replicas may become eligible for deletion earlier than expected. If the system clock moves backward, old replicas may remain in the pending deletion queue longer than expected.
DiskBalancer already uses Time.monotonicNow() for other relative-time logic, such as bandwidth throttling with nextAvailableTime. The pending deletion delay should follow the same pattern and use monotonic time for both deadline calculation and expiration checks.
This makes delayed replica deletion independent of wall-clock changes and consistent with other DiskBalancer delay/elapsed-time logic.
What is the link to the Apache JIRA
HDDS-15371. DiskBalancer should use monotonic time for delayed replica deletion.
How was this patch tested?
Exists Unit Test.