HDDS-14868. Avoid full scan of container list during refreshAndValidate of ContainerSafemodeRule. by sadanand48 · Pull Request #9953 · apache/ozone

sadanand48 · 2026-03-20T10:35:29Z

What changes were proposed in this pull request?

Periodic refresh — Run refresh on a ~5s (configurable) schedule instead of on every applyTransaction / refresh(false) path.

https://issues.apache.org/jira/browse/HDDS-14868

…te of ContainerSafemodeRule.

...ver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/AbstractContainerSafeModeRule.java

szetszwo · 2026-03-26T11:17:37Z

@sadanand48 , thanks for working on this!

How about refreshing the safemode rules every 5s, instead of doing it in applyTransactions?

sadanand48 · 2026-03-26T16:55:45Z

How about refreshing the safemode rules every 5s, instead of doing it in applyTransactions?

Thanks @szetszwo for the input, we could make this behaviour configurable i.e periodic or based on applyTransaction. I'm saying because smaller clusters or cluster's without any pending logs may be impacted by redundant refresh calls.

szetszwo · 2026-03-26T17:22:32Z

... smaller clusters or cluster's without any pending logs may be impacted by redundant refresh calls.

Refreshing the safemode rules in applyTransaction actually is a big mistake -- applyTransaction is the critical path of the StateMachine, adding unnecessary operations there is going to slow down everything.

In contrast, refreshing the safemode rules every 5s is not going to have any measurable performance impact. Hypothetically, if refreshing every 5s is not okay, then refreshing it applyTransaction is definitely much worse since there are thousands of applyTransaction ops per second.

...ver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/AbstractContainerSafeModeRule.java

szetszwo

@sadanand48 , thanks for the update

Since the current code in SCMStateMachine use SCMSafeModeManager to refresh, it is better to do refresh in SCMSafeModeManager.
When refresh is enabled, SCMStateMachine should not refresh.
During refreshing, if it is NOT in safemode, we can stop the executor. Then, we don't need any stop method.
It is better to create a non-mock test using MiniOzoneCluster.

See https://issues.apache.org/jira/secure/attachment/13081501/9953_review.patch

...ver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/AbstractContainerSafeModeRule.java

sadanand48 · 2026-03-31T13:02:52Z

Thanks @szetszwo for the review, updated as per your patch

it is better to do refresh in SCMSafeModeManager.

With this, all the safemode rules will have the same behaviour, I guess that should be okay. I will add a non-mock test

szetszwo

@sadanand48 , thanks for the update!

Quick question:

Would it work if we don't make the changes in AbstractContainerSafeModeRule and other code logic changes such as isScmRatisApplyCaughtUpToCommit?

If it works, this PR should only change the refreshing time (i.e. periodic refreshing instead of doing it in SCMStateMachine.) Other code logic changes/improvement can be done in a separate PR.

szetszwo · 2026-04-03T19:06:43Z

Would it work if we don't make the changes in AbstractContainerSafeModeRule and other code logic changes such as isScmRatisApplyCaughtUpToCommit?

@sadanand48 , any thought?

sadanand48 · 2026-04-06T08:32:34Z

Yes @szetszwo , it should work without other changes. The other change is only about isScmRatisApplyCaughtUpToCommit where we don't refresh if there are no new pending transactions. This is just an optimization . In the current revision of the patch I have removed this

szetszwo

@sadanand48 , thanks for the update. Please see the commnets inlined.

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java

...op-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/SCMSafeModeManager.java

szetszwo · 2026-04-08T17:09:22Z

Just found that there is already a safeModeLogExecutor. We should just use it instead of creating a new ScheduledExecutorService.

@sadanand48 , very sorry that this idea actually is bad since the safeModeLogExecutor is new feature by HDDS-14012. Not sure if it is rock solid. Let don't use it. We may simply start a thread instead; see https://issues.apache.org/jira/secure/attachment/13081633/9953_review2.patch

szetszwo

@sadanand48 , thanks for the update! Please see the comments inlined.

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/SafeModeExitRule.java

...op-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/SCMSafeModeManager.java

szetszwo

+1 the change looks good.

The failed test (TestContainerStateMachine) seems unrelated. Please take a look.

sadanand48 added 2 commits March 20, 2026 14:40

HDDS-14868. Avoid full scan of container list during refreshAndValida…

30f4081

…te of ContainerSafemodeRule.

fix 1

30290d6

sumitagrawl reviewed Mar 25, 2026

View reviewed changes

...ver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/AbstractContainerSafeModeRule.java Outdated Show resolved Hide resolved

...ver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/AbstractContainerSafeModeRule.java Outdated Show resolved Hide resolved

refresh every 5s

2a9aadd

sadanand48 requested a review from szetszwo March 27, 2026 07:58

szetszwo reviewed Mar 28, 2026

View reviewed changes

...ver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/AbstractContainerSafeModeRule.java Outdated Show resolved Hide resolved

sadanand48 added 3 commits March 30, 2026 14:48

refresh only when transactions pending

d74022e

revert new map additions

66163f1

compile

efdefc9

szetszwo reviewed Mar 30, 2026

View reviewed changes

...ver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/AbstractContainerSafeModeRule.java Outdated Show resolved Hide resolved

...ver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/AbstractContainerSafeModeRule.java Outdated Show resolved Hide resolved

sadanand48 added 2 commits March 31, 2026 18:23

address comments

34603b8

code cleanup

e2bfca4

szetszwo reviewed Mar 31, 2026

View reviewed changes

sadanand48 added 3 commits April 1, 2026 12:03

checkstyle

cafe82a

add to default xml

7e9a8f7

add tests

c3fb5a5

sadanand48 marked this pull request as ready for review April 6, 2026 10:25

szetszwo reviewed Apr 7, 2026

View reviewed changes

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java Show resolved Hide resolved

...op-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/SCMSafeModeManager.java Outdated Show resolved Hide resolved

address comments

82ab7de

address comments

aed2ead

sadanand48 marked this pull request as draft April 9, 2026 07:30

sadanand48 marked this pull request as ready for review April 9, 2026 16:45

call in start()

2fc6901

szetszwo reviewed Apr 9, 2026

View reviewed changes

sadanand48 added 2 commits April 10, 2026 13:39

address comments

02e679e

Merge branch 'master' into HDDS-14868

a7a0ec1

szetszwo approved these changes Apr 10, 2026

View reviewed changes

fix test

de7c752

Conversation

sadanand48 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Uh oh!

Uh oh!

Uh oh!

szetszwo commented Mar 26, 2026

Uh oh!

sadanand48 commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szetszwo commented Mar 26, 2026

Uh oh!

Uh oh!

szetszwo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sadanand48 commented Mar 31, 2026

Uh oh!

szetszwo left a comment

Choose a reason for hiding this comment

Uh oh!

szetszwo commented Apr 3, 2026

Uh oh!

sadanand48 commented Apr 6, 2026

Uh oh!

szetszwo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

szetszwo commented Apr 8, 2026

Uh oh!

szetszwo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

szetszwo left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sadanand48 commented Mar 20, 2026 •

edited

Loading

sadanand48 commented Mar 26, 2026 •

edited

Loading