HDDS-7989. UnhealthyReplicationProcessor retries failure without delay by adoroszlai · Pull Request #4285 · apache/ozone

adoroszlai · 2023-02-18T07:43:42Z

What changes were proposed in this pull request?

UnhealthyReplicationProcessor#processAll requeues any failed task. Such tasks are attempted in the same processAll call, before exiting the loop. This can flood SCM logs until the cause of the error is resolved.

This causes Github's environment to run out of disk space in just a few minutes after testing EC reconstruction read (test being added in HDDS-7982).

This PR proposes to collect failed container health results and requeue them only after exiting the loop.

https://issues.apache.org/jira/browse/HDDS-7989

How was this patch tested?

Added unit test.

Also verified together with HDDS-7982 (which uncovered the problem without this fix):
https://github.com/adoroszlai/hadoop-ozone/actions/runs/4207471575/jobs/7302558782

Regular CI:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/4207414175

adoroszlai · 2023-02-21T22:16:24Z

Thanks @sodonnel for reviewing and committing it.

adoroszlai added 2 commits February 17, 2023 21:44

Reproduce infinite loop in unit test

b9dfbda

HDDS-7989. UnhealthyReplicationProcessor retries failure without delay

3ae2b55

adoroszlai self-assigned this Feb 18, 2023

adoroszlai added the EC label Feb 18, 2023

adoroszlai requested review from sodonnel and umamaheswararao February 18, 2023 10:20

sodonnel approved these changes Feb 21, 2023

View reviewed changes

sodonnel merged commit 47a68f8 into apache:master Feb 21, 2023

adoroszlai deleted the HDDS-7989 branch February 21, 2023 22:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-7989. UnhealthyReplicationProcessor retries failure without delay#4285

HDDS-7989. UnhealthyReplicationProcessor retries failure without delay#4285
sodonnel merged 2 commits intoapache:masterfrom
adoroszlai:HDDS-7989

adoroszlai commented Feb 18, 2023

Uh oh!

adoroszlai commented Feb 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adoroszlai commented Feb 18, 2023

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

adoroszlai commented Feb 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants