Skip to content

HDDS-15261. Increment UNHEALTHY count when a container reaches sufficient unhealthy replicas#10260

Merged
sumitagrawl merged 1 commit into
apache:masterfrom
sarvekshayr:HDDS-15261
May 13, 2026
Merged

HDDS-15261. Increment UNHEALTHY count when a container reaches sufficient unhealthy replicas#10260
sumitagrawl merged 1 commit into
apache:masterfrom
sarvekshayr:HDDS-15261

Conversation

@sarvekshayr
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Fixed a regression from HDDS-14119 in RatisUnhealthyReplicationCheckHandler, where the UNHEALTHY container count was not being incremented in reports. While the handler correctly identified under- and over-replicated containers, it missed the count for containers that were sufficiently replicated but unhealthy. Added an explicit check to ensure these containers are accurately tracked after the replication health checks are performed.

What is the link to the Apache JIRA

HDDS-15261

How was this patch tested?

Before fix:

bash-5.1$ ozone admin container info 1
Container id: 1
Pipeline id: 60de5f05-1ac3-4ca7-9250-49a24d89fee0
Write PipelineId: 9c523134-f14d-466f-9a62-46faa585bcb7
Write Pipeline State: CLOSED
Container State: QUASI_CLOSED
SequenceId: 4
Datanodes: [f31498ed-62e5-4ecb-8758-dd65dab6a8cb/ozone-balancer-datanode2-1.ozone-balancer_default,
d2d8fd98-f8fb-46c0-80ec-116e8438f05f/ozone-balancer-datanode3-1.ozone-balancer_default,
c7174500-a46a-4038-82ee-ab0b780f4e1f/ozone-balancer-datanode1-1.ozone-balancer_default]
Replicas: [State: UNHEALTHY; ReplicaIndex: 0; SequenceId: 4; Origin: f31498ed-62e5-4ecb-8758-dd65dab6a8cb; Location: f31498ed-62e5-4ecb-8758-dd65dab6a8cb/ozone-balancer-datanode2-1.ozone-balancer_default,
State: UNHEALTHY; ReplicaIndex: 0; SequenceId: 4; Origin: c7174500-a46a-4038-82ee-ab0b780f4e1f; Location: d2d8fd98-f8fb-46c0-80ec-116e8438f05f/ozone-balancer-datanode3-1.ozone-balancer_default,
State: UNHEALTHY; ReplicaIndex: 0; SequenceId: 4; Origin: c7174500-a46a-4038-82ee-ab0b780f4e1f; Location: c7174500-a46a-4038-82ee-ab0b780f4e1f/ozone-balancer-datanode1-1.ozone-balancer_default]

bash-5.1$ ozone admin container report
Container Summary Report generated at 2026-05-13T10:00:57Z
==========================================================

Container State Summary
=======================
OPEN: 0
CLOSING: 0
QUASI_CLOSED: 1
CLOSED: 0
DELETING: 0
DELETED: 0
RECOVERING: 0

Container Health Summary
========================
HEALTHY: 0
UNDER_REPLICATED: 0
MIS_REPLICATED: 0
OVER_REPLICATED: 0
MISSING: 0
UNHEALTHY: 0
EMPTY: 0
OPEN_UNHEALTHY: 0
QUASI_CLOSED_STUCK: 1
OPEN_WITHOUT_PIPELINE: 0
UNHEALTHY_UNDER_REPLICATED: 0
UNHEALTHY_OVER_REPLICATED: 0
MISSING_UNDER_REPLICATED: 0
QUASI_CLOSED_STUCK_UNDER_REPLICATED: 0
QUASI_CLOSED_STUCK_OVER_REPLICATED: 0
QUASI_CLOSED_STUCK_MISSING: 0

First 100 QUASI_CLOSED_STUCK containers:
#1

After fix:

bash-5.1$ ozone admin container info 1
Container id: 1
Pipeline id: 22af1552-8a15-4beb-88cd-1a01a18539df
Write PipelineId: 67d407a5-07f9-4a53-a7e7-fc1bec2c1fd1
Write Pipeline State: OPEN
Container State: QUASI_CLOSED
SequenceId: 47
Datanodes: [1992c0bf-4e05-482b-a97f-7421d46a550d/ozone-balancer-datanode6-1.ozone-balancer_default,
2b8b19bb-832a-4a2b-9af1-577ea0eaee28/ozone-balancer-datanode2-1.ozone-balancer_default,
5a15ba14-ea5e-4db5-9ff2-25073f14bbda/ozone-balancer-datanode4-1.ozone-balancer_default]
Replicas: [State: UNHEALTHY; ReplicaIndex: 0; SequenceId: 47; Origin: 1992c0bf-4e05-482b-a97f-7421d46a550d; Location: 1992c0bf-4e05-482b-a97f-7421d46a550d/ozone-balancer-datanode6-1.ozone-balancer_default,
State: UNHEALTHY; ReplicaIndex: 0; SequenceId: 47; Origin: 2b8b19bb-832a-4a2b-9af1-577ea0eaee28; Location: 2b8b19bb-832a-4a2b-9af1-577ea0eaee28/ozone-balancer-datanode2-1.ozone-balancer_default,
State: UNHEALTHY; ReplicaIndex: 0; SequenceId: 47; Origin: 5a15ba14-ea5e-4db5-9ff2-25073f14bbda; Location: 5a15ba14-ea5e-4db5-9ff2-25073f14bbda/ozone-balancer-datanode4-1.ozone-balancer_default]

bash-5.1$ ozone admin container report

Container Summary Report generated at 2026-05-13T10:45:00Z
==========================================================

Container State Summary
=======================
OPEN: 0
CLOSING: 0
QUASI_CLOSED: 1
CLOSED: 0
DELETING: 0
DELETED: 0
RECOVERING: 0

Container Health Summary
========================
HEALTHY: 0
UNDER_REPLICATED: 0
MIS_REPLICATED: 0
OVER_REPLICATED: 0
MISSING: 0
UNHEALTHY: 1
EMPTY: 0
OPEN_UNHEALTHY: 0
QUASI_CLOSED_STUCK: 1
OPEN_WITHOUT_PIPELINE: 0
UNHEALTHY_UNDER_REPLICATED: 0
UNHEALTHY_OVER_REPLICATED: 0
MISSING_UNDER_REPLICATED: 0
QUASI_CLOSED_STUCK_UNDER_REPLICATED: 0
QUASI_CLOSED_STUCK_OVER_REPLICATED: 0
QUASI_CLOSED_STUCK_MISSING: 0

First 100 UNHEALTHY containers:
#1

First 100 QUASI_CLOSED_STUCK containers:
#1

@sarvekshayr sarvekshayr requested a review from sumitagrawl May 13, 2026 12:44
Copy link
Copy Markdown
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sumitagrawl sumitagrawl merged commit 9553e1d into apache:master May 13, 2026
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants