HDDS-15004. Stabilize TestReconContainerEndpoint#testContainerEndpointForOBSBucket by arunsarin85 · Pull Request #10116 · apache/ozone

arunsarin85 · 2026-04-23T20:17:01Z

What changes were proposed in this pull request?

Fix the intermittent failure in TestReconContainerEndpoint#testContainerEndpointForOBSBucket (HDDS-15004).

The test sometimes failed with expected: <1> but was: <0> on KeysResponse#getTotalCount(). That usually meant either Recon had not finished updating its containerdi-key index, or the test queried the wrong container id.

Please describe your PR in detail:

Use the real container id from OM

OBS: the test no longer hard-codes 1L. It uses OzoneManager#lookupKey (via getContainerIdForKey) to get the container id from the key’s block locations.
FSO: the same helper is used for each key path instead of assuming fixed container ids.

Fail fast when the async “buffer empty” wait breaks

After waitForEventBufferEmpty, the test only waited until the CompletableFuture completed. If the async work failed, the future could still be “done” and the test would continue anyway.

The test now calls completableFuture.join() after GenericTestUtils.waitFor(completableFuture::isDone, …) so a failed buffer wait is not ignored.

Wait until Recon actually has the keys (no sleep in TestReconContainerEndpoint)

The OM event buffer can be empty while Recon is still applying that batch (events are removed from the queue before processing finishes). So “buffer empty” is not enough to assert on the container endpoint.

Instead of sleeping, the test waits until ReconContainerMetadataManager#getKeyCountForContainer shows the expected number of keys per container. That logic lives in TestReconOmMetaManagerUtils.waitUntilReconKeyCounts, which polls until counts match or a timeout is hit.

The FSO test builds the expected counts from both written keys (so if two keys share a container, it waits for the right total).

Clean up static state between tests

ContainerKeyMapperHelper uses JVM-wide static flags and maps. If one test method runs before another in the same process, the second test can see stale state.

Safer shutdown in @AfterEach

The client is closed with IOUtils.closeQuietly(client) so a client-close error does not block cluster.shutdown().

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15004

How was this patch tested?

(Please explain how this patch was tested. Ex: unit tests, manual tests, workflow run on the fork git repo.)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this.)

https://github.com/arunsarin85/ozone/actions/runs/24855010484
https://github.com/arunsarin85/ozone/actions/runs/24855051641

…tForOBSBucket

devmadhuu

Thanks @arunsarin85 for the patch. Kindly find comments.

arunsarin85 · 2026-04-29T19:26:21Z

@devmadhuu Thanks for the review . I have added a patch for the above changes and triggered the flaky-test-check
https://github.com/arunsarin85/ozone/actions/runs/25127529917
https://github.com/arunsarin85/ozone/actions/runs/25127529917/attempts/1

devmadhuu

Thanks @arunsarin85 for improving the patch. Largely looks good to me. Can you revisit your PR description. E.g I noticed below point seems obsolete. Check other points also and rephrase them to more cleaner understandable way. Currently the language is too complex to understand.

Short settle time after the buffer wait
The OM event queue can be empty while a batch is still being processed (events are dequeued before task processing finishes). A two-second sleep after join() gives in-flight container-key updates time to land before assertions.

arunsarin85 · 2026-05-05T07:12:16Z

Thanks @arunsarin85 for improving the patch. Largely looks good to me. Can you revisit your PR description. E.g I noticed below point seems obsolete. Check other points also and rephrase them to more cleaner understandable way. Currently the language is too complex to understand.
Short settle time after the buffer wait
The OM event queue can be empty while a batch is still being processed (events are dequeued before task processing finishes). A two-second sleep after join() gives in-flight container-key updates time to land before assertions.

Hi @devmadhuu , I have updated the PR description

devmadhuu

Thanks @arunsarin85 for improving the patch. LGTM +1

adoroszlai · 2026-05-06T07:35:58Z

Thanks @arunsarin85 for updating the patch. Please note checkstyle failure:

hadoop-ozone/integration-test-recon/src/test/java/org/apache/hadoop/ozone/recon/TestReconContainerEndpoint.java
 32: Wrong lexicographical order for 'org.apache.hadoop.hdds.scm.server.OzoneStorageContainerManager' import. Should be before 'org.apache.hadoop.hdds.utils.IOUtils'.

https://github.com/arunsarin85/ozone/actions/runs/25127388560/job/73644085279

adoroszlai · 2026-05-07T10:26:50Z

Thanks @arunsarin85 for the patch, @ArafatKhan2198, @devmadhuu for the review.

HDDS-15004. Stabilize TestReconContainerEndpoint#testContainerEndpoin…

6cd41bb

…tForOBSBucket

adoroszlai requested review from ArafatKhan2198 and devmadhuu April 24, 2026 12:24

devmadhuu reviewed Apr 27, 2026

View reviewed changes

arunsarin85 added 2 commits April 29, 2026 23:05

HDDS-15004. Addressed review comments

5424549

HDDS-15004. Addressed review comments

67cea40

arunsarin85 requested a review from devmadhuu April 29, 2026 19:26

ArafatKhan2198 approved these changes May 3, 2026

View reviewed changes

devmadhuu reviewed May 4, 2026

View reviewed changes

devmadhuu self-requested a review May 5, 2026 08:42

devmadhuu approved these changes May 5, 2026

View reviewed changes

adoroszlai added test recon labels May 6, 2026

adoroszlai reviewed May 6, 2026

View reviewed changes

Comment thread ...ation-test-recon/src/test/java/org/apache/hadoop/ozone/recon/TestReconContainerEndpoint.java Outdated

HDDS-15004. Addressed review comments

4448871

adoroszlai merged commit bc89991 into apache:master May 7, 2026
32 checks passed

arunsarin85 deleted the HDDS-15004 branch May 7, 2026 11:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-15004. Stabilize TestReconContainerEndpoint#testContainerEndpointForOBSBucket#10116

HDDS-15004. Stabilize TestReconContainerEndpoint#testContainerEndpointForOBSBucket#10116
adoroszlai merged 4 commits into
apache:masterfrom
arunsarin85:HDDS-15004

arunsarin85 commented Apr 23, 2026 •

edited

Loading

Uh oh!

devmadhuu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arunsarin85 commented Apr 29, 2026

Uh oh!

devmadhuu left a comment

Uh oh!

arunsarin85 commented May 5, 2026

Uh oh!

devmadhuu left a comment

Uh oh!

adoroszlai commented May 6, 2026

Uh oh!

Uh oh!

Uh oh!

adoroszlai commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

arunsarin85 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

devmadhuu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arunsarin85 commented Apr 29, 2026

Uh oh!

devmadhuu left a comment

Choose a reason for hiding this comment

Uh oh!

arunsarin85 commented May 5, 2026

Uh oh!

devmadhuu left a comment

Choose a reason for hiding this comment

Uh oh!

adoroszlai commented May 6, 2026

Uh oh!

Uh oh!

Uh oh!

adoroszlai commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

arunsarin85 commented Apr 23, 2026 •

edited

Loading