Conversation
…-ready wait Agent-Logs-Url: https://github.com/OpenIdentityPlatform/OpenDJ/sessions/2245c057-9027-491d-aa17-aead4f92162b Co-authored-by: vharseko <6818498+vharseko@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix flaky failing GitHub Actions job in OpenDJ
Fix flaky Apr 1, 2026
testMultiRS: replace fixed sleep with deterministic domain-ready wait
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
GenerationIdTest#testMultiRSwas flaky in CI because it asserted generation IDs across all three replication servers after only a fixedThread.sleep(2000), before RS-to-RS handshakes and domain creation had necessarily completed — leavingreplServer2atgenId=-1instead of the expected value.Changes
waitForDomainsOnAllReplicationServers()— polls (up to 30s, every 100ms viaTestTimer) untilgetReplicationServerDomain(baseDN)is non-null onreplServer1,replServer2, andreplServer3, confirming topology convergence before any assertion.Thread.sleep(2000)intestMultiRS(postconnectServer1ToReplServer) with a call to this helper.No behavioral change for passing cases; only the synchronization point becomes deterministic.
Original prompt
Create a pull request to fix the flaky failing GitHub Actions job in OpenIdentityPlatform/OpenDJ (PR #613 run). The job failed in unit test
org.opends.server.replication.GenerationIdTest#testMultiRSwherereplServer2generationId remained-1instead of the expected value (e.g., 48), indicating the test asserts before replication servers finish handshaking / domain creation.Use commit ref 6a75f55 and workflow
.github/workflows/build.yml(job: build-maven) and logs showing the failure.Implement a deterministic wait in
opendj-server-legacy/src/test/java/org/opends/server/replication/GenerationIdTest.javaso thattestMultiRSwaits for the replication domain to exist on all three replication servers (replServer1/2/3) before asserting generation IDs, instead of relying on fixedThread.sleep(...).Suggested approach:
waitForDomainsOnAllReplicationServers()that usesTestTimerto poll untilreplServerX.getReplicationServerDomain(baseDN)is non-null for X=1..3.testMultiRS, afterconnectServer1ToReplServer(replServer1), call the new helper and remove or reduce the fixed sleep.Validate that the test suite still passes.
Reference failing job URL (for context): https://github.com/OpenIdentityPlatform/OpenDJ/actions/runs/23842862927/job/69502680510?pr=613
The following is the prior conversation context from the user's chat exploration (may be truncated):
User: Why did this job fail?
Assistant: The job failed primarily because of failures in the unit test
org.opends.server.replication.GenerationIdTest#testMultiRS. The main issue was an assertion failure during replication testing.Key Details:
Test Failure Details:
replServer2, the expected generation ID was48L, but the actual value was-1L.Related Logs and Issues During the Test:
Summary of Other Failures:
Overall Test Result Summary:
Suggested Actions:
Fix the Generation ID Issue:
Ensure consistent generation IDs across all replication servers.
replServer2is receiving an incorrect generation ID (-1L) during the test. It may relate to incorrect setup or timing issues.Check Synchronization and Domain Configurations:
Examine Connection Stability:
Test Isolation:
Rerun the failed test in isolation with logs enabled:
Let me know if you'd like more guidance on how to address these issues!
User: Please find a solution for this failing job. Use the logs, job definition, and any referenced files where the failure occurred. Keep your response focused on the solution and include code suggestions when appropriate.
Assistant: The failure is a real test bug / race in
GenerationIdTest.testMultiRS, not a workflow problem.What’s happening (from the job log)
The failing assertion is:
GenerationIdTest.testMultiRSexpected all 3 replication servers to have generationId48, but:-1GenerationIdTest#assertGenIdEquals(lines 1009–1016)Log excerpt shows the symptom earlier too:
Root cause: the test assumes RS-to-RS topology + domain initialization is fully established after a fixed
Thread.sleep(...), but i...This pull request was created from Copilot chat.