fix: deflake //rs/tests/idx:ii_delegation_test#9796
Merged
basvandijk merged 1 commit intomasterfrom Apr 9, 2026
Merged
Conversation
Increase retry parameters in assert_canister_counter_with_retries from 10 retries × 1s to 30 retries × 2s, giving 60 seconds total wait time instead of 10 seconds. Root cause: The update() method in AgentWithDelegation is fire-and-forget (sends the HTTP call but doesn't poll for execution). The test then asserts the counter was incremented, but with only 10 seconds of retry time, the update may not have been processed through consensus yet. In the observed failure on 2026-04-08, the update was sent at 15:25:50 and the assertion timed out at 15:26:01 (~11s), just exceeding the 10-second retry window.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR deflakes the //rs/tests/idx:ii_delegation_test system test by extending the retry window used when asserting that an update call has been reflected in canister state (i.e., after the update has gone through consensus).
Changes:
- Increase
assert_canister_counter_with_retriesretry count from 10 to 30. - Increase per-retry wait from 1s to 2s to allow more time for consensus processing.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cgundy
approved these changes
Apr 9, 2026
daniel-wong-dfinity-org
pushed a commit
that referenced
this pull request
Apr 15, 2026
## Root Cause The `update()` method in `AgentWithDelegation` is fire-and-forget — it sends the HTTP call but doesn't poll for execution through consensus. The test then asserts the counter was incremented using `assert_canister_counter_with_retries`, but with only 10 retries × 1s = 10 seconds total wait time, the update may not have been processed through consensus yet. In the observed failure on 2026-04-08, the update was sent at 15:25:50 and the assertion timed out at 15:26:01 (~11 seconds), just barely exceeding the 10-second retry window. ## Fix Increased retry parameters from 10 retries × 1s (10s total) to 30 retries × 2s (60s total), giving ample time for the update to go through consensus. Verified with `--runs_per_test=3 --jobs=3` — all 3 runs passed (avg 67.9s). --- This PR was created following the steps in `.claude/skills/fix-flaky-tests/SKILL.md`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Root Cause
The
update()method inAgentWithDelegationis fire-and-forget — it sends the HTTP call but doesn't poll for execution through consensus. The test then asserts the counter was incremented usingassert_canister_counter_with_retries, but with only 10 retries × 1s = 10 seconds total wait time, the update may not have been processed through consensus yet.In the observed failure on 2026-04-08, the update was sent at 15:25:50 and the assertion timed out at 15:26:01 (~11 seconds), just barely exceeding the 10-second retry window.
Fix
Increased retry parameters from 10 retries × 1s (10s total) to 30 retries × 2s (60s total), giving ample time for the update to go through consensus.
Verified with
--runs_per_test=3 --jobs=3— all 3 runs passed (avg 67.9s).This PR was created following the steps in
.claude/skills/fix-flaky-tests/SKILL.md.