test(sqs): slow throttle refill on no-op SetQueueAttributes test to fix CI flake#819
Conversation
…ix CI flake TestSQSServer_Throttle_NoOpSetQueueAttributesPreservesBucket intermittently fails with 'expected 400, got 200' on CI runs (observed multiple times today on unrelated PRs #813, etc). Root cause: 1-token-per-second refill races the test wall-clock. The test sequence — 10 drain sends → sanity send → SetQueueAttributes → post-no-op send — pushes 12 Raft writes through the SQS adapter. Under -race on slow CI runners each write's propose+apply takes 100-250ms; total elapsed before the final assertion can reach 1.5-2.5s. At 1 token/sec refill, by that point the bucket has accumulated 1+ tokens and the post-no-op send returns 200 instead of the expected 400 — falsely indicating a no-op invalidate-bypass regression. Fix: drop refill from 1/sec to 0.01/sec (1 token per 100s) so no test-window wall-clock can accumulate to a whole token. The test's intent — verify that a no-op SetQueueAttributes does not reset the bucket state — is independent of the refill rate, so widening the safety margin is in scope. Tested locally with -race -count=2 (-2.1s wall, both pass). Lint clean. Caller audit: test-only change. The throttle config validator accepts fractional float64 refill rates (sqs_catalog.go:163 SendRefillPerSecond float64). 0.01 is non-zero so IsEmpty (line 172) returns false and throttling stays enabled — the test still exercises the throttle path.
|
Warning Review limit reached
Your plan currently allows 1 review/hour. Refill in 9 minutes and 13 seconds. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more review capacity refills, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@claude review New flake-fix PR for the recurring Root cause: 12 Raft writes × 100-250ms/each under -race = 1.5-2.5s test wall-clock. 1 token/sec refill accumulates to 1+ tokens by the post-no-op send, returning 200 instead of expected 400. Fix: refill 1/sec → 0.01/sec. Tested with -race -count=2. |
|
Claude finished @bootjp's task in 1m 47s —— View job Code Review
Summary: LGTM with one minor style note. The fix is correct. The root cause diagnosis is accurate, the chosen value is safe, and the test continues to exercise the invariant it was designed to protect. CorrectnessValidator passes:
Safety margin is comfortable: DesignThe named constant Minor: comment length
// Slow refill (1 token/100s) so wall-clock elapsed under -race can't
// accumulate to a whole token before the post-no-op send.
const slowRefill = "0.01"Not a blocker — the explanation is genuinely non-obvious and test-only — but worth a follow-up squash or a quick edit before merge to stay consistent with the project convention. Verdict: The fix is mechanically sound, correctly scoped, and consistent with the #818 approach. The comment style is the only deviation from convention, and it's minor. |
There was a problem hiding this comment.
Code Review
This pull request addresses a race condition in the SQS throttle integration test by significantly reducing the refill rate, ensuring the token bucket does not prematurely refill on slow CI runners. The reviewer suggested centralizing the throttle attribute map into a single variable to improve maintainability and guarantee that the subsequent configuration call is a true no-op.
| const slowRefill = "0.01" | ||
| mustSetQueueAttributes(t, node, url, map[string]string{ | ||
| "ThrottleSendCapacity": "10", | ||
| "ThrottleSendRefillPerSecond": "1", | ||
| "ThrottleSendRefillPerSecond": slowRefill, | ||
| }) |
There was a problem hiding this comment.
To ensure the no-op nature of the test and prevent future drift between the two SetQueueAttributes calls, consider defining the attribute map once and reusing it. This makes the test's intent explicit and improves maintainability.
| const slowRefill = "0.01" | |
| mustSetQueueAttributes(t, node, url, map[string]string{ | |
| "ThrottleSendCapacity": "10", | |
| "ThrottleSendRefillPerSecond": "1", | |
| "ThrottleSendRefillPerSecond": slowRefill, | |
| }) | |
| const slowRefill = "0.01" | |
| throttleAttrs := map[string]string{ | |
| "ThrottleSendCapacity": "10", | |
| "ThrottleSendRefillPerSecond": slowRefill, | |
| } | |
| mustSetQueueAttributes(t, node, url, throttleAttrs) |
References
- Centralize common logic to avoid code duplication and improve maintainability.
| mustSetQueueAttributes(t, node, url, map[string]string{ | ||
| "ThrottleSendCapacity": "10", | ||
| "ThrottleSendRefillPerSecond": "1", | ||
| "ThrottleSendRefillPerSecond": slowRefill, | ||
| }) |
Gemini medium x2: define the attribute map once and reuse it on both mustSetQueueAttributes calls so the 'no-op' intent is structurally visible. A future drift between the two map literals (typo, stray edit) would now be obviously wrong since they share the same source variable. No behavior change. go test -race -count=1 -timeout=120s ./adapter/... in scope: passes golangci-lint: clean
|
@claude review Round 1 addresses Gemini medium x2: extracted |
|
Claude finished @bootjp's task in 1m 4s —— View job Code Review (Round 2)
Summary: Gemini suggestions fully addressed. One residual comment style note. Gemini Medium Findings — Both Resolved ✓Suggestion 1 (define map once): Suggestion 2 (reuse on second call): The second Both changes are correct and well-scoped. Residual: Comment Style (still deviates from CLAUDE.md)My Round 1 review already flagged the 11-line comment block. Round 2 added a second multi-line block (4 lines) explaining the map reuse, and that block contains a PR back-reference: // "no-op" intent is structurally visible: the two
// mustSetQueueAttributes calls share the same map literal, so
// a future drift between them (e.g., a typo or a stray edit)
// would be obviously wrong (Gemini medium on PR #819). ← CLAUDE.md: "Don't reference the current task, fix, or callers"CLAUDE.md:
The // slowRefill: 1 token/100s so wall-clock elapsed under -race can't reach 1.0 before the post-no-op send.
const slowRefill = "0.01"
throttleAttrs := map[string]string{...}Not a blocker — the rationale is genuinely non-obvious and test-only, and the Gemini attribution will simply go stale rather than cause correctness issues. But worth a squash/edit before merge to stay consistent. Verdict: The functional fix (refill rate + map reuse) is correct and complete. Ready to merge once the comment style is tidied, or merge as-is if you prefer to keep the verbose explanation. |
Summary
Fixes the recurring
TestSQSServer_Throttle_NoOpSetQueueAttributesPreservesBucketCI flake that has been hitting the same admin PR series (#813, #815, #816) that #818 just unblocked.Root cause
The test sequence:
SetQueueAttributes(capacity=10, refill=1/sec)SetQueueAttributeswith identical values (no-op)Each step goes through a real Raft propose+apply. Under
-raceon slow CI runners each round-trip takes 100–250ms. Total wall-clock from start to step 5 reaches 1.5–2.5s. At 1 token/sec refill, by step 5 the bucket has accumulated 1+ tokens and the send returns 200 — falsely indicating a no-op-invalidate-bypass regression.The test's intent — verify that a no-op
SetQueueAttributesdoes not reset bucket state — is independent of the refill rate. Slowing the refill removes the race without changing what's tested.Fix
refill 1/sec → 0.01/sec(1 token per 100s). Even a 10s test window can't accumulate to a whole token.The throttle config validator (
adapter/sqs_catalog.go:163SendRefillPerSecond float64) accepts fractional values.0.01 != 0soIsEmptyreturns false and the throttle path is still exercised — only the refill cadence changes.Self-review (5 passes)
Test plan
go test -race -count=2 -timeout=120s -run TestSQSServer_Throttle_NoOpSetQueueAttributesPreservesBucket ./adapter/— passes both (2.1s)golangci-lint --config=.golangci.yaml run— cleanContext
Second flake-fix PR today (after #818 for Redis TTL). The admin PR series (#813, #815, #816) keeps hitting unrelated adapter-package flakes because the admin code lives in
internal/admin//web/admin/butgo test ./...runs the whole tree. Each fix is small and scoped.