Skip to content

test(sqs): slow throttle refill on no-op SetQueueAttributes test to fix CI flake#819

Merged
bootjp merged 2 commits into
mainfrom
fix/sqs-throttle-noop-test-flake
May 23, 2026
Merged

test(sqs): slow throttle refill on no-op SetQueueAttributes test to fix CI flake#819
bootjp merged 2 commits into
mainfrom
fix/sqs-throttle-noop-test-flake

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented May 23, 2026

Summary

Fixes the recurring TestSQSServer_Throttle_NoOpSetQueueAttributesPreservesBucket CI flake that has been hitting the same admin PR series (#813, #815, #816) that #818 just unblocked.

Root cause

The test sequence:

  1. SetQueueAttributes (capacity=10, refill=1/sec)
  2. 10 drain sends (drain the bucket)
  3. Sanity-check send → expects 400 (drained)
  4. SetQueueAttributes with identical values (no-op)
  5. Post-no-op send → expects 400 ← intermittently returns 200

Each step goes through a real Raft propose+apply. Under -race on slow CI runners each round-trip takes 100–250ms. Total wall-clock from start to step 5 reaches 1.5–2.5s. At 1 token/sec refill, by step 5 the bucket has accumulated 1+ tokens and the send returns 200 — falsely indicating a no-op-invalidate-bypass regression.

The test's intent — verify that a no-op SetQueueAttributes does not reset bucket state — is independent of the refill rate. Slowing the refill removes the race without changing what's tested.

Fix

refill 1/sec → 0.01/sec (1 token per 100s). Even a 10s test window can't accumulate to a whole token.

The throttle config validator (adapter/sqs_catalog.go:163 SendRefillPerSecond float64) accepts fractional values. 0.01 != 0 so IsEmpty returns false and the throttle path is still exercised — only the refill cadence changes.

Self-review (5 passes)

  1. Data loss — none; test-only constant.
  2. Concurrency — closes a wall-clock vs. refill-rate race in the test fixture (same shape as the Redis TTL fix in test(redis): widen TTL window on ExpiredKey_BecomesInvisible to fix CI flake #818).
  3. Performance — no runtime change; the test still completes in ~1–2s.
  4. Consistency — the test still verifies the no-op-preserves-bucket invariant identically. The refill rate isn't load-bearing for the assertion.
  5. Test coverage — unchanged. Same drain → sanity → no-op → recheck sequence; only the refill cadence differs.

Test plan

  • go test -race -count=2 -timeout=120s -run TestSQSServer_Throttle_NoOpSetQueueAttributesPreservesBucket ./adapter/ — passes both (2.1s)
  • golangci-lint --config=.golangci.yaml run — clean
  • CI

Context

Second flake-fix PR today (after #818 for Redis TTL). The admin PR series (#813, #815, #816) keeps hitting unrelated adapter-package flakes because the admin code lives in internal/admin/ / web/admin/ but go test ./... runs the whole tree. Each fix is small and scoped.

…ix CI flake

TestSQSServer_Throttle_NoOpSetQueueAttributesPreservesBucket
intermittently fails with 'expected 400, got 200' on CI runs
(observed multiple times today on unrelated PRs #813, etc).

Root cause: 1-token-per-second refill races the test wall-clock.
The test sequence — 10 drain sends → sanity send →
SetQueueAttributes → post-no-op send — pushes 12 Raft writes
through the SQS adapter. Under -race on slow CI runners each
write's propose+apply takes 100-250ms; total elapsed before the
final assertion can reach 1.5-2.5s. At 1 token/sec refill, by
that point the bucket has accumulated 1+ tokens and the
post-no-op send returns 200 instead of the expected 400 —
falsely indicating a no-op invalidate-bypass regression.

Fix: drop refill from 1/sec to 0.01/sec (1 token per 100s) so
no test-window wall-clock can accumulate to a whole token.
The test's intent — verify that a no-op SetQueueAttributes
does not reset the bucket state — is independent of the
refill rate, so widening the safety margin is in scope.

Tested locally with -race -count=2 (-2.1s wall, both pass).
Lint clean.

Caller audit: test-only change. The throttle config validator
accepts fractional float64 refill rates (sqs_catalog.go:163
SendRefillPerSecond float64). 0.01 is non-zero so IsEmpty
(line 172) returns false and throttling stays enabled — the
test still exercises the throttle path.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 23, 2026

Warning

Review limit reached

@bootjp, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 1 review/hour. Refill in 9 minutes and 13 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7adb76dd-5ed7-4b67-bf1f-018ad5911515

📥 Commits

Reviewing files that changed from the base of the PR and between aca0aba and 6e5ad3f.

📒 Files selected for processing (1)
  • adapter/sqs_throttle_integration_test.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/sqs-throttle-noop-test-flake

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 23, 2026

@claude review

New flake-fix PR for the recurring TestSQSServer_Throttle_NoOpSetQueueAttributesPreservesBucket flake (companion to #818).

Root cause: 12 Raft writes × 100-250ms/each under -race = 1.5-2.5s test wall-clock. 1 token/sec refill accumulates to 1+ tokens by the post-no-op send, returning 200 instead of expected 400.

Fix: refill 1/sec → 0.01/sec. Tested with -race -count=2.

@claude
Copy link
Copy Markdown

claude Bot commented May 23, 2026

Claude finished @bootjp's task in 1m 47s —— View job


Code Review

  • Gather context and understand the request
  • Read changed file and diff
  • Analyze fix correctness and test integrity
  • Post review feedback

Summary: LGTM with one minor style note.

The fix is correct. The root cause diagnosis is accurate, the chosen value is safe, and the test continues to exercise the invariant it was designed to protect.


Correctness

Validator passes: 0.01 clears every constraint in parseThrottleFloat and validateThrottlePair:

  • Non-negative, finite, below throttleHardCeilingPerSecond (100,000) ✓
  • capacity(10) ≥ refill(0.01)
  • Neither is zero, so the "both-zero-or-both-positive" rule is satisfied ✓

IsEmpty() still returns false: IsEmpty checks all six float fields. With SendCapacity=10 and SendRefillPerSecond=0.01 the throttle path is still fully exercised (adapter/sqs_catalog.go:176). ✓

throttleConfigEqual still detects the no-op correctly: Both mustSetQueueAttributes calls use the same slowRefill constant, so the second call is a genuine same-value no-op and the invariant under test is unchanged. ✓

Safety margin is comfortable: 0.01 tokens/sec × max ~10s CI window = 0.1 tokens — far below the 1.0 threshold. Even a 60s stall can only accumulate 0.6 tokens. The previous 1/sec rate crossed the threshold in ~1.5s. ✓


Design

The named constant const slowRefill = "0.01" is the right call — it eliminates duplication across both mustSetQueueAttributes calls and makes the intent legible at each use site.


Minor: comment length

CLAUDE.md says "Never write multi-paragraph docstrings or multi-line comment blocks — one short line max." The added block is 11 lines. The reasoning is already fully documented in the PR description, so the code comment could be trimmed to one or two lines, e.g.:

// Slow refill (1 token/100s) so wall-clock elapsed under -race can't
// accumulate to a whole token before the post-no-op send.
const slowRefill = "0.01"

Not a blocker — the explanation is genuinely non-obvious and test-only — but worth a follow-up squash or a quick edit before merge to stay consistent with the project convention.


Verdict: The fix is mechanically sound, correctly scoped, and consistent with the #818 approach. The comment style is the only deviation from convention, and it's minor.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a race condition in the SQS throttle integration test by significantly reducing the refill rate, ensuring the token bucket does not prematurely refill on slow CI runners. The reviewer suggested centralizing the throttle attribute map into a single variable to improve maintainability and guarantee that the subsequent configuration call is a true no-op.

Comment on lines 340 to 344
const slowRefill = "0.01"
mustSetQueueAttributes(t, node, url, map[string]string{
"ThrottleSendCapacity": "10",
"ThrottleSendRefillPerSecond": "1",
"ThrottleSendRefillPerSecond": slowRefill,
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To ensure the no-op nature of the test and prevent future drift between the two SetQueueAttributes calls, consider defining the attribute map once and reusing it. This makes the test's intent explicit and improves maintainability.

Suggested change
const slowRefill = "0.01"
mustSetQueueAttributes(t, node, url, map[string]string{
"ThrottleSendCapacity": "10",
"ThrottleSendRefillPerSecond": "1",
"ThrottleSendRefillPerSecond": slowRefill,
})
const slowRefill = "0.01"
throttleAttrs := map[string]string{
"ThrottleSendCapacity": "10",
"ThrottleSendRefillPerSecond": slowRefill,
}
mustSetQueueAttributes(t, node, url, throttleAttrs)
References
  1. Centralize common logic to avoid code duplication and improve maintainability.

Comment on lines 365 to 368
mustSetQueueAttributes(t, node, url, map[string]string{
"ThrottleSendCapacity": "10",
"ThrottleSendRefillPerSecond": "1",
"ThrottleSendRefillPerSecond": slowRefill,
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Reuse the throttleAttrs map here to guarantee that this call is a true no-op relative to the initial configuration.

	mustSetQueueAttributes(t, node, url, throttleAttrs)
References
  1. Centralize common logic to avoid code duplication and improve maintainability.

Gemini medium x2: define the attribute map once and reuse it
on both mustSetQueueAttributes calls so the 'no-op' intent
is structurally visible. A future drift between the two map
literals (typo, stray edit) would now be obviously wrong since
they share the same source variable.

No behavior change.

go test -race -count=1 -timeout=120s ./adapter/... in scope: passes
golangci-lint: clean
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 23, 2026

@claude review

Round 1 addresses Gemini medium x2: extracted throttleAttrs map for reuse across both mustSetQueueAttributes calls — the 'no-op' intent is now structurally visible. No behavior change.

@claude
Copy link
Copy Markdown

claude Bot commented May 23, 2026

Claude finished @bootjp's task in 1m 4s —— View job


Code Review (Round 2)

  • Gather context
  • Read changed file and diff
  • Verify Gemini suggestions addressed
  • Post review feedback

Summary: Gemini suggestions fully addressed. One residual comment style note.

Gemini Medium Findings — Both Resolved ✓

Suggestion 1 (define map once): throttleAttrs is now declared once with const slowRefill = "0.01" and passed directly to the first mustSetQueueAttributes call.

Suggestion 2 (reuse on second call): The second mustSetQueueAttributes call now passes the same throttleAttrs variable. Structural identity guarantees the two calls are a genuine no-op pair — a future edit that changes one but not the other is immediately obvious.

Both changes are correct and well-scoped.


Residual: Comment Style (still deviates from CLAUDE.md)

My Round 1 review already flagged the 11-line comment block. Round 2 added a second multi-line block (4 lines) explaining the map reuse, and that block contains a PR back-reference:

// "no-op" intent is structurally visible: the two
// mustSetQueueAttributes calls share the same map literal, so
// a future drift between them (e.g., a typo or a stray edit)
// would be obviously wrong (Gemini medium on PR #819).  ← CLAUDE.md: "Don't reference the current task, fix, or callers"

CLAUDE.md:

"Don't reference the current task, fix, or callers ("used by X", "added for the Y flow", "handles the case from issue #123") — those belong in the PR description and rot as the codebase evolves."

The (Gemini medium on PR #819) attribution belongs in the PR description (already there), not in the source file. The sentence before it is also redundant — the self-evident structure of reusing a named variable is already sufficient documentation. Both comment blocks could be collapsed to one line each:

// slowRefill: 1 token/100s so wall-clock elapsed under -race can't reach 1.0 before the post-no-op send.
const slowRefill = "0.01"
throttleAttrs := map[string]string{...}

Not a blocker — the rationale is genuinely non-obvious and test-only, and the Gemini attribution will simply go stale rather than cause correctness issues. But worth a squash/edit before merge to stay consistent.


Verdict: The functional fix (refill rate + map reuse) is correct and complete. Ready to merge once the comment style is tidied, or merge as-is if you prefer to keep the verbose explanation.

@bootjp bootjp merged commit 65d8f17 into main May 23, 2026
9 checks passed
@bootjp bootjp deleted the fix/sqs-throttle-noop-test-flake branch May 23, 2026 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant