test(redis): structural fix — eventuallyExpired helper for TTL-expiry tests#903
Conversation
… tests Third flake-root-cause PR, after PR #891 (SQS throttle refill) and PR #898 (Raft readiness probe). Same pattern: a recurring CI-flake mechanism has been fixed per-test for months; this PR generalises the structural fix and migrates all extant sites. Recurring flake (PR #818 lineage): Tests that "set a short TTL, sleep past it, assert the key is gone" race their own wall clock under -race on slow CI runners. The post-sleep read can sometimes still observe an un-expired view because the inter-call pause does not absorb scheduler jitter. PR #818 fixed one instance (ExpiredKey_BecomesInvisible) by switching to require.Eventually with a TTL-derived deadline. The other 8 sites in the test suite kept the fragile time.Sleep(ttl + 50ms) pattern. Structural fix ============== - New helper eventuallyExpired(t, ttl, condition, msg) in test_util.go. Encapsulates the require.Eventually + ttlExpiryHeadroom (3 s) + ttlExpiryPoll (25 ms) idiom so future test writers reach for it instead of inventing fresh time.Sleep margins. - Migrated 8 sites to the helper: * redis_collection_fastpath_test.go: TestRedis_HGET_TTLExpired, TestRedis_HEXISTS_TTLExpired, TestRedis_SISMEMBER_TTLExpired. * redis_lua_collection_fastpath_test.go: TestLua_HGET_HashTTL, TestLua_HEXISTS_HashTTL, TestLua_ZSCORE_*, TestLua_ZRANGEBYSCORE_*. * redis_misskey_compat_test.go: post-EX-1 expiry assertion. - Removed the now-unused waitForTTLExpiry (redis_collection_fastpath _test.go) and waitForLuaTTL (redis_lua_collection_fastpath_test.go) convenience wrappers. Their sole callers were the migrated sites. - Added errors stdlib import to the three test files where the helper's condition closures use errors.Is. Caller audit per /loop semantic-change rule ============================================ - eventuallyExpired: new helper, 8 callers in 3 test files. All test-only. - waitForTTLExpiry / waitForLuaTTL: removed. grep across the repo confirms zero remaining call sites. - collectionFastPathTTL / luaFastPathTTL consts: still used by the migrated sites as the ttl argument to eventuallyExpired. Unchanged. Validation ========== go test ./adapter/ -run 'TestRedis_HGET_TTLExpired|TestRedis_HEXISTS_TTLExpired|TestRedis_SISMEMBER_TTLExpired|TestLua_HGET_HashTTL|TestLua_HEXISTS|TestLua_ZSCORE|TestLua_ZRANGEBYSCORE|TestRedis_Misskey' -race -count=2 -> ok 32.3s gofmt + go vet + golangci-lint run -> 0 issues Closes the "Redis TTL window deadline-multiplier" follow-up noted in PR #891's description.
|
Warning Review limit reached
More reviews will be available in 53 minutes and 13 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new helper function, eventuallyExpired, in adapter/test_util.go to address test flakiness caused by static time.Sleep calls during TTL expiration checks. Various Redis and Lua TTL tests across redis_collection_fastpath_test.go, redis_lua_collection_fastpath_test.go, and redis_misskey_compat_test.go have been refactored to use this helper, which utilizes require.Eventually with a generous deadline and a 25ms polling interval to absorb CI scheduler jitter. There are no review comments, and we have no further feedback to provide.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c2b55e38ee
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ttlExpiryHeadroom = 3 * time.Second | ||
| ttlExpiryPoll = 25 * time.Millisecond | ||
| ) | ||
| require.Eventually(t, condition, ttl+ttlExpiryHeadroom, ttlExpiryPoll, msg) |
There was a problem hiding this comment.
Wait until the TTL elapses before polling
require.Eventually runs the condition once immediately, so these TTL-expiry assertions can now pass before the TTL has elapsed. In cases where the post-expiry result is also the result for a missing/hidden collection (for example HEXISTS/SISMEMBER returning false or Lua range returning an empty array), a regression that deletes or hides the key as soon as PExpire is called would satisfy the helper on the first poll and the test would never exercise the expired-after-deadline state. Sleep until ttl (or otherwise assert the key is still visible first) before starting the retry window.
Useful? React with 👍 / 👎.
codex P2 on PR #903 (adapter/test_util.go:742): require.Eventually runs the condition once immediately before its first poll tick. For TTL-expiry tests where the "expired" result shape coincides with "missing/hidden" (HEXISTS/SISMEMBER false, Lua ZRANGEBYSCORE empty array), a regression that deletes/hides the key at PExpire time would satisfy the first poll and the test would pass without ever exercising the actual expired-after-deadline state — silently losing the regression- detection power the test was meant to provide. Fix: time.Sleep(ttl) before require.Eventually so the first poll fires AFTER the TTL deadline. Total time bound is unchanged (ttl + ttlExpiryHeadroom); polling just starts later. The PExpire-deletes-key regression now fails the test because the PRE-sleep state has already deleted the key, and after the sleep we would have expected to observe the expiry transition, not a stale already-gone result. Validation: go test ./adapter/ -run "TestRedis_HGET_TTLExpired|TestRedis_HEXISTS_TTLExpired|TestRedis_SISMEMBER_TTLExpired|TestLua_HGET_HashTTL|TestLua_HEXISTS|TestLua_ZSCORE|TestLua_ZRANGEBYSCORE|TestRedis_Misskey" -race -count=2 -> ok 30.3s gofmt + golangci-lint -> 0 issues No new caller audit needed: eventuallyExpired signature unchanged, sole new behavior is the pre-poll sleep window. Existing 8 callers benefit transparently.
|
@claude review Round 1 fix (commit ffff49b) for codex P2:
go test -race -count=2 (8 migrated sites) passes; lint clean. |
|
Claude finished @bootjp's task in 1m 40s —— View job Review: Round 1 fix for codex P2Tasks:
SummaryThe fix correctly addresses the codex P2 concern. Adding
|
…gatorRaceFree (#906) ## Summary Fixes the CI flake observed in [Actions run 26765510693](https://github.com/bootjp/elastickv/actions/runs/26765510693/job/78890657730?pr=902): ``` --- FAIL: TestHotKeysAggregatorRaceFree (0.11s) hot_keys_test.go:356: Expected value not to be nil. ``` Surfaced on PR #902's CI run but **unrelated to PR #902** — the failing test is on `main` in `keyviz/`, which PR #902 (Redis adapter `*Eventually` helper cleanup) doesn't touch. ## Root cause Same wall-clock-racing pattern as the SQS-throttle (PR #891) and Redis-TTL (PR #903) flake series. The test: 1. Launches an aggregator goroutine (Step=5ms tick). 2. Spawns 8 workers each calling `Observe` 500 times. 3. `time.Sleep(50 * time.Millisecond)`. 4. Asserts both per-route snapshots are non-nil. The aggregator publishes a snapshot only on a tick (or on ctx-done drain). Under `-race` on a slow CI runner, the `Run` goroutine itself can be slow to schedule — the 50ms wait may not contain a single actual tick worth of scheduler time, so `hotKeysSnap` stays at its initial `nil` `atomic.Pointer` load and the assertion fires. ## Fix `require.Eventually` with a 3-second budget + 5ms poll cadence, asserting both snapshots non-nil: ```go require.Eventually(t, func() bool { return s.HotKeysSnapshot(1) != nil && s.HotKeysSnapshot(2) != nil }, 3*time.Second, 5*time.Millisecond, "aggregator must publish a snapshot for each route within the budget") ``` The test still pins the load-bearing property (snapshot reads are lock-free; the `atomic.Pointer.Load` works without contention with the publisher); only the **wait mechanism** changes from "sleep blindly and hope" to "poll until ready or fail". Total time bound is at most 3s on a slow runner, ~5–50ms in the common case. ## Validation - `go test ./keyviz/ -run TestHotKeysAggregatorRaceFree -race -count=5 -timeout 120s` → ok 1.1s (5/5 iterations green) - `gofmt` + `golangci-lint run` → 0 issues ## Pattern continuity This is the same kind of structural fix as PR #891 / #903: **replace fixed-time-window sleep with `require.Eventually`** for any condition that depends on a background goroutine's progress under `-race` on CI. Future flakes in this category should follow the same pattern.
Summary
Third flake-root-cause PR, after #891 (SQS throttle refill) and #898 (Raft readiness probe). Same recipe: a recurring CI-flake mechanism has been patched per-test for months; this PR generalises the structural fix and migrates the extant sites.
Recurring flake (PR #818 lineage)
Tests that "set a short TTL, sleep past it, assert the key is gone" race their own wall clock under
-raceon slow CI runners. The post-sleep read sometimes observes an un-expired view because the inter-call pause does not absorb scheduler jitter. PR #818 fixed one instance (TestRedis_ExpiredKey_BecomesInvisible) by switching torequire.Eventuallywith a TTL-derived deadline. The other 8 sites kept the fragiletime.Sleep(ttl + 50ms)pattern.Structural fix
New helper
eventuallyExpired(t, ttl, condition, msg)intest_util.go. Encapsulates therequire.Eventually + ttlExpiryHeadroom(3s) + ttlExpiryPoll(25ms)idiom so future test writers reach for it instead of inventing freshtime.Sleepmargins.Migrated 8 sites to the helper:
redis_collection_fastpath_test.go:TestRedis_HGET_TTLExpired,TestRedis_HEXISTS_TTLExpired,TestRedis_SISMEMBER_TTLExpiredredis_lua_collection_fastpath_test.go:TestLua_HGET_HashTTL,TestLua_HEXISTS_HashTTL,TestLua_ZSCORE_*,TestLua_ZRANGEBYSCORE_*redis_misskey_compat_test.go: post-EX 1expiry assertionRemoved unused
waitForTTLExpiry(redis_collection_fastpath_test.go) andwaitForLuaTTL(redis_lua_collection_fastpath_test.go) convenience wrappers. Sole callers were the migrated sites.Added
errorsstdlib import to the three test files where the helper's condition closures useerrors.Is.Caller audit (per /loop semantic-change rule)
eventuallyExpired: new helper, 8 callers in 3 test files. All test-only.waitForTTLExpiry/waitForLuaTTL: removed. Repo-wide grep confirms zero remaining call sites.collectionFastPathTTL/luaFastPathTTLconsts: still used by the migrated sites as thettlargument toeventuallyExpired. Unchanged.Validation
go test ./adapter/ -run 'TestRedis_HGET_TTLExpired|TestRedis_HEXISTS_TTLExpired|TestRedis_SISMEMBER_TTLExpired|TestLua_HGET_HashTTL|TestLua_HEXISTS|TestLua_ZSCORE|TestLua_ZRANGEBYSCORE|TestRedis_Misskey' -race -count=2→ ok 32.3sgofmt+go vet+golangci-lint run→ 0 issuesScope
Closes the "Redis TTL window deadline-multiplier" follow-up noted in PR #891's description.
Out-of-scope (still on the flake follow-up backlog)