test(redis): wait for follower apply catch-up in MultiExec list tests#582
Conversation
TestRedis_MultiExec_DelThenRPushRecreatesList was intermittently failing under -race on CI (PR #577): after MULTI / DEL list / RPUSH list new1 new2 / EXEC, the final direct store.ScanAt(..., readTS) on nodes[1] returned the pre-EXEC ["old1","old2"] even though the immediately preceding LRANGE via the redis client correctly returned ["new1","new2"]. Why the race exists: - The go-redis client connects to nodes[1], which is not necessarily the raft leader. - LRANGE goes through the adapter's LeaseRead / LinearizableRead path, which blocks until the local apply watermark reaches a safe point, so it observes the EXEC commit even on a slightly-lagging follower. - readTS() (adapter/ts.go:snapshotTS) returns the local store's LastCommitTS() directly, with no wait-apply. On a follower (or a leader that has just responded to the client), LastCommitTS can momentarily trail the raft-committed EXEC because the test reads it on a different code path than the one that drove the client-visible wait. - The direct ScanAt(readTS) therefore sees the stale, pre-EXEC state. Fix: introduce waitForListState(t, node, key, expectedLen, expectedValues) that polls resolveListMeta AND the raw item scan at the node-local readTS via require.Eventually (250ms interval, 5s timeout) until both reflect the expected post-EXEC state. This is a real synchronization on the raft apply progress of the target node, not a timing hack: it succeeds on the first iteration when the node is already caught up and only retries while the apply is genuinely behind. Use it in place of the ad-hoc readTS/resolveListMeta/ScanAt block in TestRedis_MultiExec_DelThenRPushRecreatesList.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 43 minutes and 41 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a waitForListState helper function in adapter/redis_multi_test.go to handle replication lag in tests by polling for expected list states. A review comment identifies a logic error in the helper: it fails when a list is expected to be empty (length 0) because it incorrectly treats a non-existent key as a failure, preventing its use for verifying deletions.
| if err != nil || !exists || meta.Len != int64(expectedLen) { | ||
| return false | ||
| } |
There was a problem hiding this comment.
The current logic for checking list existence and length in waitForListState is incorrect for cases where the list is expected to be empty or deleted (expectedLen == 0). In Redis, an empty list is typically represented by a non-existent key. The current check !exists || meta.Len != int64(expectedLen) will return false (triggering a retry/timeout) if the key does not exist, even if expectedLen is 0. This prevents the helper from being used to verify that a list has been successfully deleted.
| if err != nil || !exists || meta.Len != int64(expectedLen) { | |
| return false | |
| } | |
| if err != nil || (expectedLen > 0 && !exists) || (exists && meta.Len != int64(expectedLen)) { | |
| return false | |
| } |
#583) ## Summary Drop the `require.GreaterOrEqual(t, pool.Hits(), uint64(1))` line from `TestLua_VMReuseDoesNotLeakGlobals`. It was flaking on CI (PR #582 run `24781704678`), and the assertion is inherently racy on `sync.Pool`. ## Root cause The assertion required the Script-B `pool.get(nil)` to return exactly the `pooledLuaState` that Script-A put back moments earlier. But `sync.Pool` is documented to discard pooled items under GC pressure, which can easily fire between a Put and a Get on the race runtime. The test's own comment above line 80 already acknowledges this non-determinism — but then the Hits assertion at line 112 reintroduces the same source of flake. ``` --- FAIL: TestLua_VMReuseDoesNotLeakGlobals (0.00s) Error: "0" is not greater than or equal to "1" redis_lua_pool_test.go:112 ``` ## Change - Remove the flaky Hits assertion from `TestLua_VMReuseDoesNotLeakGlobals`. - Keep the test's actual purpose intact (no leak of `GLOBAL_LEAK`, `LEAKY_TABLE` across pooled states; whitelisted globals still intact). - Comment points at `TestLua_PoolRecordsReuseVsAllocation` which asserts `hits + misses` deterministically across a controlled loop. ## Verification - `go test -race -count=20 -run TestLua_VMReuseDoesNotLeakGlobals ./adapter/` → 20/20 pass - `go test -race -count=1 -short ./adapter/...` → ok in 63s - `make lint` clean Pair with PR #582 (multi-exec flake) to unblock the CI pipeline on `perf/raft-dispatcher-lanes` and `fix/idempotent-rollback-marker`.
|
Addressed gemini's medium (d63f92b):
Verified: /gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a waitForListState test helper in adapter/redis_multi_test.go to handle eventual consistency when verifying list states across nodes. It also refactors TestRedis_MultiExec_DelThenRPushRecreatesList to use this helper. Feedback was provided to add a validation check within the helper to ensure the length of expectedValues matches expectedLen, which prevents potential out-of-bounds panics during test execution.
| func waitForListState(t *testing.T, n Node, key []byte, expectedLen int, expectedValues []string) { | ||
| t.Helper() | ||
| ctx := context.Background() |
There was a problem hiding this comment.
The waitForListState helper should verify that the length of expectedValues matches expectedLen if it is provided. This prevents a potential out-of-bounds panic at line 71 if the helper is misused in a test with mismatched arguments.
func waitForListState(t *testing.T, n Node, key []byte, expectedLen int, expectedValues []string) {
t.Helper()
if expectedValues != nil && len(expectedValues) != expectedLen {
t.Fatalf("waitForListState: expectedValues length %d does not match expectedLen %d", len(expectedValues), expectedLen)
}
ctx := context.Background()|
Refactored
Commit: 403367e /gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a polling mechanism in the Redis multi-test suite to handle eventual consistency issues where a node might lag in applying Raft commits. Specifically, it adds the waitForListState helper and several supporting functions to ensure tests wait for the expected state before making assertions. Feedback was provided to improve the robustness of the listValuesMatch helper by adding a length check to prevent potential panics when comparing slices of different sizes.
| func listValuesMatch(kvs []*store.KVPair, expectedValues []string) bool { | ||
| if expectedValues == nil { | ||
| return true | ||
| } | ||
| for i, kvp := range kvs { | ||
| if string(kvp.Value) != expectedValues[i] { | ||
| return false | ||
| } | ||
| } | ||
| return true | ||
| } |
There was a problem hiding this comment.
The listValuesMatch function is susceptible to a panic if len(kvs) is greater than len(expectedValues). While scanListItems ensures that len(kvs) == expectedLen, there is no validation that the caller-provided expectedLen matches the actual length of the expectedValues slice. Adding a length check at the beginning of the function makes the helper more robust and prevents confusing panics during test failures.
func listValuesMatch(kvs []*store.KVPair, expectedValues []string) bool {
if expectedValues == nil {
return true
}
if len(kvs) != len(expectedValues) {
return false
}
for i, kvp := range kvs {
if string(kvp.Value) != expectedValues[i] {
return false
}
}
return true
}Fail fast in waitForListState when the caller supplies an expectedValues slice whose length does not match expectedLen. Without this check a misuse could cause an out-of-bounds panic inside listValuesMatch when scanListItems succeeds with expectedLen items. The guard centralizes the invariant at the single public entry point so listValuesMatch and listMetaMatches stay simple.
|
Addressed the latest two review threads in 9e0258d:
Test results on this commit:
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a waitForListState helper function and associated utilities in adapter/redis_multi_test.go to handle eventual consistency during tests. These helpers poll the node's state until it reflects the expected list metadata and values, addressing a race condition where direct store scans might occur before the local node has updated its commit timestamp. The TestRedis_MultiExec_DelThenRPushRecreatesList test was refactored to use this new polling mechanism, making it more robust. I have no feedback to provide.
Summary
TestRedis_MultiExec_DelThenRPushRecreatesListflake: the test readsreadTS()directly onnodes[1], which returns the node-localstore.LastCommitTS(). That value can trail the raft-committed EXEC on a follower (or a leader that just responded), while the client-facing LRANGE path waits for local apply via LeaseRead/LinearizableRead. The directScanAt(readTS)therefore observes the pre-EXEC state.waitForListStatehelper that pollsresolveListMeta+ raw item scan at the node-localreadTSviarequire.Eventually(250ms / 5s) until both reflect the expected post-EXEC state. Real synchronization on raft apply progress, not a sleep.readTS/resolveListMeta/ScanAtblock inTestRedis_MultiExec_DelThenRPushRecreatesListfor the helper.Test plan
go test -race -count=10 -run TestRedis_MultiExec ./adapter/-> 10/10 pass (20.5s)go test -race -count=1 -short ./adapter/...passes (~62s)make lintclean (0 issues)