test(redis): wait for follower apply catch-up in MultiExec list tests by bootjp · Pull Request #582 · bootjp/elastickv

bootjp · 2026-04-22T13:41:13Z

Summary

Root-cause TestRedis_MultiExec_DelThenRPushRecreatesList flake: the test reads readTS() directly on nodes[1], which returns the node-local store.LastCommitTS(). That value can trail the raft-committed EXEC on a follower (or a leader that just responded), while the client-facing LRANGE path waits for local apply via LeaseRead/LinearizableRead. The direct ScanAt(readTS) therefore observes the pre-EXEC state.
Fix by introducing waitForListState helper that polls resolveListMeta + raw item scan at the node-local readTS via require.Eventually (250ms / 5s) until both reflect the expected post-EXEC state. Real synchronization on raft apply progress, not a sleep.
Swap the ad-hoc readTS/resolveListMeta/ScanAt block in TestRedis_MultiExec_DelThenRPushRecreatesList for the helper.

Test plan

go test -race -count=10 -run TestRedis_MultiExec ./adapter/ -> 10/10 pass (20.5s)
go test -race -count=1 -short ./adapter/... passes (~62s)
make lint clean (0 issues)

TestRedis_MultiExec_DelThenRPushRecreatesList was intermittently failing under -race on CI (PR #577): after MULTI / DEL list / RPUSH list new1 new2 / EXEC, the final direct store.ScanAt(..., readTS) on nodes[1] returned the pre-EXEC ["old1","old2"] even though the immediately preceding LRANGE via the redis client correctly returned ["new1","new2"]. Why the race exists: - The go-redis client connects to nodes[1], which is not necessarily the raft leader. - LRANGE goes through the adapter's LeaseRead / LinearizableRead path, which blocks until the local apply watermark reaches a safe point, so it observes the EXEC commit even on a slightly-lagging follower. - readTS() (adapter/ts.go:snapshotTS) returns the local store's LastCommitTS() directly, with no wait-apply. On a follower (or a leader that has just responded to the client), LastCommitTS can momentarily trail the raft-committed EXEC because the test reads it on a different code path than the one that drove the client-visible wait. - The direct ScanAt(readTS) therefore sees the stale, pre-EXEC state. Fix: introduce waitForListState(t, node, key, expectedLen, expectedValues) that polls resolveListMeta AND the raw item scan at the node-local readTS via require.Eventually (250ms interval, 5s timeout) until both reflect the expected post-EXEC state. This is a real synchronization on the raft apply progress of the target node, not a timing hack: it succeeds on the first iteration when the node is already caught up and only retries while the apply is genuinely behind. Use it in place of the ad-hoc readTS/resolveListMeta/ScanAt block in TestRedis_MultiExec_DelThenRPushRecreatesList.

coderabbitai · 2026-04-22T13:41:20Z

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 43 minutes and 41 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 43 minutes and 41 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b0dfce6b-2494-49c4-ade2-c6b03b92614c

📥 Commits

Reviewing files that changed from the base of the PR and between 20f97f4 and 9e0258d.

📒 Files selected for processing (1)

adapter/redis_multi_test.go

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch test/fix-multiexec-flake

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a waitForListState helper function in adapter/redis_multi_test.go to handle replication lag in tests by polling for expected list states. A review comment identifies a logic error in the helper: it fails when a list is expected to be empty (length 0) because it incorrectly treats a non-existent key as a failure, preventing its use for verifying deletions.

gemini-code-assist · 2026-04-22T13:46:39Z

+		if err != nil || !exists || meta.Len != int64(expectedLen) {
+			return false
+		}


The current logic for checking list existence and length in waitForListState is incorrect for cases where the list is expected to be empty or deleted (expectedLen == 0). In Redis, an empty list is typically represented by a non-existent key. The current check !exists || meta.Len != int64(expectedLen) will return false (triggering a retry/timeout) if the key does not exist, even if expectedLen is 0. This prevents the helper from being used to verify that a list has been successfully deleted.

Suggested change

if err != nil || !exists || meta.Len != int64(expectedLen) {

return false

}

if err != nil || (expectedLen > 0 && !exists) || (exists && meta.Len != int64(expectedLen)) {

return false

}

#583) ## Summary Drop the `require.GreaterOrEqual(t, pool.Hits(), uint64(1))` line from `TestLua_VMReuseDoesNotLeakGlobals`. It was flaking on CI (PR #582 run `24781704678`), and the assertion is inherently racy on `sync.Pool`. ## Root cause The assertion required the Script-B `pool.get(nil)` to return exactly the `pooledLuaState` that Script-A put back moments earlier. But `sync.Pool` is documented to discard pooled items under GC pressure, which can easily fire between a Put and a Get on the race runtime. The test's own comment above line 80 already acknowledges this non-determinism — but then the Hits assertion at line 112 reintroduces the same source of flake. ``` --- FAIL: TestLua_VMReuseDoesNotLeakGlobals (0.00s) Error: "0" is not greater than or equal to "1" redis_lua_pool_test.go:112 ``` ## Change - Remove the flaky Hits assertion from `TestLua_VMReuseDoesNotLeakGlobals`. - Keep the test's actual purpose intact (no leak of `GLOBAL_LEAK`, `LEAKY_TABLE` across pooled states; whitelisted globals still intact). - Comment points at `TestLua_PoolRecordsReuseVsAllocation` which asserts `hits + misses` deterministically across a controlled loop. ## Verification - `go test -race -count=20 -run TestLua_VMReuseDoesNotLeakGlobals ./adapter/` → 20/20 pass - `go test -race -count=1 -short ./adapter/...` → ok in 63s - `make lint` clean Pair with PR #582 (multi-exec flake) to unblock the CI pipeline on `perf/raft-dispatcher-lanes` and `fix/idempotent-rollback-marker`.

bootjp · 2026-04-22T14:01:11Z

Addressed gemini's medium (d63f92b):

waitForListState now accepts !exists as success when expectedLen == 0. In Redis an empty / deleted list is represented by the absence of its meta key; the previous check was failing post-DEL waits.
For expectedLen > 0, !exists still means "not yet applied" and keeps polling.

Verified: go test -race -count=10 -run TestRedis_MultiExec ./adapter/ 10/10 green.

/gemini review

gemini-code-assist

Code Review

This pull request introduces a waitForListState test helper in adapter/redis_multi_test.go to handle eventual consistency when verifying list states across nodes. It also refactors TestRedis_MultiExec_DelThenRPushRecreatesList to use this helper. Feedback was provided to add a validation check within the helper to ensure the length of expectedValues matches expectedLen, which prevents potential out-of-bounds panics during test execution.

gemini-code-assist · 2026-04-22T14:04:16Z

+func waitForListState(t *testing.T, n Node, key []byte, expectedLen int, expectedValues []string) {
+	t.Helper()
+	ctx := context.Background()


The waitForListState helper should verify that the length of expectedValues matches expectedLen if it is provided. This prevents a potential out-of-bounds panic at line 71 if the helper is misused in a test with mismatched arguments.

func waitForListState(t *testing.T, n Node, key []byte, expectedLen int, expectedValues []string) { t.Helper() if expectedValues != nil && len(expectedValues) != expectedLen { t.Fatalf("waitForListState: expectedValues length %d does not match expectedLen %d", len(expectedValues), expectedLen) } ctx := context.Background()

bootjp · 2026-04-22T14:06:57Z

Refactored waitForListState to satisfy cyclop (was 12, limit 10).

Split the require.Eventually body into four helpers:
- listStateMatches — single-shot predicate
- listMetaMatches — meta + expectedLen==0 empty/absent semantics
- scanListItems — ScanAt + length check
- listValuesMatch — optional value equality
No behavior change; polling/timeouts and the expectedLen==0 branch preserved.
go test -race -count=10 -run TestRedis_MultiExec ./adapter/ → pass (20.7s).
make lint → 0 issues.

Commit: 403367e

/gemini review

gemini-code-assist

Code Review

This pull request introduces a polling mechanism in the Redis multi-test suite to handle eventual consistency issues where a node might lag in applying Raft commits. Specifically, it adds the waitForListState helper and several supporting functions to ensure tests wait for the expected state before making assertions. Feedback was provided to improve the robustness of the listValuesMatch helper by adding a length check to prevent potential panics when comparing slices of different sizes.

gemini-code-assist · 2026-04-22T14:08:47Z

+func listValuesMatch(kvs []*store.KVPair, expectedValues []string) bool {
+	if expectedValues == nil {
+		return true
+	}
+	for i, kvp := range kvs {
+		if string(kvp.Value) != expectedValues[i] {
+			return false
+		}
+	}
+	return true
+}


The listValuesMatch function is susceptible to a panic if len(kvs) is greater than len(expectedValues). While scanListItems ensures that len(kvs) == expectedLen, there is no validation that the caller-provided expectedLen matches the actual length of the expectedValues slice. Adding a length check at the beginning of the function makes the helper more robust and prevents confusing panics during test failures.

func listValuesMatch(kvs []*store.KVPair, expectedValues []string) bool { if expectedValues == nil { return true } if len(kvs) != len(expectedValues) { return false } for i, kvp := range kvs { if string(kvp.Value) != expectedValues[i] { return false } } return true }

Fail fast in waitForListState when the caller supplies an expectedValues slice whose length does not match expectedLen. Without this check a misuse could cause an out-of-bounds panic inside listValuesMatch when scanListItems succeeds with expectedLen items. The guard centralizes the invariant at the single public entry point so listValuesMatch and listMetaMatches stay simple.

bootjp · 2026-04-22T14:18:35Z

Addressed the latest two review threads in 9e0258d:

waitForListState misuse guard (review 14:04 UTC) + listValuesMatch panic (review 14:08 UTC, line 99): Added a single fail-fast check at the top of waitForListState: if expectedValues != nil && len(expectedValues) != expectedLen, the helper t.Fatalfs immediately. This centralizes the invariant at the public entry point, so scanListItems's existing len(kvs) == expectedLen guarantee plus this length check means listValuesMatch cannot panic. Keeping the check in one place instead of duplicating into both leaf helpers preserves readability.
Rejected: empty-list / deleted-list check (review 13:46 UTC): This comment describes the pre-fix code. The current listMetaMatches (lines 60-69) already handles expectedLen == 0 with !exists || meta.Len == 0, landed in commit d63f92b before the review snapshot was taken. No change needed.

Test results on this commit:

go test -race -count=10 -run TestRedis_MultiExec ./adapter/ -> 10/10 pass (20.9s)
go test -race -count=1 -short ./adapter/... -> pass (64.9s)
make lint -> 0 issues

/gemini review

gemini-code-assist

Code Review

This pull request introduces a waitForListState helper function and associated utilities in adapter/redis_multi_test.go to handle eventual consistency during tests. These helpers poll the node's state until it reflects the expected list metadata and values, addressing a race condition where direct store scans might occur before the local node has updated its commit timestamp. The TestRedis_MultiExec_DelThenRPushRecreatesList test was refactored to use this new polling mechanism, making it more robust. I have no feedback to provide.

Merge branch 'main' into test/fix-multiexec-flake

773c718

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

bootjp mentioned this pull request Apr 22, 2026

test(redis-lua): drop flaky pool.Hits() assertion in VMReuse leak test #583

Merged

bootjp added 2 commits April 22, 2026 22:58

Merge branch 'main' into test/fix-multiexec-flake

5febf06

fix(test): waitForListState accepts !exists when expectedLen==0

d63f92b

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

refactor(test): split waitForListState to satisfy cyclop

403367e

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

bootjp merged commit 63feb7d into main Apr 22, 2026
8 checks passed

bootjp deleted the test/fix-multiexec-flake branch April 22, 2026 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(redis): wait for follower apply catch-up in MultiExec list tests#582

test(redis): wait for follower apply catch-up in MultiExec list tests#582
bootjp merged 6 commits into
mainfrom
test/fix-multiexec-flake

bootjp commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

bootjp commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

bootjp commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

bootjp commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bootjp commented Apr 22, 2026

Summary

Test plan

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading