feat(txn-dedup): re-land M3 EXEC reuse on main + close M2 open + multi-mop#887
Conversation
…i-mop PR #884 was merged into the stacked branch docs/txn-idempotency-design (at cbbde3d) but never reached main — main has only PR #796's M1 + M2 + M3 RPUSH/LPUSH content. This PR re-lands the M3 EXEC reuse code on top of main and bundles three follow-ups that extend the design doc's "still open" / "follow-up" items into "landed": 1. Re-land M3 EXEC reuse (PR #884's content, cherry-picked + rebased) - adapter/redis.go: txnContext.prepareDispatch() split out of commit(); reusableExecTxn; dispatchExecReuse; runTransactionWithDedup + firstExecAttempt; gate at the top of runTransaction. - adapter/redis_exec_dedup_test.go (originally added in PR #884): 5 tests pinning all four reuse outcomes plus the gate-off legacy equivalence. - The cherry-pick required one small adaptation: prepareDispatch() uses Clock().NextFenced() (uint64, error) on current main; the PR #884 version targeted Clock().Next() (uint64). Same downstream semantics; the error return is wired through preparedTxnDispatch. 2. Close M2 open item — FSM other-txn exactness test (kv/fsm_onephase_dedup_test.go) - TestOnePhaseDedup_OtherTxnVersionDoesNotMaskRetry pins exactness at the FSM apply layer: a third-party version at T_other=20 must not satisfy the FSM probe at T1=30, so the retry falls through and applies at the fresh T2=40. The store-layer pin (store/committed_version_at_test.go) already covers the primitive; this test covers the dispatch path that uses it. 3. M3 multi-mop EXEC dedup test (adapter/redis_exec_dedup_test.go) - TestExecDedup_MultiMopLandedPriorAttempt_ReturnsCachedResults extends single-mop dedup to a 3-command MULTI/EXEC body (SET a + SET b + DEL c). Validates the design's claim that the mechanism works the same for multi-mop because cached results + OCC readKeys fence are mop-count-agnostic. Without dedup the DEL would re-execute to 0 on the second pass — the test rejects that. 4. Design doc updates (docs/design/2026_05_21_proposed_txn_secondary_idempotency.md) - §M2 "still open" → "LANDED" with the new FSM test reference. - §M3 "runTransaction (MULTI/EXEC) — Still open" → "LANDED via PR #884" with multi-mop test reference and the two intentional deviations from the M1/M2 template that claude[bot] flagged on #884 (readKeys assembly order, fresh per-attempt reuseCtx). - §M3 "standalone SET/INCR/HSET" called out as the next follow-up (PR-B in the user-visible roadmap). Caller audit (per /loop semantic-change rule) ============================================== - prepareDispatch (newly added, replaces commit()'s body): callers are commit() and firstExecAttempt; both honor the defer prepared.cancel() contract. External behavior of commit() preserved. - commit(): internal structure changed; external behavior preserved (no test directly invokes it; runTransaction's legacy path continues to call it through the same shape). - runTransactionWithDedup / firstExecAttempt / dispatchExecReuse / reusableExecTxn: all new symbols, exercised only from the gated runTransaction path. Validation ========== - go test ./adapter/ -run 'Dedup|Txn|MULTI|EXEC' passes. - go test ./kv/ ./store/ both pass. - gofmt, go vet, golangci-lint run all clean (0 issues across adapter/kv/store). Relation to prior work ====================== - #796 (merged f481f2b): M1 + M2 + M3 RPUSH/LPUSH on main. - #884 (merged cbbde3d into stacked branch, NOT main): M3 EXEC reuse. - THIS PR: brings #884's content into main, plus M2 cross-txn FSM test, multi-mop EXEC test, and design doc updates. - Next (PR-B): standalone SET / INCR / HSET reuse paths. - Next (PR-C): M4 Jepsen validation infra.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Warning Review limit reached
More reviews will be available in 50 minutes and 16 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR implements one-phase transaction dedup for MULTI/EXEC ( ChangesOne-Phase Transaction Dedup for MULTI/EXEC and Standalone SET
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements the Option-2 retry and deduplication path for MULTI/EXEC transactions (runTransactionWithDedup), allowing transaction write sets to be reused under a fresh commit timestamp with PrevCommitTS to probe if a prior attempt already landed. It includes comprehensive unit tests validating various deduplication scenarios, updates the design documentation, and adds FSM-level tests for exactness. The review feedback highlights three key areas for improvement: updating the timestamp allocation in dispatchExecReuse to use NextFenced() to respect HLC-4 safety limits, adding a fast-path check for parent context expiration in the retry loop to prevent resource waste, and adding a nil check on readPin to avoid a potential nil pointer dereference panic.
| // is the current length" question; the client-visible result IS the | ||
| // cached results array. | ||
| func (r *RedisServer) dispatchExecReuse(ctx context.Context, pending *reusableExecTxn) (results []redisResult, drop bool, err error) { | ||
| commitTS := r.coordinator.Clock().Next() |
There was a problem hiding this comment.
The commitTS allocated here uses r.coordinator.Clock().Next(), which bypasses the HLC-4 physical-ceiling fence. According to the HLC-4 safety specification (and the documentation in kv/hlc.go), all persistence-grade allocations (including commitTS for transactions) must go through NextFenced() to ensure that an expired-ceiling allocation fails closed instead of silently issuing a timestamp that could collide with a subsequent leader's window.
Please update this to use NextFenced() and handle the returned error.
commitTS, err := r.coordinator.Clock().NextFenced()
if err != nil {
return nil, false, errors.Wrap(err, "dispatchExecReuse: allocate commitTS")
}| if pending != nil { | ||
| reuseCtx, reuseCancel := context.WithTimeout(r.handlerContext(), redisDispatchTimeout) |
There was a problem hiding this comment.
If the parent dispatchCtx has already expired or been cancelled, starting a new reuse dispatch with a fresh 10-second timeout (redisDispatchTimeout) derived from r.handlerContext() will waste resources and delay the error response to the client.
Consider checking dispatchCtx.Err() at the start of the retry block when pending != nil to fail fast if the parent context is already dead.
if pending != nil {
if err := dispatchCtx.Err(); err != nil {
return err
}
reuseCtx, reuseCancel := context.WithTimeout(r.handlerContext(), redisDispatchTimeout)| readPin := r.pinReadTS(startTS) | ||
| defer readPin.Release() |
There was a problem hiding this comment.
r.pinReadTS(startTS) can return nil if r.readTracker is not initialized (which is common in minimal test setups or certain configurations). Deferring readPin.Release() directly without a nil check will cause a nil pointer dereference panic.
Please add a nil check before deferring the release.
readPin := r.pinReadTS(startTS)
if readPin != nil {
defer readPin.Release()
}|
@claude review PR-A: re-land M3 EXEC reuse + close M2 open + multi-mop test. Status:
Please verify the cherry-pick adaptation (NextFenced error wired through preparedTxnDispatch) and the multi-mop test's DEL-result invariance argument. |
|
Claude finished @bootjp's task in 4m 11s —— View job PR #887 Review — Re-land M3 EXEC reuse + M2 close + multi-mop testTasks:
Requested verification items1.
commitTS, err := t.server.coordinator.Clock().NextFenced()
if err != nil {
return preparedTxnDispatch{cancel: func() {}}, errors.Wrap(err, "redis txn commit: allocate commitTS")
}The 2. Multi-mop DEL=1 invariance — sound The argument holds. IssuesBug (high) —
|
gemini HIGH (adapter/redis.go:3112 dispatchExecReuse): Clock().Next() bypasses the HLC-4 physical-ceiling fence; persistence-grade commit_ts allocation must use NextFenced() to avoid issuing a timestamp that collides with a subsequent leader's window after renewal. Switched dispatchExecReuse to NextFenced() with the standard error-propagation shape (matches prepareDispatch from PR #884 cherry-pick). gemini MEDIUM (adapter/redis.go:3175 reuseCtx propagation): The original "fresh ctx from handlerContext per reuse attempt" pattern ignored outer cancellation -- a disconnected client would wait the full 10 s before the reuse returned. Derived reuseCtx from dispatchCtx (the caller's retry-loop ctx) so outer cancellation interrupts mid-attempt. Per-attempt redisDispatchTimeout still caps the dispatch the same way commit does for the first attempt. Matches listPushCoreWithDedup's caller-ctx threading. gemini MEDIUM (adapter/redis.go:3215 readPin.Release on nil): False positive. ActiveTimestampToken.Release is documented nil-safe at kv/active_timestamp_tracker.go:58. The pattern readPin := r.pinReadTS(...); defer readPin.Release() is used unchanged in the existing runTransaction (legacy path); no change needed. Caller audit per /loop semantic-change rule: - dispatchExecReuse signature unchanged; the new error return is the same (bool, error) tuple position. Single caller is runTransactionWithDedup which already returns dispErr via dropping to the retry loop's error path -- the new NextFenced error reaches the same dispErr branch. - reuseCtx parent change is local to runTransactionWithDedup; the only observable effect is faster cancellation propagation, which retryRedisWrite already gates on ctx.Done between attempts. Note on pre-existing Clock().Next() callers: dispatchListPushReuse (adapter/redis.go:3508) and the listPushCoreWithDedup first-attempt site (adapter/redis.go:3679) -- both shipped on main as part of PR #796 -- still use Clock().Next() and have the same HLC-4 ceiling-bypass exposure. Out of scope for this PR (those are pre-existing on main, not introduced by PR-A's diff), but should be fixed in a follow-up cleanup PR for parity. Design doc: M3 "fresh reuseCtx from handlerContext" deviation note is struck and replaced with the dispatchCtx-derived rationale; future readers see why the earlier framing was wrong and what the current pattern is. Validation: go test ./adapter/ -run 'ExecDedup|TxnMULTI' passes. go build ./adapter/... clean. golangci-lint run ./adapter/... 0 issues.
|
@claude review Round 1 fixes (commit dd042e9):
go test ./adapter/ -run 'ExecDedup|TxnMULTI' passes; gofmt/vet/golangci-lint clean. |
|
Claude finished @bootjp's task in 2m 41s —— View job PR #887 Round 1 Review — Verifying fixes from dd042e9Tasks:
Round 1 fixes verified — all correctHIGH fix (
commitTS, allocErr := r.coordinator.Clock().NextFenced()
if allocErr != nil {
return nil, false, errors.Wrap(allocErr, "redis exec reuse: allocate commitTS")
}Return shape MEDIUM fix (
reuseCtx, reuseCancel := context.WithTimeout(dispatchCtx, redisDispatchTimeout)Go's MEDIUM (readPin nil-safety) — false positive, correctly documented ✅
Design doc ✅ The strikethrough + "Reverted per PR #887 review" annotation correctly records why the handlerContext framing was dropped. Future readers see the rationale rather than a silent inconsistency with the implementation. One remaining observation (informational, not blocking)The pre-existing SummaryAll three issues from the prior review are correctly addressed in dd042e9. The NextFenced error propagation shape is clean, the context threading fix is semantically correct and now matches the list-push dedup pattern, and the nil readPin dismissal is well-documented. LGTM — no further changes needed for Round 1. |
Stacked on PR-A (re-land of M3 EXEC reuse + multi-mop test).
Extends option-2 dedup to standalone SET (dispatched outside MULTI).
When `r.onePhaseTxnDedup` is on, `r.set` wraps the single command as a
1-element redcon.Command queue and dispatches via
runTransactionWithDedup — reusing the M3 EXEC machinery instead of
building a per-handler reusableSetTxn + dispatchSetReuse shape. SET
already has `applySet` on `txnContext`, so this is the "free"
extension to any command whose txn-state-aware apply hook already
exists.
The standalone fast path (`trySetFastPath`) is intentionally bypassed
under the gate: dedup is opt-in, and a non-dedup'd fast path under a
dedup-on cluster would split the idempotency contract.
Why standalone INCR / HSET are out of scope for THIS PR:
========================================================
INCR (in adapter/redis_compat_commands.go) and HSET both lack a
`txnContext.applyIncr` / `applyHSet` implementation, so the
"route through single-mop EXEC" pattern that worked here for SET
cannot apply as-is — `runTransactionWithDedup` would reject the
command at `txn.apply` ("unsupported in MULTI"). Bringing them in
requires implementing `applyIncr` / `applyHSet` first (each
~30–50 LOC for the txn-state-aware read-compute-write shape), then
the standalone handler routing is a one-liner via this same path.
Tracked as separate follow-up PRs; until then, INCR and HSET keep
today's buggy-under-churn behaviour, which is the design doc's
stated default ("everything else keeps today's behaviour until
its hook is added").
Caller audit (per /loop semantic-change rule):
- `r.set` (handler): gate-on takes a new code path; gate-off
preserved verbatim via `setLegacy`. No external callers exist
(it is wired only into the redcon dispatch table).
- `setLegacy` (new): byte-identical extraction of the pre-PR set()
body. No external callers.
- `writeRedisStandaloneResult` (new): translates a single-element
`[]redisResult` from `runTransactionWithDedup` into the bare
redcon reply shape (no WriteArray wrapper). Single caller in
this PR; future SET-pattern callers will reuse it.
Validation:
- adapter/redis_set_dedup_test.go (new):
TestStandaloneSetDedup_LandedPriorAttempt_ReturnsOK,
TestStandaloneSetDedup_DisabledKeepsLegacyPath. Existing
TestRedis_SET / dedup suites still pass.
- gofmt, go vet, golangci-lint run all clean (0 issues).
Design doc updated to mark standalone SET as LANDED and call out
INCR/HSET as the next follow-up with the precise reason they
cannot land alongside SET in this PR.
claude[bot] 🔶 (redundant double gate, adapter/redis.go:1135):
set's gate-on branch called r.runTransaction which immediately
re-checked the same r.onePhaseTxnDedup gate and routed to
runTransactionWithDedup. The indirection was misleading -- the PR
description said "dispatches via runTransactionWithDedup" but that
was true only by indirection. Replaced with a direct call to
runTransactionWithDedup; intent is now explicit and the double gate
check is removed.
claude[bot] 🔶 (test gap, adapter/redis_set_dedup_test.go):
Only the resultString OK path was tested through the dedup route.
Added two regression tests covering the other applySet result types
that go through writeRedisStandaloneResult:
- TestStandaloneSetDedup_NXMissReturnsNil pins resultNil routing:
SET k v NX on an existing key returns nil; the cached attempt-1
result must round-trip through writeRedisStandaloneResult ->
WriteNull, leaving conn.bulk == nil.
- TestStandaloneSetDedup_GETOptionReturnsOldBulk pins resultBulk
routing: SET k v GET on an existing key returns the prior value
as a bulk reply; conn.bulk must be the prior bytes.
Both fire the ambiguous-attempt-1-lands path (newDedupTestCoordinator
ambiguousLands=true) so the result MUST come from the cached
attempt-1 array, not a re-execution.
claude[bot] minor observation (shallow-array constraint in
writeRedisStandaloneResult):
Documented in the function's doc comment that the resultArray arm
flattens via WriteBulkString and is NOT safe to reuse for callers
whose applyXxx emits nested arrays. Future HGETALL-pattern callers
must either pre-flatten their result or extend the switch.
Caller audit per /loop semantic-change rule:
- set's gate-on branch is the only call site changed; the new direct
call to runTransactionWithDedup uses the same []redcon.Command{cmd}
shape and the same []redisResult return type. runTransactionWithDedup
is exported within the package and has no other callers outside the
legacy runTransaction path (which is unchanged).
- writeRedisStandaloneResult signature unchanged. Documentation
expanded but no behavior change.
- New tests are pure additions; no existing tests modified.
Validation:
go test ./adapter/ -run StandaloneSetDedup passes (5 tests now).
go build ./adapter/... clean.
golangci-lint run ./adapter/... 0 issues.
claude[bot] (a) attribution prefix per CLAUDE.md convention: Stripped the inline "// claude[bot] PR #888 review:" prefixes from the two comments added in round 1. The substantive "why" content (double-gate rationale; shallow-array constraint) is kept as that is what future readers need. CLAUDE.md convention is to leave fix attribution in the commit message, not the source. claude[bot] (b) wroteNull bool field on recordingConn: Added wroteNull witness to recordingConn (in redis_retry_test.go, the shared test helper). Hardened TestStandaloneSetDedup_NXMissReturnsNil to require.True(conn.wroteNull, ...) so a wrong branch that wrote nothing at all (would also satisfy conn.bulk == nil) is now caught. Caller audit per /loop semantic-change rule: - recordingConn is in *_test.go and used by tests only. Adding wroteNull bool default false is strictly additive; no existing test reads it. The only WriteNull semantics change is setting the new field; conn.bulk = nil behavior is preserved. - Verified by grep: only redis_set_dedup_test.go references wroteNull. All other recordingConn uses (8 instantiations across redis_retry_test.go + redis_lua_linearizable_read_test.go) are unaffected. Validation: go test ./adapter/ -run StandaloneSetDedup passes. go test ./adapter/ -run ExecDedup passes. gofmt + go vet + golangci-lint clean.
…_test claude[bot] PR #888 round-3 minor cleanup: same CLAUDE.md convention violation that was cleaned from redis.go in round 2 was left in the test file. Removed two references on lines 56 and 78. Substantive rationale (NX semantics, zero-value aliasing problem) kept; the attribution moves to the commit message where it belongs. No functional or test behaviour change. golangci-lint clean.
) ## Summary Stacked on PR-A (#887). Extends option-2 dedup to **standalone SET** (dispatched outside MULTI). When `r.onePhaseTxnDedup` is on, `r.set` wraps the single command as a 1-element `redcon.Command` queue and dispatches via `runTransactionWithDedup` — reusing the M3 EXEC machinery instead of building a per-handler `reusableSetTxn` + `dispatchSetReuse` shape. SET already has `applySet` on `txnContext`, so this is the "free" extension to any command whose txn-state-aware apply hook already exists. The standalone fast path (`trySetFastPath`) is intentionally bypassed under the gate: dedup is opt-in, and a non-dedup'd fast path under a dedup-on cluster would split the idempotency contract. Base: `feat/txn-dedup-docs-and-tests` (PR #887). ## Why standalone INCR / HSET are out of scope for THIS PR INCR (in `adapter/redis_compat_commands.go`) and HSET both lack a `txnContext.applyIncr` / `applyHSet` implementation, so the "route through single-mop EXEC" pattern that worked here for SET cannot apply as-is — `runTransactionWithDedup` would reject the command at `txn.apply` ("unsupported in MULTI"). Bringing them in requires implementing `applyIncr` / `applyHSet` first (each ~30–50 LOC for the txn-state-aware read-compute-write shape), then the standalone handler routing is a one-liner via this same path. Tracked as separate follow-up PRs; until then, INCR and HSET keep today's buggy-under-churn behaviour — the design doc's stated default ("everything else keeps today's behaviour until its hook is added"). ## Caller audit (per /loop semantic-change rule) - `r.set` (handler): gate-on takes a new code path; gate-off preserved verbatim via `setLegacy`. No external callers exist (it is wired only into the redcon dispatch table). - `setLegacy` (new): byte-identical extraction of the pre-PR `set()` body. No external callers. - `writeRedisStandaloneResult` (new): translates a single-element `[]redisResult` from `runTransactionWithDedup` into the bare redcon reply shape (no `WriteArray` wrapper). Single caller in this PR; future SET-pattern callers will reuse it. ## Tests - `adapter/redis_set_dedup_test.go` (new): - `TestStandaloneSetDedup_LandedPriorAttempt_ReturnsOK`: gate-on + ambiguous attempt-1 land + reuse → probe hits → cached "OK" returned, dispatches=2, probeNoOps=1. - `TestStandaloneSetDedup_DisabledKeepsLegacyPath`: gate-off path runs no probe. - Existing `TestRedis_SET*` and dedup suites still pass. ## Validation - `go test ./adapter/ -run 'Dedup|Txn|MULTI|EXEC|TestRedis_SET'` passes - `gofmt`, `go vet`, `golangci-lint run` all clean (0 issues) ## Design doc `docs/design/2026_05_21_proposed_txn_secondary_idempotency.md` §M3 — standalone SET marked LANDED, standalone INCR/HSET called out as follow-up with the precise reason they cannot land alongside SET in this PR.
|
@claude review Head moved to 4da85dc after PR #888 (standalone SET dedup) was merged into this branch. The new content is the LGTM'd PR #888 work — The PR #887 round-1 changes (NextFenced fix in dispatchExecReuse + reuseCtx parent switch + design doc deviation note revert) are unchanged; both findings were marked LGTM previously. Please confirm the combined diff against main is mergeable. No new code was authored — just the PR-B work folded in. |
|
Claude finished @bootjp's task in 5m 50s —— View job PR #887 Round 2 Review — Combined diff (PR #887 + PR #888 folded in)
Prior LGTM'd changes — intactRound-1 fixes confirmed unchanged at their final positions:
PR #888 content —
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
adapter/redis.go (1)
2781-2809:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve the outer EXEC context here.
prepareDispatch()starts a fresh timeout fromhandlerContext(), so the commit phase can run for another fullredisDispatchTimeoutafter apply/validate have already consumed most of the caller'sdispatchCtxbudget. That also drops prompt caller cancellation during stream-deletion scans andDispatch. Uset.ctxwhen it's set, and only fall back tohandlerContext()for minimal test fixtures that leavetxnContext.ctxnil.Suggested change
- ctx, cancel := context.WithTimeout(t.server.handlerContext(), redisDispatchTimeout) + parentCtx := t.ctx + if parentCtx == nil { + parentCtx = t.server.handlerContext() + } + ctx, cancel := context.WithTimeout(parentCtx, redisDispatchTimeout)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@adapter/redis.go` around lines 2781 - 2809, The code in prepareDispatch (the block creating ctx, cancel via context.WithTimeout(t.server.handlerContext(), redisDispatchTimeout)) should preserve an outer EXEC/dispatch context by using t.ctx when present instead of always starting from t.server.handlerContext(); change the parent passed to context.WithTimeout to be t.ctx if t.ctx != nil, falling back to t.server.handlerContext() only when t.ctx is nil (ensure cancel is still returned and used as before); this keeps stream-deletion scans (buildStreamDeletionElems) and the final Dispatch bounded by redisDispatchTimeout while honoring caller cancellation.
🧹 Nitpick comments (1)
adapter/redis_set_dedup_test.go (1)
26-138: ⚡ Quick winPrefer a table-driven structure for the SET dedup variants.
These four tests duplicate the same harness (store/coordinator/server/conn/invoke/assert baseline). Converting them into table-driven cases will make option growth (
NX,XX,GET, gate on/off) easier to maintain and less error-prone.♻️ Suggested refactor sketch
+func TestStandaloneSetDedup(t *testing.T) { + t.Parallel() + tests := []struct { + name string + dedup bool + args [][]byte + seed func(ctx context.Context, st store.MVCCStore) + assert func(t *testing.T, conn *recordingConn, st store.MVCCStore, coord *dedupTestCoordinator) + }{ + // landed prior attempt -> OK + // NX miss -> WriteNull + // GET -> old bulk + // gate off -> legacy path (no probe) + } + for _, tc := range tests { + tc := tc + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + ctx := context.Background() + st := store.NewMVCCStore() + if tc.seed != nil { tc.seed(ctx, st) } + coord := newDedupTestCoordinator(st, 1, true) + srv := &RedisServer{store: st, coordinator: coord, scriptCache: map[string]string{}, onePhaseTxnDedup: tc.dedup} + conn := &recordingConn{} + srv.set(conn, redcon.Command{Args: tc.args}) + tc.assert(t, conn, st, coord) + }) + } +}As per coding guidelines,
**/*_test.go: "prefer table-driven test patterns/cases."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@adapter/redis_set_dedup_test.go` around lines 26 - 138, These four nearly identical tests (TestStandaloneSetDedup_LandedPriorAttempt_ReturnsOK, TestStandaloneSetDedup_NXMissReturnsNil, TestStandaloneSetDedup_GETOptionReturnsOldBulk, TestStandaloneSetDedup_DisabledKeepsLegacyPath) should be consolidated into a table-driven test that iterates cases containing the command Args (e.g., []byte{cmdSet, "k", "v1", optional flags}), any pre-seed actions (PutAt), expected recordingConn results (bulk string vs nil, wroteNull bool, err empty), and coordinator/gate expectations (probeNoOps, dispatches); in the loop instantiate the common harness (store.NewMVCCStore, newDedupTestCoordinator, RedisServer{...}, recordingConn), call srv.set(conn, cmd) and assert the per-case expectations, so move shared setup into the single loop and keep case-specific assertions in each case entry to remove the duplicated code paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@adapter/redis_exec_dedup_test.go`:
- Around line 116-119: The test currently asserts require.GreaterOrEqual(t,
coord.dispatches, 3) which is too permissive; change it to require.Equal(t, 3,
coord.dispatches) so the test enforces the exact expected retry sequence
(pre-reject, OCC conflict reuse, fresh-snapshot retry success), and keep the
existing require.Equal(t, 0, coord.probeNoOps, ...) check unchanged; locate the
assertions in adapter/redis_exec_dedup_test.go referencing coord.dispatches and
coord.probeNoOps and replace the GreaterOrEqual assertion with an exact equality
assertion.
In `@docs/design/2026_05_21_proposed_txn_secondary_idempotency.md`:
- Around line 481-482: Update the PR reference text for the runTransaction note
so it accurately reflects the landing PR: change the “LANDED via PR `#884`”
mention to reference PR `#887` (or both with `#887` listed as the landing PR) in the
documentation line that reads about runTransaction (MULTI/EXEC) — LANDED via PR
`#884`; ensure the adjacent description of runTransactionWithDedup mirroring
listPushCoreWithDedup remains unchanged.
---
Outside diff comments:
In `@adapter/redis.go`:
- Around line 2781-2809: The code in prepareDispatch (the block creating ctx,
cancel via context.WithTimeout(t.server.handlerContext(), redisDispatchTimeout))
should preserve an outer EXEC/dispatch context by using t.ctx when present
instead of always starting from t.server.handlerContext(); change the parent
passed to context.WithTimeout to be t.ctx if t.ctx != nil, falling back to
t.server.handlerContext() only when t.ctx is nil (ensure cancel is still
returned and used as before); this keeps stream-deletion scans
(buildStreamDeletionElems) and the final Dispatch bounded by
redisDispatchTimeout while honoring caller cancellation.
---
Nitpick comments:
In `@adapter/redis_set_dedup_test.go`:
- Around line 26-138: These four nearly identical tests
(TestStandaloneSetDedup_LandedPriorAttempt_ReturnsOK,
TestStandaloneSetDedup_NXMissReturnsNil,
TestStandaloneSetDedup_GETOptionReturnsOldBulk,
TestStandaloneSetDedup_DisabledKeepsLegacyPath) should be consolidated into a
table-driven test that iterates cases containing the command Args (e.g.,
[]byte{cmdSet, "k", "v1", optional flags}), any pre-seed actions (PutAt),
expected recordingConn results (bulk string vs nil, wroteNull bool, err empty),
and coordinator/gate expectations (probeNoOps, dispatches); in the loop
instantiate the common harness (store.NewMVCCStore, newDedupTestCoordinator,
RedisServer{...}, recordingConn), call srv.set(conn, cmd) and assert the
per-case expectations, so move shared setup into the single loop and keep
case-specific assertions in each case entry to remove the duplicated code paths.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c5daab0e-1591-4731-8bab-73a72fc03247
📒 Files selected for processing (6)
adapter/redis.goadapter/redis_exec_dedup_test.goadapter/redis_retry_test.goadapter/redis_set_dedup_test.godocs/design/2026_05_21_proposed_txn_secondary_idempotency.mdkv/fsm_onephase_dedup_test.go
coderabbitai (1) test (adapter/redis_exec_dedup_test.go:119): TestExecDedup_GenuineConflictRebuildsAndApplies asserted GreaterOrEqual(coord.dispatches, 3) but with the single injected pre-reject + single concurrent SET the retry topology is fully deterministic: attempt 1 (pre-reject), reuse (OCC-conflict), fresh retry (success). Tightened to Equal(3) so a regression that adds an extra dispatch is caught. coderabbitai (2) doc (docs/design/...:481): "LANDED via PR #884" was ambiguous after the re-land on main via PR #887. Updated to "LANDED via PR #887 (originally PR #884, re-landed against main)" so future readers see the canonical landing PR with the lineage preserved. No Go callers touched. go test ./adapter/ -run ExecDedup passes.
|
@claude review Round 2 fixes for coderabbitai 🟡 minors (commit 69db79b):
go test ./adapter/ -run ExecDedup passes. No Go callers touched. |
|
Claude finished @bootjp's task in 2m 22s —— View job PR #887 Round 2 Review — Verifying coderabbitai minor fixes (commit 69db79b)Tasks:
Round 2 fixes verified — both correctFix 1 —
require.Equal(t, 3, coord.dispatches)The comment is expanded with a clear explanation of the deterministic retry topology. No Fix 2 — PR reference in design doc ✅
The lineage is preserved ( One remaining open item (informational, not blocking this PR)The coderabbitai "outside diff range" comment on
parentCtx := t.ctx
if parentCtx == nil {
parentCtx = t.server.handlerContext()
}
ctx, cancel := context.WithTimeout(parentCtx, redisDispatchTimeout)The nil guard is needed because the test harness constructs The table-driven refactor suggestion for SummaryBoth round 2 fixes are correctly implemented. Combined diff against |
…t to fix CI flake (#890) ## Summary Fixes the CI flake observed in [Test workflow run 26678774684](https://github.com/bootjp/elastickv/actions/runs/26678774684): ``` --- FAIL: TestSQSServer_Throttle_SetQueueAttributesInvalidatesBucket (1.79s) sqs_throttle_integration_test.go:170: expected throttle, got 200 ``` ## Root cause Identical race to the one fixed for `TestSQSServer_Throttle_NoOpSetQueueAttributesPreservesBucket` in commit [54c6cd5](54c6cd56) (PR #819 follow-up): the 1-token-per-second refill rate races the test's own wall clock under `-race` on slow CI runners. For **this** test: 1. `mustSetQueueAttributes(Capacity=10, Refill=1)` 2. `for range 10 { send }` — drains the bucket 3. Sanity send — expects HTTP 400 (throttle) Each send goes through Raft propose+apply at ~100-250ms under `-race`. The 11 writes from steps 1-3 elapse ~1.1-2.75s. At Refill=1/sec the bucket has accumulated ≥1 token by step 3, so the sanity send returns HTTP 200 instead of 400 — **falsely** indicating a bucket-invalidation regression that does not exist. ## Fix Drop the initial Refill from `"1"` to `"0.01"` (1 token per 100 seconds) so no test-window wall-clock can accumulate to a whole token. The test's intent — *verify that a Capacity/Refill **raise** invalidates the cached bucket on the very next request* — is independent of the **initial** refill rate. The post-set assertion at line 182 is exercised against the fresh `Capacity=20/Refill=20` bucket, which is what the test actually claims to pin. ## Why only this test, not the sibling `TestSQSServer_Throttle_DeleteQueueInvalidatesBucket` has the same `Refill=1` initial config but **no post-drain sanity assertion** — it just drains without status checks, then verifies fresh capacity post-recreate. No race window there. ## Relation to other open PRs Unrelated to PRs #887/#888/#889 (option-2 dedup work). The flake surfaced on PR #889's CI run but the fix lives in pre-existing SQS test code that none of the dedup PRs touch. ## Caller audit (per /loop semantic-change rule) Test-only change. The throttle config validator (`sqs_catalog.go:163`) accepts fractional `float64 SendRefillPerSecond`; `0.01` is non-zero so `IsEmpty` (line 172) returns `false` and throttling stays enabled — the test still exercises the throttle path. Matches the prior fix's caller-audit conclusion verbatim. ## Validation - `go test ./adapter/ -run TestSQSServer_Throttle -race -count=3 -timeout 120s` passes (5.3s wall, all three iterations green) - `gofmt`, `go vet`, `golangci-lint run` all clean <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Updated test configuration and expanded documentation to improve test stability on slow CI environments. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/bootjp/elastickv/pull/890?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
claude[bot] PR #901 minor #2 (CLAUDE.md convention): inline comments should state the underlying reason, not reference issue/PR numbers that rot as the codebase evolves. Dropped "PR #887" / "PR #887 round 1" / "PR #887 verdict" / "PR #887 review" attributions from the three doc blocks. The WHY content is kept (ctx-parent symmetry, HLC-4 fence rationale, ErrCeilingExpired non-retryable shape). The PR description is the durable home for the flake-fix lineage and the related PR refs. No behavior change. Comment-only.
## Summary Picks up the two follow-up items noted in claude[bot]'s PR #887 verdict but deferred from that PR: ### (a) `prepareDispatch()` ctx-parent symmetric fix `prepareDispatch()` took its bounded ctx from `t.server.handlerContext()` — the server-lifetime ctx, **not** the caller's `dispatchCtx`. The `reuseCtx` fix on PR #887 round 1 corrected `runTransactionWithDedup` to derive from `dispatchCtx`; this PR applies the symmetric fix to `prepareDispatch` so a disconnected client / `retryRedisWrite` timeout interrupts the prepare+dispatch promptly. Nil-guard falls back to `handlerContext` for test fixtures that construct a `txnContext` without setting `ctx`. ### (b) Last two `Clock().Next()` callers in adapter/redis.go → `NextFenced()` `dispatchListPushReuse` and the `listPushCoreWithDedup` first-attempt site both shipped before the HLC-4 physical-ceiling migration and were missed in that pass. `dispatchExecReuse` already uses `NextFenced` (PR #887 round 1) and so do all first-attempt `commit_ts` allocations through `prepareDispatch`. These two list-push dedup sites were the **last** persistence-grade `Clock().Next()` callers in `adapter/redis.go` — verified by `grep '\.Clock()\.Next\b' adapter/redis.go` now returning only `NextFenced` variants. Without HLC-4 parity here, a stale-leader window could let the list-push dedup path mint a `commit_ts` that collides with the successor's freshly-fenced range — defeating the same anomaly class option-2 was introduced to prevent. ## Caller audit (per /loop semantic-change rule) - `prepareDispatch`: sole callers are `commit()` and `firstExecAttempt` (both in adapter/redis.go). Ctx-parent change is additive; same defer-cancel discipline, same error mapping. - `dispatchListPushReuse`: sole caller `listPushCoreWithDedup`. NextFenced error wired through existing `(newLen, drop, err)` tuple; non-retryable so `retryRedisWrite` surfaces it directly — matches PR #887 round-1's `dispatchExecReuse` NextFenced wiring. - `listPushCoreWithDedup` first-attempt: surfaces NextFenced error through existing `retryRedisWrite` closure return. ## Validation - `go test ./adapter/ -run 'ListPushDedup|ExecDedup|StandaloneSetDedup' -race -count=2` → ok 1.2s - `gofmt` + `go vet` + `golangci-lint run` → 0 issues ## Scope Closes the two follow-up items explicitly noted in PR #887's round-2 verdict ("Note on pre-existing Clock().Next() callers" + `prepareDispatch` ctx deviation). No new functionality.
Summary
PR #884 was merged into the stacked branch
docs/txn-idempotency-design(atcbbde3d7) but never reachedmain— main has only PR #796's M1 + M2 + M3 RPUSH/LPUSH content. This PR re-lands the M3 EXEC reuse code on top of main and bundles three follow-ups that move the design doc's "still open" / "follow-up" items into "landed".Base:
main.Changes
1. Re-land M3 EXEC reuse (PR #884's content)
adapter/redis.go:txnContext.prepareDispatch()split out ofcommit();reusableExecTxn;dispatchExecReuse;runTransactionWithDedup+firstExecAttempt; gate at the top ofrunTransaction.adapter/redis_exec_dedup_test.go(originally added in PR feat(txn-dedup): M3 EXEC reuse — option-2 dedup for MULTI/EXEC #884): 5 tests pinning all four reuse outcomes plus the gate-off legacy equivalence.prepareDispatch()usesClock().NextFenced() (uint64, error)on current main; the PR feat(txn-dedup): M3 EXEC reuse — option-2 dedup for MULTI/EXEC #884 version targetedClock().Next() (uint64). Same downstream semantics, error wired throughpreparedTxnDispatch.2. Close M2 open item — FSM other-txn exactness test
TestOnePhaseDedup_OtherTxnVersionDoesNotMaskRetry(kv/fsm_onephase_dedup_test.go) pins exactness at the FSM apply layer: a third-party version atT_other=20must not satisfy the FSM probe atT1=30, so the retry falls through and applies at the freshT2=40. The store-layer pin already covers the primitive; this test covers the dispatch path that uses it.3. M3 multi-mop EXEC dedup test
TestExecDedup_MultiMopLandedPriorAttempt_ReturnsCachedResultsextends single-mop dedup to a 3-command MULTI/EXEC body (SET a + SET b + DEL c). Validates that cached results + OCCreadKeysfence are mop-count-agnostic. Without dedup the DEL would re-execute to 0 on the second pass — the test rejects that.4. Design doc updates
readKeysassembly order, fresh per-attemptreuseCtx)Caller audit (per /loop semantic-change rule)
prepareDispatch(newly added): callers arecommit()andfirstExecAttempt; both honordefer prepared.cancel(). External behavior ofcommit()preserved.commit(): internal structure changed; external behavior preserved (no test directly invokes it).runTransactionWithDedup/firstExecAttempt/dispatchExecReuse/reusableExecTxn: all new symbols, exercised only from the gatedrunTransactionpath.Validation
go test ./adapter/ -run 'Dedup|Txn|MULTI|EXEC'passesgo test ./kv/ ./store/both passgofmt,go vet,golangci-lint runall clean (0 issues)Relation to prior work
f481f2b7)cbbde3d7)Summary by CodeRabbit
Release Notes
Tests
Documentation