feat(txn-dedup): M3 EXEC reuse — option-2 dedup for MULTI/EXEC#884
Conversation
Follow-up to PR #796 (M1 + M2 + M3-RPUSH/LPUSH). Extends option-2 write-set reuse + exact-ts dedup probe to MULTI/EXEC dispatched through runTransaction. Mirrors listPushCoreWithDedup at the EXEC granularity. Gated on the existing onePhaseTxnDedup flag — default off — so no mixed-version divergence window. Mechanism ========= 1. txnContext.commit() refactor: split the build-and-dispatch shape into prepareDispatch() returning a preparedTxnDispatch carrying (elems, commitTS, readKeys, ctx, cancel). commit() now calls prepareDispatch() and dispatches; runTransactionWithDedup calls them separately so it can intercept between prepare and dispatch. External behavior of commit() is unchanged (same defer-cancel discipline, same error mapping); the refactor is purely to expose the dispatch-able state to the dedup loop without code dup. 2. reusableExecTxn: EXEC analogue of reusableListPush. Captures (elems, startTS, commitTS, readKeys, results). The cached results array is the M3 R1 result reconstruction: computed once from attempt 1's startTS snapshot, returned as-is on any reuse. Same invariance argument as RPUSH/LPUSH length — the write set is fixed, so apply-vs-no-op is invisible to the client. Reads in the EXEC body returned values from attempt 1's snapshot; those values are what the client would have observed had attempt 1 not returned an ambiguous error, so caching them is the correct semantics. 3. dispatchExecReuse: one reuse iteration. Dispatches with PrevCommitTS=pending.commitTS; on success returns cached results. On WriteConflict, the self-inflicted-conflict guard probes CommittedVersionAt(probeKey, freshCommitTS) — if hit, the apply landed despite the conflict surface (codex P1 round-10 class), return cached results. Otherwise drop pending so the retry rebuilds from a fresh snapshot. 4. runTransactionWithDedup: the option-2 retry loop. First iteration calls firstExecAttempt to build the txn, capture results, dispatch. Retryable failure stashes pending; subsequent iterations call dispatchExecReuse with PrevCommitTS. Drop-on-conflict falls back to a fresh first attempt. 5. runTransaction gate: when onePhaseTxnDedup is on, route to runTransactionWithDedup; otherwise keep the legacy retry loop byte-identical — pinned by TestExecDedup_DisabledKeepsLegacyPath. Caller audit (per /loop semantic-change rule) ============================================== - prepareDispatch (new): callers are commit() and firstExecAttempt; both honor the defer-cancel contract. - commit(): internal structure changed; external behavior preserved (no test directly invokes it; legacy runTransaction continues to call it through the same path). - runTransactionWithDedup / firstExecAttempt / dispatchExecReuse / reusableExecTxn: all new symbols, exercised only from the gated runTransaction path. - prepareDispatchRetry in kv/coordinator.go: unrelated existing helper with a similar name, different package — no collision. Tests (adapter/redis_exec_dedup_test.go) ========================================= 1. TestExecDedup_LandedPriorAttempt_ReturnsCachedResults — attempt 1 lands then errors; reuse FSM probe hits; cached "OK" returned; value matches. 2. TestExecDedup_PriorAttemptDidNotLand_Applies — attempt 1 pre-rejects; reuse applies fresh; results match. 3. TestExecDedup_GenuineConflictRebuildsAndApplies — concurrent SET advances key past pending.startTS; reuse OCC-conflicts; self-conflict probe misses; pending drops; fresh attempt succeeds. 4. TestExecDedup_SelfInflictedReuseConflict_ReturnsSuccess — reuse lands then surfaces WriteConflict; self-conflict guard probes fresh commitTS; cached results returned (no double-apply). 5. TestExecDedup_DisabledKeepsLegacyPath — gate off; no probe; same result as legacy runTransaction. Validation ========== - go test ./adapter/ -run 'Txn|MULTI|EXEC|Dedup|TxnStartTS' passes. - go test ./kv/ ./store/ both pass. - gofmt, go vet, golangci-lint run all clean (0 issues across adapter/kv/store). Scope of this PR ================ - Single-mop EXEC is the conservative scope per the design doc's "Open questions". The mechanism (cache results array; OCC fence on readKeys) works the same for multi-mop EXEC under the existing proof, but the test matrix doubles, so multi-mop validation is a follow-up. - Per-command reconstruction hooks for SET/INCR/HSET etc. when dispatched OUTSIDE MULTI are still a follow-up; those go through individual command paths, not runTransaction.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a deduplication and retry mechanism for MULTI/EXEC transactions in the Redis adapter. It refactors the transaction commit logic by splitting out a 'prepareDispatch' step and introduces a new 'runTransactionWithDedup' path (gated by 'onePhaseTxnDedup') that allows reusing a transaction's write set under a fresh commit timestamp with 'prev_commit_ts' set. This prevents double-application of commands and returns cached results when a prior attempt has already landed. Comprehensive unit tests are added in 'adapter/redis_exec_dedup_test.go' to verify various success, failure, and conflict scenarios. There are no review comments, so no additional feedback is provided.
|
@claude review Initial request — PR is stacked on #796 (base branch Status:
Please verify:
|
|
Claude finished @bootjp's task in 6m 7s —— View job Code Review — M3 EXEC reuse (option-2 dedup for MULTI/EXEC)Todo:
Verification Results✅ #1 —
|
gemini HIGH (adapter/redis.go:3112 dispatchExecReuse): Clock().Next() bypasses the HLC-4 physical-ceiling fence; persistence-grade commit_ts allocation must use NextFenced() to avoid issuing a timestamp that collides with a subsequent leader's window after renewal. Switched dispatchExecReuse to NextFenced() with the standard error-propagation shape (matches prepareDispatch from PR #884 cherry-pick). gemini MEDIUM (adapter/redis.go:3175 reuseCtx propagation): The original "fresh ctx from handlerContext per reuse attempt" pattern ignored outer cancellation -- a disconnected client would wait the full 10 s before the reuse returned. Derived reuseCtx from dispatchCtx (the caller's retry-loop ctx) so outer cancellation interrupts mid-attempt. Per-attempt redisDispatchTimeout still caps the dispatch the same way commit does for the first attempt. Matches listPushCoreWithDedup's caller-ctx threading. gemini MEDIUM (adapter/redis.go:3215 readPin.Release on nil): False positive. ActiveTimestampToken.Release is documented nil-safe at kv/active_timestamp_tracker.go:58. The pattern readPin := r.pinReadTS(...); defer readPin.Release() is used unchanged in the existing runTransaction (legacy path); no change needed. Caller audit per /loop semantic-change rule: - dispatchExecReuse signature unchanged; the new error return is the same (bool, error) tuple position. Single caller is runTransactionWithDedup which already returns dispErr via dropping to the retry loop's error path -- the new NextFenced error reaches the same dispErr branch. - reuseCtx parent change is local to runTransactionWithDedup; the only observable effect is faster cancellation propagation, which retryRedisWrite already gates on ctx.Done between attempts. Note on pre-existing Clock().Next() callers: dispatchListPushReuse (adapter/redis.go:3508) and the listPushCoreWithDedup first-attempt site (adapter/redis.go:3679) -- both shipped on main as part of PR #796 -- still use Clock().Next() and have the same HLC-4 ceiling-bypass exposure. Out of scope for this PR (those are pre-existing on main, not introduced by PR-A's diff), but should be fixed in a follow-up cleanup PR for parity. Design doc: M3 "fresh reuseCtx from handlerContext" deviation note is struck and replaced with the dispatchCtx-derived rationale; future readers see why the earlier framing was wrong and what the current pattern is. Validation: go test ./adapter/ -run 'ExecDedup|TxnMULTI' passes. go build ./adapter/... clean. golangci-lint run ./adapter/... 0 issues.
coderabbitai (1) test (adapter/redis_exec_dedup_test.go:119): TestExecDedup_GenuineConflictRebuildsAndApplies asserted GreaterOrEqual(coord.dispatches, 3) but with the single injected pre-reject + single concurrent SET the retry topology is fully deterministic: attempt 1 (pre-reject), reuse (OCC-conflict), fresh retry (success). Tightened to Equal(3) so a regression that adds an extra dispatch is caught. coderabbitai (2) doc (docs/design/...:481): "LANDED via PR #884" was ambiguous after the re-land on main via PR #887. Updated to "LANDED via PR #887 (originally PR #884, re-landed against main)" so future readers see the canonical landing PR with the lineage preserved. No Go callers touched. go test ./adapter/ -run ExecDedup passes.
…i-mop (#887) ## Summary PR #884 was merged into the stacked branch `docs/txn-idempotency-design` (at `cbbde3d7`) but never reached `main` — main has only PR #796's M1 + M2 + M3 RPUSH/LPUSH content. This PR re-lands the M3 EXEC reuse code on top of main and bundles three follow-ups that move the design doc's "still open" / "follow-up" items into "landed". Base: `main`. ## Changes ### 1. Re-land M3 EXEC reuse (PR #884's content) - `adapter/redis.go`: `txnContext.prepareDispatch()` split out of `commit()`; `reusableExecTxn`; `dispatchExecReuse`; `runTransactionWithDedup` + `firstExecAttempt`; gate at the top of `runTransaction`. - `adapter/redis_exec_dedup_test.go` (originally added in PR #884): 5 tests pinning all four reuse outcomes plus the gate-off legacy equivalence. - Cherry-pick adaptation: `prepareDispatch()` uses `Clock().NextFenced() (uint64, error)` on current main; the PR #884 version targeted `Clock().Next() (uint64)`. Same downstream semantics, error wired through `preparedTxnDispatch`. ### 2. Close M2 open item — FSM other-txn exactness test `TestOnePhaseDedup_OtherTxnVersionDoesNotMaskRetry` (`kv/fsm_onephase_dedup_test.go`) pins exactness at the FSM apply layer: a third-party version at `T_other=20` must not satisfy the FSM probe at `T1=30`, so the retry falls through and applies at the fresh `T2=40`. The store-layer pin already covers the primitive; this test covers the dispatch path that uses it. ### 3. M3 multi-mop EXEC dedup test `TestExecDedup_MultiMopLandedPriorAttempt_ReturnsCachedResults` extends single-mop dedup to a 3-command MULTI/EXEC body (`SET a + SET b + DEL c`). Validates that cached results + OCC `readKeys` fence are mop-count-agnostic. Without dedup the DEL would re-execute to 0 on the second pass — the test rejects that. ### 4. Design doc updates - §M2 "still open" → "LANDED" with the new FSM test reference - §M3 "runTransaction (MULTI/EXEC) — Still open" → "LANDED via PR #884" with multi-mop test reference and acknowledgement of two intentional deviations from the M1/M2 template that claude[bot] flagged on #884 (`readKeys` assembly order, fresh per-attempt `reuseCtx`) - §M3 "standalone SET/INCR/HSET" called out as PR-B follow-up ## Caller audit (per /loop semantic-change rule) - `prepareDispatch` (newly added): callers are `commit()` and `firstExecAttempt`; both honor `defer prepared.cancel()`. External behavior of `commit()` preserved. - `commit()`: internal structure changed; external behavior preserved (no test directly invokes it). - `runTransactionWithDedup` / `firstExecAttempt` / `dispatchExecReuse` / `reusableExecTxn`: all new symbols, exercised only from the gated `runTransaction` path. ## Validation - `go test ./adapter/ -run 'Dedup|Txn|MULTI|EXEC'` passes - `go test ./kv/ ./store/` both pass - `gofmt`, `go vet`, `golangci-lint run` all clean (0 issues) ## Relation to prior work | PR | Merged into | Content | |---|---|---| | #796 (`f481f2b7`) | main | M1 + M2 + M3 RPUSH/LPUSH | | #884 (`cbbde3d7`) | stacked branch, NOT main | M3 EXEC reuse (stranded) | | **This PR** | main | Re-land #884 + M2 cross-txn FSM test + multi-mop test + doc updates | | PR-B (next) | main | Standalone SET / INCR / HSET reuse paths | | PR-C (next) | main | M4 Jepsen validation infrastructure | <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **Tests** * Added comprehensive regression test suite for transaction retry and deduplication handling, covering cached result reuse, conflict detection, and legacy behavior paths * Extended test coverage for standalone command transaction consistency * **Documentation** * Updated design documentation to reflect transaction handling improvements and recent test coverage additions <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/bootjp/elastickv/pull/887?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Follow-up to #796 (M1 + M2 + M3 RPUSH/LPUSH). Extends option-2 write-set reuse + exact-ts dedup probe to MULTI/EXEC dispatched through
runTransaction. MirrorslistPushCoreWithDedupat the EXEC granularity. Gated on the existingonePhaseTxnDedupflag — default off — so no mixed-version divergence window.This PR is stacked on #796: the base branch is
docs/txn-idempotency-design. Once #796 merges this rebases ontomain.Mechanism
txnContext.commit()refactor — split build-and-dispatch intoprepareDispatch()returning apreparedTxnDispatch{elems, commitTS, readKeys, ctx, cancel}.commit()now callsprepareDispatch()then dispatches;runTransactionWithDedupcalls them separately so it can intercept between prepare and dispatch. External behavior ofcommit()is unchanged.reusableExecTxn— EXEC analogue ofreusableListPush. The cachedresultsarray is the M3 R1 result reconstruction: computed once from attempt 1'sstartTSsnapshot, returned as-is on any reuse. Same invariance argument as RPUSH/LPUSHlength.dispatchExecReuse— one reuse iteration. Dispatches withPrevCommitTS=pending.commitTS. OnWriteConflict, the self-inflicted-conflict guard probesCommittedVersionAt(probeKey, freshCommitTS); if hit, return cached results (codex P1 round-10 class). Otherwise drop pending.runTransactionWithDedup+firstExecAttempt— the option-2 retry loop. First iteration builds the txn and dispatches; retryable failure stashes pending; subsequent iterations calldispatchExecReusewithPrevCommitTS.runTransactiongate — whenonePhaseTxnDedupis on, route torunTransactionWithDedup; otherwise keep the legacy retry loop byte-identical.Caller audit (per /loop semantic-change rule)
prepareDispatch(new): callers arecommit()andfirstExecAttempt; both honor thedefer prepared.cancel()contract.commit(): internal structure changed; external behavior preserved (no test directly invokes it).runTransactionWithDedup/firstExecAttempt/dispatchExecReuse/reusableExecTxn: all new symbols, exercised only from the gatedrunTransactionpath.Tests (
adapter/redis_exec_dedup_test.go)TestExecDedup_LandedPriorAttempt_ReturnsCachedResultsTestExecDedup_PriorAttemptDidNotLand_AppliesTestExecDedup_GenuineConflictRebuildsAndAppliespending.startTSTestExecDedup_SelfInflictedReuseConflict_ReturnsSuccesscommitTS; cached results returned (no double-apply)TestExecDedup_DisabledKeepsLegacyPathrunTransactionValidation
go test ./adapter/ -run 'Txn|MULTI|EXEC|Dedup|TxnStartTS'passesgo test ./kv/ ./store/both passgofmt,go vet,golangci-lint runall clean (0 issues acrossadapter/kv/store)Scope
readKeys) works the same for multi-mop EXEC under the existing proof, but the test matrix doubles, so multi-mop validation is a follow-up.runTransaction.Design doc reference
docs/design/2026_05_21_proposed_txn_secondary_idempotency.md§M3.