Skip to content

feat(txn-dedup): re-land M3 EXEC reuse on main + close M2 open + multi-mop#887

Merged
bootjp merged 8 commits into
mainfrom
feat/txn-dedup-docs-and-tests
May 30, 2026
Merged

feat(txn-dedup): re-land M3 EXEC reuse on main + close M2 open + multi-mop#887
bootjp merged 8 commits into
mainfrom
feat/txn-dedup-docs-and-tests

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented May 30, 2026

Summary

PR #884 was merged into the stacked branch docs/txn-idempotency-design (at cbbde3d7) but never reached main — main has only PR #796's M1 + M2 + M3 RPUSH/LPUSH content. This PR re-lands the M3 EXEC reuse code on top of main and bundles three follow-ups that move the design doc's "still open" / "follow-up" items into "landed".

Base: main.

Changes

1. Re-land M3 EXEC reuse (PR #884's content)

2. Close M2 open item — FSM other-txn exactness test

TestOnePhaseDedup_OtherTxnVersionDoesNotMaskRetry (kv/fsm_onephase_dedup_test.go) pins exactness at the FSM apply layer: a third-party version at T_other=20 must not satisfy the FSM probe at T1=30, so the retry falls through and applies at the fresh T2=40. The store-layer pin already covers the primitive; this test covers the dispatch path that uses it.

3. M3 multi-mop EXEC dedup test

TestExecDedup_MultiMopLandedPriorAttempt_ReturnsCachedResults extends single-mop dedup to a 3-command MULTI/EXEC body (SET a + SET b + DEL c). Validates that cached results + OCC readKeys fence are mop-count-agnostic. Without dedup the DEL would re-execute to 0 on the second pass — the test rejects that.

4. Design doc updates

Caller audit (per /loop semantic-change rule)

  • prepareDispatch (newly added): callers are commit() and firstExecAttempt; both honor defer prepared.cancel(). External behavior of commit() preserved.
  • commit(): internal structure changed; external behavior preserved (no test directly invokes it).
  • runTransactionWithDedup / firstExecAttempt / dispatchExecReuse / reusableExecTxn: all new symbols, exercised only from the gated runTransaction path.

Validation

  • go test ./adapter/ -run 'Dedup|Txn|MULTI|EXEC' passes
  • go test ./kv/ ./store/ both pass
  • gofmt, go vet, golangci-lint run all clean (0 issues)

Relation to prior work

PR Merged into Content
#796 (f481f2b7) main M1 + M2 + M3 RPUSH/LPUSH
#884 (cbbde3d7) stacked branch, NOT main M3 EXEC reuse (stranded)
This PR main Re-land #884 + M2 cross-txn FSM test + multi-mop test + doc updates
PR-B (next) main Standalone SET / INCR / HSET reuse paths
PR-C (next) main M4 Jepsen validation infrastructure

Summary by CodeRabbit

Release Notes

  • Tests

    • Added comprehensive regression test suite for transaction retry and deduplication handling, covering cached result reuse, conflict detection, and legacy behavior paths
    • Extended test coverage for standalone command transaction consistency
  • Documentation

    • Updated design documentation to reflect transaction handling improvements and recent test coverage additions

Review Change Stack

…i-mop

PR #884 was merged into the stacked branch docs/txn-idempotency-design
(at cbbde3d) but never reached main — main has only PR #796's M1 + M2 +
M3 RPUSH/LPUSH content. This PR re-lands the M3 EXEC reuse code on top
of main and bundles three follow-ups that extend the design doc's
"still open" / "follow-up" items into "landed":

1. Re-land M3 EXEC reuse (PR #884's content, cherry-picked + rebased)

   - adapter/redis.go: txnContext.prepareDispatch() split out of
     commit(); reusableExecTxn; dispatchExecReuse; runTransactionWithDedup
     + firstExecAttempt; gate at the top of runTransaction.
   - adapter/redis_exec_dedup_test.go (originally added in PR #884):
     5 tests pinning all four reuse outcomes plus the gate-off legacy
     equivalence.
   - The cherry-pick required one small adaptation: prepareDispatch()
     uses Clock().NextFenced() (uint64, error) on current main; the PR
     #884 version targeted Clock().Next() (uint64). Same downstream
     semantics; the error return is wired through preparedTxnDispatch.

2. Close M2 open item — FSM other-txn exactness test
   (kv/fsm_onephase_dedup_test.go)

   - TestOnePhaseDedup_OtherTxnVersionDoesNotMaskRetry pins exactness at
     the FSM apply layer: a third-party version at T_other=20 must not
     satisfy the FSM probe at T1=30, so the retry falls through and
     applies at the fresh T2=40. The store-layer pin
     (store/committed_version_at_test.go) already covers the primitive;
     this test covers the dispatch path that uses it.

3. M3 multi-mop EXEC dedup test
   (adapter/redis_exec_dedup_test.go)

   - TestExecDedup_MultiMopLandedPriorAttempt_ReturnsCachedResults
     extends single-mop dedup to a 3-command MULTI/EXEC body
     (SET a + SET b + DEL c). Validates the design's claim that the
     mechanism works the same for multi-mop because cached results +
     OCC readKeys fence are mop-count-agnostic. Without dedup the DEL
     would re-execute to 0 on the second pass — the test rejects that.

4. Design doc updates
   (docs/design/2026_05_21_proposed_txn_secondary_idempotency.md)

   - §M2 "still open" → "LANDED" with the new FSM test reference.
   - §M3 "runTransaction (MULTI/EXEC) — Still open" → "LANDED via
     PR #884" with multi-mop test reference and the two intentional
     deviations from the M1/M2 template that claude[bot] flagged on
     #884 (readKeys assembly order, fresh per-attempt reuseCtx).
   - §M3 "standalone SET/INCR/HSET" called out as the next follow-up
     (PR-B in the user-visible roadmap).

Caller audit (per /loop semantic-change rule)
==============================================

- prepareDispatch (newly added, replaces commit()'s body): callers are
  commit() and firstExecAttempt; both honor the defer prepared.cancel()
  contract. External behavior of commit() preserved.
- commit(): internal structure changed; external behavior preserved
  (no test directly invokes it; runTransaction's legacy path continues
  to call it through the same shape).
- runTransactionWithDedup / firstExecAttempt / dispatchExecReuse /
  reusableExecTxn: all new symbols, exercised only from the gated
  runTransaction path.

Validation
==========

- go test ./adapter/ -run 'Dedup|Txn|MULTI|EXEC' passes.
- go test ./kv/ ./store/ both pass.
- gofmt, go vet, golangci-lint run all clean (0 issues across
  adapter/kv/store).

Relation to prior work
======================

- #796 (merged f481f2b): M1 + M2 + M3 RPUSH/LPUSH on main.
- #884 (merged cbbde3d into stacked branch, NOT main): M3 EXEC reuse.
- THIS PR: brings #884's content into main, plus M2 cross-txn FSM
  test, multi-mop EXEC test, and design doc updates.
- Next (PR-B): standalone SET / INCR / HSET reuse paths.
- Next (PR-C): M4 Jepsen validation infra.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

Warning

Review limit reached

@bootjp, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 50 minutes and 16 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8a3d17e4-09d8-4334-afee-90ffb45f77bc

📥 Commits

Reviewing files that changed from the base of the PR and between 4da85dc and 69db79b.

📒 Files selected for processing (2)
  • adapter/redis_exec_dedup_test.go
  • docs/design/2026_05_21_proposed_txn_secondary_idempotency.md
📝 Walkthrough

Walkthrough

This PR implements one-phase transaction dedup for MULTI/EXEC (runTransaction) and standalone SET commands, refactoring the transaction commit path to enable dedup interception and adding comprehensive test coverage for caching, reuse, conflict detection, and legacy path preservation.

Changes

One-Phase Transaction Dedup for MULTI/EXEC and Standalone SET

Layer / File(s) Summary
Transaction dispatch refactoring foundation
adapter/redis.go
preparedTxnDispatch type and prepareDispatch() method pre-assemble dispatch-ready write sets, commit timestamps, and read keys under a cancelable timeout context. txnContext.commit() refactored to use prepared dispatch, enabling dedup logic to intercept between preparation and coordinator dispatch.
MULTI/EXEC dedup implementation
adapter/redis.go
reusableExecTxn captures attempt-1 write set, read set, and cached results. dispatchExecReuse dispatches under fresh commitTS with PrevCommitTS flag and probes committed versions on conflicts. runTransactionWithDedup orchestrates retry loop. firstExecAttempt builds snapshot, applies commands, validates reads, and prepares dispatch. runTransaction routes to dedup path when onePhaseTxnDedup enabled.
Standalone SET dedup integration
adapter/redis.go
set() gates standalone SET through runTransactionWithDedup when enabled, with pre-dedup logic extracted to setLegacy. writeRedisStandaloneResult helper translates single-element dedup results into redcon reply forms.
MULTI/EXEC dedup test coverage
adapter/redis_exec_dedup_test.go
Six integration tests exercise landed prior attempts with cached reuse, prior attempts that did not land, genuine external conflicts triggering rebuild, self-inflicted reuse conflicts, multi-command transactions with result array reuse, and legacy path preservation when dedup is disabled.
Standalone SET dedup tests with infrastructure
adapter/redis_retry_test.go, adapter/redis_set_dedup_test.go
recordingConn enhanced with wroteNull field. Four tests validate SET dedup: landed attempts return cached OK without error, NX option returns null-style response without overwrite, GET option returns cached prior value while committing new value, and legacy path when dedup disabled.
FSM exactness validation for foreign versions
kv/fsm_onephase_dedup_test.go
New test verifies unrelated committed versions at different timestamps do not satisfy exact-timestamp probes during idempotency retry, ensuring foreign transactions do not mask dedup detection.
Design documentation updates
docs/design/2026_05_21_proposed_txn_secondary_idempotency.md
M2 section records FSM-level "other transaction" exactness test as landed. M3 section marks MULTI/EXEC dedup as landed with cached-result behavior, multi-mop regression tests, and routing updates for standalone SET via runTransactionWithDedup.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • bootjp/elastickv#871: Modifies txnContext.commit commitTS allocation to use Clock().NextFenced() and propagate errors alongside this PR's commit dispatch refactoring.
  • bootjp/elastickv#518: Modifies the Redis transaction commit flow (txnContext.commit) to remove TTL elems from Raft txn and flush TTL via ttlBuffer alongside this PR's commit dispatch changes.
  • bootjp/elastickv#796: Builds on shared onePhaseTxnDedup and PrevCommitTS coordinator/FSM infrastructure; this PR extends dedup to standalone SET and MULTI/EXEC, while the retrieved PR focused on list-push operations.

Poem

🐰 A rabbit hops through transactions with care,
Dedup caches results beyond compare,
MULTI/EXEC and SET now reuse with grace,
When conflicts arise, we probe the right place,
No retries needed—cached answers we share!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: re-landing M3 EXEC reuse, closing M2 gaps, and adding multi-mop test coverage.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/txn-dedup-docs-and-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the Option-2 retry and deduplication path for MULTI/EXEC transactions (runTransactionWithDedup), allowing transaction write sets to be reused under a fresh commit timestamp with PrevCommitTS to probe if a prior attempt already landed. It includes comprehensive unit tests validating various deduplication scenarios, updates the design documentation, and adds FSM-level tests for exactness. The review feedback highlights three key areas for improvement: updating the timestamp allocation in dispatchExecReuse to use NextFenced() to respect HLC-4 safety limits, adding a fast-path check for parent context expiration in the retry loop to prevent resource waste, and adding a nil check on readPin to avoid a potential nil pointer dereference panic.

Comment thread adapter/redis.go Outdated
// is the current length" question; the client-visible result IS the
// cached results array.
func (r *RedisServer) dispatchExecReuse(ctx context.Context, pending *reusableExecTxn) (results []redisResult, drop bool, err error) {
commitTS := r.coordinator.Clock().Next()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The commitTS allocated here uses r.coordinator.Clock().Next(), which bypasses the HLC-4 physical-ceiling fence. According to the HLC-4 safety specification (and the documentation in kv/hlc.go), all persistence-grade allocations (including commitTS for transactions) must go through NextFenced() to ensure that an expired-ceiling allocation fails closed instead of silently issuing a timestamp that could collide with a subsequent leader's window.

Please update this to use NextFenced() and handle the returned error.

commitTS, err := r.coordinator.Clock().NextFenced()
if err != nil {
	return nil, false, errors.Wrap(err, "dispatchExecReuse: allocate commitTS")
}

Comment thread adapter/redis.go Outdated
Comment on lines +3174 to +3175
if pending != nil {
reuseCtx, reuseCancel := context.WithTimeout(r.handlerContext(), redisDispatchTimeout)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the parent dispatchCtx has already expired or been cancelled, starting a new reuse dispatch with a fresh 10-second timeout (redisDispatchTimeout) derived from r.handlerContext() will waste resources and delay the error response to the client.

Consider checking dispatchCtx.Err() at the start of the retry block when pending != nil to fail fast if the parent context is already dead.

if pending != nil {
	if err := dispatchCtx.Err(); err != nil {
		return err
	}
	reuseCtx, reuseCancel := context.WithTimeout(r.handlerContext(), redisDispatchTimeout)

Comment thread adapter/redis.go
Comment on lines +3214 to +3215
readPin := r.pinReadTS(startTS)
defer readPin.Release()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

r.pinReadTS(startTS) can return nil if r.readTracker is not initialized (which is common in minimal test setups or certain configurations). Deferring readPin.Release() directly without a nil check will cause a nil pointer dereference panic.

Please add a nil check before deferring the release.

readPin := r.pinReadTS(startTS)
if readPin != nil {
	defer readPin.Release()
}

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 30, 2026

@claude review

PR-A: re-land M3 EXEC reuse + close M2 open + multi-mop test.

Status:

  • PR feat(txn-dedup): M3 EXEC reuse — option-2 dedup for MULTI/EXEC #884 (M3 EXEC reuse) merged into stacked branch but never reached main; this PR re-lands its content on main with the Clock().NextFenced merge fix.
  • TestOnePhaseDedup_OtherTxnVersionDoesNotMaskRetry closes M2 open item.
  • TestExecDedup_MultiMopLandedPriorAttempt_ReturnsCachedResults proves multi-mop dedup works (SET+SET+DEL body).
  • Design doc §M2 "open" → LANDED, §M3 EXEC → LANDED with deviations doc'd.
  • gofmt + go vet + golangci-lint run clean.

Please verify the cherry-pick adaptation (NextFenced error wired through preparedTxnDispatch) and the multi-mop test's DEL-result invariance argument.

@claude
Copy link
Copy Markdown

claude Bot commented May 30, 2026

Claude finished @bootjp's task in 4m 11s —— View job


PR #887 Review — Re-land M3 EXEC reuse + M2 close + multi-mop test

Tasks:

  • Read adapter/redis.go changes
  • Read adapter/redis_exec_dedup_test.go
  • Read kv/fsm_onephase_dedup_test.go
  • Read design doc changes
  • Verify NextFenced error wiring through preparedTxnDispatch
  • Analyze multi-mop DEL-result invariance argument
  • Check Gemini review comments (Next vs NextFenced, ctx check, nil readPin)
  • Post full review

Requested verification items

1. NextFenced error wiring through prepareDispatch() — correct

adapter/redis.go:2685 correctly uses Clock().NextFenced() and both error paths return a no-op cancel stub:

commitTS, err := t.server.coordinator.Clock().NextFenced()
if err != nil {
    return preparedTxnDispatch{cancel: func() {}}, errors.Wrap(err, "redis txn commit: allocate commitTS")
}

The buildStreamDeletionElems error path also calls cancel() before returning, avoiding a context leak. The cherry-pick adaptation is correct.

2. Multi-mop DEL=1 invariance — sound

The argument holds. applyDel calls stagedKeyType(key)trackTypeReadKeys(key) (which tracks redisStrKey(key), list meta, hash, set, etc.) and stageKeyDeletion(key)t.load(key)trackReadKey(storageKey). Both paths register the key in t.readKeys. So the OCC fence covers the deleted key: any concurrent write to "c" between startTS and commitTS triggers WriteConflictdrop pending → recompute from fresh snapshot. The cached DEL=1 result is only returned when the OCC fence passes, meaning "c" was not concurrently modified — at which point either attempt 1 already deleted it (probe hits, no-op) or attempt 1 didn't land and the reuse deletes it at the fresh commitTS, again yielding count 1.


Issues

Bug (high) — dispatchExecReuse uses Clock().Next() for a persistence-grade commitTS

adapter/redis.go:3112:

commitTS := r.coordinator.Clock().Next()

The HLC documentation at kv/hlc.go:107-111 is explicit:

NEW persistence-grade allocations (everything that ends up as a startTS / commitTS, an MVCC write timestamp, or a lease/expiry boundary) MUST go through NextFenced instead — Next bypasses the HLC-4 (iii) ceiling fence and can therefore issue a timestamp inside a stale leader window after lease renewal stops.

This is a real allocation of a commitTS that is passed to coordinator.Dispatch, so it must use NextFenced(). The prepareDispatch() (line 2685) correctly uses NextFenced(); the reuse path should follow the same pattern:

commitTS, err := r.coordinator.Clock().NextFenced()
if err != nil {
    return nil, false, errors.Wrap(err, "dispatchExecReuse: allocate commitTS")
}

Context: The pre-existing dispatchListPushReuse at line 3508 has the same bug — commitTS := r.coordinator.Clock().Next(). The new code mirrors that pattern, but it should mirror the NextFenced() pattern from prepareDispatch() and applyListPush (line 3438) instead. Fix this →

Non-issue — Gemini's nil readPin concern

adapter/redis.go:3214-3215:

readPin := r.pinReadTS(startTS)
defer readPin.Release()

pinReadTS can return nil when r.readTracker == nil. However, ActiveTimestampToken.Release() at kv/active_timestamp_tracker.go:57-60 already has:

func (t *ActiveTimestampToken) Release() {
    if t == nil || t.tracker == nil || t.id == 0 {
        return
    }

Calling .Release() on a nil pointer is valid Go and returns immediately. The identical pattern exists in the legacy runTransaction at line 3032-3033 and is working correctly. Gemini's concern is a false positive.

Minor — fresh reuseCtx bypasses expired dispatchCtx

adapter/redis.go:3172-3175:

reuseCtx, reuseCancel := context.WithTimeout(r.handlerContext(), redisDispatchTimeout)
defer reuseCancel()

The fresh context from handlerContext() means an already-expired dispatchCtx is bypassed. If the outer context has already cancelled (client disconnected), a full 10-second reuse attempt will proceed on their behalf. This is documented as intentional deviation #2 in the design doc, so it's an informed choice, but checking dispatchCtx.Err() first would avoid wasted work for the common case where the client is no longer waiting:

if pending != nil {
    if err := dispatchCtx.Err(); err != nil {
        return err
    }
    reuseCtx, reuseCancel := ...

Not a blocking issue given the documentation.


Design doc & tests

The design doc updates are clean: §M2 and §M3 transitions from "Still open" to "LANDED" are accurate, and the two intentional deviations are correctly documented. The TestOnePhaseDedup_OtherTxnVersionDoesNotMaskRetry test scenario (T_other=20, T1=30, T2=40) correctly exercises exactness at the FSM dispatch layer beyond what the store-layer suite already covers. The 5 exec dedup tests cover all four reuse outcomes plus the gate-off legacy equivalence path.

Summary

The cherry-pick adaptation is correct (NextFenced properly wired in prepareDispatch()), and the multi-mop DEL invariance argument is sound. One real fix needed: dispatchExecReuse:3112 must use NextFenced() not Next() for its commitTS allocation — a persistence-grade timestamp must respect the HLC-4 ceiling fence. The nil-readPin concern from Gemini is a false positive. The fresh-context deviation is documented and acceptable but adding a pre-check on dispatchCtx.Err() is worth considering.
| Branch

gemini HIGH (adapter/redis.go:3112 dispatchExecReuse):
  Clock().Next() bypasses the HLC-4 physical-ceiling fence;
  persistence-grade commit_ts allocation must use NextFenced() to
  avoid issuing a timestamp that collides with a subsequent leader's
  window after renewal. Switched dispatchExecReuse to NextFenced()
  with the standard error-propagation shape (matches prepareDispatch
  from PR #884 cherry-pick).

gemini MEDIUM (adapter/redis.go:3175 reuseCtx propagation):
  The original "fresh ctx from handlerContext per reuse attempt"
  pattern ignored outer cancellation -- a disconnected client would
  wait the full 10 s before the reuse returned. Derived reuseCtx from
  dispatchCtx (the caller's retry-loop ctx) so outer cancellation
  interrupts mid-attempt. Per-attempt redisDispatchTimeout still caps
  the dispatch the same way commit does for the first attempt.
  Matches listPushCoreWithDedup's caller-ctx threading.

gemini MEDIUM (adapter/redis.go:3215 readPin.Release on nil):
  False positive. ActiveTimestampToken.Release is documented
  nil-safe at kv/active_timestamp_tracker.go:58. The pattern
  readPin := r.pinReadTS(...); defer readPin.Release() is used
  unchanged in the existing runTransaction (legacy path); no change
  needed.

Caller audit per /loop semantic-change rule:

- dispatchExecReuse signature unchanged; the new error return is
  the same (bool, error) tuple position. Single caller is
  runTransactionWithDedup which already returns dispErr via dropping
  to the retry loop's error path -- the new NextFenced error reaches
  the same dispErr branch.
- reuseCtx parent change is local to runTransactionWithDedup; the
  only observable effect is faster cancellation propagation, which
  retryRedisWrite already gates on ctx.Done between attempts.

Note on pre-existing Clock().Next() callers:

dispatchListPushReuse (adapter/redis.go:3508) and the
listPushCoreWithDedup first-attempt site (adapter/redis.go:3679) --
both shipped on main as part of PR #796 -- still use Clock().Next()
and have the same HLC-4 ceiling-bypass exposure. Out of scope for
this PR (those are pre-existing on main, not introduced by PR-A's
diff), but should be fixed in a follow-up cleanup PR for parity.

Design doc:
  M3 "fresh reuseCtx from handlerContext" deviation note is struck
  and replaced with the dispatchCtx-derived rationale; future
  readers see why the earlier framing was wrong and what the
  current pattern is.

Validation:
  go test ./adapter/ -run 'ExecDedup|TxnMULTI' passes.
  go build ./adapter/... clean.
  golangci-lint run ./adapter/... 0 issues.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 30, 2026

@claude review

Round 1 fixes (commit dd042e9):

  • gemini HIGH: dispatchExecReuse now uses Clock().NextFenced() (HLC-4 ceiling honored). NextFenced error wired through (nil, false, err) tuple.
  • gemini MEDIUM (reuseCtx): switched from r.handlerContext() to dispatchCtx as the parent so outer cancellation propagates promptly. Per-attempt redisDispatchTimeout still caps the dispatch.
  • gemini MEDIUM (readPin.Release): documented as false positive — ActiveTimestampToken.Release is nil-safe at kv/active_timestamp_tracker.go:58.
  • Design doc §M3 deviation note updated: "fresh reuseCtx from handlerContext" struck and replaced with dispatchCtx-derived rationale.
  • Note: pre-existing dispatchListPushReuse + listPushCoreWithDedup also use Clock().Next() with the same HLC-4 exposure; flagged for a separate cleanup PR (out of scope for PR-A's diff).

go test ./adapter/ -run 'ExecDedup|TxnMULTI' passes; gofmt/vet/golangci-lint clean.

@claude
Copy link
Copy Markdown

claude Bot commented May 30, 2026

Claude finished @bootjp's task in 2m 41s —— View job


PR #887 Round 1 Review — Verifying fixes from dd042e9

Tasks:

  • Load tools
  • Read adapter/redis.go — verify NextFenced() wiring in dispatchExecReuse
  • Read adapter/redis.go — verify dispatchCtx as parent for reuseCtx
  • Check for any new issues introduced by the fixes
  • Verify design doc update reflects actual code state
  • Post review

Round 1 fixes verified — all correct

HIGH fix (dispatchExecReuseNextFenced)

adapter/redis.go:3118-3120 is correct:

commitTS, allocErr := r.coordinator.Clock().NextFenced()
if allocErr != nil {
    return nil, false, errors.Wrap(allocErr, "redis exec reuse: allocate commitTS")
}

Return shape (nil, false, err) is correct: drop=false means the caller does not nil pending (nothing to drop from a timestamp allocation failure); the error is non-retryable (isRetryableRedisTxnErr only matches ErrWriteConflict / ErrTxnLocked, line adapter/redis_retry.go:44), so retryRedisWrite immediately surfaces it to the client. No retry amplification possible.

MEDIUM fix (reuseCtx parent → dispatchCtx)

adapter/redis.go:3194:

reuseCtx, reuseCancel := context.WithTimeout(dispatchCtx, redisDispatchTimeout)

Go's context.WithTimeout takes the minimum of the parent's remaining deadline and the new timeout, so the total budget across all attempts is now capped by the outer dispatchCtx (itself handlerContext() + redisDispatchTimeout). Previously each reuse attempt could independently hold a fresh full 10 s even after dispatchCtx expired — this is corrected. The fix matches listPushCoreWithDedup's threading pattern.

MEDIUM (readPin nil-safety) — false positive, correctly documented

ActiveTimestampToken.Release() at kv/active_timestamp_tracker.go:57-60 has an explicit if t == nil guard. The pattern defer readPin.Release() with a potentially-nil readPin is used identically at adapter/redis.go:3032-3033 in the existing (shipped) runTransaction path. No change required.

Design doc

The strikethrough + "Reverted per PR #887 review" annotation correctly records why the handlerContext framing was dropped. Future readers see the rationale rather than a silent inconsistency with the implementation.


One remaining observation (informational, not blocking)

The pre-existing dispatchListPushReuse (adapter/redis.go:3527) and listPushCoreWithDedup first-attempt site (:3698) still use Clock().Next() — same HLC-4 ceiling-bypass exposure that was just fixed in dispatchExecReuse. These are correctly called out in the commit message as out-of-scope for this PR (they shipped on main in PR #796). A follow-up cleanup PR to bring them to NextFenced() parity is the right call. No action needed here.


Summary

All three issues from the prior review are correctly addressed in dd042e9. The NextFenced error propagation shape is clean, the context threading fix is semantically correct and now matches the list-push dedup pattern, and the nil readPin dismissal is well-documented. LGTM — no further changes needed for Round 1.
| Branch

bootjp added 5 commits May 30, 2026 16:44
Stacked on PR-A (re-land of M3 EXEC reuse + multi-mop test).

Extends option-2 dedup to standalone SET (dispatched outside MULTI).
When `r.onePhaseTxnDedup` is on, `r.set` wraps the single command as a
1-element redcon.Command queue and dispatches via
runTransactionWithDedup — reusing the M3 EXEC machinery instead of
building a per-handler reusableSetTxn + dispatchSetReuse shape. SET
already has `applySet` on `txnContext`, so this is the "free"
extension to any command whose txn-state-aware apply hook already
exists.

The standalone fast path (`trySetFastPath`) is intentionally bypassed
under the gate: dedup is opt-in, and a non-dedup'd fast path under a
dedup-on cluster would split the idempotency contract.

Why standalone INCR / HSET are out of scope for THIS PR:
========================================================

INCR (in adapter/redis_compat_commands.go) and HSET both lack a
`txnContext.applyIncr` / `applyHSet` implementation, so the
"route through single-mop EXEC" pattern that worked here for SET
cannot apply as-is — `runTransactionWithDedup` would reject the
command at `txn.apply` ("unsupported in MULTI"). Bringing them in
requires implementing `applyIncr` / `applyHSet` first (each
~30–50 LOC for the txn-state-aware read-compute-write shape), then
the standalone handler routing is a one-liner via this same path.
Tracked as separate follow-up PRs; until then, INCR and HSET keep
today's buggy-under-churn behaviour, which is the design doc's
stated default ("everything else keeps today's behaviour until
its hook is added").

Caller audit (per /loop semantic-change rule):

- `r.set` (handler): gate-on takes a new code path; gate-off
  preserved verbatim via `setLegacy`. No external callers exist
  (it is wired only into the redcon dispatch table).
- `setLegacy` (new): byte-identical extraction of the pre-PR set()
  body. No external callers.
- `writeRedisStandaloneResult` (new): translates a single-element
  `[]redisResult` from `runTransactionWithDedup` into the bare
  redcon reply shape (no WriteArray wrapper). Single caller in
  this PR; future SET-pattern callers will reuse it.

Validation:

- adapter/redis_set_dedup_test.go (new):
  TestStandaloneSetDedup_LandedPriorAttempt_ReturnsOK,
  TestStandaloneSetDedup_DisabledKeepsLegacyPath. Existing
  TestRedis_SET / dedup suites still pass.
- gofmt, go vet, golangci-lint run all clean (0 issues).

Design doc updated to mark standalone SET as LANDED and call out
INCR/HSET as the next follow-up with the precise reason they
cannot land alongside SET in this PR.
claude[bot] 🔶 (redundant double gate, adapter/redis.go:1135):
  set's gate-on branch called r.runTransaction which immediately
  re-checked the same r.onePhaseTxnDedup gate and routed to
  runTransactionWithDedup. The indirection was misleading -- the PR
  description said "dispatches via runTransactionWithDedup" but that
  was true only by indirection. Replaced with a direct call to
  runTransactionWithDedup; intent is now explicit and the double gate
  check is removed.

claude[bot] 🔶 (test gap, adapter/redis_set_dedup_test.go):
  Only the resultString OK path was tested through the dedup route.
  Added two regression tests covering the other applySet result types
  that go through writeRedisStandaloneResult:

  - TestStandaloneSetDedup_NXMissReturnsNil pins resultNil routing:
    SET k v NX on an existing key returns nil; the cached attempt-1
    result must round-trip through writeRedisStandaloneResult ->
    WriteNull, leaving conn.bulk == nil.

  - TestStandaloneSetDedup_GETOptionReturnsOldBulk pins resultBulk
    routing: SET k v GET on an existing key returns the prior value
    as a bulk reply; conn.bulk must be the prior bytes.

  Both fire the ambiguous-attempt-1-lands path (newDedupTestCoordinator
  ambiguousLands=true) so the result MUST come from the cached
  attempt-1 array, not a re-execution.

claude[bot] minor observation (shallow-array constraint in
writeRedisStandaloneResult):
  Documented in the function's doc comment that the resultArray arm
  flattens via WriteBulkString and is NOT safe to reuse for callers
  whose applyXxx emits nested arrays. Future HGETALL-pattern callers
  must either pre-flatten their result or extend the switch.

Caller audit per /loop semantic-change rule:

- set's gate-on branch is the only call site changed; the new direct
  call to runTransactionWithDedup uses the same []redcon.Command{cmd}
  shape and the same []redisResult return type. runTransactionWithDedup
  is exported within the package and has no other callers outside the
  legacy runTransaction path (which is unchanged).
- writeRedisStandaloneResult signature unchanged. Documentation
  expanded but no behavior change.
- New tests are pure additions; no existing tests modified.

Validation:
  go test ./adapter/ -run StandaloneSetDedup passes (5 tests now).
  go build ./adapter/... clean.
  golangci-lint run ./adapter/... 0 issues.
claude[bot] (a) attribution prefix per CLAUDE.md convention:
  Stripped the inline "// claude[bot] PR #888 review:" prefixes from
  the two comments added in round 1. The substantive "why" content
  (double-gate rationale; shallow-array constraint) is kept as that
  is what future readers need. CLAUDE.md convention is to leave
  fix attribution in the commit message, not the source.

claude[bot] (b) wroteNull bool field on recordingConn:
  Added wroteNull witness to recordingConn (in redis_retry_test.go,
  the shared test helper). Hardened TestStandaloneSetDedup_NXMissReturnsNil
  to require.True(conn.wroteNull, ...) so a wrong branch that wrote
  nothing at all (would also satisfy conn.bulk == nil) is now caught.

Caller audit per /loop semantic-change rule:
  - recordingConn is in *_test.go and used by tests only. Adding
    wroteNull bool default false is strictly additive; no existing
    test reads it. The only WriteNull semantics change is setting
    the new field; conn.bulk = nil behavior is preserved.
  - Verified by grep: only redis_set_dedup_test.go references
    wroteNull. All other recordingConn uses (8 instantiations
    across redis_retry_test.go + redis_lua_linearizable_read_test.go)
    are unaffected.

Validation:
  go test ./adapter/ -run StandaloneSetDedup passes.
  go test ./adapter/ -run ExecDedup passes.
  gofmt + go vet + golangci-lint clean.
…_test

claude[bot] PR #888 round-3 minor cleanup: same CLAUDE.md convention
violation that was cleaned from redis.go in round 2 was left in the
test file. Removed two references on lines 56 and 78. Substantive
rationale (NX semantics, zero-value aliasing problem) kept; the
attribution moves to the commit message where it belongs.

No functional or test behaviour change. golangci-lint clean.
)

## Summary

Stacked on PR-A (#887). Extends option-2 dedup to **standalone SET**
(dispatched outside MULTI). When `r.onePhaseTxnDedup` is on, `r.set`
wraps the single command as a 1-element `redcon.Command` queue and
dispatches via `runTransactionWithDedup` — reusing the M3 EXEC machinery
instead of building a per-handler `reusableSetTxn` + `dispatchSetReuse`
shape. SET already has `applySet` on `txnContext`, so this is the "free"
extension to any command whose txn-state-aware apply hook already
exists.

The standalone fast path (`trySetFastPath`) is intentionally bypassed
under the gate: dedup is opt-in, and a non-dedup'd fast path under a
dedup-on cluster would split the idempotency contract.

Base: `feat/txn-dedup-docs-and-tests` (PR #887).

## Why standalone INCR / HSET are out of scope for THIS PR

INCR (in `adapter/redis_compat_commands.go`) and HSET both lack a
`txnContext.applyIncr` / `applyHSet` implementation, so the "route
through single-mop EXEC" pattern that worked here for SET cannot apply
as-is — `runTransactionWithDedup` would reject the command at
`txn.apply` ("unsupported in MULTI"). Bringing them in requires
implementing `applyIncr` / `applyHSet` first (each ~30–50 LOC for the
txn-state-aware read-compute-write shape), then the standalone handler
routing is a one-liner via this same path. Tracked as separate follow-up
PRs; until then, INCR and HSET keep today's buggy-under-churn behaviour
— the design doc's stated default ("everything else keeps today's
behaviour until its hook is added").

## Caller audit (per /loop semantic-change rule)

- `r.set` (handler): gate-on takes a new code path; gate-off preserved
verbatim via `setLegacy`. No external callers exist (it is wired only
into the redcon dispatch table).
- `setLegacy` (new): byte-identical extraction of the pre-PR `set()`
body. No external callers.
- `writeRedisStandaloneResult` (new): translates a single-element
`[]redisResult` from `runTransactionWithDedup` into the bare redcon
reply shape (no `WriteArray` wrapper). Single caller in this PR; future
SET-pattern callers will reuse it.

## Tests

- `adapter/redis_set_dedup_test.go` (new):
- `TestStandaloneSetDedup_LandedPriorAttempt_ReturnsOK`: gate-on +
ambiguous attempt-1 land + reuse → probe hits → cached "OK" returned,
dispatches=2, probeNoOps=1.
- `TestStandaloneSetDedup_DisabledKeepsLegacyPath`: gate-off path runs
no probe.
- Existing `TestRedis_SET*` and dedup suites still pass.

## Validation

- `go test ./adapter/ -run 'Dedup|Txn|MULTI|EXEC|TestRedis_SET'` passes
- `gofmt`, `go vet`, `golangci-lint run` all clean (0 issues)

## Design doc

`docs/design/2026_05_21_proposed_txn_secondary_idempotency.md` §M3 —
standalone SET marked LANDED, standalone INCR/HSET called out as
follow-up with the precise reason they cannot land alongside SET in this
PR.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 30, 2026

@claude review

Head moved to 4da85dc after PR #888 (standalone SET dedup) was merged into this branch. The new content is the LGTM'd PR #888 work — adapter/redis.go set() routing + setLegacy + writeRedisStandaloneResult + redis_set_dedup_test.go.

The PR #887 round-1 changes (NextFenced fix in dispatchExecReuse + reuseCtx parent switch + design doc deviation note revert) are unchanged; both findings were marked LGTM previously.

Please confirm the combined diff against main is mergeable. No new code was authored — just the PR-B work folded in.

@claude
Copy link
Copy Markdown

claude Bot commented May 30, 2026

Claude finished @bootjp's task in 5m 50s —— View job


PR #887 Round 2 Review — Combined diff (PR #887 + PR #888 folded in)

  • Load tools
  • Get combined diff against origin/main
  • Read new PR feat(txn-dedup): standalone SET routes through option-2 dedup loop #888 content: set() routing, setLegacy, writeRedisStandaloneResult
  • Read adapter/redis_set_dedup_test.go
  • Verify correctness and HLC compliance of the standalone SET dedup path
  • Confirm all prior LGTM'd changes (NextFenced, reuseCtx) are intact
  • Post review

Prior LGTM'd changes — intact

Round-1 fixes confirmed unchanged at their final positions:

  • dispatchExecReuse uses commitTS, allocErr := r.coordinator.Clock().NextFenced()
  • reuseCtx, reuseCancel := context.WithTimeout(dispatchCtx, redisDispatchTimeout) (outer dispatchCtx, not r.handlerContext()) ✓
  • Design doc deviation note 2 reflects the fix with "fresh ctx from handlerContext — Reverted per PR feat(txn-dedup): re-land M3 EXEC reuse on main + close M2 open + multi-mop #887 review" ✓

PR #888 content — set() routing + setLegacy + writeRedisStandaloneResult

set() routing logic — correct.

The gate at r.onePhaseTxnDedup routes to runTransactionWithDedup([]redcon.Command{cmd}), then writeRedisStandaloneResult. The gate-off branch falls through to setLegacy. Argument safety is guaranteed before set() is reached: redis_command_specs.go registers SET with Arity: -3 and validateCmd rejects commands with fewer than 3 args — no unsafe cmd.Args[1] access possible when set() is invoked.

setLegacy — byte-identical to the pre-PR set() body. Extraction is clean; the comment accurately describes the intent.

writeRedisStandaloneResult — correct. All result types handled:

  • resultNilWriteNull()
  • resultErrorwriteRedisError(conn, res.err)
  • resultBulkWriteBulk(res.bulk)
  • resultStringWriteString(res.str)
  • resultArrayWriteArray(len) + WriteBulkString per element ✓ (flat-string-array constraint documented)
  • resultIntWriteInt64(res.integer)
  • default → WriteNull()

The guarded len(results) != 1 → WriteNull is the right safety valve — a standalone handler wrapping a single-mop EXEC result should always have exactly 1 element, and any misuse gets a null rather than a malformed reply.

redis_set_dedup_test.go — all four tests sound:

  • TestStandaloneSetDedup_LandedPriorAttempt_ReturnsOK: attempt 1 lands → pending stashed → reuse dispatch → maybeProbe fires → probeNoOps++ → cached resultString("OK") returned. WriteString("OK")conn.bulk = []byte("OK"). Dispatch and probe counts (2 / 1) correctly describe the two-attempt flow. ✓
  • TestStandaloneSetDedup_NXMissReturnsNil: NX miss → applySet returns resultNil without staging any write → prepared.elems empty → firstExecAttempt returns early with nextResults=[resultNil], nil, nil (no reuse, no dispatch retry) → writeRedisStandaloneResult calls WriteNull()wroteNull=true, bulk=nil. The wroteNull witness field (newly added to recordingConn) is the airtight check that WriteNull was actually called and not silently skipped. ✓
  • TestStandaloneSetDedup_GETOptionReturnsOldBulk: existing key "prior" → applySet with GET option stages the write and captures resultBulk("prior") → dispatch 1 lands then errors → reuse probe hits → cached resultBulk("prior") returned → WriteBulk([]byte("prior"))conn.bulk = []byte("prior"). New value "v1" in store. ✓
  • TestStandaloneSetDedup_DisabledKeepsLegacyPath: onePhaseTxnDedup=falsesetLegacy path → coord.probeNoOps == 0. ✓

recordingConn change in redis_retry_test.go — minimal and correct. Adding wroteNull bool + setting it in WriteNull() is a clean, non-breaking extension needed by the NX miss test.

Design doc — accurate. M2 open → LANDED with test reference; §M3 EXEC → LANDED with runTransactionWithDedup description; standalone SET → LANDED with TestStandaloneSetDedup_* reference; standalone INCR/HSET correctly noted as still-open follow-up. The deviation-note-2 update correctly reflects the Round-1 reuseCtx fix.


No new issues

The combined diff introduces no new HLC violations, no context threading issues, no nil-safety gaps. The self-inflicted-conflict guard in dispatchExecReuse (probe at the fresh commitTS after a WriteConflict) mirrors dispatchListPushReuse's established pattern and is correctly scoped to that function's locally-issued commitTS.

LGTM — combined diff is mergeable.
| Branch

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
adapter/redis.go (1)

2781-2809: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the outer EXEC context here.

prepareDispatch() starts a fresh timeout from handlerContext(), so the commit phase can run for another full redisDispatchTimeout after apply/validate have already consumed most of the caller's dispatchCtx budget. That also drops prompt caller cancellation during stream-deletion scans and Dispatch. Use t.ctx when it's set, and only fall back to handlerContext() for minimal test fixtures that leave txnContext.ctx nil.

Suggested change
-	ctx, cancel := context.WithTimeout(t.server.handlerContext(), redisDispatchTimeout)
+	parentCtx := t.ctx
+	if parentCtx == nil {
+		parentCtx = t.server.handlerContext()
+	}
+	ctx, cancel := context.WithTimeout(parentCtx, redisDispatchTimeout)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@adapter/redis.go` around lines 2781 - 2809, The code in prepareDispatch (the
block creating ctx, cancel via context.WithTimeout(t.server.handlerContext(),
redisDispatchTimeout)) should preserve an outer EXEC/dispatch context by using
t.ctx when present instead of always starting from t.server.handlerContext();
change the parent passed to context.WithTimeout to be t.ctx if t.ctx != nil,
falling back to t.server.handlerContext() only when t.ctx is nil (ensure cancel
is still returned and used as before); this keeps stream-deletion scans
(buildStreamDeletionElems) and the final Dispatch bounded by
redisDispatchTimeout while honoring caller cancellation.
🧹 Nitpick comments (1)
adapter/redis_set_dedup_test.go (1)

26-138: ⚡ Quick win

Prefer a table-driven structure for the SET dedup variants.

These four tests duplicate the same harness (store/coordinator/server/conn/invoke/assert baseline). Converting them into table-driven cases will make option growth (NX, XX, GET, gate on/off) easier to maintain and less error-prone.

♻️ Suggested refactor sketch
+func TestStandaloneSetDedup(t *testing.T) {
+	t.Parallel()
+	tests := []struct {
+		name string
+		dedup bool
+		args [][]byte
+		seed func(ctx context.Context, st store.MVCCStore)
+		assert func(t *testing.T, conn *recordingConn, st store.MVCCStore, coord *dedupTestCoordinator)
+	}{
+		// landed prior attempt -> OK
+		// NX miss -> WriteNull
+		// GET -> old bulk
+		// gate off -> legacy path (no probe)
+	}
+	for _, tc := range tests {
+		tc := tc
+		t.Run(tc.name, func(t *testing.T) {
+			t.Parallel()
+			ctx := context.Background()
+			st := store.NewMVCCStore()
+			if tc.seed != nil { tc.seed(ctx, st) }
+			coord := newDedupTestCoordinator(st, 1, true)
+			srv := &RedisServer{store: st, coordinator: coord, scriptCache: map[string]string{}, onePhaseTxnDedup: tc.dedup}
+			conn := &recordingConn{}
+			srv.set(conn, redcon.Command{Args: tc.args})
+			tc.assert(t, conn, st, coord)
+		})
+	}
+}

As per coding guidelines, **/*_test.go: "prefer table-driven test patterns/cases."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@adapter/redis_set_dedup_test.go` around lines 26 - 138, These four nearly
identical tests (TestStandaloneSetDedup_LandedPriorAttempt_ReturnsOK,
TestStandaloneSetDedup_NXMissReturnsNil,
TestStandaloneSetDedup_GETOptionReturnsOldBulk,
TestStandaloneSetDedup_DisabledKeepsLegacyPath) should be consolidated into a
table-driven test that iterates cases containing the command Args (e.g.,
[]byte{cmdSet, "k", "v1", optional flags}), any pre-seed actions (PutAt),
expected recordingConn results (bulk string vs nil, wroteNull bool, err empty),
and coordinator/gate expectations (probeNoOps, dispatches); in the loop
instantiate the common harness (store.NewMVCCStore, newDedupTestCoordinator,
RedisServer{...}, recordingConn), call srv.set(conn, cmd) and assert the
per-case expectations, so move shared setup into the single loop and keep
case-specific assertions in each case entry to remove the duplicated code paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@adapter/redis_exec_dedup_test.go`:
- Around line 116-119: The test currently asserts require.GreaterOrEqual(t,
coord.dispatches, 3) which is too permissive; change it to require.Equal(t, 3,
coord.dispatches) so the test enforces the exact expected retry sequence
(pre-reject, OCC conflict reuse, fresh-snapshot retry success), and keep the
existing require.Equal(t, 0, coord.probeNoOps, ...) check unchanged; locate the
assertions in adapter/redis_exec_dedup_test.go referencing coord.dispatches and
coord.probeNoOps and replace the GreaterOrEqual assertion with an exact equality
assertion.

In `@docs/design/2026_05_21_proposed_txn_secondary_idempotency.md`:
- Around line 481-482: Update the PR reference text for the runTransaction note
so it accurately reflects the landing PR: change the “LANDED via PR `#884`”
mention to reference PR `#887` (or both with `#887` listed as the landing PR) in the
documentation line that reads about runTransaction (MULTI/EXEC) — LANDED via PR
`#884`; ensure the adjacent description of runTransactionWithDedup mirroring
listPushCoreWithDedup remains unchanged.

---

Outside diff comments:
In `@adapter/redis.go`:
- Around line 2781-2809: The code in prepareDispatch (the block creating ctx,
cancel via context.WithTimeout(t.server.handlerContext(), redisDispatchTimeout))
should preserve an outer EXEC/dispatch context by using t.ctx when present
instead of always starting from t.server.handlerContext(); change the parent
passed to context.WithTimeout to be t.ctx if t.ctx != nil, falling back to
t.server.handlerContext() only when t.ctx is nil (ensure cancel is still
returned and used as before); this keeps stream-deletion scans
(buildStreamDeletionElems) and the final Dispatch bounded by
redisDispatchTimeout while honoring caller cancellation.

---

Nitpick comments:
In `@adapter/redis_set_dedup_test.go`:
- Around line 26-138: These four nearly identical tests
(TestStandaloneSetDedup_LandedPriorAttempt_ReturnsOK,
TestStandaloneSetDedup_NXMissReturnsNil,
TestStandaloneSetDedup_GETOptionReturnsOldBulk,
TestStandaloneSetDedup_DisabledKeepsLegacyPath) should be consolidated into a
table-driven test that iterates cases containing the command Args (e.g.,
[]byte{cmdSet, "k", "v1", optional flags}), any pre-seed actions (PutAt),
expected recordingConn results (bulk string vs nil, wroteNull bool, err empty),
and coordinator/gate expectations (probeNoOps, dispatches); in the loop
instantiate the common harness (store.NewMVCCStore, newDedupTestCoordinator,
RedisServer{...}, recordingConn), call srv.set(conn, cmd) and assert the
per-case expectations, so move shared setup into the single loop and keep
case-specific assertions in each case entry to remove the duplicated code paths.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c5daab0e-1591-4731-8bab-73a72fc03247

📥 Commits

Reviewing files that changed from the base of the PR and between 76e9151 and 4da85dc.

📒 Files selected for processing (6)
  • adapter/redis.go
  • adapter/redis_exec_dedup_test.go
  • adapter/redis_retry_test.go
  • adapter/redis_set_dedup_test.go
  • docs/design/2026_05_21_proposed_txn_secondary_idempotency.md
  • kv/fsm_onephase_dedup_test.go

Comment thread adapter/redis_exec_dedup_test.go Outdated
Comment thread docs/design/2026_05_21_proposed_txn_secondary_idempotency.md Outdated
coderabbitai (1) test (adapter/redis_exec_dedup_test.go:119):
  TestExecDedup_GenuineConflictRebuildsAndApplies asserted
  GreaterOrEqual(coord.dispatches, 3) but with the single injected
  pre-reject + single concurrent SET the retry topology is fully
  deterministic: attempt 1 (pre-reject), reuse (OCC-conflict), fresh
  retry (success). Tightened to Equal(3) so a regression that adds
  an extra dispatch is caught.

coderabbitai (2) doc (docs/design/...:481):
  "LANDED via PR #884" was ambiguous after the re-land on main via
  PR #887. Updated to "LANDED via PR #887 (originally PR #884,
  re-landed against main)" so future readers see the canonical
  landing PR with the lineage preserved.

No Go callers touched. go test ./adapter/ -run ExecDedup passes.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 30, 2026

@claude review

Round 2 fixes for coderabbitai 🟡 minors (commit 69db79b):

go test ./adapter/ -run ExecDedup passes. No Go callers touched.

@claude
Copy link
Copy Markdown

claude Bot commented May 30, 2026

Claude finished @bootjp's task in 2m 22s —— View job


PR #887 Round 2 Review — Verifying coderabbitai minor fixes (commit 69db79b)

Tasks:

  • Load tools
  • Read adapter/redis_exec_dedup_test.go — verify Equal(3) assertion
  • Read design doc — verify PR reference update
  • Check for any remaining open coderabbitai comments not yet addressed
  • Post final review

Round 2 fixes verified — both correct

Fix 1 — GreaterOrEqualEqual(3) in TestExecDedup_GenuineConflictRebuildsAndApplies

adapter/redis_exec_dedup_test.go:122:

require.Equal(t, 3, coord.dispatches)

The comment is expanded with a clear explanation of the deterministic retry topology. No GreaterOrEqual on dispatch counts remains anywhere in either dedup test file (redis_exec_dedup_test.go, redis_set_dedup_test.go).

Fix 2 — PR reference in design doc

docs/design/2026_05_21_proposed_txn_secondary_idempotency.md:481:

LANDED via PR #887 (originally PR #884, re-landed against main)

The lineage is preserved (#884 still credited) and the canonical landing PR (#887) is now unambiguous. Only one occurrence of the LANDED note for runTransaction; correct.


One remaining open item (informational, not blocking this PR)

The coderabbitai "outside diff range" comment on prepareDispatch() (line 2786) — using handlerContext() as parent context instead of t.ctx — was not included in the round 2 targets, and it doesn't block merge. Summary for the record:

prepareDispatch() creates ctx from r.server.handlerContext(), which ignores the already-bounded t.ctx (= dispatchCtx). This means stream-deletion scans + Dispatch in the prepare phase can outlive a cancelled dispatchCtx by up to a full redisDispatchTimeout. The fix is symmetric with the reuseCtx correction from round 1:

parentCtx := t.ctx
if parentCtx == nil {
    parentCtx = t.server.handlerContext()
}
ctx, cancel := context.WithTimeout(parentCtx, redisDispatchTimeout)

The nil guard is needed because the test harness constructs txnContext without setting ctx. Recommend tracking this in a follow-up PR (same batch as the dispatchListPushReuse / listPushCoreWithDedup Next()NextFenced() cleanup).

The table-driven refactor suggestion for redis_set_dedup_test.go (coderabbitai nitpick) is a style preference and not blocking.


Summary

Both round 2 fixes are correctly implemented. Combined diff against main is clean. LGTM — ready to merge.
| Branch

@bootjp bootjp merged commit 3fc9786 into main May 30, 2026
9 checks passed
@bootjp bootjp deleted the feat/txn-dedup-docs-and-tests branch May 30, 2026 08:55
bootjp added a commit that referenced this pull request May 30, 2026
…t to fix CI flake (#890)

## Summary

Fixes the CI flake observed in [Test workflow run
26678774684](https://github.com/bootjp/elastickv/actions/runs/26678774684):

```
--- FAIL: TestSQSServer_Throttle_SetQueueAttributesInvalidatesBucket (1.79s)
    sqs_throttle_integration_test.go:170: expected throttle, got 200
```

## Root cause

Identical race to the one fixed for
`TestSQSServer_Throttle_NoOpSetQueueAttributesPreservesBucket` in commit
[54c6cd5](54c6cd56) (PR #819
follow-up): the 1-token-per-second refill rate races the test's own wall
clock under `-race` on slow CI runners.

For **this** test:
1. `mustSetQueueAttributes(Capacity=10, Refill=1)`
2. `for range 10 { send }` — drains the bucket
3. Sanity send — expects HTTP 400 (throttle)

Each send goes through Raft propose+apply at ~100-250ms under `-race`.
The 11 writes from steps 1-3 elapse ~1.1-2.75s. At Refill=1/sec the
bucket has accumulated ≥1 token by step 3, so the sanity send returns
HTTP 200 instead of 400 — **falsely** indicating a bucket-invalidation
regression that does not exist.

## Fix

Drop the initial Refill from `"1"` to `"0.01"` (1 token per 100 seconds)
so no test-window wall-clock can accumulate to a whole token. The test's
intent — *verify that a Capacity/Refill **raise** invalidates the cached
bucket on the very next request* — is independent of the **initial**
refill rate. The post-set assertion at line 182 is exercised against the
fresh `Capacity=20/Refill=20` bucket, which is what the test actually
claims to pin.

## Why only this test, not the sibling

`TestSQSServer_Throttle_DeleteQueueInvalidatesBucket` has the same
`Refill=1` initial config but **no post-drain sanity assertion** — it
just drains without status checks, then verifies fresh capacity
post-recreate. No race window there.

## Relation to other open PRs

Unrelated to PRs #887/#888/#889 (option-2 dedup work). The flake
surfaced on PR #889's CI run but the fix lives in pre-existing SQS test
code that none of the dedup PRs touch.

## Caller audit (per /loop semantic-change rule)

Test-only change. The throttle config validator (`sqs_catalog.go:163`)
accepts fractional `float64 SendRefillPerSecond`; `0.01` is non-zero so
`IsEmpty` (line 172) returns `false` and throttling stays enabled — the
test still exercises the throttle path. Matches the prior fix's
caller-audit conclusion verbatim.

## Validation

- `go test ./adapter/ -run TestSQSServer_Throttle -race -count=3
-timeout 120s` passes (5.3s wall, all three iterations green)
- `gofmt`, `go vet`, `golangci-lint run` all clean


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Tests**
* Updated test configuration and expanded documentation to improve test
stability on slow CI environments.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/bootjp/elastickv/pull/890?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
bootjp added a commit that referenced this pull request Jun 2, 2026
claude[bot] PR #901 minor #2 (CLAUDE.md convention): inline comments
should state the underlying reason, not reference issue/PR numbers
that rot as the codebase evolves. Dropped "PR #887" / "PR #887
round 1" / "PR #887 verdict" / "PR #887 review" attributions from
the three doc blocks. The WHY content is kept (ctx-parent symmetry,
HLC-4 fence rationale, ErrCeilingExpired non-retryable shape).

The PR description is the durable home for the flake-fix lineage
and the related PR refs.

No behavior change. Comment-only.
bootjp added a commit that referenced this pull request Jun 2, 2026
## Summary

Picks up the two follow-up items noted in claude[bot]'s PR #887 verdict
but deferred from that PR:

### (a) `prepareDispatch()` ctx-parent symmetric fix

`prepareDispatch()` took its bounded ctx from
`t.server.handlerContext()` — the server-lifetime ctx, **not** the
caller's `dispatchCtx`. The `reuseCtx` fix on PR #887 round 1 corrected
`runTransactionWithDedup` to derive from `dispatchCtx`; this PR applies
the symmetric fix to `prepareDispatch` so a disconnected client /
`retryRedisWrite` timeout interrupts the prepare+dispatch promptly.

Nil-guard falls back to `handlerContext` for test fixtures that
construct a `txnContext` without setting `ctx`.

### (b) Last two `Clock().Next()` callers in adapter/redis.go →
`NextFenced()`

`dispatchListPushReuse` and the `listPushCoreWithDedup` first-attempt
site both shipped before the HLC-4 physical-ceiling migration and were
missed in that pass. `dispatchExecReuse` already uses `NextFenced` (PR
#887 round 1) and so do all first-attempt `commit_ts` allocations
through `prepareDispatch`.

These two list-push dedup sites were the **last** persistence-grade
`Clock().Next()` callers in `adapter/redis.go` — verified by `grep
'\.Clock()\.Next\b' adapter/redis.go` now returning only `NextFenced`
variants.

Without HLC-4 parity here, a stale-leader window could let the list-push
dedup path mint a `commit_ts` that collides with the successor's
freshly-fenced range — defeating the same anomaly class option-2 was
introduced to prevent.

## Caller audit (per /loop semantic-change rule)

- `prepareDispatch`: sole callers are `commit()` and `firstExecAttempt`
(both in adapter/redis.go). Ctx-parent change is additive; same
defer-cancel discipline, same error mapping.
- `dispatchListPushReuse`: sole caller `listPushCoreWithDedup`.
NextFenced error wired through existing `(newLen, drop, err)` tuple;
non-retryable so `retryRedisWrite` surfaces it directly — matches PR
#887 round-1's `dispatchExecReuse` NextFenced wiring.
- `listPushCoreWithDedup` first-attempt: surfaces NextFenced error
through existing `retryRedisWrite` closure return.

## Validation

- `go test ./adapter/ -run 'ListPushDedup|ExecDedup|StandaloneSetDedup'
-race -count=2` → ok 1.2s
- `gofmt` + `go vet` + `golangci-lint run` → 0 issues

## Scope

Closes the two follow-up items explicitly noted in PR #887's round-2
verdict ("Note on pre-existing Clock().Next() callers" +
`prepareDispatch` ctx deviation). No new functionality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant