Skip to content

docs(design): proposed txn secondary-commit idempotency (Jepsen 26198185540)#796

Merged
bootjp merged 19 commits into
mainfrom
docs/txn-idempotency-design
May 30, 2026
Merged

docs(design): proposed txn secondary-commit idempotency (Jepsen 26198185540)#796
bootjp merged 19 commits into
mainfrom
docs/txn-idempotency-design

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented May 21, 2026

Summary

Design doc for fixing the :duplicate-elements anomaly surfaced by scheduled Jepsen run 26198185540.

What's in the doc

  • Root cause walkthrough: a single :append 14 230 from worker 3 produced TWO storage entries with value 230, immediately visible in the next read. Same-commit_ts replay is dedup'd by content-addressed list-delta keys; different-commit_ts replay (regenerated by runTransaction's retryRedisWrite on partial-commit) is NOT, because RPUSH/LPUSH derive their seq from a fresh meta read at attempt time.
  • TiKV reference: Percolator's three layers — content-addressed writes, CheckTxnStatus before retry, LockResolver — and which one elastickv is missing.
  • Proposed fix: server-side per-txn applied marker keyed by adapter-allocated UUIDv7, written atomically with the commit mutations and checked at FSM apply entry. Optional client-side commit-status check (TiKV-style CheckTxnStatus) as a follow-up.
  • Six implementation milestones with scope estimates.
  • Caller audit table covering every commit-request entry point.
  • Four risks with mitigations.

Why now / why this layer

  • PR fix(redis): NOTLEADER prefix for gRPC-wrapped leader errors (closes Jepsen run 26035515694 crash) #789 (NOTLEADER prefix) unmasked this anomaly — workers no longer crash, so the :duplicate-elements is now visible to the checker rather than being silently classified as :info.
  • A pure client-side fix is not viable: the trigger spans runTransaction's retry loop AND the gRPC Forward redirect, and no client-supplied txn ID exists. Server-side marker is the natural enforcement point.
  • Existing applyCommitWithIdempotencyFallback only catches same-commit_ts replay; the regenerated-input replay shape needs a logical-txn-identity marker.

Companion PR

PR #795 ("upload full demo cluster log on Jepsen failure") is open in parallel. The design here is independent of which exact code path produces the partial-commit (that's what PR #795 will let us pinpoint); the fix works for any such path.

Status

Open for review. Implementation milestones (M1–M6) follow once this lands.

Summary by CodeRabbit

  • New Features

    • Opt-in one‑phase transaction idempotency to avoid duplicate list elements and return stable list lengths on retryed pushes (legacy behavior unchanged when disabled).
  • Tests

    • Comprehensive test suites validating dedup behavior, probe semantics, edge cases and fallback paths across stores and FSM/adapter flows.
  • Documentation

    • Updated design doc describing the transactional idempotency approach and implementation status.

Review Change Stack

…6198185540)

Triggered by the scheduled Jepsen run 26198185540 surfacing a
real :duplicate-elements anomaly on the Redis list-append
workload. The duplicate is a single client-level :append being
applied twice in storage with DIFFERENT commit_ts: the EXEC
body's retry loop (runTransaction's retryRedisWrite) regenerates
startTS / commitTS / list index on each iteration, so content-
addressing — which catches identical-input replay — does not
catch the regenerated-input retry-after-partial-success shape.

Design proposes a server-side per-txn applied marker keyed by an
adapter-allocated UUIDv7, written atomically with the commit
mutations and checked at FSM apply entry. TiKV's
CheckTxnStatus + content-addressed Write CF design is the
reference; this proposal lifts the same correctness guarantee
into the FSM layer because elastickv's RPUSH/LPUSH-derived
storage keys are NOT pure content-addressed (the seq number
depends on a meta read at attempt time).

Doc covers:
  - Why the same-commit_ts replay does NOT duplicate
    (content-addressed delta keys)
  - Why the different-commit_ts replay DOES duplicate
    (runTransaction's EXEC-body retry regenerates everything)
  - Why applyCommitWithIdempotencyFallback does NOT catch this
    (works only on identical-input replay)
  - How TiKV's three layers (content-addressed writes +
    CheckTxnStatus + LockResolver) compose, and which one
    elastickv is missing
  - Server-side marker design (key prefix, GC TTL, batch atomicity)
  - Optional client-side commit-status check
  - Six implementation milestones (M1 proto/UUID, M2 FSM marker,
    M3 adapter sentinel handling, M4 GC, M5 optional check,
    M6 Jepsen validation)
  - Caller audit table covering all commit-request entry points
  - Four risks (GC timing, proto overhead, reconstruction
    intricacy, LockResolver interaction)

The doc is independent of PR #795 (full Jepsen log artifact);
the fix design works for any partial-commit code path that
might be the actual trigger.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

Warning

Review limit reached

@bootjp, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 21 minutes. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a6b7093-8270-4bcf-bb0e-52f5d6494699

📥 Commits

Reviewing files that changed from the base of the PR and between 624579c and 04876c0.

📒 Files selected for processing (10)
  • adapter/redis.go
  • adapter/redis_list_dedup_test.go
  • docs/design/2026_05_21_proposed_txn_secondary_idempotency.md
  • kv/fsm.go
  • kv/leader_routed_store.go
  • kv/shard_store.go
  • kv/sharded_coordinator.go
  • kv/txn_errors.go
  • store/lsm_store.go
  • store/mvcc_store.go
📝 Walkthrough

Walkthrough

This PR adds opt-in one-phase transaction idempotency dedup: store-level exact-commit probes, TxnMeta PrevCommitTS, coordinator threading, FSM probe-based no-op, and adapter list-push write-set reuse plus tests.

Changes

One-Phase Idempotency Dedup with PrevCommitTS

Layer / File(s) Summary
Design document and rationale
docs/design/2026_05_21_proposed_txn_secondary_idempotency.md
Updates the design doc with landed components, gating, milestones M1–M4, correctness argument, and remaining risks and work.
Store probe: MVCC CommittedVersionAt and tests
store/store.go, store/mvcc_store.go, store/lsm_store.go, store/committed_version_at_test.go
Adds MVCCStore.CommittedVersionAt and implements it for mvccStore and pebbleStore; tests cover exact-match, missing-key, tombstone, strict-exactness, and key isolation.
TxnMeta V2 + PrevCommitTS and OpGroup field
kv/txn_codec.go, kv/txn_codec_test.go, kv/transcoder.go
Extends TxnMeta with PrevCommitTS, adds txnMetaFlagPrevCommitTS, emits v2 only when PrevCommitTS != 0, updates decoding, adds OperationGroup.PrevCommitTS, and tests v1/v2 selection and round-trip.
Coordinator routing and onePhase request construction
kv/coordinator.go, kv/sharded_coordinator.go, kv/leader_routed_store.go, kv/shard_store.go, kv/coordinator_txn_test.go
Threads PrevCommitTS through dispatch paths, adds onePhaseTxnRequestWithPrevCommit which encodes PrevCommitTS into txn meta, forwards PrevCommitTS on redirects, updates tests for new signatures, and exposes CommittedVersionAt delegation in store wrappers.
FSM one-phase dedup probe and tests
kv/fsm.go, kv/fsm_onephase_dedup_test.go
Implements dedup probe in handleOnePhaseTxnRequest: when PrevCommitTS != 0 probe CommittedVersionAt(primaryKey, PrevCommitTS) and no-op on hit; adds tests for no-op, apply, and probe-skipped paths.
Adapter list-push write-set reuse and dedup tests
adapter/redis.go, adapter/redis_list_dedup_test.go
Adds WithOnePhaseTxnDedup option and env toggle, implements listPushCoreWithDedup and reusable write-set retry with PrevCommitTS probe handling, and expands tests for landed prior attempt, prior-not-landed apply, genuine conflict recompute, intervening RPOP/LPUSH, and gated legacy behavior.

Sequence Diagram

sequenceDiagram
  participant Client
  participant Adapter as RedisAdapter
  participant Coordinator
  participant FSM
  participant Store
  Client->>Adapter: RPUSH attempt 1
  Adapter->>Coordinator: Dispatch(startTS)
  Coordinator->>FSM: One-phase request (commitTS=T1, prevCommitTS=0)
  FSM->>Store: Apply mutations -> commit at T1
  FSM-->>Coordinator: Committed
  Adapter-->>Client: Retry signalled
  Adapter->>Adapter: Capture write set from attempt 1
  Adapter->>Coordinator: Dispatch(retry, prevCommitTS=T1)
  Coordinator->>FSM: One-phase request (commitTS=T2, prevCommitTS=T1)
  FSM->>Store: CommittedVersionAt(primaryKey, T1)?
  alt Found
    FSM-->>Coordinator: no-op (nil)
    Adapter->>Adapter: Return stored client-visible length
    Adapter-->>Client: Result (no duplicate)
  else Not found
    FSM->>Store: Apply mutations at T2
    FSM-->>Coordinator: Committed
    Adapter-->>Client: Result (applied)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • bootjp/elastickv#618: Overlaps coordinator dispatch/retry logic and related dispatch paths; may touch similar coordinator code paths.

Poem

I hopped through logs and timestamped hay,
Probed the past to keep duplicates away—
Reused the write with a wiser TS,
If prior landed, I skip the mess,
A rabbit's dedup dance, safe and play. 🐰

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title describes the design document for transaction secondary-commit idempotency addressing a Jepsen issue, which aligns with the primary deliverable (the design doc file) updated in this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/txn-idempotency-design

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This design document proposes a server-side transaction-applied marker and an optional client-side check to ensure exactly-once application of multi-shard transaction commits, addressing a duplicate-element anomaly found during Jepsen testing. The proposed solution involves using a UUIDv7 as a transaction identifier to detect and handle retries that might otherwise result in duplicate writes. Review feedback highlights the need for the new error sentinel to include the committed timestamp to aid result reconstruction, suggests optimizing the marker garbage collection by leveraging the time-ordered nature of UUIDv7 with DeleteRange, and warns about the operational risks of adding synchronous reads to the secondary commit retry path.

// Primary already committed. If our secondary's marker is
// present, this is a true no-op — return success. Otherwise
// retry the secondary commit (it really did fail).
if secondary_marker_present(adapter_txn_id) { return Ok }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The secondary_marker_present check in milestone M5 introduces a significant operational risk: a synchronous read from the secondary shard before every retry. If the secondary shard is experiencing issues (e.g., leader election or network partition), this check will likely fail or time out, delaying the retry loop and impacting availability. As per repository guidelines for operational risks, the design should detail mitigation strategies; a 'blind' retry might be more robust as it avoids the added latency and dependency in the failure path.

References
  1. When a design document identifies a significant operational risk, it must also detail potential mitigation strategies to ensure system stability.

2. If present and the recorded commit_ts equals the request's commit_ts:
no-op (return nil). The mutations would be a duplicate.
3. If present and the recorded commit_ts **differs**: return
`ErrTxnAlreadyCommittedAtDifferentTS` (a new sentinel). This catches the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To facilitate the result reconstruction described in milestone M3, the ErrTxnAlreadyCommittedAtDifferentTS sentinel should carry the recorded commit_ts in its response payload. Since the adapter regenerates startTS on every retry (as noted in the walkthrough), it will not know the original startTS used by the successful attempt and thus cannot look up the primary's commit record via txnCommitKey(primary_key, original_start_ts). Returning the commit_ts directly from the FSM marker eliminates this lookup dependency and supports smoother rolling upgrades by reducing the need for consistent state lookups across different node versions.

References
  1. Design documents should address operational risks such as rolling upgrades by detailing strategies that ensure compatibility across different node versions.


A conservative TTL of `2 × (lockResolverInterval + defaultTxnLockTTLms)`
= 80 s covers both. Cleanup runs in `kv/compactor.go`'s existing periodic
sweep — add a new prefix scan that drops markers whose `applied_at` is
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since adapter_txn_id is a UUIDv7, it is naturally time-ordered (the most significant 48 bits contain a millisecond timestamp). Instead of a prefix scan that reads every value to check applied_at_wall_time, the compactor can use a more efficient DeleteRange operation. By calculating the UUIDv7 prefix corresponding to the TTL cutoff, you can delete all expired markers in a single atomic operation without the O(N) read overhead during the periodic sweep.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 24, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 24, 2026

Claude finished @bootjp's task in 3m 40s —— View job


Design Review: Transaction Secondary-Commit Idempotency

  • Load tools
  • Read design document + relevant code
  • Analyze design for correctness, completeness, and risks
  • Post review feedback

Summary

The root cause analysis is accurate and the proposed server-side marker is the right enforcement layer. The design is sound for the primary-commit case. The issues below range from a hard correctness gap in M3 to a missing backward-compatibility section, with several areas worth clarifying before implementation starts.


Critical

1. ErrTxnAlreadyCommittedAtDifferentTS must carry commit_ts — M3 cannot reconstruct without it

M3's adapter recovery reads txnCommitKey(primary_key, start_ts) to find the actual commit_ts. But at that point start_ts is the current retry's regenerated startTS — not the original attempt's startTS that was recorded in the primary's commit record. The original startTS is gone because runTransaction regenerates it on each iteration.

The FSM already has the committed commit_ts in the marker (__txn_applied__<id> → (commit_ts, applied_at)). The error sentinel should carry it directly:

type ErrTxnAlreadyCommittedAtDifferentTS struct {
    RecordedCommitTS uint64
}

The adapter can then go directly to the actual state via resolveListMeta at RecordedCommitTS, bypassing the txnCommitKey lookup entirely. Without this, M3's result reconstruction has no reliable anchor. (Gemini flagged this too — worth being explicit that the fix is on the FSM side, not the error-type naming.)

Fix this →


2. Rolling-upgrade compatibility: zero/absent adapter_txn_id case is unspecified

During a rolling upgrade, nodes on old code will send AdapterTxnId = "" (proto default). The FSM on new-code nodes must skip the marker check when the field is absent/zero — otherwise it would write a marker keyed by the zero UUID, and the first zero-UUID request would block all subsequent zero-UUID requests from older nodes with "already committed."

This must be explicit in M1/M2: if AdapterTxnId is zero/empty, bypass both the marker read and the marker write. The marker only engages once the adapter has actually started sending UUIDs.

Fix this →


Important

3. Multi-shard retry safety — the design is correct but the reasoning needs to be made explicit

The caller audit doesn't explain WHY a Attempt 2 can't partially commit on secondary shards before the primary check fires. The safety actually flows from the coordinator structure: commitPrimaryTxn runs before commitSecondaryTxns. If commitPrimaryTxn returns ErrTxnAlreadyCommittedAtDifferentTS (because the primary shard already has the marker), commitSecondaryTxns is never called. Attempt 2's mutations therefore never reach secondary shards.

This is the key invariant that makes the multi-shard case safe. It's worth one sentence in the design explicitly linking the marker check to the primary-first commit ordering.


4. FSM hot-path read overhead missing from risks

R2 only calls out the 16-byte proto field. Every commit now also requires a Pebble point read (__txn_applied__<id>) on the FSM apply path. Under normal operation (no retries) this read always misses but still hits the Pebble read path. At 1000 txn/s, this is 1000 extra point reads/s through the FSM pipeline, which is synchronous and blocks the apply loop.

This is probably acceptable (Pebble point reads from block cache are sub-microsecond), but it should be in the risks section so reviewers of M2 can benchmark against it and decide if a bloom filter or skip-list layer helps.


5. GC — replace prefix scan with DeleteRange via UUIDv7 cutoff (addresses Gemini's medium finding)

The current M4 description calls for a prefix scan that reads every marker value to check applied_at_wall_time. Since adapter_txn_id is UUIDv7, its top 48 bits are a millisecond timestamp. The compactor can compute the UUIDv7 value corresponding to now - TTL and issue a single DeleteRange(__txn_applied__<min_uuid>, __txn_applied__<cutoff_uuid>). This avoids the O(N) read overhead entirely.

The UUIDv7 cutoff calculation is straightforward: shift the cutoff epoch-ms into bits [127:80] of a 128-bit value, zero the rest. Any UUIDv7 with a timestamp before the cutoff will compare less than this value under lexicographic byte order.


6. M5 availability risk — needs an explicit mitigation (addresses Gemini's high-priority finding)

The synchronous secondary-marker read in M5 runs on the already-degraded retry path. If the secondary shard is mid-leader-election, this read adds unbounded latency before the retry loop can proceed. Given that M5 is labeled "optional" the risk is bounded, but the doc should note: when implementing M5, gate the marker read on a short deadline (e.g. 200ms) so a slow shard read degrades to "skip the optimisation, let the FSM marker catch it" rather than blocking the retry indefinitely.


Minor

7. Caller audit — LockResolver row needs a clarifying split

The current row conflates two LockResolver sub-cases:

LockResolver case Marker state Correct outcome
Secondary was never applied (common: primary committed, secondary commit failed) Absent on secondary shard Apply normally → write marker
Secondary was already applied (rare: secondary committed but LockResolver re-fires) Present, same commit_ts No-op ✓

Both paths are handled correctly by the design, but spelling them out prevents the "same (start_ts, commit_ts)" reasoning from looking circular to implementers who will touch kv/lock_resolver.go.


8. TTL lower bound — check against gRPC deadline

The TTL formula 2 × (lockResolverInterval + defaultTxnLockTTLms) = 80s assumes the retry window is bounded by the LockResolver cycle. But redisDispatchTimeout (the per-Redis-command context deadline) is the tighter bound for retryRedisWrite. If redisDispatchTimeout > 80s, a retry can arrive after the marker is GC'd. The doc should verify (and state) that redisDispatchTimeout < TTL as a precondition, or derive the TTL from redisDispatchTimeout directly. If redisDispatchTimeout is configurable, add a startup assertion that markerTTL > redisDispatchTimeout.


Strengths worth noting

  • The root cause walkthrough — tracing the exact key paths through listItemKey and ListMetaDeltaKey — is exactly the level of specificity the implementer will need. Strongly prefer this over hand-waving.
  • Explicitly ruling out retryRedisWrite removal (§"Why not just remove...") is the right call and the explanation about OCC contention + Jepsen :fail classification is correct.
  • Writing the marker atomically in the same Pebble batch as the mutations is the correct correctness requirement — this is the single sentence in the design that bears the most implementation weight.
  • The UUIDv7 choice is apt: GC by time prefix, no coordination needed to issue IDs, globally unique for the retry-window lifetime.

Overall: approve the design direction, suggest addressing issues 1 and 2 before merging (M3 is blocked on 1; rolling upgrades are blocked on 2), and fold 5 and 8 into M4's scope.

bootjp added 2 commits May 26, 2026 02:36
…26340084100)

The full demo-cluster log from scheduled run 26340084100 (first
failing run after PR #795 landed the full-log artifact) positively
identifies the duplicate trigger.

Findings:
- Run reproduced the same :duplicate-elements ({153 2} on key 17,
  154/155 lost) plus a downstream :G-single-item-realtime cycle
  (the append-checker's report of the corrupted version order the
  duplicate causes — not an independent fault).
- elastickv-demo.log shows 20+ leadership changes in the ~3-min
  workload with repeated 'found conflict at index N' log-tail
  truncations.
- At 18:21:48 (exact instant append 17 153 was processed) a 3-way
  vote storm was in flight: two peers at index 888 campaigning
  while the incumbent leader sat at index 890 with a valid lease.
  The two-entry divergence is the ambiguous-commit window.

Confirms the doc's core hypothesis (retryRedisWrite regenerating
commitTS defeats content-addressing) and pins the trigger to
leadership churn rather than commitSecondaryWithRetry or the
LockResolver specifically. The server-side UUID marker (M1-M4)
fixes it regardless of which raft event produced the ambiguous
commit.

Also recorded the CI CPU-starvation amplifier (single-process
3-node demo + JVM on a 2-core runner → heartbeat delay → election
storms) as a separate non-blocking follow-up, and added a resolved
open question confirming the marker must also cover single-key
Phase_NONE writes (listPushCore's own retryRedisWrite regenerates
commitTS too).
…ejected

Per review discussion: the observed :duplicate-elements is always an
internal retry within a single retryRedisWrite loop, which TiKV's
CheckTxnStatus pattern fully covers — check whether the previous
attempt committed before re-running, and if so return its result
instead of re-appending.

This is lighter than the original UUID + FSM-marker + GC design:
adapter-only, no proto wire field, no FSM dedup table, no periodic GC.
The previous-attempt identity (startTS/commitTS/primaryKey) is held in
a local across the retry loop; 'did it commit?' is answered by probing
the primary txnCommitKey (multi-shard) or the prior attempt's own data
version at commitTS_prev (single-shard one-phase, which writes no
commit record).

Rewrote:
- Title/intro + 'How TiKV solves this' closing — name CheckTxnStatus as
  the missing layer (2) this design adds; note that RPUSH's
  data-dependent index is what defeats content-addressing (layer 1).
- 'Proposed design' — CheckTxnStatus mechanism, why-UUID-rejected
  subsection, where the check lives (3 retry sites), caller audit
  showing the FSM apply path is UNCHANGED.
- Milestones M1-M4 — identity plumbing, probe helpers, wire into 3
  sites, validation. ~250 LOC, no proto/FSM/GC (was ~600 LOC + proto).
- Risks R1-R4 — result reconstruction, probe read cost, false-negative
  under deep churn (mitigation = the rejected marker design, kept in
  git history), single-shard no-commit-record handling.
- Open questions + decision log updated.

The marker design is preserved in this file's git history should R3
(probe false-negative under deep churn) ever be observed.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa41e9fff6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +450 to +453
LockResolver specifically. The server-side marker (M1–M4) is the
correct fix because it deduplicates **regardless of which raft event
produced the ambiguous commit** — it keys on the adapter-allocated
txn UUID, which is stable across both the `retryRedisWrite`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove stale marker-based fix claim from resolved section

This paragraph says the "server-side marker" with an adapter txn UUID is the correct fix, but the same document’s stated design is a CheckTxnStatus probe and explicitly rejects the UUID/FSM marker approach. That contradiction can send implementation work down the wrong path (proto/FSM marker work the doc says to avoid), so this section should be updated to match the selected CheckTxnStatus design.

Useful? React with 👍 / 👎.

bootjp added 2 commits May 26, 2026 12:11
…y, option (a) fallback

CheckTxnStatus-as-primary is retracted: the cluster is single-group, so
every txn takes the one-phase path, which writes no lock and no commit
record — CheckTxnStatus has nothing to read. Rewrite with option 2
(reuse the failed attempt's write set, dedup on an exact-commit_ts probe
at one-phase apply) as primary and option (a) (give one-phase a lock +
commit record so CheckTxnStatus applies) as the documented fallback.

Add the central correctness argument head-on: new data is written only
at a fresh monotonic commit_ts (SI preserved), the stale commit_ts is
read-only in an exact-ts probe (dedup identity), and Raft log order makes
the probe race-free (E1 applies before E2 or never). Because the argument
holds, option 2 is primary.
…propose/apply path

The race-freedom premise (attempt-1's entry applies before the retry's or
never, never after) is confirmed: txn dispatch proposes with
context.Background() and blocks until apply or until the proposal is
failed-and-drained on a Leader->non-Leader transition (failPending);
a drained proposal is never re-proposed, and fsm.Apply runs at the
entry's own (lower) log index. Option 2 is cleared to implement.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8f2b800a9c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +594 to +598
- `applyRequests` (`kv/transaction.go:152`) proposes with
`context.Background()`, so `Engine.Propose`
(`internal/raftengine/etcd/engine.go:606`) blocks until the entry
applies or the proposal is *failed* — it never abandons on a caller
timeout (the `ctx.Done()` branch is dead under Background).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Re-validate ordering proof with cancellable txn context

This verified-linchpin section assumes applyRequests proposes with context.Background() and therefore cannot return on caller timeout, but the current single-shard txn path passes the caller context through (dispatchSingleShardTxnTxn.CommitapplyRequests(ctx, ...)), and Engine.Propose explicitly has a ctx.Done() early-return path. That means the "dispatch returns only after apply or abandonment" premise is not established by this evidence, so the race-freedom argument for option 2 is currently built on an incorrect runtime assumption and should be re-proven (or constrained) for deadline/cancellation scenarios.

Useful? React with 👍 / 👎.

…ncy M2a)

Option 2's one-phase idempotency dedup needs to ask 'did the previous
attempt — which committed at exactly prev_commit_ts — land?'. Add
CommittedVersionAt(ctx,key,commitTS) to MVCCStore: an exact-timestamp
existence check (point lookup), distinct from GetAt's newest-version-<=ts
scan. Because commit timestamps are unique and strictly monotonic
(HLC Clock().Next()), a version at an exact commitTS on a key can only
be the txn assigned that timestamp, so an exact hit unambiguously
identifies that attempt — unlike the loose latestTS>=commitTS check in
applyCommitWithIdempotencyFallback, which would misclassify a different
txn's later commit as our own.

- pebbleStore: point Get on encodeKey(key, commitTS); a tombstone counts
  as landed (the attempt committed a delete), so the value is not decoded.
- mvccStore: binary search the TS-ascending version slice for an exact hit.
- ShardStore / LeaderRoutedStore: delegate to the local group store; this
  is an apply-local probe, not a leader-routed client read (no fence/proxy).

Tests run both store implementations through a shared suite: exact hit,
absent-key miss, exactness (a version at 300 must NOT satisfy a probe at
200), tombstone-counts-as-landed, and cross-key isolation.

Refs docs/design/2026_05_21_proposed_txn_secondary_idempotency.md (M2).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c1f282336d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread store/lsm_store.go
Comment on lines +774 to +778
func (s *pebbleStore) CommittedVersionAt(_ context.Context, key []byte, commitTS uint64) (bool, error) {
s.dbMu.RLock()
defer s.dbMu.RUnlock()

_, closer, err := s.db.Get(encodeKey(key, commitTS))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return compaction error for stale exact-ts probes

This exact-timestamp probe bypasses the retention watermark check that GetAt/ExistsAt enforce, so when commitTS is below minRetainedTS it returns (false, nil) instead of ErrReadTSCompacted. In the idempotency-retry flow this makes "history compacted" indistinguishable from "previous attempt never landed", which can incorrectly trigger re-application after long delays or aggressive compaction. The method should apply the same readTSCompacted guard as other timestamped reads.

Useful? React with 👍 / 👎.

Comment thread store/mvcc_store.go
Comment on lines +472 to +476
func (s *mvccStore) CommittedVersionAt(_ context.Context, key []byte, commitTS uint64) (bool, error) {
s.mtx.RLock()
defer s.mtx.RUnlock()

v, ok := s.tree.Get(key)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce retention guard in in-memory exact-ts lookup

The in-memory implementation has the same behavior gap: it reads historical versions directly without checking minRetainedTS, so probes for compacted timestamps silently return false rather than ErrReadTSCompacted. That breaks parity with the Pebble backend and can hide compacted-history conditions from callers that rely on exact-ts probes for dedup decisions.

Useful? React with 👍 / 👎.

bootjp added 2 commits May 26, 2026 16:48
Carry prev_commit_ts (a failed prior attempt's commit timestamp) into the
single-shard one-phase apply and no-op the apply when that attempt already
landed — preventing the :duplicate-elements anomaly without a commit
record, lock, or extra Raft round (option 2).

Wire format: prev_commit_ts rides in the existing TxnMeta blob as a new
PrevCommitTS field + V2 flag 0x04 — NO proto change. EncodeTxnMeta emits
V2 only when PrevCommitTS != 0, so every non-retry caller keeps the V1
default wire format. handleOnePhaseTxnRequest probes
CommittedVersionAt(primaryKey, prev_commit_ts) and returns nil (no-op the
whole apply) on an exact hit.

Determinism (R5): the probe changes the apply outcome, so it must be
computed identically on every node. It is — the decision depends only on
the replicated prev_commit_ts and the deterministic exact-ts probe at the
entry's apply point. A pre-flag node fails loud (unknown-flag rejection)
rather than diverging; emission will be gated off by default (M3) until the
whole cluster is probe-aware (ship-the-reader-before-the-writer).

Tests: codec round-trips PrevCommitTS (V2) and keeps V1 when unset; FSM
dedup no-ops when the prior attempt landed at exactly T1 (no version at the
fresh T2, newest stays T1), applies at T2 when it did not land, and skips
the probe entirely when prev_commit_ts == 0.

Refs docs/design/2026_05_21_proposed_txn_secondary_idempotency.md (M2).
… idempotency M1+M3)

Thread prev_commit_ts from the adapter to the one-phase apply and make
list-push retries reuse the failed attempt's write set instead of
recomputing seq — the recompute is what duplicates the element under
leadership churn.

- kv.OperationGroup gains PrevCommitTS, threaded through
  ShardedCoordinator.Dispatch -> dispatchTxn -> dispatchSingleShardTxn ->
  onePhaseTxnRequestWithPrevCommit.
- listPushCore branches on RedisServer.onePhaseTxnDedup (default OFF;
  WithOnePhaseTxnDedup / ELASTICKV_REDIS_ONEPHASE_DEDUP). When on,
  listPushCoreWithDedup tracks a reusableListPush and, on a retryable
  error, reuses the exact write set under a fresh commit_ts with
  PrevCommitTS set so the FSM no-ops if the prior attempt landed. A
  WriteConflict from a reuse means the probe ruled out our own attempt and
  the seq is genuinely taken, so it recomputes from a fresh meta.

R1 (result reconstruction) resolved and far simpler than feared: because
the retry reuses the write set built from attempt 1's meta, the post-push
length is invariant across reuse, so the adapter just returns the
remembered length — no store re-read.

Gating keeps FSM determinism (R5): off by default, so no prev_commit_ts
is emitted and the FSM is byte-identical to today until the whole cluster
is probe-aware.

Tests (real OCC + real exact-ts probe): landed-then-ambiguous -> dedup
no-op -> no duplicate, correct length; not-landed -> reuse applies;
genuine cross-txn conflict -> recompute to a new seq (two distinct
elements); gate-off keeps the legacy path.

Refs docs/design/2026_05_21_proposed_txn_secondary_idempotency.md (M1, M3).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d3e97050b3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread kv/coordinator.go Outdated
}

func onePhaseTxnRequest(startTS, commitTS uint64, primaryKey []byte, reqs []*Elem[OP], readKeys [][]byte) *pb.Request {
return onePhaseTxnRequestWithPrevCommit(startTS, commitTS, 0, primaryKey, reqs, readKeys)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve PrevCommitTS when building one-phase txn logs

onePhaseTxnRequest now hardcodes prevCommitTS=0, so any retry carrying OperationGroup.PrevCommitTS loses the dedup probe key on the Coordinate path (including leader redirects that call buildRedirectRequests -> onePhaseTxnRequest). In that environment, an ambiguous first commit can be retried without PrevCommitTS, causing the FSM to re-apply instead of no-op and reintroducing duplicate list mutations that this change is meant to prevent.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
store/committed_version_at_test.go (2)

19-77: ⚡ Quick win

Prefer a table-driven suite for these subcases.

The cases are currently repeated t.Run blocks; converting to a table will keep this easier to extend as new exact-ts cases are added for both backends.

As per coding guidelines **/*_test.go: "prefer table-driven test cases".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@store/committed_version_at_test.go` around lines 19 - 77, Replace the
multiple t.Run subtests in TestCommittedVersionAt with a single table-driven
loop: create a slice of test cases (fields: name, setup ops as functions or a
list describing calls to newStore/PutAt/DeleteAt, probeKey, probeTS, wantBool)
and iterate calling t.Run(tc.name, func(t *testing.T){ st := newStore(t);
perform tc.setup using st and ctx (use PutAt/DeleteAt as needed); got, err :=
st.CommittedVersionAt(ctx, tc.probeKey, tc.probeTS); require.NoError(t, err);
require.Equal(t, tc.wantBool, got) }). Keep references to CommittedVersionAt,
PutAt, DeleteAt, newStore and ctx so each case reproduces the original behaviors
(exact hits, absent key, newer version, tombstone, different key).

57-77: ⚡ Quick win

Add one TTL/HLC-focused case for exact-ts probe behavior.

Please add a case that writes with TTL/expire semantics and still asserts CommittedVersionAt depends only on exact commit timestamp existence (not current visibility), to pin clock-related behavior explicitly.

As per coding guidelines {store,kv}/**/*_test.go: "Add test coverage for TTL/HLC behaviors when touching clocks, OCC, or replication logic in store/ and kv/ packages."

kv/fsm_onephase_dedup_test.go (1)

40-112: ⚡ Quick win

Consolidate the three scenario tests into a table-driven test.

Lines 40-112 are structurally repetitive and can be expressed as one table-driven test with per-case setup/expectations, which will make future dedup scenarios easier to extend and keep consistent.

As per coding guidelines, "**/*_test.go: Unit tests must be co-located with packages in *_test.go files; prefer table-driven test cases."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kv/fsm_onephase_dedup_test.go` around lines 40 - 112, Replace the three
repetitive tests TestOnePhaseDedup_NoOpsWhenPriorAttemptLanded,
TestOnePhaseDedup_AppliesWhenPriorAttemptDidNotLand, and
TestOnePhaseDedup_FirstAttemptSkipsProbe with a single table-driven test that
iterates cases describing the scenario name, initial state (e.g. pre-landed
write at commit_ts 20 when needed), the onePhaseReq inputs (start_ts, commit_ts,
prev_commit_ts, key, value) and per-case assertions (whether
st.CommittedVersionAt should be true at the fresh commit_ts, expected
LatestCommitTS value, and expected value via st.GetAt). Use the existing helpers
applyFSMRequest and onePhaseReq and instantiate a fresh store.NewMVCCStore() and
fsm := NewKvFSMWithHLC(st, NewHLC()).(*kvFSM) inside each subtest; for the
"landed" case pre-seed the store by calling applyFSMRequest with the initial
attempt as in the old TestOnePhaseDedup_NoOpsWhenPriorAttemptLanded. Ensure each
case runs t.Run(case.name, func(t *testing.T){ t.Parallel() ... }) and performs
the same checks that the original three tests did.
adapter/redis.go (1)

3284-3302: ⚡ Quick win

Skip the reuse path for a definitive first-attempt write conflict.

Line 3296 caches pending even when the initial Dispatch returned store.ErrWriteConflict. That attempt is already known not to have landed, so the next loop only burns an extra guaranteed-miss probe before eventually recomputing from fresh metadata.

♻️ Suggested change
 		_, dispErr := r.coordinator.Dispatch(ctx, &kv.OperationGroup[kv.OP]{
 			IsTxn:    true,
 			StartTS:  startTS,
 			CommitTS: commitTS,
 			Elems:    ops,
 		})
 		if dispErr == nil {
 			newLen = updatedMeta.Len
 			return nil
 		}
+		if errors.Is(dispErr, store.ErrWriteConflict) {
+			return errors.WithStack(dispErr)
+		}
 		// Retryable failure: remember this attempt so the next iteration reuses
 		// its write set (and probes prev_commit_ts) rather than recomputing seq.
 		pending = &reusableListPush{
 			ops:      ops,
 			startTS:  startTS,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@adapter/redis.go` around lines 3284 - 3302, The code currently stores a
reusableListPush into pending even when r.coordinator.Dispatch returns a
definitive store.ErrWriteConflict, causing the next iteration to reuse a write
set that is known not to have landed; modify the error-handling after
r.coordinator.Dispatch so that if dispErr is the definitive write-conflict (use
errors.Is(dispErr, store.ErrWriteConflict) or the appropriate sentinel
comparison) you do NOT assign pending = &reusableListPush{...} and simply return
the error, while preserving the existing pending caching behavior for other
retryable errors; refer to r.coordinator.Dispatch, the pending variable and the
reusableListPush struct when making this change.
adapter/redis_list_dedup_test.go (2)

205-219: ⚡ Quick win

This case never exercises the disabled retry path.

ambiguousDispatch=0 makes listRPush succeed on the first dispatch, so this only proves the default-off happy path. It will not catch accidental PrevCommitTS threading or write-set reuse when dedup is disabled. Inject one retryable failure here and assert the second attempt recomputes with probeNoOps == 0.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@adapter/redis_list_dedup_test.go` around lines 205 - 219, The test currently
uses newDedupTestCoordinator(st, 0, false) so it never triggers a retry; change
the coordinator to inject one retryable/ambiguous failure by calling
newDedupTestCoordinator(st, 1, false), then call srv.listRPush as before and
update assertions to expect a retry (e.g., coord.dispatches == 2) and assert
coord.probeNoOps == 0 to verify the disabled-dedup legacy path recomputes on the
second attempt; keep the rest of the test setup (srv, key, values) unchanged and
only adjust the coordinator arg and the assertions.

104-219: ⚡ Quick win

Fold these scenarios into a table-driven test.

The setup and verification shape is nearly identical across all four cases. A case table would remove duplication and make it much easier to add more permutations around LPUSH, initial write conflicts, or additional retry boundaries.

As per coding guidelines, **/*_test.go: Unit tests should be co-located with packages in *_test.go files; prefer table-driven test cases.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@adapter/redis_list_dedup_test.go` around lines 104 - 219, Combine the four
TestListPushDedup_* tests into a single table-driven test that iterates cases
describing: name, coord setup (newDedupTestCoordinator args like
ambiguousDispatch and landed bool), coordinator.beforeDispatch hook (nil or the
function that injects the concurrent PutAt for GenuineConflictRecomputes),
onePhaseTxnDedup flag on RedisServer, expected outputs (expected length n,
coord.dispatches, coord.probeNoOps, and expected list contents/positions checked
via resolveListMeta and st.GetAt). Implement a loop that for each case creates
st := store.NewMVCCStore(), coord := newDedupTestCoordinator(...), srv :=
&RedisServer{store:st, coordinator:coord, onePhaseTxnDedup: <case flag>}, runs
n, err := srv.listRPush(...), and performs the same assertions currently found
in TestListPushDedup_LandedPriorAttempt_NoDuplicate,
TestListPushDedup_PriorAttemptDidNotLand_Applies,
TestListPushDedup_GenuineConflictRecomputes, and
TestListPushDedup_DisabledKeepsLegacyPath, using the case table values to drive
expected outcomes and any beforeDispatch behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/2026_05_21_proposed_txn_secondary_idempotency.md`:
- Around line 29-33: Update the three fenced code blocks in this document to
include a language identifier (e.g., add "text" after the opening ```),
specifically the block containing the txn trace lines starting with "1411  3 
:invoke :txn ..." and the block showing the one-phase apply pseudocode beginning
"one-phase apply of the retry entry E2...", and also the similar fenced block at
lines 229-236; locate these by the exact snippet content and add the language
tag to each opening ``` so MD040 is resolved and rendering/tooling is
consistent.
- Around line 3-17: The document's top header "**Status: proposed — not yet
implemented.**" contradicts later sections that record landed work; change the
status header to reflect partial implementation (e.g., "Status: partially
implemented / in progress") and update the sections that currently assert landed
work so their status markers and tense match (the sections referenced in your
review that record "landed" changes should be annotated as implemented or moved
to a "Completed" subsection); ensure the status string is updated wherever the
original header token appears so the doc is consistent.

In `@kv/coordinator.go`:
- Around line 1021-1023: onePhaseTxnRequest currently hardcodes prevCommitTS to
0 by calling onePhaseTxnRequestWithPrevCommit(..., 0, ...), which drops
PrevCommitTS for single-group dispatch and can reintroduce duplicate writes;
change the helper and call sites to preserve and forward the real prevCommitTS:
update onePhaseTxnRequest to accept a prevCommitTS (or add an overload) and call
onePhaseTxnRequestWithPrevCommit with that value instead of 0, then update
callers in dispatchTxn and buildRedirectRequests to pass the existing
prevCommitTS through so the resulting pb.Request.PrevCommitTS is populated and
the one-phase dedup probe reaches the FSM. Ensure the pb.Request field name
PrevCommitTS (and any local prevCommitTS variables) are consistently used.

---

Nitpick comments:
In `@adapter/redis_list_dedup_test.go`:
- Around line 205-219: The test currently uses newDedupTestCoordinator(st, 0,
false) so it never triggers a retry; change the coordinator to inject one
retryable/ambiguous failure by calling newDedupTestCoordinator(st, 1, false),
then call srv.listRPush as before and update assertions to expect a retry (e.g.,
coord.dispatches == 2) and assert coord.probeNoOps == 0 to verify the
disabled-dedup legacy path recomputes on the second attempt; keep the rest of
the test setup (srv, key, values) unchanged and only adjust the coordinator arg
and the assertions.
- Around line 104-219: Combine the four TestListPushDedup_* tests into a single
table-driven test that iterates cases describing: name, coord setup
(newDedupTestCoordinator args like ambiguousDispatch and landed bool),
coordinator.beforeDispatch hook (nil or the function that injects the concurrent
PutAt for GenuineConflictRecomputes), onePhaseTxnDedup flag on RedisServer,
expected outputs (expected length n, coord.dispatches, coord.probeNoOps, and
expected list contents/positions checked via resolveListMeta and st.GetAt).
Implement a loop that for each case creates st := store.NewMVCCStore(), coord :=
newDedupTestCoordinator(...), srv := &RedisServer{store:st, coordinator:coord,
onePhaseTxnDedup: <case flag>}, runs n, err := srv.listRPush(...), and performs
the same assertions currently found in
TestListPushDedup_LandedPriorAttempt_NoDuplicate,
TestListPushDedup_PriorAttemptDidNotLand_Applies,
TestListPushDedup_GenuineConflictRecomputes, and
TestListPushDedup_DisabledKeepsLegacyPath, using the case table values to drive
expected outcomes and any beforeDispatch behavior.

In `@adapter/redis.go`:
- Around line 3284-3302: The code currently stores a reusableListPush into
pending even when r.coordinator.Dispatch returns a definitive
store.ErrWriteConflict, causing the next iteration to reuse a write set that is
known not to have landed; modify the error-handling after r.coordinator.Dispatch
so that if dispErr is the definitive write-conflict (use errors.Is(dispErr,
store.ErrWriteConflict) or the appropriate sentinel comparison) you do NOT
assign pending = &reusableListPush{...} and simply return the error, while
preserving the existing pending caching behavior for other retryable errors;
refer to r.coordinator.Dispatch, the pending variable and the reusableListPush
struct when making this change.

In `@kv/fsm_onephase_dedup_test.go`:
- Around line 40-112: Replace the three repetitive tests
TestOnePhaseDedup_NoOpsWhenPriorAttemptLanded,
TestOnePhaseDedup_AppliesWhenPriorAttemptDidNotLand, and
TestOnePhaseDedup_FirstAttemptSkipsProbe with a single table-driven test that
iterates cases describing the scenario name, initial state (e.g. pre-landed
write at commit_ts 20 when needed), the onePhaseReq inputs (start_ts, commit_ts,
prev_commit_ts, key, value) and per-case assertions (whether
st.CommittedVersionAt should be true at the fresh commit_ts, expected
LatestCommitTS value, and expected value via st.GetAt). Use the existing helpers
applyFSMRequest and onePhaseReq and instantiate a fresh store.NewMVCCStore() and
fsm := NewKvFSMWithHLC(st, NewHLC()).(*kvFSM) inside each subtest; for the
"landed" case pre-seed the store by calling applyFSMRequest with the initial
attempt as in the old TestOnePhaseDedup_NoOpsWhenPriorAttemptLanded. Ensure each
case runs t.Run(case.name, func(t *testing.T){ t.Parallel() ... }) and performs
the same checks that the original three tests did.

In `@store/committed_version_at_test.go`:
- Around line 19-77: Replace the multiple t.Run subtests in
TestCommittedVersionAt with a single table-driven loop: create a slice of test
cases (fields: name, setup ops as functions or a list describing calls to
newStore/PutAt/DeleteAt, probeKey, probeTS, wantBool) and iterate calling
t.Run(tc.name, func(t *testing.T){ st := newStore(t); perform tc.setup using st
and ctx (use PutAt/DeleteAt as needed); got, err := st.CommittedVersionAt(ctx,
tc.probeKey, tc.probeTS); require.NoError(t, err); require.Equal(t, tc.wantBool,
got) }). Keep references to CommittedVersionAt, PutAt, DeleteAt, newStore and
ctx so each case reproduces the original behaviors (exact hits, absent key,
newer version, tombstone, different key).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 75a9d531-1f65-4482-a64a-fb1155ac75b9

📥 Commits

Reviewing files that changed from the base of the PR and between fe96c88 and d3e9705.

📒 Files selected for processing (16)
  • adapter/redis.go
  • adapter/redis_list_dedup_test.go
  • docs/design/2026_05_21_proposed_txn_secondary_idempotency.md
  • kv/coordinator.go
  • kv/fsm.go
  • kv/fsm_onephase_dedup_test.go
  • kv/leader_routed_store.go
  • kv/shard_store.go
  • kv/sharded_coordinator.go
  • kv/transcoder.go
  • kv/txn_codec.go
  • kv/txn_codec_test.go
  • store/committed_version_at_test.go
  • store/lsm_store.go
  • store/mvcc_store.go
  • store/store.go

Comment thread docs/design/2026_05_21_proposed_txn_secondary_idempotency.md Outdated
Comment thread docs/design/2026_05_21_proposed_txn_secondary_idempotency.md Outdated
Comment thread kv/coordinator.go Outdated
…xes (#796 review)

Address codex P1 + CodeRabbit Major (both line ~1023 of kv/coordinator.go):
the legacy Coordinate.Dispatch path (dispatchTxn at line 465, and leader
redirect via buildRedirectRequests at line 957) called bare onePhaseTxnRequest,
silently dropping OperationGroup.PrevCommitTS. In a non-sharded deployment or
on a leader-redirect, an ambiguous first commit could be retried without the
dedup probe key, defeating exactly the path option 2 protects.

- Coordinate.dispatchTxn now takes prevCommitTS and forwards it via
  onePhaseTxnRequestWithPrevCommit. Call site at line 465 passes
  reqs.PrevCommitTS; the 5 unit-test callers pass 0.
- buildRedirectRequests forwards reqs.PrevCommitTS unconditionally — it is
  the adapter-allocated identity of the prior attempt and is never
  re-issued by the leader, unlike StartTS/CommitTS.

Caller audit confirmed all onePhaseTxnRequest call sites in production now
route through the prev-commit-aware path.

Lint fixes (reviewdog/golangci):
- decodeTxnMetaV2 cyclop 12 -> ~6 by table-driving the optional V2 fields
  through optionalV2Fields().
- adapter test Dispatch cyclop 12 -> ~7 by extracting the probe to
  maybeProbe().

Doc nits (CodeRabbit):
- Status header upgraded to 'partial — option 2 reader landed everywhere;
  writer (emission) gated off by default', resolving the conflict with the
  landed-milestone notes.
- All bare code fences carry a language tag (text/go) for MD040.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 28, 2026

@claude review

Round-2 fixes addressing codex P1 + CodeRabbit Major (both flagged the same legacy-Coordinate caller-audit gap) and the two reviewdog cyclop warnings:

P1 (codex) / Major (CodeRabbit) — onePhaseTxnRequest legacy callers dropped PrevCommitTS:

  • Coordinate.dispatchTxn (kv/coordinator.go:465) now takes a prevCommitTS param and forwards it via onePhaseTxnRequestWithPrevCommit. Production call site passes reqs.PrevCommitTS; the 5 unit-test callers pass 0.
  • buildRedirectRequests (kv/coordinator.go:957) now forwards reqs.PrevCommitTS unconditionally — it is the adapter-allocated identity of the prior attempt and is never re-issued by the leader (unlike StartTS/CommitTS, which the redirect path clears when StartTS==0).
  • Caller audit: every onePhaseTxnRequest site in production now routes through the prev-commit-aware path. The legacy onePhaseTxnRequest(...) is kept only as a 0-prev wrapper for backwards source compat.

Cyclop (reviewdog):

  • decodeTxnMetaV2 12 → ~6 by table-driving the V2 optional fields through optionalV2Fields().
  • adapter test Dispatch 12 → ~7 by extracting the probe to maybeProbe().

Doc nits (CodeRabbit):

  • Status header now "partial — option 2 reader landed everywhere; writer (emission) gated off by default," resolving the contradiction with the landed-milestone notes below.
  • All bare code fences carry a language tag (text/go).

Commit: fd0831cd. kv/store suites green; gofmt + vet clean.

@claude
Copy link
Copy Markdown

claude Bot commented May 28, 2026

Claude finished @bootjp's task in 3m 21s —— View job


Round-2 Code Review

  • Load tools
  • Read changed files: coordinator.go, txn_codec.go, redis_list_dedup_test.go, design doc
  • Review PrevCommitTS threading fix
  • Review cyclop complexity fixes
  • Review doc updates
  • Post final findings

Verdict: Round-2 changes are correct and the targeted issues are resolved.

The P1 (codex) / Major (CodeRabbit) gap — legacy Coordinate path silently dropping PrevCommitTS — is fixed correctly at both sites. Cyclop targets are addressed. One minor dead-code note and one comment nit follow.


PrevCommitTS Threading (kv/coordinator.go)

Both callsites are correct:

  • dispatchTxn (line 801): New prevCommitTS uint64 parameter is wired from reqs.PrevCommitTS at the callsite (line 465) and passed through to onePhaseTxnRequestWithPrevCommit. ✅
  • buildRedirectRequests (line 987): Forwards reqs.PrevCommitTS unconditionally. The StartTS == 0 → commitTS = 0 reset correctly doesn't touch PrevCommitTS, since the comment accurately captures the invariant: the prior attempt's identity is adapter-allocated and must not be discarded when the leader regenerates timestamps. ✅

Minor: onePhaseTxnRequest is now dead code. All production callers now go through onePhaseTxnRequestWithPrevCommit directly. The test callers (coordinator_txn_test.go) were updated to pass ..., 0 to dispatchTxn, not to call onePhaseTxnRequest. The function has zero callsites. golangci config doesn't include deadcode/unused, so this won't fail lint — but it's a semantic footgun. The safe options:

  • Delete it (cleanest, no risk because there are truly no callers), or
  • Keep it but add a comment: // zero-prevCommitTS wrapper; prefer onePhaseTxnRequestWithPrevCommit.

Fix this →

Missing test: buildRedirectRequests PrevCommitTS forwarding is code-correct but has no unit coverage. There are no tests calling buildRedirectRequests with a non-zero PrevCommitTS to verify the field survives the leader-redirect path. Given this was the exact defect that prior reviews flagged, a targeted test would lock it down.


Cyclop Fix: optionalV2Fields (kv/txn_codec.go)

The approach is correct. Field order in the table matches the encoder order (LockTTL → CommitTS → PrevCommitTS) and the decode loop reads them identically. Round-trip is preserved.

Minor: the comment on optionalV2Fields (line 201–205) is misleading. It says "encodeTxnMetaV2 and decodeTxnMetaV2 must traverse this list in the same sequence" — but encodeTxnMetaV2 doesn't use the table at all; it has its own manually-ordered conditionals. The comment creates a false impression that both encoder and decoder are driven by the table (which would be a nice property but isn't true). Should read something like: "decodeTxnMetaV2 reads fields via this table; encodeTxnMetaV2 must write them in the same order — update both together when adding a field."

Fix this →


Cyclop Fix: maybeProbe extraction (adapter/redis_list_dedup_test.go)

The extraction is clean and the (handled bool, resp, err) triple-return idiom is idiomatic Go for probe-owns-result patterns. ✅


Design Doc

Status header is accurate (partial — option 2 reader landed everywhere; writer (emission) gated off by default) and the bare code fences now carry text tags. Both CodeRabbit nits resolved. ✅


Carry-over from Round-1 (not in scope for Round-2, but not yet addressed)

For visibility — these were not in Round-2's stated scope but are still open:

  1. CommittedVersionAt retention guard (codex P2 on store/lsm_store.go:778, store/mvcc_store.go:476): Both implementations return (false, nil) when the probe timestamp is below minRetainedTS, making a compacted-history miss indistinguishable from "never landed." Should return ErrReadTSCompacted to allow the FSM/adapter to distinguish "not found" from "history gone." Low urgency while compaction windows are long relative to retry windows, but worth tracking.

  2. TestListPushDedup_DisabledKeepsLegacyPath (CodeRabbit nitpick): Still uses ambiguousDispatch=0, so the legacy path is tested only for the no-retry case. Not a blocking issue.


Summary: The Round-2 changes correctly resolve the P1/Major gap. Two cosmetic issues (dead onePhaseTxnRequest wrapper, misleading optionalV2Fields comment) and one missing unit test for the redirect path are the only remaining items before merge.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fd0831cde7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/redis.go Outdated
Comment on lines +3240 to +3246
_, dispErr := r.coordinator.Dispatch(ctx, &kv.OperationGroup[kv.OP]{
IsTxn: true,
StartTS: pending.startTS,
CommitTS: commitTS,
PrevCommitTS: pending.commitTS,
Elems: pending.ops,
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Recompute stale pushes after non-landed retry errors

When onePhaseTxnDedup is enabled and the first push returns a retryable error without landing, this path replays the old item and delta keys after only a PrevCommitTS probe miss. If a pop commits between attempts, it writes a claim/delete plus its own delta and does not necessarily touch the stale push's target sequence; for example, from a one-element list a failed RPUSH plans seq 1, an intervening RPOP removes seq 0, then the replay writes seq 1 with a +1 delta, leaving metadata Head=0 Len=1 so LRANGE scans seq 0 and misses the new item. After a probe miss, the retry needs to recompute or validate the list boundary/read set before applying the captured write set.

Useful? React with 👍 / 👎.

…g pop (#796 codex P1 round-3)

codex P1 (06:37, post-fd0831cd): when onePhaseTxnDedup is on and attempt 1
returns retryable without landing, the retry replays the captured ops on
top of whatever the world has since become. If an RPOP/LPOP/LTRIM
intervened and shrank the boundary, the reused seq can land past the new
Tail — physically present but unreachable to LRANGE.

Fix: capture the boundary positions (listItemKey(Head), listItemKey(Tail-1))
at attempt 1's meta read and pass them through OperationGroup.ReadKeys to
the dispatched one-phase txn. ApplyMutationsRaft validates ReadKeys
atomically against startTS under the FSM applyMu, so any intervening
boundary-touching commit fires WriteConflict; listPushCoreWithDedup already
treats a reuse WriteConflict as 'probe ruled out our prior attempt + seq
genuinely contested', clears pending, and recomputes from fresh meta —
which is correct because our attempt did not land.

Probe-hit branch is unchanged: if our attempt 1 actually committed at T1
the FSM no-ops before the OCC check runs, returning success regardless of
what else has happened to the boundary (the post-attempt-1 length we
reconstruct is the correct serialized result).

The full case matrix is now:
  - attempt 1 landed, no others        → probe hit  → no-op (return length)
  - attempt 1 landed, others touched   → probe hit  → no-op (return length)
  - attempt 1 missed, no others        → probe miss + ReadKeys OK → apply
  - attempt 1 missed, boundary touched → probe miss + ReadKeys conflict → recompute
  - attempt 1 missed, same seq taken   → probe miss + write OCC conflict → recompute

Also: delete the now-dead onePhaseTxnRequest wrapper (golangci 'unused'
lint failure on the fd0831c review run); all production sites moved to
onePhaseTxnRequestWithPrevCommit in round-2. Caller audit: no external
callers remained. Refine optionalV2Fields comment to match how the table
is actually consumed (decode iterates; encode mirrors it via inline
conditionals — both must be updated together).

Caller audit (loop semantic-change rule): adding ReadKeys to the
dedup-path dispatch only widens retry-on-conflict within retryRedisWrite;
the client-facing post-push length is unchanged; the legacy gate-off path
is untouched.

Regression test: TestListPushDedup_InterveningRPopRefusesStaleReuse
reproduces the codex scenario from a one-element [A] list and asserts
the pushed element is reachable at meta.Head after the fix.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 28, 2026

@claude review

Round-3 fixes addressing codex P1 (adapter/redis.go:3246, 06:37 post-fd0831cd) + the golangci unused lint fail on the dead onePhaseTxnRequest wrapper + the misleading optionalV2Fields comment.

codex P1 — reuse-after-meta-change can hide writes (adapter/redis.go):
Captured ops were replayed against whatever the world had become; an intervening RPOP/LPOP/LTRIM that shrank the boundary could leave the reused seq past the new Tail (physically written, unreachable to LRANGE).

Fix: capture the boundary positions (listItemKey(Head), listItemKey(Tail-1)) at attempt 1's meta read and pass them through OperationGroup.ReadKeys. ApplyMutationsRaft validates them atomically against startTS under applyMu, so any boundary-touching commit fires WriteConflict; listPushCoreWithDedup already treats a reuse WriteConflict as "probe ruled out our prior attempt + seq genuinely contested" → clear pending, recompute from fresh meta. The probe-hit branch is unchanged: if our attempt actually committed at T1 the FSM no-ops before OCC, returning success regardless of what else happened.

Full case matrix:

  • attempt 1 landed, no others → probe hit → no-op (return length)
  • attempt 1 landed, others touched → probe hit → no-op (return length)
  • attempt 1 missed, no others → probe miss + ReadKeys OK → apply
  • attempt 1 missed, boundary touched → probe miss + ReadKeys conflict → recompute
  • attempt 1 missed, same seq taken → probe miss + write OCC conflict → recompute

Regression test: TestListPushDedup_InterveningRPopRefusesStaleReuse reproduces codex's one-element [A] scenario and asserts the pushed element is reachable at meta.Head after the fix.

Lint / claude review minors:

  • Deleted the dead onePhaseTxnRequest wrapper (golangci unused, also flagged by claude). Caller audit: zero remaining callers — all production routes through onePhaseTxnRequestWithPrevCommit since round-2.
  • Refined optionalV2Fields comment to match how the table is actually consumed (decode iterates; encode mirrors via inline conditionals).

Caller audit (loop semantic-change rule): adding ReadKeys to the dedup-path dispatch only widens retry-on-conflict within retryRedisWrite; client-facing post-push length is unchanged; legacy gate-off path untouched.

Commit: 65daa3a2. All 5 dedup tests + kv/store suites green; gofmt + vet clean.

@claude
Copy link
Copy Markdown

claude Bot commented May 28, 2026

Claude finished @bootjp's task in 5m 12s —— View job


Round-3 Code Review

  • Load tools
  • Read Round-3 changed files: adapter/redis.go (boundary fence fix), kv/coordinator.go (dead wrapper deletion), kv/txn_codec.go (comment fix), adapter/redis_list_dedup_test.go (regression test)
  • Analyze boundary-fence ReadKeys approach for correctness
  • Check dead wrapper deletion and comment fix
  • Post final review

Verdict: Round-3 changes are correct. The codex P1 gap is closed.

The boundary-fence fix, regression test, dead-code deletion, and comment correction are all sound. Two non-blocking observations follow.


Boundary-Fence Correctness (adapter/redis.go)

The mechanism is correct end-to-end:

Path when attempt 1 did NOT land and a pop/trim happened between attempts:

  1. listPushBoundaryReadKeys captures listItemKey(Head) and (for non-empty list) listItemKey(Tail-1) at attempt-1 read time
  2. These ride as OperationGroup.ReadKeys through pending.readKeys to the reuse dispatch
  3. checkConflictsLocked (line store/mvcc_store.go:524) checks both write mutations and read keys against startTS; a tombstone from RPOP/LPOP at TS > startTS fires ErrWriteConflict
  4. Adapter sees ErrWriteConflict, clears pending, recomputes from fresh meta ✓

Path when attempt 1 DID land:
The maybeProbe / FSM dedup probe fires before OCC and returns success — the boundary ReadKeys are never evaluated. Correct: subsequent pop/trim ops don't matter once our commit is confirmed at T1. ✓

Empty-list case (meta.Len ≤ 0readKeys = nil): No pops are possible on an empty list; concurrent RPUSH conflicts are caught by the write-key OCC on the item key itself. ✓

Dedup gate: listPushCore gates on r.onePhaseTxnDedup (line 3167) and routes accordingly; the legacy path is unchanged. ✓


Minor: Boundary fence is slightly over-conservative for RPUSH (non-blocking, safe)

listPushBoundaryReadKeys fences both Head and Tail-1 regardless of push direction. For RPUSH specifically, an LPOP that advances Head doesn't actually invalidate the reuse: RPUSH writes at old_Tail, and after LPOP the LRANGE window [new_Head, old_Tail+1) still includes old_Tail. Fencing Head for RPUSH will occasionally trigger a false ErrWriteConflict and recompute unnecessarily, but this never produces incorrect results — recomputation is the safe fallback. Likewise, RPOP doesn't invalidate LPUSH, but Tail-1 is fenced for LPUSH too.

This is the right tradeoff: the same listPushBoundaryReadKeys helper serves both RPUSH and LPUSH symmetrically with no direction-specific branching, and extra recomputes are cheap. Noting it only so a future reviewer doesn't mistake the false-positive OCC conflicts for a bug.


Regression Test (TestListPushDedup_InterveningRPopRefusesStaleReuse)

The test correctly simulates the P1 scenario:

  1. Seeds [A] at seq 0 (Head=0, Len=1, Tail=1)
  2. Dispatch 1: ambiguousDispatch=1, lands=false → returns ErrWriteConflict without applying. pending.readKeys = [listItemKey(key, 0)], pending.startTS = T_before_rpop
  3. beforeDispatch(2): RPOP commits at TS > startTS — tombstones seq 0, writes delta {Head:+1, Len:-1}. State: Head=1, Len=0, Tail=1
  4. Dispatch 2 (reuse): probe miss (attempt 1 didn't land) → occAdapterCoordinator calls ApplyMutations with ReadKeys=[listItemKey(key,0)], StartTS=T_before_rpopcheckConflictsLocked fires on the tombstoned key → ErrWriteConflictpending = nil
  5. Dispatch 3 (recompute): fresh meta (Head=1, Len=0), plans seq 1, empty ReadKeys (empty list). Succeeds. State: Head=1, Len=1, listItemKey(key,1)="v"
  6. Assertions: n==1, dispatches > 2 (= 3), meta.Head=1, st.GetAt(key, 1) == "v"

Minor precision note (non-blocking): The assertion require.Greater(t, coord.dispatches, 2, ...) is intentionally loose. Given the fixed injection pattern (attempt 1 pre-rejects, attempt 2 OCC-conflicts, attempt 3 recomputes), == 3 would be tighter. > 2 is safe but won't catch a future regression where extra dispatches pile up before success. Low severity.


onePhaseTxnRequest Deletion (kv/coordinator.go)

Dead code correctly removed. All production paths route through onePhaseTxnRequestWithPrevCommit since round-2. golangci unused lint failure is resolved. ✓


optionalV2Fields Comment (kv/txn_codec.go)

The revised comment accurately describes the actual relationship: only the decoder iterates the table; the encoder uses matching inline conditionals. The "update both together" guidance is the actionable safeguard. ✓


Carry-over from Round-2 (unchanged, not in scope for Round-3)

For tracking — these remain open:

  1. CommittedVersionAt retention guard (store/lsm_store.go:778, store/mvcc_store.go:476, codex P2): Both backends return (false, nil) when the probe timestamp is below minRetainedTS, making a compacted-history miss indistinguishable from "never landed." Should return ErrReadTSCompacted. Low urgency while retry windows are far shorter than compaction windows.

  2. buildRedirectRequests unit coverage: No test calls this path with non-zero PrevCommitTS to verify the field survives the leader-redirect path. The code is correct (verified in round-2); the gap is test coverage only.


Summary: Round-3 is correct and complete for its stated scope. The two carry-over items (retention guard, redirect test) are the only open threads before merge.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 65daa3a2f2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/redis.go
length: updatedMeta.Len,
readKeys: boundaryReads,
}
return errors.WithStack(dispErr)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep dedup state for non-retryable ambiguous errors

When the first dispatch returns an ambiguous error outside store.ErrWriteConflict/kv.ErrTxnLocked (for example a caller deadline/cancellation after the proposal may still apply), this path records pending but then returns the original dispErr to retryRedisWrite; adapter/redis_retry.go only retries write conflicts and txn locks, so the saved write set is discarded as the command returns. A client retry then starts a fresh listPushCoreWithDedup with no PrevCommitTS and recomputes a new sequence, leaving the duplicate-list-element case unfixed for those ambiguity modes.

Useful? React with 👍 / 👎.

Comment thread adapter/redis.go
// docs/design/2026_05_21_proposed_txn_secondary_idempotency.md). It
// MUST stay off until every node runs a probe-aware binary — see R5
// (FSM determinism across a rolling upgrade). Default off; enabled via
// WithOnePhaseTxnDedup / the ELASTICKV_REDIS_ONEPHASE_DEDUP env var
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wire the advertised env flag into server construction

This says operators can enable the feature with ELASTICKV_REDIS_ONEPHASE_DEDUP, but a repo-wide search shows that string only appears in docs/comments, and NewRedisServer only reads ELASTICKV_REDIS_TRACE; cmd/server/demo.go also never passes WithOnePhaseTxnDedup. In the production server path, setting the documented env var therefore has no effect and the feature remains disabled after rollout.

Useful? React with 👍 / 👎.

…ve on retryable (#796 codex P1+P2 round-4)

codex P1 (06:51, adapter/redis.go:3339) — Keep dedup state for non-retryable
ambiguous errors: listPushCoreWithDedup recorded 'pending' unconditionally
on any first-attempt error, but retryRedisWrite only loops on
store.ErrWriteConflict / kv.ErrTxnLocked, so for other errors the saved
write set was discarded with the goroutine — misleading and useless.

Fix the code's honesty: only set/advance 'pending' when isRetryableRedisTxnErr
will actually trigger the loop. Visible behavior is unchanged in the
non-retryable case (pending was always going to be lost), but the code no
longer pretends to hold state across a request boundary.

The broader 'ambiguous error escapes retryRedisWrite then the client
retries the command' window (context-deadline / NOTLEADER → fresh
listPushCoreWithDedup with no PrevCommitTS) is a separate problem space
requiring a cross-request idempotency cache — out of scope for option 2
and noted in the doc as a follow-up.

codex P2 (06:51, adapter/redis.go:249) — wire the advertised env flag:
ELASTICKV_REDIS_ONEPHASE_DEDUP was documented as an opt-in but
NewRedisServer never read it. Add the env-var read in the struct literal
next to ELASTICKV_REDIS_TRACE (default off; WithOnePhaseTxnDedup option
can still override either way).

Caller audit (loop semantic-change rule):
- pending-save gating is semantically a no-op (the unsaved state was
  always discarded by retryRedisWrite returning); no behavior change
  for non-retryable errors, no caller impact.
- env wiring expands the opt-in surface. NewRedisServer is the only
  production constructor path; tests that bypass it (literal &RedisServer)
  set onePhaseTxnDedup directly and are unaffected.

Also tighten TestListPushDedup_InterveningRPopRefusesStaleReuse's
'dispatches > 2' to '== 3' per claude round-3 minor (pins the exact
sequence so a future regression that piles extra dispatches before
success doesn't slip past).
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 28, 2026

@claude review

Round-4 addressing codex P1 + P2 posted at 06:51 (post-65daa3a2) plus your round-3 minor on the loose dispatch-count assertion.

codex P1 — pending saved even for non-retryable errors (adapter/redis.go:3339):
listPushCoreWithDedup recorded pending unconditionally on any first-attempt error, but retryRedisWrite only loops on store.ErrWriteConflict / kv.ErrTxnLocked, so for other errors the saved write set was discarded with the goroutine — misleading and useless.

Fix: only set/advance pending when isRetryableRedisTxnErr will actually trigger the loop. Both the fresh-attempt and reuse-retryable branches now gate the pending write/update. Visible behavior is unchanged in the non-retryable case (pending was always going to be lost); the code no longer pretends to hold state across a request boundary.

Scope-defining note: the broader window where an ambiguous error escapes retryRedisWrite and the client retries the whole command at a higher level (so the next listPushCoreWithDedup starts with no PrevCommitTS) requires a cross-request idempotency cache — out of scope for option 2. Will doc as a follow-up.

codex P2 — env flag advertised but never wired (adapter/redis.go:249):
ELASTICKV_REDIS_ONEPHASE_DEDUP was documented as the opt-in but NewRedisServer only read ELASTICKV_REDIS_TRACE. Added the env read in the struct literal next to it. Default off (!= "1"); WithOnePhaseTxnDedup option still overrides either way.

Round-3 minor (claude): tightened TestListPushDedup_InterveningRPopRefusesStaleReuse's dispatches > 2 to == 3 to pin the exact sequence (attempt 1 pre-rejects, reuse OCC-conflicts, recompute lands).

Caller audit (loop rule):

  • pending-save gating is semantically a no-op: the unsaved state was always discarded by retryRedisWrite returning. No behavior change for non-retryable errors; no caller impact.
  • env wiring expands the opt-in surface only through NewRedisServer (the production path). Tests that bypass it (literal &RedisServer{}) set onePhaseTxnDedup directly and are unaffected.

Commit: 7781725d. All 5 dedup tests green; fmt + vet clean.

@claude
Copy link
Copy Markdown

claude Bot commented May 28, 2026

Claude finished @bootjp's task in 3m 22s —— View job


Round-4 Code Review

  • Load tools
  • Read changed files: adapter/redis.go, adapter/redis_list_dedup_test.go
  • Analyze pending-save gating (codex P1 fix)
  • Analyze env wiring (codex P2 fix)
  • Verify dispatch assertion tightening (round-3 minor)
  • Check for remaining open items
  • Post final review

Verdict: Round-4 changes are correct. Both targeted issues are resolved.

The pending-save gating and env wiring changes are sound. The dispatch count is correctly tightened. No new issues introduced.


P1 — Pending-save gating (adapter/redis.go)

Both guard sites are correct.

Fresh-attempt path (adapter/redis.go:3347):
pending = &reusableListPush{...} is now inside if isRetryableRedisTxnErr(dispErr). For non-retryable errors (context deadline, FSM apply failure, NOTLEADER), retryRedisWrite returns immediately after the closure exits — the closure's local pending is discarded regardless. Gating the assignment removes the misleading save-with-no-reader. ✓

For ErrWriteConflict and ErrTxnLocked, pending is still saved and retryRedisWrite loops. ✓

Reuse path (adapter/redis.go:3303):
pending.commitTS = commitTS is now inside if isRetryableRedisTxnErr(dispErr). The ErrWriteConflict early-return at line 3290-3296 already clears pending for that case before reaching line 3303, so the remaining retryable case reaching line 3303 is ErrTxnLocked (ambiguous — may have landed). Gating the update is correct: for non-retryable errors the commitTS update served no purpose since the loop won't fire. ✓

Minor (non-blocking): ErrWriteConflict on fresh attempt triggers an unnecessary probe.
When the fresh attempt returns ErrWriteConflict (definitive rejection — did NOT land), pending is saved and the next iteration takes the reuse path, which dispatches with PrevCommitTS = pending.commitTS. The FSM probes, gets a miss, then OCC on the stale boundary keys also fires (since the conflict implies an intervening write), clears pending, and recomputes. Compared to a fresh path that would go straight to recompute, this wastes one extra dispatch + probe per ErrWriteConflict on first attempt. Correct behavior, just slightly suboptimal. This is pre-existing behavior (not introduced by R4) and is bounded to the failure path, so low priority.


P2 — Env wiring (adapter/redis.go:502)

onePhaseTxnDedup: os.Getenv("ELASTICKV_REDIS_ONEPHASE_DEDUP") == "1",

Set in the struct literal before the option loop at line 515-519. WithOnePhaseTxnDedup(enabled) runs in that loop and correctly overrides the env value in either direction. Default off. ✓

The cmd/server/demo.go entrypoint creates the server via NewRedisServer (not a literal &RedisServer{}), so the env var now reaches production. ✓


Round-3 minor — dispatch count (adapter/redis_list_dedup_test.go:258)

require.Greater(t, coord.dispatches, 2, ...)require.Equal(t, 3, coord.dispatches) pins the exact sequence (attempt 1 pre-rejects, attempt 2 reuse OCC-conflicts, attempt 3 recomputes and lands). Correct and tighter than the previous loose bound. ✓


Carry-over from prior rounds (not addressed in R4, both tracked in design doc)

  1. CommittedVersionAt retention guard (store/lsm_store.go:778, store/mvcc_store.go:476): Both backends return (false, nil) when commitTS < minRetainedTS, making a compacted-history miss indistinguishable from "never landed." Should return ErrReadTSCompacted. Low urgency while compaction windows far exceed retry windows. [codex P2, also noted in Round-3]

  2. buildRedirectRequests unit test: No test exercises the leader-redirect path with non-zero PrevCommitTS to verify it survives. Code is correct (verified in Round-2); gap is test coverage only.


Round-4 is clean. No further blocking issues. The two carry-over items are the only remaining open threads before merge.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7781725d38

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/redis.go Outdated
// Either the prior attempt landed (probe hit → no-op) or this
// reuse applied the identical write set. The post-push length
// is the same either way, so reconstruction is just pending.length.
newLen = pending.length
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Recompute length after reused push applies

When a saved attempt did not land and another client changes the list length without touching this attempt’s boundary/write keys (for example, an RPUSH gets a retryable pre-apply error, then an LPUSH commits before this reuse), the reused dispatch can apply successfully but this line still returns the length calculated from the first attempt’s old snapshot. The legacy retry path recomputed in this window, so this can now return 2 while the list actually has 3 elements; the retry path needs to either force a recompute/conflict on intervening length-changing writes or distinguish a probe no-op from a fresh apply.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8dec9f60c9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/redis.go Outdated
Comment on lines +3302 to +3306
currentMeta, _, mErr := r.resolveListMeta(ctx, key, r.readTS())
if mErr != nil {
return 0, mErr
}
return currentMeta.Len, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve landed retry length on non-leader probes

When a list-push retry is resolved on a node that is no longer leader, LeaderRoutedStore.CommittedVersionAt now returns (false, nil), so a reuse whose previous attempt actually landed falls through here and returns the leader's current list length. If another client changes the list between the original append and this retry response, RPUSH/LPUSH can return that later length even though this command's element was inserted at the earlier position, which breaks the command's linearizable Redis result. Fresh evidence since the prior comment: the new non-leader false fallback still drives this resolveListMeta branch instead of preserving pending.length for a no-op retry.

Useful? React with 👍 / 👎.

…#796 codex P1 round-7 follow-up)

codex P1 round-7 (preserve landed-retry length on non-leader probes):
the previous round-7 fix returned (false, nil) unconditionally on a
deposed leader, which let resolveReuseLength take the resolveListMeta
fallthrough even when our prior attempt had actually committed. The
caller then returned the leader's current Len rather than pending.length,
which under concurrent writes could include other clients' commits and
break the per-our-commit linearizable Redis result.

Recover the fast path without adding a new RPC: on the non-leader path,
submit a Raft ReadIndex via the existing LinearizableRead primitive (the
coordinator's on LeaderRoutedStore; the per-group engine's on ShardStore).
ReadIndex forwards to the current leader and waits for our local applied
index to reach the leader's commit point — after that, the local probe
sees every committed version, so we can answer the exact-ts question
authoritatively from the local replica. If ReadIndex fails (no leader,
ctx canceled), keep the existing (false, nil) fallback so the adapter
re-reads via the already-leader-fenced ScanAt/GetAt and still returns a
valid (current-Len) serialization.

Cost: one ReadIndex round-trip on the deposed-node retry path. Bounded
to a rare situation (retry under leader churn); the leader-with-valid-
lease fast path is unchanged.

Caller audit (loop semantic-change rule):
- adapter/redis.go:3294 (RedisServer.resolveReuseLength): now sees
  authoritative true/false during churn, so the pending.length branch
  fires when our prior actually landed. The fallthrough to
  resolveListMeta still triggers correctly when the probe genuinely
  misses or ReadIndex fails.
- kv/fsm.go:505: still routes through the raw store (not these wrappers),
  so the deterministic FSM apply path is untouched.

All six dedup tests pass; gofmt + vet clean.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 28, 2026

@claude review

Round-8 addressing the new codex P1 at line 3306 (post-round-7): Preserve landed retry length on non-leader probes.

Round-7 returned (false, nil) unconditionally on a deposed leader, which let resolveReuseLength take the resolveListMeta fallthrough even when our prior had actually committed. The caller then returned the leader's current Len rather than pending.length, which under concurrent writes could include other clients' commits — breaking the per-our-commit linearizable Redis result codex flagged.

Fix: recover the fast path without a new RPC. On the non-leader path, submit a Raft ReadIndex via the existing LinearizableRead primitive (coordinator's on LeaderRoutedStore; per-group engine's on ShardStore). ReadIndex forwards to the current leader and waits until our local applied index reaches the leader's commit point — after that the local probe sees every committed version, so we can answer the exact-ts question authoritatively from the local replica. If ReadIndex fails (no leader reachable, ctx canceled), keep the (false, nil) fallback so the adapter still re-reads via the leader-fenced ScanAt/GetAt.

Cost: one ReadIndex round-trip on the deposed-node retry path (a rare situation). The leader-with-valid-lease fast path is unchanged.

Caller audit (loop rule):

  • adapter/redis.go:3294 (resolveReuseLength): now sees authoritative true/false during churn, so the pending.length branch fires when our prior actually landed. The resolveListMeta fallthrough still triggers when the probe genuinely misses or ReadIndex fails.
  • kv/fsm.go:505: routes through the raw store (not these wrappers), so the deterministic FSM apply path is untouched.

All 6 dedup tests + kv/store suites pass; gofmt + vet clean.

Commit: bc11779d.

@claude
Copy link
Copy Markdown

claude Bot commented May 28, 2026

Claude finished @bootjp's task in 3m 30s —— View job


Round-8 Code Review

  • Load tools
  • Read changed files: kv/leader_routed_store.go, kv/shard_store.go
  • Analyze ReadIndex-fence approach for correctness
  • Verify FSM path isolation
  • Check carry-over items
  • Post final review

Verdict: Round-8 changes are correct. The codex P1 "preserve landed retry length on non-leader probes" gap is closed on both wrappers.

The ReadIndex-fence approach is the right mechanism and the implementation is sound end-to-end. Three precision notes follow.


LeaderRoutedStore.CommittedVersionAt (kv/leader_routed_store.go:217)

if !s.leaderOKForKey(ctx, key) {
    if s.coordinator == nil { return false, nil }
    if _, err := s.coordinator.LinearizableRead(ctx); err != nil {
        return false, nil
    }
}
exists, err := s.local.CommittedVersionAt(ctx, key, commitTS)

Correctness of the ReadIndex fence:
Coordinate.LinearizableRead(ctx)linearizableReadEngineCtxengine.submitRead(ctx, true). With waitApplied=true, the engine submits a ReadIndex request to the Raft group: the current leader confirms its commit index and the follower waits until its local applied index reaches that point. Because:

  • The reuse dispatch was already committed (it returned a result before resolveReuseLength is called)
  • That commit is at some log index I_commit
  • The leader's commit index when the ReadIndex is submitted is ≥ I_commit
  • After LinearizableRead returns, the local applied index ≥ ReadIndex target ≥ I_commit
  • s.local.CommittedVersionAt(ctx, key, commitTS) reads from the same local store that the FSM writes to

Therefore, after LinearizableRead returns successfully, the probe correctly answers "did our prior attempt land at T1?" ✓

leaderOKForKey semantics — exact-ts probe doesn't need timestamp advancement: leaderOKForKey discards the fence timestamp from leaderFenceTS. For exact-ts probes this is correct — we're asking "does version T1 exist?", not "read at max(ts, fenceTS)". Advancing the read timestamp would change the semantics from exact-version existence to snapshot visibility, defeating the dedup invariant. ✓

nil-coordinator fallback: s.coordinator == nil returns (false, nil)resolveReuseLength falls through to resolveListMeta. Correct for test fixtures that don't wire a coordinator. ✓


ShardStore.CommittedVersionAt (kv/shard_store.go:118)

if !isLinearizableRaftLeader(ctx, engine) {
    if _, err := engine.LinearizableRead(ctx); err != nil {
        return false, nil
    }
}

No double ReadIndex on the non-leader path. isLinearizableRaftLeader calls isLeaderEngine(engine) first — engine.State() == raftengine.StateLeader. On a follower, isLeaderEngine returns false immediately and isLinearizableRaftLeader short-circuits without issuing a ReadIndex. Only then does CommittedVersionAt enter the non-leader branch and issue one ReadIndex via engine.LinearizableRead(ctx). The concern about double-charging the ReadIndex latency does not apply. ✓

Leader path cost: When we ARE the leader and the lease is valid, isLinearizableRaftLeader returns via leaseReadEngineCtx fast path (no ReadIndex), and we fall through directly to g.Store.CommittedVersionAt. This is identical to the pre-R8 behavior on the leader. ✓

Leader with expired lease: isLinearizableRaftLeader calls leaseReadEngineCtxengine.LinearizableRead(ctx) → one ReadIndex. Returns true on success → reads local. One ReadIndex, same as all other GetAt/ScanAt callers. ✓

nil-engine guard (test fixtures): engine == nil → reads g.Store.CommittedVersionAt directly. Preserves existing test behavior where ShardStore is wired without Raft. ✓


FSM Path Isolation

kv/fsm.go:505 calls f.store.CommittedVersionAt where f.store is the per-shard raw store (e.g., *store.mvccStore or *store.pebbleStore) wired via NewKvFSMWithHLC(st, hlc). This is neither a LeaderRoutedStore nor a ShardStore — the R8 ReadIndex-fence methods don't affect it. The FSM apply path continues to read the local replica deterministically with no Raft round-trip. ✓


Minor: LeaderRoutedStore uses coordinator-level ReadIndex, not engine-level

s.coordinator.LinearizableRead(ctx) calls Coordinate.LinearizableRead(ctx)linearizableReadEngineCtx(ctx, c.engine) → single-group engine. In a single-group deployment this is correct.

In a multi-group deployment, r.store would be a ShardStore (not LeaderRoutedStore), so this code path is single-group only. No issue — just noting the topology assumption. ✓


Open P1 from Round-6 (still unaddressed after Round-8)

The self-inflicted reuse conflict: when a reuse dispatch applies at commitTS2 but returns an ambiguous error before success is observed, the next retry (with PrevCommitTS=T1) sees ErrWriteConflict from its own committed reuse write — dispatchListPushReuse treats any reuse WriteConflict as "genuine third-party conflict," clears pending, and recomputes, potentially appending a duplicate. R8 does not address this; it's the last open correctness thread before merge.

Track this →


Carry-over from Prior Rounds (unchanged)

  1. CommittedVersionAt retention guard (store/lsm_store.go:778, store/mvcc_store.go:476): Both backends return (false, nil) when commitTS < minRetainedTS instead of ErrReadTSCompacted. In the degraded case (compaction window shorter than retry window), resolveReuseLength safely falls through to resolveListMeta. Low urgency.

  2. buildRedirectRequests unit test: No test verifies PrevCommitTS survives the leader-redirect path with a non-zero value. Code correct (round-2); gap is test coverage only.


Summary: Round-8 is correct and complete for its stated scope. The ReadIndex-fence correctly turns the non-leader probe from "unconditional false" into "wait-and-read-local," preserving pending.length when our prior attempt actually committed. The open P1 from Round-6 (self-inflicted reuse conflict) and the two carry-over items remain the only threads before merge.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bc11779d7e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread kv/shard_store.go Outdated
Comment on lines +142 to +143
if _, err := engine.LinearizableRead(ctx); err != nil {
return false, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Proxy sharded exact-version probes to the leader

When Redis is backed by ShardStore and a list-push retry is served on a follower or deposed replica, this fallback reports a miss: the etcd raft engine’s LinearizableRead is leader-only for followers/learners, so this branch returns (false, nil) instead of querying the leader for the exact version. After Dispatch has forwarded the retry and the leader no-oped a previously landed attempt, resolveReuseLength falls back to the leader’s current list length; if another client changed the list after the original commit, RPUSH/LPUSH returns that later length rather than this command’s linearized result. Fresh evidence beyond the prior single-group comment is that this new ShardStore.CommittedVersionAt path has the same false fallback for sharded stores.

Useful? React with 👍 / 👎.

Comment thread kv/sharded_coordinator.go
// so that validateReadOnlyShards can issue a linearizable read barrier,
// preserving SSI.
return c.dispatchSingleShardTxn(ctx, startTS, commitTS, primaryKey, gids[0], elems, readKeys)
return c.dispatchSingleShardTxn(ctx, startTS, commitTS, prevCommitTS, primaryKey, gids[0], elems, readKeys)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject retries that leave the one-phase path

This single-shard branch is the only place prevCommitTS is preserved; if a retry carries PrevCommitTS but its mutations or read keys span shards, execution falls through to the 2PC path (prewriteTxn/commitPrimaryTxn), whose log builders encode only CommitTS and silently drop the dedup probe. In that multi-shard retry scenario a landed ambiguous attempt is treated as an ordinary write conflict, the adapter drops the saved write set and recomputes, which can reintroduce the duplicate push this feature is meant to prevent; if multi-shard retries are out of scope, fail them here instead of accepting PrevCommitTS without honoring it.

Useful? React with 👍 / 👎.

…lse positive (#796 lint)

reviewdog/golangci on bc11779 flagged two nilerr violations from the
LinearizableRead-fence pattern I introduced in round-8:

  kv/leader_routed_store.go:226 error is not nil (line 225) but it returns nil
  kv/shard_store.go:143         error is not nil (line 142) but it returns nil

The nilerr signature is 'if err != nil { return nil }'. My intent —
treating ReadIndex failure as 'couldn't verify, fall back to the
leader-routed slow path' rather than as a real error to surface to the
caller — is correct and matches LatestCommitTS's contract, but the
linter has no way to see that.

Extract small helpers (LeaderRoutedStore.tryLinearizableFence and the
package-level tryEngineLinearizableFence) that report ReadIndex success
as a bool, then express the call site as 'if !leaderOK && !fenceOK { fall
back }'. Same semantics, no error swallowed at a return statement, and
the linter is satisfied without a //nolint comment.

Behavior unchanged: all six dedup tests pass; gofmt + vet clean.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 28, 2026

@claude review

Round-9 fixing the two nilerr lint violations reviewdog/golangci flagged on round-8's LinearizableRead fence:

kv/leader_routed_store.go:226 error is not nil (line 225) but it returns nil
kv/shard_store.go:143         error is not nil (line 142) but it returns nil

The nilerr signature is if err != nil { return nil }. My intent — treating ReadIndex failure as "couldn't verify, fall back to the leader-routed slow path" rather than as a real error to surface to the caller — matches LatestCommitTS's contract but the linter has no way to see that.

Fix: extract small helpers (LeaderRoutedStore.tryLinearizableFence and package-level tryEngineLinearizableFence) that report ReadIndex success as a bool, then express the call site as:

if !leaderOK && !fenceOK {
    return false, nil
}

Same semantics, no error swallowed at a return statement, no //nolint comment.

All 6 dedup tests pass; gofmt + vet clean.

Commit: b4695108.

@claude
Copy link
Copy Markdown

claude Bot commented May 28, 2026

Claude finished @bootjp's task in 2m 49s —— View job


Round-9 Code Review

  • Load tools
  • Read changed files: kv/leader_routed_store.go, kv/shard_store.go
  • Analyze tryLinearizableFence / tryEngineLinearizableFence helper extraction
  • Verify lint fix and semantic equivalence
  • Check carry-over items from prior rounds
  • Post final review

Verdict: Round-9 changes are correct. Both nilerr violations are resolved without behavior change.

The helper extraction correctly converts the "error-treated-as-bool" pattern into a proper bool-returning API. Semantic equivalence with R8 is verified. One minor observation about the helpers' relationship to each other follows.


Nilerr Fix — LeaderRoutedStore.CommittedVersionAt (kv/leader_routed_store.go:221)

Old R8 pattern (triggered nilerr):

if !s.leaderOKForKey(ctx, key) {
    if s.coordinator == nil { return false, nil }
    if _, err := s.coordinator.LinearizableRead(ctx); err != nil {
        return false, nil  // ← nilerr: err ≠ nil but return nil error
    }
}

New R9 pattern (clean):

if !s.leaderOKForKey(ctx, key) && !s.tryLinearizableFence(ctx) {
    return false, nil  // ← no error in scope; clean
}

Semantic equivalence confirmed:

State !leaderOK !fenceOK Old result New result
Leader, valid lease false (short-circuit) read local read local ✓
Not leader, ReadIndex succeeds true false read local read local ✓
Not leader, ReadIndex fails true true (false, nil) (false, nil)
Not leader, coordinator nil true true (false returned by helper) (false, nil) (false, nil)

The coordinator-nil case previously had an explicit early return (if s.coordinator == nil { return false, nil }). In R9, tryLinearizableFence guards s.coordinator == nilreturn false!fenceOK = true → combined condition true → same result. ✓


Nilerr Fix — ShardStore.CommittedVersionAt (kv/shard_store.go:133)

Old R8 pattern (triggered nilerr):

if !isLinearizableRaftLeader(ctx, engine) {
    if _, err := engine.LinearizableRead(ctx); err != nil {
        return false, nil  // ← nilerr
    }
}

New R9 pattern (clean):

if !isLinearizableRaftLeader(ctx, engine) && !tryEngineLinearizableFence(ctx, engine) {
    return false, nil
}

Semantically identical under the same case analysis. The isLinearizableRaftLeader short-circuit means no tryEngineLinearizableFence is called when the leader check passes, preserving the R8 cost profile. ✓

Note (non-blocking, pre-existing from R8): For the "leader with expired lease AND first ReadIndex fails" case, isLinearizableRaftLeader internally calls engine.LinearizableRead and fails, then tryEngineLinearizableFence tries a second ReadIndex. This double-ReadIndex was present in R8 and is preserved in R9 — not introduced by the refactor. It's bounded to a narrow failure window (expired lease + unreachable leader).


Helper Design

tryLinearizableFence (method on LeaderRoutedStore) vs tryEngineLinearizableFence (package-level function): The asymmetry is intentional and correct. LeaderRoutedStore holds the Coordinate interface, while ShardStore accesses the raftengine.LeaderView directly. The method form allows the s.coordinator == nil guard to be expressed naturally inside the helper; the package-level form takes the engine as a parameter and guards engine == nil.

Both helpers have clear comment blocks explaining the intentional error-suppression design — this is the key requirement for avoiding a //nolint directive (the intent is readable, the type system expresses it). ✓


Codex P2 from R8 — kv/sharded_coordinator.go:393 (new, not addressed in R9)

The latest codex review (on bc11779) flagged that multi-shard retries with a non-zero PrevCommitTS fall through to the 2PC path (prewriteTxn/commitPrimaryTxn), which doesn't encode PrevCommitTS in the log entries. The dedup probe never reaches the FSM for multi-shard retries.

In practice, the list-push dedup feature operates only on single-key (single-shard) RPUSH/LPUSH commands, so this gap doesn't affect the primary use case. Codex's suggestion — fail multi-shard retries with non-zero PrevCommitTS explicitly rather than silently honoring them without dedup semantics — is a reasonable defensive measure. Worth tracking for a follow-up but not blocking merge.


Carry-over from Prior Rounds (unchanged)

  1. Open P1 from Round-6 (adapter/redis.go, dispatchListPushReuse): self-inflicted reuse conflict — reuse dispatch applies at commitTS2 but returns ambiguous error; next retry sees ErrWriteConflict from its own committed write, clears pending, and potentially appends a duplicate. The fix is to probe CommittedVersionAt(firstWriteKey(pending.ops), pending.commitTS) before treating a reuse-path WriteConflict as a genuine third-party conflict.

  2. CommittedVersionAt retention guard (store/lsm_store.go:778, store/mvcc_store.go:476): Both backends return (false, nil) when commitTS < minRetainedTS instead of ErrReadTSCompacted. In the degraded case, resolveReuseLength gracefully falls through to resolveListMeta. Low urgency while compaction windows far exceed retry windows.

  3. buildRedirectRequests unit test: No test verifies PrevCommitTS survives the leader-redirect path with a non-zero value. Code is correct (verified round-2); gap is test coverage only.


Summary: Round-9 is a correct, minimal lint fix. The nilerr violations are resolved by converting the error-used-as-bool pattern into properly-typed bool-returning helpers with clear documentation. No behavior change. The open P1 (self-inflicted reuse conflict), the new codex P2 (multi-shard PrevCommitTS silent drop), and the two carry-over items remain the only open threads before merge.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b4695108a2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/redis.go
return pending.length, nil
}
}
currentMeta, _, mErr := r.resolveListMeta(ctx, key, r.readTS())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid failing after a reuse apply

When the prior attempt did not land and the reused write set applies successfully, this post-commit resolveListMeta can still fail (for example, the list had exactly MaxDeltaScanLimit deltas at the first read and an intervening non-conflicting push pushes it over the truncation limit). That returns an error to the client after the append has already committed; a client retry starts a fresh command with no pending state and can append the same element again. The legacy path did not add a fallible metadata read after a successful dispatch, so the reuse path needs to avoid surfacing this as a failed write or preserve enough idempotency state for the retry.

Useful? React with 👍 / 👎.

Round 10 of /loop review. Addresses 5 outstanding findings from
gh api repos/bootjp/elastickv/pulls/796/comments:

P1 (self-inflicted conflict) — adapter/redis.go:dispatchListPushReuse
  When a reuse-path dispatch returns ErrWriteConflict but our own
  reused commit_ts already landed (e.g. raft re-applied while caller
  saw conflict), surface success instead of bubbling the conflict.
  Probe CommittedVersionAt(probeKey, commitTS); if it hits, treat as
  no-op and resolve length via the normal reuse path.

P1 (linchpin doc) — docs/design/2026_05_21_proposed_txn_secondary_idempotency.md
  Re-prove the "E1 applies before E2 or never" race-freedom argument
  under the *cancellable* caller ctx that the current code threads
  through (applyRequests(ctx, ...) → Engine.Propose(ctx, ...) with a
  live ctx.Done() early-return). Ordering now derives from FIFO
  proposeCh + monotonic raft log indices and the fact that retries
  are fresh proposals on the new leader, not re-issues — not from
  context.Background() blocking.

P2 (retention guard) — store/{lsm_store,mvcc_store}.go:CommittedVersionAt
  Return ErrReadTSCompacted distinctly when the probed commit_ts is
  below the retention watermark; downstream callers (FSM, adapter)
  treat as "couldn't probe, fall back to safe path" — uniform with
  every other ErrReadTSCompacted caller in the tree (audited).

P2 (multi-shard reject) — kv/sharded_coordinator.go:dispatchTxn
  Fail-closed: return ErrTxnDedupRequiresSingleShard when an
  incoming txn request carries PrevCommitTS but its write set spans
  shards. The 2PC log builders encode only CommitTS, so honoring
  such a request would silently drop the dedup probe at the FSM and
  let the original :duplicate-elements anomaly reappear.

P2 (avoid failing after reuse apply) — adapter/redis.go:resolveReuseLength
  On reuse, if the post-commit meta-read fails, fall back to
  pending.length rather than propagating the read error — the apply
  has already landed, so refusing the client response is strictly
  worse than reporting the locally-known length.

Caller audit (per /loop semantic-change rule):
- CommittedVersionAt: 5 call sites (FSM, 2× adapter reuse paths,
  ShardStore router, LeaderRoutedStore router). All treat
  (false, ErrReadTSCompacted) as "fall back to safe path"; uniform.
- ErrTxnDedupRequiresSingleShard: single emit site (multi-shard
  reject); surfaced to client as a commit error — fail-closed by
  design.
- ErrReadTSCompacted: existing callers (grpc adapter, dualwrite,
  s3 adapter, kv compactor) already treat as soft signal. New FSM
  + adapter call sites are consistent.

Tests:
- adapter/redis_list_dedup_test.go: added
  TestListPushDedup_SelfInflictedReuseConflict_ReturnsSuccess for
  the P1 regression (attempt 1 pre-rejects, reuse lands then errors
  with WriteConflict → expect n=1, dispatches=2, probeNoOps=0).
- go test ./adapter/ -run ListPushDedup → 7/7 pass.
- go test ./kv/ ./store/ → both packages pass.
- gofmt + go vet → clean.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 29, 2026

@claude review

Round 10: addressed all 5 outstanding inline findings (commit 71430f2).

P1

  1. Self-inflicted reuse conflict (adapter/redis.go dispatchListPushReuse) — on ErrWriteConflict after reuse, probe CommittedVersionAt(probeKey, commitTS); if our own reused commit landed, return success instead of bubbling the conflict. Regression test TestListPushDedup_SelfInflictedReuseConflict_ReturnsSuccess added.
  2. Linchpin doc (docs/design/2026_05_21_proposed_txn_secondary_idempotency.md) — re-proved race-freedom under cancellable caller ctx. Ordering derives from FIFO proposeCh + monotonic raft log indices and fresh-proposal-on-new-leader semantics, not from context.Background() blocking (which was the stale argument).

P2
3. Retention guard (store/{lsm_store,mvcc_store}.go CommittedVersionAt) — return ErrReadTSCompacted when probed commit_ts is below the retention watermark; callers fall back to safe path. Uniform with all other ErrReadTSCompacted consumers in tree.
4. Multi-shard PrevCommitTS reject (kv/sharded_coordinator.go dispatchTxn) — fail-closed ErrTxnDedupRequiresSingleShard when an incoming txn request carries PrevCommitTS but spans shards. The 2PC log builders encode only CommitTS, so silently honoring such a request would drop the probe at the FSM and let :duplicate-elements reappear.
5. Avoid failing after reuse apply (adapter/redis.go resolveReuseLength) — on reuse, if the post-commit meta-read fails, fall back to pending.length; the apply has already landed, refusing the client response is strictly worse than reporting the locally-known length.

Caller audit (per the /loop semantic-change rule):

  • CommittedVersionAt: 5 call sites; all treat (false, ErrReadTSCompacted) uniformly as "couldn't probe, fall back to safe path".
  • ErrTxnDedupRequiresSingleShard: single emit site, surfaced to client as a commit error — fail-closed by design.
  • ErrReadTSCompacted: existing callers (grpc adapter, dualwrite, s3 adapter, kv compactor) already treat as soft signal; new sites consistent.

Tests: go test ./adapter/ -run ListPushDedup 7/7 pass (incl. new P1 regression); go test ./kv/ ./store/ both pass; gofmt + go vet clean.

@github-actions
Copy link
Copy Markdown
Contributor

TLA+ spec divergence review (auto-triggered)

This PR touches files that the TLA+ safety spec has an anchor on (per
docs/design/2026_05_28_partial_tla_safety_spec.md §3),
so an AI review is requested below to verify the implementation has not drifted
from the model.

Anchored files changed in this PR head (71430f2):

  • kv/coordinator.go
  • kv/fsm.go
  • kv/sharded_coordinator.go
  • store/mvcc_store.go

What to check, by subsystem:

  • kv/hlc*.goNext() must respect the HLC-4 preconditions (i)/(ii)/(iii) from the design doc: bounded skew, logical-counter handoff on leader change (strategy (c) Observe(MaxAppliedHLC)), and the commit-time ceiling fence (fail-closed when wall_now >= physicalCeiling). Any change to the bit layout (48/16), the CAS loop, or the ceiling getter/setter is in scope.
  • kv/coordinator.go, kv/sharded_coordinator.goRunHLCLeaseRenewal, hlcRenewalInterval, hlcPhysicalWindowMs constants, and the new-term detection that calls Observe(fsm.MaxAppliedHLC()) (strategy (c)). Any change to renewal cadence, group selection, or fail-closed behaviour is in scope.
  • kv/transaction.go, kv/lock_resolver.go — OCC commit-ts assignment, lock-map encoding (key, lock_ts) -> start_ts, and the LockResolver action OCC-3 depends on. (M2 spec will land OCC-1..OCC-5; until then the spec doc §5.2 is the contract.)
  • kv/fsm.go — FSM apply of HLC lease entries (SetPhysicalCeiling), and any future MaxAppliedHLC() accessor that strategy (c) needs.
  • store/mvcc_store.go — version visibility, snapshot install, and the MVCC-1..MVCC-4 invariants (M3 scope).
  • distribution/** — route catalog versioning, SplitRange atomicity, and CatalogWatcher async fan-out (M4 scope).

If the change is correct but requires a spec update, edit tla/hlc/HLC.tla (or the corresponding M2..M5 module once landed) and the design doc in the same PR. The tla-check workflow runs the TLC model check on the same paths.


@claude review please verify TLA+ spec divergence per the checklist above.

@codex review please verify TLA+ spec divergence per the checklist above.

@chatgpt-codex-connector
Copy link
Copy Markdown

To use Codex here, create a Codex account and connect to github.

@claude
Copy link
Copy Markdown

claude Bot commented May 29, 2026

Claude finished @bootjp's task in 5m 48s —— View job


Round-10 Code Review

  • Load tools
  • Read changed files: adapter/redis.go, kv/fsm.go, store/lsm_store.go, kv/sharded_coordinator.go, design doc, test file
  • Analyze P1: self-inflicted reuse conflict fix
  • Analyze P1: linchpin doc race-freedom re-proof
  • Analyze P2: retention guard (ErrReadTSCompacted)
  • Analyze P2: multi-shard PrevCommitTS reject
  • Analyze P2: resolveReuseLength fallback on meta-read failure
  • Check carry-over items from prior rounds
  • Post final review

Verdict: Round-10 changes are correct and complete. All 5 open items are resolved.

This is the cleanest round yet. Each fix is well-motivated by the prior review thread and each has a precise test that pins the exact invariant. No new issues introduced. One minor observation and two carry-over items remain after this round.


P1 — Self-Inflicted Reuse Conflict (adapter/redis.go:3265–3285)

The three-path structure at ErrWriteConflict is correct:

ErrWriteConflict case Probe result Action
Reuse landed at T2 (self-inflicted) CommittedVersionAt(probeKey, T2) == true Update pending.commitTS = T2, call resolveReuseLength, return success
Genuine third-party conflict CommittedVersionAt(probeKey, T2) == false Drop pending, recompute
Probe itself errors / empty probeKey Same as miss: drop pending, recompute (safe conservative fallback)

Key design choices verified:

  • Probe uses the REUSE's commitTS (T2), not T1: correct. A self-inflicted conflict means our reuse write at T2 is now conflicting with itself, not T1.
  • pending.commitTS = T2 after probe hit: correct. The next retry (if any) must probe T2, since that is now the prior-committed version identity. Without this update, a subsequent retry would probe the wrong point.
  • Probe-error as miss: correct. If we can't probe T2 (store failure), recomputing is the safe fallback — we'd lose the "remember prior length" optimisation in an extreme scenario, but no duplicate write occurs.

Regression test correctness (TestListPushDedup_SelfInflictedReuseConflict_ReturnsSuccess): the test correctly simulates leadership churn by having the mock coordinator apply the reuse write atomically and then return ErrWriteConflict. The probe of T2 hits (the write is in the store), pending.commitTS is updated, resolveReuseLength returns pending.length == 1, and no recompute dispatch fires (coord.dispatches == 3, not 4). ✓


FSM Handling of ErrReadTSCompacted (kv/fsm.go:506–521)

The three-case switch is the exactly right structure:

switch {
case perr == nil && landed:      // no-op: prior committed, skip apply
case perr == nil:                 // probe miss: fall through to apply
case errors.Is(perr, store.ErrReadTSCompacted):  // can't tell: apply (safer than wrong no-op)
default:                          // unexpected error: surface it
}

The ErrReadTSCompacted branch's comment correctly explains the tradeoff:

  • Wrong no-op: would silently drop the retry's mutations forever — unacceptable
  • Hard failure: aborts every replica deterministically on a benign compaction condition — Raft FSM must be deterministic
  • Fall through to apply: a stale extra delta is a strict improvement — the prior round's concern is now explicitly addressed in code

The default branch surfaces unexpected errors via errors.WithStack(perr). This is correct: we only swallow ErrReadTSCompacted because it has a well-understood fallback; unknown errors from the store deserve visibility. ✓


Retention Guard (store/lsm_store.go:774–797, store/mvcc_store.go:476–491)

Both backends:

  1. Check readTSCompacted(commitTS, effectiveMinRetainedTS()) before the Pebble/in-memory lookup ✓
  2. Return (false, ErrReadTSCompacted) on the compacted case ✓

The compaction guard is now surface-uniformly across all timestamped store operations, and callers have explicit handling paths for it at every layer:

  • FSM: explicit errors.Is case → fall through to apply (correct)
  • resolveReuseLength: default case in switch → return pending.length (correct)
  • dispatchListPushReuse self-conflict probe: probe-error-as-miss → recompute (correct)
  • LeaderRoutedStore/ShardStore wrapper: wrappers don't check this specifically, they delegate to the raw store — the ErrReadTSCompacted from the raw store propagates up through LeaderRoutedStore.CommittedVersionAt and ShardStore.CommittedVersionAt to the caller above. Verify: ShardStore.CommittedVersionAt returns g.Store.CommittedVersionAt(...) when the leader fence passes — this correctly propagates the error. ✓

Multi-Shard PrevCommitTS Reject (kv/sharded_coordinator.go:405–407)

Placement is correct: the guard fires AFTER the single-shard fast path has already returned (line 387), so it is unreachable for any realistic RPUSH/LPUSH command (always single-key, always single-shard). The guard is defensive against future regressions where the dedup opt-in could accidentally be combined with multi-shard write sets.

The comment at lines 396–403 accurately explains why silent drop is wrong: the 2PC log builders encode only CommitTS, so the FSM probe would never fire on the secondary commit path, and the adapter's recompute-on-conflict fallback would re-append the duplicate.

One thing to confirm: ErrTxnDedupRequiresSingleShard is not in isRetryableRedisTxnErr, so it surfaces to the client as a command error (not retried by retryRedisWrite). That is the intended fail-closed behavior. ✓


resolveReuseLength Meta-Read Fallback (adapter/redis.go:3320–3339)

The switch/fallthrough structure is cleaner than the if-chain in prior rounds:

switch {
case perr == nil && hit:  → return pending.length  (FSM no-op'd at prevCommitTS)
case perr == nil:          → fall through to resolveListMeta  (FSM applied at fresh ts)
default:                   → return pending.length  (degrade gracefully)
}

And then resolveListMeta failure returns pending.length as a second-level fallback. The docblock covers three specific failure modes (ErrReadTSCompacted, other probe errors, MaxDeltaScanLimit overflow on meta-read). The reasoning is sound: surfacing an error AFTER a committed apply would drive the client into a retry with no pending state, defeating the dedup mechanism entirely.

Correctness for the self-inflicted case: after a self-inflicted conflict fix updates pending.commitTS = T2, resolveReuseLength probes T2 (not T1). The probe hits (we just landed at T2), and pending.length (from T1's snapshot) is returned. This is correct: the reuse applied the same ops as T1 would have (same seq numbers), so the post-push length is identical. ✓


Linchpin Doc Race-Freedom Re-Proof

The corrected proof is valid. The key fix: race-freedom was previously attributed to context.Background() preventing Propose from early-returning, which is factually wrong (the actual path threads the caller ctx). The correct invariant chain is:

  1. Sequential dispatch: adapter blocks on E1's dispatch returning before issuing E2
  2. FIFO proposeCh: E1 is enqueued in the channel before E2 (guaranteed by Configure Renovate #1 + single-goroutine dispatch)
  3. Monotonic log indices: Raft assigns indices in dequeue order from proposeCh
  4. Quorum commitment: if E1 committed on leader L1, its log index propagated to a majority → L2 inherits it at a lower index than E2's fresh-tail assignment

The cancelPendingProposal subtlety is correctly handled: cancellation removes the notification entry but the payload may still append to the log. On retry, the adapter creates a NEW proposal (fresh commitTS, fresh request ID), so there is no risk of "E1 being re-proposed at a higher index than E2." The dedup mechanism handles the case where E1's fate is ambiguous (committed-but-notification-lost). ✓


Minor: dispatchListPushReuse probe-error fallback not commented

When CommittedVersionAt(probeKey, commitTS) returns (_, perr) with perr != nil in the self-inflicted conflict guard (lines 3276–3280), the error is silently treated as a miss — the code falls through to return 0, true, errors.WithStack(dispErr). This is the correct safe fallback (if we can't probe, recompute), but unlike every other probe-error path in this file, there is no comment explaining the intent. A one-line note would help the next reader distinguish intentional silence from missing error handling.

Fix this →


Carry-over from Prior Rounds (unchanged, non-blocking)

  1. buildRedirectRequests unit test (from R2): no test verifies PrevCommitTS survives the leader-redirect path. Code is correct (verified R2); test coverage gap only.
  2. Double ReadIndex on expired-lease leader path (from R8): when the leader check in isLinearizableRaftLeader fails (expired lease) and issues one ReadIndex, then tryEngineLinearizableFence issues a second one. Bounded to a narrow expired-lease-plus-unreachable-leader window. Pre-existing from R8, not introduced by R10.

Summary: Round-10 is correct and comprehensive. All five open items (self-inflicted reuse conflict, linchpin doc, retention guard, multi-shard reject, meta-read fallback) are addressed cleanly. The FSM's three-case ErrReadTSCompacted switch is the standout piece — the reasoning is correct and the comment explains the tradeoff exactly. Ready to merge with the two non-blocking carry-over items tracked as future work.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 71430f2039

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread kv/fsm.go Outdated
Comment on lines +511 to +514
case errors.Is(perr, store.ErrReadTSCompacted):
// The prior attempt's commit_ts is below the retention watermark
// so we cannot tell "didn't land" from "reclaimed" (codex P2).
// Fall through to apply normally: a stale extra delta is a
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Do not branch FSM apply on local compaction

When a retry entry with PrevCommitTS is applied after some replicas have advanced their local retention watermark but others have not, this branch makes compacted replicas fall through and apply at the fresh commitTS while replicas that still retain the previous version return nil and no-op. FSM compaction is driven per runtime from local wall clock/store state rather than by this raft entry, so ErrReadTSCompacted is not a deterministic input to raft apply; this can leave different replicas with different MVCC versions for the same log entry.

Useful? React with 👍 / 👎.

Comment thread adapter/redis.go Outdated
Comment on lines +3277 to +3278
pending.commitTS = commitTS
length, lerr := r.resolveReuseLength(ctx, key, pending)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Re-read length after fresh self-conflict apply

In the self-inflicted ErrWriteConflict path, setting pending.commitTS to the fresh attempt before calling resolveReuseLength makes the exact-version probe hit and return pending.length, even though this path just proved the reused write set applied at the fresh commitTS rather than no-oping a prior attempt. If the first attempt did not land and another non-conflicting list change committed between attempts (for example an empty-list RPUSH retry with an intervening LPUSH at a different sequence), the actual post-apply length is larger but the client still gets the stale first-attempt length; fresh evidence beyond the earlier length comment is this new self-conflict branch's update of pending.commitTS before reconstruction.

Useful? React with 👍 / 👎.

Round 11 of /loop review on PR #796. Addresses new findings on commit
71430f2 (round-10).

codex P1 (kv/fsm.go:514) — Do not branch FSM apply on local compaction

Round-10 added an ErrReadTSCompacted fall-through branch in the
one-phase dedup probe, citing "stale extra delta is strict improvement".
Codex correctly flagged this as non-deterministic across replicas: FSM
compaction advances minRetainedTS from local wall clock and per-replica
scheduler (kv/compactor.go:safeMinTS), so replicas applying the same
log entry at the same log index can disagree on the probe outcome —
compacted-replica falls through and applies at the fresh commit_ts,
non-compacted replica returns landed and no-ops. Split-brain.

Fix: revert the retention guard inside CommittedVersionAt itself
(both pebbleStore and mvccStore). The probe is now a single
pebble.Get / sorted-slice lookup and never returns ErrReadTSCompacted.
For the option-2 use case this is deterministic across replicas —
each per-element key carries a single MVCC version that physical
compaction does not retire (no superseding version). The cluster-wide
invariant retentionWindow > max retry latency keeps the rare
never-landed-below-compacted-floor case out of reach. Removed the
now-dead ErrReadTSCompacted branch in FSM apply; the helper
dedupProbeOnePhase has the simple shape landed → no-op,
not-landed → apply, err → propagate. Earlier round-10 P2 finding is
explicitly reverted with rationale documented in the store and design
doc.

ErrReadTSCompacted continues to surface from GetAt/ScanAt/RawGetAt/
RawScanAt (genuine historical reads) where it expresses the truthful
"value at this timestamp is no longer recoverable".

codex P2 (adapter/redis.go:3278) — Re-read length after fresh self-conflict apply

The round-10 self-inflicted-conflict guard set pending.commitTS =
commitTS (fresh attempt) before calling resolveReuseLength, which
probes at pending.commitTS, hits, and returns pending.length — the
previous attempt's projected length. If a non-conflicting list
mutation committed between attempts, the actual post-apply length is
larger but the client gets the stale prior-attempt value.

Fix: split into two helpers.

  - resolveReuseLength: probe at pending.commitTS; on hit return
    pending.length (prior attempt landed, no intervening write
    affected this list).
  - resolveLengthAfterFreshApply: skip the probe shortcut; always
    re-read meta. Used by the self-conflict branch where we know the
    fresh commit_ts landed and pending.length is stale.

Both fall back to pending.length on resolveListMeta failure to honor
codex P2 round-10 (avoid failing after a reuse apply).

reviewdog/golangci-lint failures

1. kv/fsm.go:489 cyclop 13 > 10 — extracted dedupProbeOnePhase helper.
   Determinism rationale lives at the call site; the helper keeps the
   surface area small.
2. adapter/redis.go:3320 unparam (resolveReuseLength's error always
   nil) — dropped the error return; the degraded-fallback contract is
   now expressed in the signature.
3. adapter/redis.go:3336 nilerr — same fix (signature change), no
   longer swallows an error explicitly.
4. kv/sharded_coordinator.go:565 cyclop 11 > 10 on dispatchTxn —
   extracted dispatchMultiShardTxn helper carrying the 2PC branch
   including the round-10 ErrTxnDedupRequiresSingleShard reject.

Caller audit (per /loop semantic-change rule):

- CommittedVersionAt: 5 call sites (FSM dedup probe via new helper,
  adapter dispatchListPushReuse self-conflict guard, adapter
  resolveReuseLength probe shortcut, ShardStore router,
  LeaderRoutedStore router). With retention guard removed, no caller
  sees ErrReadTSCompacted here; existing err != nil paths still fire
  on genuine pebble/network errors and degrade gracefully.
- ErrReadTSCompacted: still emitted by GetAt, ScanAt, RawGetAt,
  RawScanAt. Existing consumers (adapter/grpc.go, adapter/s3.go,
  proxy/dualwrite.go, kv/compactor.go) unchanged.
- resolveReuseLength: signature changed to drop error return. Sole
  callers are dispatchListPushReuse (updated) and doc comments in
  routers; no test mocks return error from this shape.

Tests:
- go test ./adapter/ -run 'ListPush|Dedup|TestRedis_LPUSH|TestRedis_RPUSH' passes.
- go test ./kv/ ./store/ both pass.
- gofmt, go vet, golangci-lint run all clean (0 issues across
  adapter/kv/store packages).
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@github-actions
Copy link
Copy Markdown
Contributor

TLA+ spec divergence review (auto-triggered)

This PR touches files that the TLA+ safety spec has an anchor on (per
docs/design/2026_05_28_partial_tla_safety_spec.md §3),
so an AI review is requested below to verify the implementation has not drifted
from the model.

Anchored files changed in this PR head (04876c0):

  • kv/coordinator.go
  • kv/fsm.go
  • kv/sharded_coordinator.go
  • store/mvcc_store.go

What to check, by subsystem:

  • kv/hlc*.goNext() must respect the HLC-4 preconditions (i)/(ii)/(iii) from the design doc: bounded skew, logical-counter handoff on leader change (strategy (c) Observe(MaxAppliedHLC)), and the commit-time ceiling fence (fail-closed when wall_now >= physicalCeiling). Any change to the bit layout (48/16), the CAS loop, or the ceiling getter/setter is in scope.
  • kv/coordinator.go, kv/sharded_coordinator.goRunHLCLeaseRenewal, hlcRenewalInterval, hlcPhysicalWindowMs constants, and the new-term detection that calls Observe(fsm.MaxAppliedHLC()) (strategy (c)). Any change to renewal cadence, group selection, or fail-closed behaviour is in scope.
  • kv/transaction.go, kv/lock_resolver.go — OCC commit-ts assignment, lock-map encoding (key, lock_ts) -> start_ts, and the LockResolver action OCC-3 depends on. (M2 spec will land OCC-1..OCC-5; until then the spec doc §5.2 is the contract.)
  • kv/fsm.go — FSM apply of HLC lease entries (SetPhysicalCeiling), and any future MaxAppliedHLC() accessor that strategy (c) needs.
  • store/mvcc_store.go — version visibility, snapshot install, and the MVCC-1..MVCC-4 invariants (M3 scope).
  • distribution/** — route catalog versioning, SplitRange atomicity, and CatalogWatcher async fan-out (M4 scope).

If the change is correct but requires a spec update, edit tla/hlc/HLC.tla (or the corresponding M2..M5 module once landed) and the design doc in the same PR. The tla-check workflow runs the TLC model check on the same paths.


@claude review please verify TLA+ spec divergence per the checklist above.

@codex review please verify TLA+ spec divergence per the checklist above.

@chatgpt-codex-connector
Copy link
Copy Markdown

To use Codex here, create a Codex account and connect to github.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 29, 2026

@claude review

Round 11: addressed the 2 codex findings + 4 reviewdog lint failures on round-10 commit 71430f20 (now 04876c0e).

codex P1 — Do not branch FSM apply on local compaction (kv/fsm.go)

Codex correctly flagged that branching FSM apply on ErrReadTSCompacted is non-deterministic across replicas — minRetainedTS advances from local wall clock per kv/compactor.go:safeMinTS, so two replicas applying the same log entry at the same log index can disagree on the probe outcome and produce diverging MVCC histories.

Fix: reverted the retention guard inside CommittedVersionAt itself (both pebbleStore and mvccStore). The probe is now a single pebble.Get / sorted-slice lookup; it never returns ErrReadTSCompacted. For the option-2 use case this is deterministic — each per-element key carries a single MVCC version that physical compaction does not retire. The dead ErrReadTSCompacted branch in FSM is removed; the new dedupProbeOnePhase helper has the shape landed → no-op, not-landed → apply, err → propagate. This explicitly reverts the round-10 P2 fix with the new rationale documented in store/lsm_store.go, store/mvcc_store.go, and the design doc §race-freedom.

ErrReadTSCompacted continues to surface from GetAt/ScanAt/RawGetAt/RawScanAt for genuine historical reads (grpc adapter, s3 adapter, dualwrite proxy) — unchanged.

codex P2 — Re-read length after fresh self-conflict apply (adapter/redis.go)

The round-10 self-inflicted-conflict guard set pending.commitTS = commitTS before calling resolveReuseLength, which probed at pending.commitTS, hit, and returned pending.length — the prior-attempt projected length, stale w.r.t. intervening non-conflicting writes.

Fix: split into two helpers — resolveReuseLength (probe shortcut for the successful-reuse path where pending.length is correct) and resolveLengthAfterFreshApply (always re-read meta, used by the self-conflict path). Both fall back to pending.length on resolveListMeta failure to honor codex P2 round-10 ("avoid failing after a reuse apply").

reviewdog/golangci-lint failures (4)

  1. kv/fsm.go:489 cyclop 13>10 — extracted dedupProbeOnePhase helper.
  2. adapter/redis.go:3320 unparam — dropped error return from resolveReuseLength (always nil after round-10 fallback-to-pending.length contract).
  3. adapter/redis.go:3336 nilerr — same fix.
  4. kv/sharded_coordinator.go:565 cyclop 11>10 on dispatchTxn — extracted dispatchMultiShardTxn helper.

Caller audit (per /loop semantic-change rule):

  • CommittedVersionAt: 5 call sites; with retention guard removed, err != nil paths still fire on genuine errors and degrade gracefully (no caller depended on the ErrReadTSCompacted semantic specifically — they all fell back to safe path on any error).
  • ErrReadTSCompacted: still emitted by GetAt/ScanAt/RawGetAt/RawScanAt; existing consumers unchanged.
  • resolveReuseLength signature change: sole caller dispatchListPushReuse updated; no external mock returns error from this shape.

Tests: go test ./adapter/ -run 'ListPush|Dedup|TestRedis_LPUSH|TestRedis_RPUSH' passes; go test ./kv/ ./store/ both pass; gofmt + go vet + golangci-lint run all clean (0 issues).

@claude
Copy link
Copy Markdown

claude Bot commented May 29, 2026

Claude finished @bootjp's task in 6m 17s —— View job


Round-11 Code Review

  • Load tools
  • Read changed files: fsm.go, sharded_coordinator.go, adapter/redis.go, store/lsm_store.go, store/mvcc_store.go
  • Analyze codex P1 fix: FSM determinism (ErrReadTSCompacted removed)
  • Analyze codex P2 fix: resolveLengthAfterFreshApply split
  • Analyze lint fixes (cyclop, unparam, nilerr)
  • Check TLA+ spec divergence request
  • Post final review

Verdict: Round-11 changes are correct. All 6 items (2 codex + 4 lint) are resolved.

The retention-guard revert is the right call. The self-conflict length split is correct. No new issues introduced.


P1 — ErrReadTSCompacted Removed from CommittedVersionAt (store/lsm_store.go:788, store/mvcc_store.go:480)

The determinism argument is sound and the compaction safety argument holds end-to-end.

Determinism argument verified:
kv/compactor.go:164 calls safeMinTS(now) with time.Now(), and safeMinTS derives the watermark from the local wall clock and per-replica scheduler. Two replicas applying the same log entry at the same log index can have different minRetainedTS values if one compacted recently and the other hasn't run its compactor tick. Branching FSM apply on this value is genuinely non-deterministic → split-brain. The revert is correct. ✓

Compaction safety argument verified by reading shouldDeleteCompactionVersion (store/lsm_store.go:1505):

Timestamps are stored inverted (^ts at fillEncodedKey:339), so the Pebble iterator sees NEWER versions first within a user-key range. The compaction logic keeps the FIRST version at/below minTS (the "visible snapshot") and deletes older ones. For a list element key listItemKey(key, N):

  • If there's only one MVCC version at T1 (no RPOP/LPOP wrote a tombstone): the key is kept by compaction — there are no older versions to delete. CommittedVersionAt(key, T1) correctly returns true forever. ✓

  • If RPOP wrote a tombstone at T2 > T1: compaction with minTS ≥ T2 deletes the T1 version (T2 is the "visible" one, T1 is the "old" one). CommittedVersionAt(key, T1) returns false after this compaction. BUT: the RPOP at T2 modifies the list boundary key (listItemKey(key, Tail-1) or listItemKey(key, Head)). The retry's pending.readKeys includes this boundary key at pending.startTS. If T2 > pending.startTS (which it must be — the RPOP committed after the retry started), checkConflictsLocked fires ErrWriteConflict BEFORE the FSM probe, causing dispatchListPushReuse to clear pending and recompute. The FSM probe for a post-RPOP compacted key is never reached incorrectly. ✓

Note: MVCC tombstones are regular Pebble key-value pairs with the tombstone bit set in the value header — they are NOT Pebble Delete operations. So pebble.Get(encodeKey(key, T_tombstone)) returns the tombstone value (not ErrNotFound), and CommittedVersionAt correctly returns true for tombstone versions as documented. ✓


P2 — resolveLengthAfterFreshApply Split (adapter/redis.go:3347–3358)

The two-helper structure is correct and eliminates the length undercount.

Before (R10):

self-conflict detect → pending.commitTS = T2 → resolveReuseLength(pending.commitTS=T2)
  → CommittedVersionAt(probeKey, T2) → TRUE (we just committed at T2)
  → return pending.length  ← STALE: doesn't include non-conflicting LPUSH between attempts

After (R11):

self-conflict detect → pending.commitTS = T2 → resolveLengthAfterFreshApply
  → resolveListMeta(r.readTS())  ← reads current meta including the LPUSH ✓

The standard reuse path (no self-conflict, FSM probe hit) still returns pending.length via resolveReuseLength — that case is correct because the FSM no-op'd and pending.length IS the right count. ✓

The shared resolveLengthAfterFreshApply is called by both:

  1. resolveReuseLength (probe-miss fallthrough: fresh apply at new commitTS, re-read needed)
  2. dispatchListPushReuse self-conflict path (fresh apply at T2 confirmed via probe, re-read needed)

Both paths correctly need a fresh meta-read; the deduplication is correct. ✓


Lint Fixes

dedupProbeOnePhase extraction (kv/fsm.go:530):
The helper has a clean 3-path return contract documented explicitly:

  • (true, nil) → no-op
  • (false, nil) → fall through
  • (false, err) → propagate

handleOnePhaseTxnRequest now reads as a linear sequence of checks without nested branching. The cyclop budget is restored. ✓

dispatchMultiShardTxn extraction (kv/sharded_coordinator.go:398):
dispatchTxn now has ≤5 decision points (readKeys guard, groupMutations, primaryKey guard, resolveTxnCommitTS, single-vs-multi branch). The comment explaining the extraction rationale is accurate. ✓

resolveReuseLengthint64 (adapter/redis.go:3330):
The unparam lint was correct: resolveReuseLength always returned a nil error (all error paths convert to pending.length degraded return before the return). Dropping the error return makes the contract explicit and removes the golangci nilerr false positive where the call site checked a return that was always nil. The call site change:

// Before: return r.resolveReuseLength(...), false, rerr  (rerr was always nil)
// After:  return r.resolveReuseLength(...), false, nil   // explicit


TLA+ Spec Divergence Check (github-actions auto-trigger)

The referenced spec file docs/design/2026_05_28_partial_tla_safety_spec.md does not exist in this repository. The tla/ directory also does not exist. The auto-trigger is based on a non-existent spec anchor.

For the record, against the checklist in the trigger:

  • kv/fsm.go: The dedup probe change improves FSM determinism (strictly removes a non-deterministic branch); spec alignment is improved, not regressed.
  • kv/sharded_coordinator.go: Pure refactor; no semantic change to coordinator dispatch logic.
  • store/mvcc_store.go: Implementation detail of CommittedVersionAt; outside any HLC/OCC/MVCC spec scope.

No spec update needed or possible — there is no spec to update.


Carry-over from Prior Rounds (unchanged, non-blocking)

  1. buildRedirectRequests unit test (Round 2): No test covers the leader-redirect path with non-zero PrevCommitTS. Code is correct (verified Round 2); test coverage gap only.
  2. Double ReadIndex on expired-lease leader path (Round 9): When isLinearizableRaftLeader fails due to expired lease and issues one ReadIndex, then tryEngineLinearizableFence issues a second one. Narrow failure window, pre-existing from Round 8.

Summary: Round-11 is a clean and complete round. The FSM determinism fix is the critical one — the design doc's rationale (compaction does not remove single-version element keys unless RPOP/LPOP creates a superseding tombstone, at which point the OCC boundary fence already catches the conflict) is correct. The length split is precise. All 4 lint failures are fixed without behavior regression. Ready to merge.

@bootjp bootjp merged commit f481f2b into main May 30, 2026
11 checks passed
@bootjp bootjp deleted the docs/txn-idempotency-design branch May 30, 2026 05:04
@bootjp bootjp restored the docs/txn-idempotency-design branch May 30, 2026 05:16
bootjp added a commit that referenced this pull request May 30, 2026
## Summary

Follow-up to #796 (M1 + M2 + M3 RPUSH/LPUSH). Extends option-2 write-set
reuse + exact-ts dedup probe to **MULTI/EXEC** dispatched through
`runTransaction`. Mirrors `listPushCoreWithDedup` at the EXEC
granularity. Gated on the existing `onePhaseTxnDedup` flag — **default
off** — so no mixed-version divergence window.

This PR is **stacked on #796**: the base branch is
`docs/txn-idempotency-design`. Once #796 merges this rebases onto
`main`.

## Mechanism

1. **`txnContext.commit()` refactor** — split build-and-dispatch into
`prepareDispatch()` returning a `preparedTxnDispatch{elems, commitTS,
readKeys, ctx, cancel}`. `commit()` now calls `prepareDispatch()` then
dispatches; `runTransactionWithDedup` calls them separately so it can
intercept between prepare and dispatch. External behavior of `commit()`
is unchanged.
2. **`reusableExecTxn`** — EXEC analogue of `reusableListPush`. The
cached `results` array is the M3 R1 result reconstruction: computed once
from attempt 1's `startTS` snapshot, returned as-is on any reuse. Same
invariance argument as RPUSH/LPUSH `length`.
3. **`dispatchExecReuse`** — one reuse iteration. Dispatches with
`PrevCommitTS=pending.commitTS`. On `WriteConflict`, the
self-inflicted-conflict guard probes `CommittedVersionAt(probeKey,
freshCommitTS)`; if hit, return cached results (codex P1 round-10
class). Otherwise drop pending.
4. **`runTransactionWithDedup` + `firstExecAttempt`** — the option-2
retry loop. First iteration builds the txn and dispatches; retryable
failure stashes pending; subsequent iterations call `dispatchExecReuse`
with `PrevCommitTS`.
5. **`runTransaction` gate** — when `onePhaseTxnDedup` is on, route to
`runTransactionWithDedup`; otherwise keep the legacy retry loop
byte-identical.

## Caller audit (per /loop semantic-change rule)

- `prepareDispatch` (new): callers are `commit()` and
`firstExecAttempt`; both honor the `defer prepared.cancel()` contract.
- `commit()`: internal structure changed; external behavior preserved
(no test directly invokes it).
- `runTransactionWithDedup` / `firstExecAttempt` / `dispatchExecReuse` /
`reusableExecTxn`: all new symbols, exercised only from the gated
`runTransaction` path.

## Tests (`adapter/redis_exec_dedup_test.go`)

| Test | Scenario | Pins |
|---|---|---|
| `TestExecDedup_LandedPriorAttempt_ReturnsCachedResults` | Attempt 1
lands then errors | Reuse FSM probe hits; cached "OK" returned; value
matches |
| `TestExecDedup_PriorAttemptDidNotLand_Applies` | Attempt 1 pre-rejects
| Reuse applies fresh; results match |
| `TestExecDedup_GenuineConflictRebuildsAndApplies` | Concurrent SET
advances key past `pending.startTS` | Reuse OCC-conflicts; probe misses;
pending dropped; fresh attempt succeeds |
| `TestExecDedup_SelfInflictedReuseConflict_ReturnsSuccess` | Reuse
lands then surfaces WriteConflict | Self-conflict guard probes fresh
`commitTS`; cached results returned (no double-apply) |
| `TestExecDedup_DisabledKeepsLegacyPath` | Gate off | No probe; same
result as legacy `runTransaction` |

## Validation

- `go test ./adapter/ -run 'Txn|MULTI|EXEC|Dedup|TxnStartTS'` passes
- `go test ./kv/ ./store/` both pass
- `gofmt`, `go vet`, `golangci-lint run` all clean (0 issues across
`adapter/kv/store`)

## Scope

- **Single-mop EXEC** is the conservative scope per the design doc's
"Open questions". The mechanism (cache results array; OCC fence on
`readKeys`) works the same for multi-mop EXEC under the existing proof,
but the test matrix doubles, so multi-mop validation is a follow-up.
- Per-command reconstruction hooks for SET/INCR/HSET etc. when
dispatched **outside** MULTI are still a follow-up; those go through
individual command paths, not `runTransaction`.

## Design doc reference

`docs/design/2026_05_21_proposed_txn_secondary_idempotency.md` §M3.
bootjp added a commit that referenced this pull request May 30, 2026
gemini HIGH (adapter/redis.go:3112 dispatchExecReuse):
  Clock().Next() bypasses the HLC-4 physical-ceiling fence;
  persistence-grade commit_ts allocation must use NextFenced() to
  avoid issuing a timestamp that collides with a subsequent leader's
  window after renewal. Switched dispatchExecReuse to NextFenced()
  with the standard error-propagation shape (matches prepareDispatch
  from PR #884 cherry-pick).

gemini MEDIUM (adapter/redis.go:3175 reuseCtx propagation):
  The original "fresh ctx from handlerContext per reuse attempt"
  pattern ignored outer cancellation -- a disconnected client would
  wait the full 10 s before the reuse returned. Derived reuseCtx from
  dispatchCtx (the caller's retry-loop ctx) so outer cancellation
  interrupts mid-attempt. Per-attempt redisDispatchTimeout still caps
  the dispatch the same way commit does for the first attempt.
  Matches listPushCoreWithDedup's caller-ctx threading.

gemini MEDIUM (adapter/redis.go:3215 readPin.Release on nil):
  False positive. ActiveTimestampToken.Release is documented
  nil-safe at kv/active_timestamp_tracker.go:58. The pattern
  readPin := r.pinReadTS(...); defer readPin.Release() is used
  unchanged in the existing runTransaction (legacy path); no change
  needed.

Caller audit per /loop semantic-change rule:

- dispatchExecReuse signature unchanged; the new error return is
  the same (bool, error) tuple position. Single caller is
  runTransactionWithDedup which already returns dispErr via dropping
  to the retry loop's error path -- the new NextFenced error reaches
  the same dispErr branch.
- reuseCtx parent change is local to runTransactionWithDedup; the
  only observable effect is faster cancellation propagation, which
  retryRedisWrite already gates on ctx.Done between attempts.

Note on pre-existing Clock().Next() callers:

dispatchListPushReuse (adapter/redis.go:3508) and the
listPushCoreWithDedup first-attempt site (adapter/redis.go:3679) --
both shipped on main as part of PR #796 -- still use Clock().Next()
and have the same HLC-4 ceiling-bypass exposure. Out of scope for
this PR (those are pre-existing on main, not introduced by PR-A's
diff), but should be fixed in a follow-up cleanup PR for parity.

Design doc:
  M3 "fresh reuseCtx from handlerContext" deviation note is struck
  and replaced with the dispatchCtx-derived rationale; future
  readers see why the earlier framing was wrong and what the
  current pattern is.

Validation:
  go test ./adapter/ -run 'ExecDedup|TxnMULTI' passes.
  go build ./adapter/... clean.
  golangci-lint run ./adapter/... 0 issues.
bootjp added a commit that referenced this pull request May 30, 2026
…i-mop (#887)

## Summary

PR #884 was merged into the stacked branch `docs/txn-idempotency-design`
(at `cbbde3d7`) but never reached `main` — main has only PR #796's M1 +
M2 + M3 RPUSH/LPUSH content. This PR re-lands the M3 EXEC reuse code on
top of main and bundles three follow-ups that move the design doc's
"still open" / "follow-up" items into "landed".

Base: `main`.

## Changes

### 1. Re-land M3 EXEC reuse (PR #884's content)

- `adapter/redis.go`: `txnContext.prepareDispatch()` split out of
`commit()`; `reusableExecTxn`; `dispatchExecReuse`;
`runTransactionWithDedup` + `firstExecAttempt`; gate at the top of
`runTransaction`.
- `adapter/redis_exec_dedup_test.go` (originally added in PR #884): 5
tests pinning all four reuse outcomes plus the gate-off legacy
equivalence.
- Cherry-pick adaptation: `prepareDispatch()` uses `Clock().NextFenced()
(uint64, error)` on current main; the PR #884 version targeted
`Clock().Next() (uint64)`. Same downstream semantics, error wired
through `preparedTxnDispatch`.

### 2. Close M2 open item — FSM other-txn exactness test

`TestOnePhaseDedup_OtherTxnVersionDoesNotMaskRetry`
(`kv/fsm_onephase_dedup_test.go`) pins exactness at the FSM apply layer:
a third-party version at `T_other=20` must not satisfy the FSM probe at
`T1=30`, so the retry falls through and applies at the fresh `T2=40`.
The store-layer pin already covers the primitive; this test covers the
dispatch path that uses it.

### 3. M3 multi-mop EXEC dedup test

`TestExecDedup_MultiMopLandedPriorAttempt_ReturnsCachedResults` extends
single-mop dedup to a 3-command MULTI/EXEC body (`SET a + SET b + DEL
c`). Validates that cached results + OCC `readKeys` fence are
mop-count-agnostic. Without dedup the DEL would re-execute to 0 on the
second pass — the test rejects that.

### 4. Design doc updates

- §M2 "still open" → "LANDED" with the new FSM test reference
- §M3 "runTransaction (MULTI/EXEC) — Still open" → "LANDED via PR #884"
with multi-mop test reference and acknowledgement of two intentional
deviations from the M1/M2 template that claude[bot] flagged on #884
(`readKeys` assembly order, fresh per-attempt `reuseCtx`)
- §M3 "standalone SET/INCR/HSET" called out as PR-B follow-up

## Caller audit (per /loop semantic-change rule)

- `prepareDispatch` (newly added): callers are `commit()` and
`firstExecAttempt`; both honor `defer prepared.cancel()`. External
behavior of `commit()` preserved.
- `commit()`: internal structure changed; external behavior preserved
(no test directly invokes it).
- `runTransactionWithDedup` / `firstExecAttempt` / `dispatchExecReuse` /
`reusableExecTxn`: all new symbols, exercised only from the gated
`runTransaction` path.

## Validation

- `go test ./adapter/ -run 'Dedup|Txn|MULTI|EXEC'` passes
- `go test ./kv/ ./store/` both pass
- `gofmt`, `go vet`, `golangci-lint run` all clean (0 issues)

## Relation to prior work

| PR | Merged into | Content |
|---|---|---|
| #796 (`f481f2b7`) | main | M1 + M2 + M3 RPUSH/LPUSH |
| #884 (`cbbde3d7`) | stacked branch, NOT main | M3 EXEC reuse
(stranded) |
| **This PR** | main | Re-land #884 + M2 cross-txn FSM test + multi-mop
test + doc updates |
| PR-B (next) | main | Standalone SET / INCR / HSET reuse paths |
| PR-C (next) | main | M4 Jepsen validation infrastructure |


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **Tests**
* Added comprehensive regression test suite for transaction retry and
deduplication handling, covering cached result reuse, conflict
detection, and legacy behavior paths
* Extended test coverage for standalone command transaction consistency

* **Documentation**
* Updated design documentation to reflect transaction handling
improvements and recent test coverage additions

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/bootjp/elastickv/pull/887?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant