Skip to content

feat(txn-dedup): standalone SET routes through option-2 dedup loop#888

Merged
bootjp merged 4 commits into
feat/txn-dedup-docs-and-testsfrom
feat/standalone-set-dedup
May 30, 2026
Merged

feat(txn-dedup): standalone SET routes through option-2 dedup loop#888
bootjp merged 4 commits into
feat/txn-dedup-docs-and-testsfrom
feat/standalone-set-dedup

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented May 30, 2026

Summary

Stacked on PR-A (#887). Extends option-2 dedup to standalone SET (dispatched outside MULTI). When r.onePhaseTxnDedup is on, r.set wraps the single command as a 1-element redcon.Command queue and dispatches via runTransactionWithDedup — reusing the M3 EXEC machinery instead of building a per-handler reusableSetTxn + dispatchSetReuse shape. SET already has applySet on txnContext, so this is the "free" extension to any command whose txn-state-aware apply hook already exists.

The standalone fast path (trySetFastPath) is intentionally bypassed under the gate: dedup is opt-in, and a non-dedup'd fast path under a dedup-on cluster would split the idempotency contract.

Base: feat/txn-dedup-docs-and-tests (PR #887).

Why standalone INCR / HSET are out of scope for THIS PR

INCR (in adapter/redis_compat_commands.go) and HSET both lack a txnContext.applyIncr / applyHSet implementation, so the "route through single-mop EXEC" pattern that worked here for SET cannot apply as-is — runTransactionWithDedup would reject the command at txn.apply ("unsupported in MULTI"). Bringing them in requires implementing applyIncr / applyHSet first (each ~30–50 LOC for the txn-state-aware read-compute-write shape), then the standalone handler routing is a one-liner via this same path. Tracked as separate follow-up PRs; until then, INCR and HSET keep today's buggy-under-churn behaviour — the design doc's stated default ("everything else keeps today's behaviour until its hook is added").

Caller audit (per /loop semantic-change rule)

  • r.set (handler): gate-on takes a new code path; gate-off preserved verbatim via setLegacy. No external callers exist (it is wired only into the redcon dispatch table).
  • setLegacy (new): byte-identical extraction of the pre-PR set() body. No external callers.
  • writeRedisStandaloneResult (new): translates a single-element []redisResult from runTransactionWithDedup into the bare redcon reply shape (no WriteArray wrapper). Single caller in this PR; future SET-pattern callers will reuse it.

Tests

  • adapter/redis_set_dedup_test.go (new):
    • TestStandaloneSetDedup_LandedPriorAttempt_ReturnsOK: gate-on + ambiguous attempt-1 land + reuse → probe hits → cached "OK" returned, dispatches=2, probeNoOps=1.
    • TestStandaloneSetDedup_DisabledKeepsLegacyPath: gate-off path runs no probe.
  • Existing TestRedis_SET* and dedup suites still pass.

Validation

  • go test ./adapter/ -run 'Dedup|Txn|MULTI|EXEC|TestRedis_SET' passes
  • gofmt, go vet, golangci-lint run all clean (0 issues)

Design doc

docs/design/2026_05_21_proposed_txn_secondary_idempotency.md §M3 — standalone SET marked LANDED, standalone INCR/HSET called out as follow-up with the precise reason they cannot land alongside SET in this PR.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b5b3a75c-93be-4bee-a83a-8ecf60379811

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/standalone-set-dedup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements deduplication for standalone SET commands by routing them through the transaction deduplication machinery (runTransaction) when the onePhaseTxnDedup gate is enabled. It extracts the legacy path into setLegacy, introduces writeRedisStandaloneResult to handle standalone response translation, adds corresponding unit tests, and updates the design documentation. I have no feedback to provide as there are no review comments and the implementation is clean and well-tested.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 30, 2026

@claude review

PR-B: standalone SET routes through option-2 dedup loop.

Mechanism: when r.onePhaseTxnDedup is on, r.set wraps the command as a 1-element redcon.Command queue and dispatches via runTransactionWithDedup. setLegacy preserves the pre-PR body byte-identical for the gate-off path. writeRedisStandaloneResult translates the single-element results array to a bare redcon reply (no WriteArray wrapper).

Standalone INCR / HSET are documented as out of scope because they lack txnContext.applyIncr / applyHSet hooks (would each be a ~30-50 LOC feature add).

Please verify: (1) setLegacy is byte-identical to the pre-PR set() body; (2) the writeRedisStandaloneResult coverage matches every redisResult type emitted by applySet (resultString OK, resultNil for NX/XX miss, resultBulk for GET old-value); (3) the gate-off path is unchanged (TestStandaloneSetDedup_DisabledKeepsLegacyPath pins this).

@claude
Copy link
Copy Markdown

claude Bot commented May 30, 2026

Claude finished @bootjp's task in 2m 57s —— View job


PR #888 Review — standalone SET dedup

Todo list:

  • Load tools
  • Read changed files and diff
  • Verify (1): setLegacy is byte-identical to pre-PR set() body
  • Verify (2): writeRedisStandaloneResult covers all redisResult types from applySet
  • Verify (3): gate-off path unchanged (TestStandaloneSetDedup_DisabledKeepsLegacyPath)
  • General review: correctness, concurrency, edge cases
  • Post final review

Verification Results

(1) setLegacy byte-identical to pre-PR set() body — ✅ Confirmed

The diff cleanly confirms the extraction: the pre-PR set() body (everything after proxyToLeader) becomes setLegacy verbatim. First line is opts, err := parseRedisSetOptions(cmd.Args[3:], time.Now()), last line is conn.WriteString("OK") — no reordering, no rewriting. The comment "byte-identical" is accurate.

(2) writeRedisStandaloneResult coverage vs. applySet result types — ✅ Covered, with one minor observation

applySet can return these result types:

Scenario Result type Path
WRONGTYPE error resultError applySet line 2337
NX miss / XX miss resultNil applySetCondition line 2390
Normal SET resultString ("OK") applySetResult line 2410
GET option, no prior value resultNil applySetResult line 2413
GET option, returns old value resultBulk applySetResult line 2415

writeRedisStandaloneResult handles all five: resultNilWriteNull, resultErrorwriteRedisError, resultStringWriteString, resultBulkWriteBulk. The extra resultArray/resultInt arms are correct defensive coding for future callers.

Minor observation — resultArray is shallow: the array arm writes WriteBulkString for each element. Redis nested arrays (e.g. HGETALL) need recursive handling, so writeRedisStandaloneResult is not safe to reuse for commands whose applyXxx can produce resultArray with non-string elements. This is fine for applySet and for any future string-list command, but the function's doc comment says "future SET-pattern callers" — worth noting the shallow-array constraint so a future caller doesn't silently mangle a nested response.

(3) Gate-off path unchanged — ✅ Confirmed

When onePhaseTxnDedup is false, set() calls setLegacy(conn, cmd) with no other side effects. The TestStandaloneSetDedup_DisabledKeepsLegacyPath test correctly pins this: probeNoOps == 0 guarantees no dedup machinery was entered, and the test uses newDedupTestCoordinator(st, 1, false) (pre-reject on attempt 1) so any accidental routing through runTransactionWithDedup would be observable.


Issues

🔶 Redundant double gate — r.runTransaction vs. r.runTransactionWithDedup (adapter/redis.go:1135)

The gate-on branch calls r.runTransaction, which then immediately re-checks r.onePhaseTxnDedup:

// set() line 1134-1142:
if r.onePhaseTxnDedup {
    results, err := r.runTransaction([]redcon.Command{cmd})  // <-- indirect
    ...
}

// runTransaction() line 3088-3091:
func (r *RedisServer) runTransaction(queue []redcon.Command) ([]redisResult, error) {
    if r.onePhaseTxnDedup {
        return r.runTransactionWithDedup(queue)  // same gate, checked twice
    }
    ...
}

This is functionally correctrunTransactionWithDedup is reached — but it creates a misleading call chain. The PR description and the block comment say "dispatches via runTransactionWithDedup", which is true only by indirection. The exec() handler also calls runTransaction and relies on the same internal dispatch (line 1900), so there is prior art for this style, but exec() is called outside any gate. A reader of set() who knows runTransaction contains its own gate will wonder why the outer gate re-checks.

The cleaner call under the gate is r.runTransactionWithDedup([]redcon.Command{cmd}) directly, which removes the double-check and makes the intent explicit. Fix this →

🔶 Test gap: NX/XX miss and GET return paths not covered (adapter/redis_set_dedup_test.go)

TestStandaloneSetDedup_LandedPriorAttempt_ReturnsOK only exercises the resultString("OK") path. The resultNil (NX miss, or GET with no prior value) and resultBulk (GET with old value) paths through writeRedisStandaloneResult are not tested via the dedup route. These result types come from applySet under normal conditions (NX/XX/GET options) and are exercised by existing TestRedis_SET* tests, but not in the dedup context.

This is lower priority because writeRedisStandaloneResult is a straightforward switch with no branching logic beyond type dispatch, and the covered path (resultString) validates the routing is wired correctly. But since the PR calls out "coverage matches every redisResult type emitted by applySet" as a review criterion, it's worth noting the gap.


Verdict

All three verification targets pass. The implementation is correct: the dedup machinery is properly reached, the result translation is complete for all applySet output types, and the legacy path is untouched for gate-off. The redundant double-gate is a style issue worth fixing before merge (the fix is a one-word change: runTransactionrunTransactionWithDedup). The test coverage gap for NX/XX/GET dedup paths is a nice-to-have.

bootjp added a commit that referenced this pull request May 30, 2026
claude[bot] 🔶 (redundant double gate, adapter/redis.go:1135):
  set's gate-on branch called r.runTransaction which immediately
  re-checked the same r.onePhaseTxnDedup gate and routed to
  runTransactionWithDedup. The indirection was misleading -- the PR
  description said "dispatches via runTransactionWithDedup" but that
  was true only by indirection. Replaced with a direct call to
  runTransactionWithDedup; intent is now explicit and the double gate
  check is removed.

claude[bot] 🔶 (test gap, adapter/redis_set_dedup_test.go):
  Only the resultString OK path was tested through the dedup route.
  Added two regression tests covering the other applySet result types
  that go through writeRedisStandaloneResult:

  - TestStandaloneSetDedup_NXMissReturnsNil pins resultNil routing:
    SET k v NX on an existing key returns nil; the cached attempt-1
    result must round-trip through writeRedisStandaloneResult ->
    WriteNull, leaving conn.bulk == nil.

  - TestStandaloneSetDedup_GETOptionReturnsOldBulk pins resultBulk
    routing: SET k v GET on an existing key returns the prior value
    as a bulk reply; conn.bulk must be the prior bytes.

  Both fire the ambiguous-attempt-1-lands path (newDedupTestCoordinator
  ambiguousLands=true) so the result MUST come from the cached
  attempt-1 array, not a re-execution.

claude[bot] minor observation (shallow-array constraint in
writeRedisStandaloneResult):
  Documented in the function's doc comment that the resultArray arm
  flattens via WriteBulkString and is NOT safe to reuse for callers
  whose applyXxx emits nested arrays. Future HGETALL-pattern callers
  must either pre-flatten their result or extend the switch.

Caller audit per /loop semantic-change rule:

- set's gate-on branch is the only call site changed; the new direct
  call to runTransactionWithDedup uses the same []redcon.Command{cmd}
  shape and the same []redisResult return type. runTransactionWithDedup
  is exported within the package and has no other callers outside the
  legacy runTransaction path (which is unchanged).
- writeRedisStandaloneResult signature unchanged. Documentation
  expanded but no behavior change.
- New tests are pure additions; no existing tests modified.

Validation:
  go test ./adapter/ -run StandaloneSetDedup passes (5 tests now).
  go build ./adapter/... clean.
  golangci-lint run ./adapter/... 0 issues.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 30, 2026

@claude review

Round 1 fixes (commit 963a2d3):

  • 🔶 Redundant double gate: set's gate-on branch now calls runTransactionWithDedup directly instead of runTransaction (which re-checked the same gate). Intent explicit, indirection removed.
  • 🔶 Test gap: added TestStandaloneSetDedup_NXMissReturnsNil (resultNil routing) and TestStandaloneSetDedup_GETOptionReturnsOldBulk (resultBulk routing). Both fire the ambiguous-attempt-1-lands path so the result must come from the cached attempt-1 array.
  • Minor: documented writeRedisStandaloneResult's shallow-array constraint in its doc comment per claude[bot]'s nested-array observation.

go test ./adapter/ -run StandaloneSetDedup passes (5 tests). gofmt/vet/golangci-lint clean.

@claude
Copy link
Copy Markdown

claude Bot commented May 30, 2026

Claude finished @bootjp's task in 2m 1s —— View job


PR #888 Review — Round 2 (commit 963a2d3 fixes)

Todo list:

  • Load tools
  • Read adapter/redis.go — verify double-gate fix
  • Read adapter/redis_set_dedup_test.go — verify new test cases
  • Verify shallow-array doc comment added to writeRedisStandaloneResult
  • Post final review

Verification Results

(1) Double-gate fix — ✅ Confirmed

adapter/redis.go:1142 now calls r.runTransactionWithDedup([]redcon.Command{cmd}) directly. The old runTransaction → re-check-gate → runTransactionWithDedup indirection is gone. The surrounding block comment at lines 1121–1133 clearly explains the routing rationale.

Minor style note (not a blocker): the inline comment at lines 1135–1141 uses // claude[bot] PR #888 review: as an attribution prefix. The CLAUDE.md convention says to not reference the current task/fix in code comments (that belongs in the commit message, not the source). The substantive "why" content (explaining the double-gate indirection) is correct and worth keeping — the attribution prefix is the only part that violates the convention. Worth trimming before merge: strip claude[bot] PR #888 review: and keep the rest.

(2) Test gap — ✅ Confirmed addressed

Both new tests are structurally sound:

Test Result type How pinned
TestStandaloneSetDedup_NXMissReturnsNil resultNil conn.bulk == nil, conn.err == "", stored value still "seed"
TestStandaloneSetDedup_GETOptionReturnsOldBulk resultBulk conn.bulk == "prior", new value "v1" committed

Both use newDedupTestCoordinator(st, 1, true) (ambiguousLands=true) so the result must come from the cached attempt-1 array, not a re-execution. The tripartite assertion (conn write + no error + storage state) in each test provides meaningful coverage.

Weak nil assertion (informational, not a blocker): TestStandaloneSetDedup_NXMissReturnsNil asserts conn.bulk == nil to pin that WriteNull was called. Since recordingConn.bulk starts as nil, this assertion is also satisfied if nothing at all was written. The test is still an effective regression pin because (a) any wrong branch (WriteString, WriteBulk, writeRedisError) would make the assertion fail or set conn.err/conn.bulk to non-nil, and (b) the storage-state check confirms the NX didn't apply. A wroteNull bool field on recordingConn would make it airtight, but the current form is adequate.

(3) Shallow-array doc comment — ✅ Confirmed

writeRedisStandaloneResult at lines 1202–1209 now documents the flat-array constraint clearly:

The resultArray arm writes each element via WriteBulkString... It does NOT recurse into nested arrays. A future caller whose applyXxx emits resultArray with non-string elements... must either pre-flatten its result or extend this switch with a recursive arm.

The warning is precise, actionable, and correctly identifies the failure mode.


Verdict

All three Round 1 findings are addressed. The implementation is correct. Two minor notes before merge:

  1. // claude[bot] PR #888 review: inline comment prefix (adapter/redis.go:1135): strip the attribution, keep the explanation. One-word fix, purely cosmetic.
  2. Nil-assertion strength in TestStandaloneSetDedup_NXMissReturnsNil: acceptable as-is; adding wroteNull bool to recordingConn would harden it but is not required for this PR.

Neither is a merge blocker. LGTM.

bootjp added 2 commits May 30, 2026 16:44
Stacked on PR-A (re-land of M3 EXEC reuse + multi-mop test).

Extends option-2 dedup to standalone SET (dispatched outside MULTI).
When `r.onePhaseTxnDedup` is on, `r.set` wraps the single command as a
1-element redcon.Command queue and dispatches via
runTransactionWithDedup — reusing the M3 EXEC machinery instead of
building a per-handler reusableSetTxn + dispatchSetReuse shape. SET
already has `applySet` on `txnContext`, so this is the "free"
extension to any command whose txn-state-aware apply hook already
exists.

The standalone fast path (`trySetFastPath`) is intentionally bypassed
under the gate: dedup is opt-in, and a non-dedup'd fast path under a
dedup-on cluster would split the idempotency contract.

Why standalone INCR / HSET are out of scope for THIS PR:
========================================================

INCR (in adapter/redis_compat_commands.go) and HSET both lack a
`txnContext.applyIncr` / `applyHSet` implementation, so the
"route through single-mop EXEC" pattern that worked here for SET
cannot apply as-is — `runTransactionWithDedup` would reject the
command at `txn.apply` ("unsupported in MULTI"). Bringing them in
requires implementing `applyIncr` / `applyHSet` first (each
~30–50 LOC for the txn-state-aware read-compute-write shape), then
the standalone handler routing is a one-liner via this same path.
Tracked as separate follow-up PRs; until then, INCR and HSET keep
today's buggy-under-churn behaviour, which is the design doc's
stated default ("everything else keeps today's behaviour until
its hook is added").

Caller audit (per /loop semantic-change rule):

- `r.set` (handler): gate-on takes a new code path; gate-off
  preserved verbatim via `setLegacy`. No external callers exist
  (it is wired only into the redcon dispatch table).
- `setLegacy` (new): byte-identical extraction of the pre-PR set()
  body. No external callers.
- `writeRedisStandaloneResult` (new): translates a single-element
  `[]redisResult` from `runTransactionWithDedup` into the bare
  redcon reply shape (no WriteArray wrapper). Single caller in
  this PR; future SET-pattern callers will reuse it.

Validation:

- adapter/redis_set_dedup_test.go (new):
  TestStandaloneSetDedup_LandedPriorAttempt_ReturnsOK,
  TestStandaloneSetDedup_DisabledKeepsLegacyPath. Existing
  TestRedis_SET / dedup suites still pass.
- gofmt, go vet, golangci-lint run all clean (0 issues).

Design doc updated to mark standalone SET as LANDED and call out
INCR/HSET as the next follow-up with the precise reason they
cannot land alongside SET in this PR.
claude[bot] 🔶 (redundant double gate, adapter/redis.go:1135):
  set's gate-on branch called r.runTransaction which immediately
  re-checked the same r.onePhaseTxnDedup gate and routed to
  runTransactionWithDedup. The indirection was misleading -- the PR
  description said "dispatches via runTransactionWithDedup" but that
  was true only by indirection. Replaced with a direct call to
  runTransactionWithDedup; intent is now explicit and the double gate
  check is removed.

claude[bot] 🔶 (test gap, adapter/redis_set_dedup_test.go):
  Only the resultString OK path was tested through the dedup route.
  Added two regression tests covering the other applySet result types
  that go through writeRedisStandaloneResult:

  - TestStandaloneSetDedup_NXMissReturnsNil pins resultNil routing:
    SET k v NX on an existing key returns nil; the cached attempt-1
    result must round-trip through writeRedisStandaloneResult ->
    WriteNull, leaving conn.bulk == nil.

  - TestStandaloneSetDedup_GETOptionReturnsOldBulk pins resultBulk
    routing: SET k v GET on an existing key returns the prior value
    as a bulk reply; conn.bulk must be the prior bytes.

  Both fire the ambiguous-attempt-1-lands path (newDedupTestCoordinator
  ambiguousLands=true) so the result MUST come from the cached
  attempt-1 array, not a re-execution.

claude[bot] minor observation (shallow-array constraint in
writeRedisStandaloneResult):
  Documented in the function's doc comment that the resultArray arm
  flattens via WriteBulkString and is NOT safe to reuse for callers
  whose applyXxx emits nested arrays. Future HGETALL-pattern callers
  must either pre-flatten their result or extend the switch.

Caller audit per /loop semantic-change rule:

- set's gate-on branch is the only call site changed; the new direct
  call to runTransactionWithDedup uses the same []redcon.Command{cmd}
  shape and the same []redisResult return type. runTransactionWithDedup
  is exported within the package and has no other callers outside the
  legacy runTransaction path (which is unchanged).
- writeRedisStandaloneResult signature unchanged. Documentation
  expanded but no behavior change.
- New tests are pure additions; no existing tests modified.

Validation:
  go test ./adapter/ -run StandaloneSetDedup passes (5 tests now).
  go build ./adapter/... clean.
  golangci-lint run ./adapter/... 0 issues.
@bootjp bootjp force-pushed the feat/standalone-set-dedup branch from 963a2d3 to 8321ad0 Compare May 30, 2026 07:44
bootjp added a commit that referenced this pull request May 30, 2026
Stacked on PR-B (#888).

Adds .github/workflows/jepsen-test-scheduled-dedup.yml: a daily Jepsen
run that launches the demo cluster with ELASTICKV_REDIS_ONEPHASE_DEDUP=1
and executes the Redis workload. The 7-consecutive-days-green
criterion in design doc §M4 is now operationally checkable.

Why a separate workflow vs adding the env var to the existing
jepsen-test-scheduled.yml
=========================================================

The legacy path (gate off) must also stay covered. The existing
6-hourly workflow runs the legacy path; this new daily workflow runs
the dedup path. Both must stay green for option-2 to be safe to
default-on. Mixing the gate into the existing workflow would lose
legacy coverage without adding the additional dedup-on signal that
the design doc actually calls for.

Workflow scope
==============

- Cadence: daily at 03:17 UTC (off-peak; non-zero minute matching
  the project's cron pattern guidance).
- Workload: Redis only. The dedup feature ships behind the Redis
  adapter's onePhaseTxnDedup flag (RPUSH/LPUSH via
  listPushCoreWithDedup, MULTI/EXEC via runTransactionWithDedup,
  standalone SET via single-mop EXEC routing). DynamoDB / S3 / SQS
  do NOT route through the dedup loop, so re-running them under the
  gate would add hours of CI for zero signal on the new code path.
- Cluster gate assertion: the launch step exits 2 immediately if the
  job-level env var is not '1'. The env var is set on the job and
  inherited by every run: step; nothing in demo.go can intercept or
  unset it before NewRedisServer reads os.Getenv. So if the asserted
  value is '1' at launch time, the dedup gate IS active in the
  cluster process — no log-grep guesswork.

Design doc updates (docs/design/2026_05_21_..._idempotency.md)
==============================================================

§M4 expanded with:
- Local reproduction script reference (make jepsen-redis with the
  env var on).
- 7-day criterion specifically tied to the new workflow file name.
- Workflow scope rationale (Redis-only is intentional, not an
  oversight).
- Gate assertion mechanism (env at job level, fail-fast on '!=1').

Caller audit (per /loop semantic-change rule)
=============================================

No Go code changed. This is pure infrastructure: a new workflow file
and a doc update. No production behavior change, no new test
coverage on existing Go callers.

Validation
==========

- actionlint .github/workflows/jepsen-test-scheduled-dedup.yml clean.
- The workflow's lein / go-mod-download / cache steps mirror the
  existing scheduled workflow line for line, so cache invalidation
  semantics are unchanged.
claude[bot] (a) attribution prefix per CLAUDE.md convention:
  Stripped the inline "// claude[bot] PR #888 review:" prefixes from
  the two comments added in round 1. The substantive "why" content
  (double-gate rationale; shallow-array constraint) is kept as that
  is what future readers need. CLAUDE.md convention is to leave
  fix attribution in the commit message, not the source.

claude[bot] (b) wroteNull bool field on recordingConn:
  Added wroteNull witness to recordingConn (in redis_retry_test.go,
  the shared test helper). Hardened TestStandaloneSetDedup_NXMissReturnsNil
  to require.True(conn.wroteNull, ...) so a wrong branch that wrote
  nothing at all (would also satisfy conn.bulk == nil) is now caught.

Caller audit per /loop semantic-change rule:
  - recordingConn is in *_test.go and used by tests only. Adding
    wroteNull bool default false is strictly additive; no existing
    test reads it. The only WriteNull semantics change is setting
    the new field; conn.bulk = nil behavior is preserved.
  - Verified by grep: only redis_set_dedup_test.go references
    wroteNull. All other recordingConn uses (8 instantiations
    across redis_retry_test.go + redis_lua_linearizable_read_test.go)
    are unaffected.

Validation:
  go test ./adapter/ -run StandaloneSetDedup passes.
  go test ./adapter/ -run ExecDedup passes.
  gofmt + go vet + golangci-lint clean.
bootjp added a commit that referenced this pull request May 30, 2026
Stacked on PR-B (#888).

Adds .github/workflows/jepsen-test-scheduled-dedup.yml: a daily Jepsen
run that launches the demo cluster with ELASTICKV_REDIS_ONEPHASE_DEDUP=1
and executes the Redis workload. The 7-consecutive-days-green
criterion in design doc §M4 is now operationally checkable.

Why a separate workflow vs adding the env var to the existing
jepsen-test-scheduled.yml
=========================================================

The legacy path (gate off) must also stay covered. The existing
6-hourly workflow runs the legacy path; this new daily workflow runs
the dedup path. Both must stay green for option-2 to be safe to
default-on. Mixing the gate into the existing workflow would lose
legacy coverage without adding the additional dedup-on signal that
the design doc actually calls for.

Workflow scope
==============

- Cadence: daily at 03:17 UTC (off-peak; non-zero minute matching
  the project's cron pattern guidance).
- Workload: Redis only. The dedup feature ships behind the Redis
  adapter's onePhaseTxnDedup flag (RPUSH/LPUSH via
  listPushCoreWithDedup, MULTI/EXEC via runTransactionWithDedup,
  standalone SET via single-mop EXEC routing). DynamoDB / S3 / SQS
  do NOT route through the dedup loop, so re-running them under the
  gate would add hours of CI for zero signal on the new code path.
- Cluster gate assertion: the launch step exits 2 immediately if the
  job-level env var is not '1'. The env var is set on the job and
  inherited by every run: step; nothing in demo.go can intercept or
  unset it before NewRedisServer reads os.Getenv. So if the asserted
  value is '1' at launch time, the dedup gate IS active in the
  cluster process — no log-grep guesswork.

Design doc updates (docs/design/2026_05_21_..._idempotency.md)
==============================================================

§M4 expanded with:
- Local reproduction script reference (make jepsen-redis with the
  env var on).
- 7-day criterion specifically tied to the new workflow file name.
- Workflow scope rationale (Redis-only is intentional, not an
  oversight).
- Gate assertion mechanism (env at job level, fail-fast on '!=1').

Caller audit (per /loop semantic-change rule)
=============================================

No Go code changed. This is pure infrastructure: a new workflow file
and a doc update. No production behavior change, no new test
coverage on existing Go callers.

Validation
==========

- actionlint .github/workflows/jepsen-test-scheduled-dedup.yml clean.
- The workflow's lein / go-mod-download / cache steps mirror the
  existing scheduled workflow line for line, so cache invalidation
  semantics are unchanged.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 30, 2026

@claude review

Round 2 fixes (commit fd91a6a):

  • (a) Stripped // claude[bot] PR #888 review: attribution prefix from the two inline comments per CLAUDE.md convention. Substantive "why" content kept (double-gate rationale; shallow-array constraint) — fix attribution lives in the commit message instead.
  • (b) Added wroteNull bool witness field to recordingConn (in redis_retry_test.go). Hardened TestStandaloneSetDedup_NXMissReturnsNil to require.True(conn.wroteNull) so a wrong branch that wrote nothing at all would now be caught (not just the existing conn.bulk == nil zero-value match).

Caller audit:

  • recordingConn is test-only; adding wroteNull bool default false is strictly additive. Grep'd all 8 recordingConn instantiations across redis_retry_test.go + redis_lua_linearizable_read_test.go — none read wroteNull. Only redis_set_dedup_test.go uses it.

go test ./adapter/ -run StandaloneSetDedup|ExecDedup passes; gofmt + go vet + golangci-lint clean.

@claude
Copy link
Copy Markdown

claude Bot commented May 30, 2026

Claude finished @bootjp's task in 1m 24s —— View job


PR #888 Review — Round 3 (commit fd91a6a fixes)

Todo list:

  • Load tools
  • Read adapter/redis.go — verify attribution prefix stripped, substantive content kept
  • Read adapter/redis_retry_test.go — verify wroteNull bool field added
  • Read adapter/redis_set_dedup_test.go — verify require.True(conn.wroteNull) assertion
  • Post final review

Verification Results

(a) Attribution prefix stripped from inline comments — ✅ Confirmed (with one remaining gap)

adapter/redis.go is clean: grep "claude\[bot\]\|PR #888" adapter/redis.go returns nothing. The two comments that previously carried the // claude[bot] PR #888 review: prefix now read as pure rationale — the double-gate explanation and the shallow-array constraint — with no task-reference attribution. ✅

Remaining gapredis_set_dedup_test.go still carries two CLAUDE.md-violating references:

These are the same violation that was cleaned from redis.go in Round 2, just left in the test file. CLAUDE.md: "Don't reference the current task, fix, or callers … since those belong in the PR description and rot as the codebase evolves." Line 56 can drop to just the NX-miss test rationale; line 78 can end at the zero-value explanation without the attribution suffix.

These are not merge blockers, but the cleanup is one-liner each.

(b) wroteNull bool field and test hardening — ✅ Confirmed airtight

recordingConn diff:

+	wroteNull bool

WriteNull():

func (c *recordingConn) WriteNull() {
    c.bulk = nil
+   c.wroteNull = true
}

TestStandaloneSetDedup_NXMissReturnsNil asserts:

require.True(t, conn.wroteNull, "NX miss must call WriteNull, not silently skip the write")

The witness flag closes the zero-value aliasing problem exactly as identified in Round 2: any branch that wrote nothing would leave conn.bulk == nil (true) but conn.wroteNull == false (fail). The test is now airtight. ✅

The tripartite assertion (wroteNull, bulk == nil, err == "") combined with the storage-state check (seed value unchanged) gives full coverage of the NX miss path through writeRedisStandaloneResult.


Verdict

Both Round 2 items are correctly addressed. The implementation is sound. One minor cleanup remains: strip the two claude[bot] PR #888 references from redis_set_dedup_test.go lines 56 and 78, applying the same CLAUDE.md rule that was applied to redis.go in Round 2.

Not a merge blocker. LGTM with that cleanup.

…_test

claude[bot] PR #888 round-3 minor cleanup: same CLAUDE.md convention
violation that was cleaned from redis.go in round 2 was left in the
test file. Removed two references on lines 56 and 78. Substantive
rationale (NX semantics, zero-value aliasing problem) kept; the
attribution moves to the commit message where it belongs.

No functional or test behaviour change. golangci-lint clean.
bootjp added a commit that referenced this pull request May 30, 2026
Stacked on PR-B (#888).

Adds .github/workflows/jepsen-test-scheduled-dedup.yml: a daily Jepsen
run that launches the demo cluster with ELASTICKV_REDIS_ONEPHASE_DEDUP=1
and executes the Redis workload. The 7-consecutive-days-green
criterion in design doc §M4 is now operationally checkable.

Why a separate workflow vs adding the env var to the existing
jepsen-test-scheduled.yml
=========================================================

The legacy path (gate off) must also stay covered. The existing
6-hourly workflow runs the legacy path; this new daily workflow runs
the dedup path. Both must stay green for option-2 to be safe to
default-on. Mixing the gate into the existing workflow would lose
legacy coverage without adding the additional dedup-on signal that
the design doc actually calls for.

Workflow scope
==============

- Cadence: daily at 03:17 UTC (off-peak; non-zero minute matching
  the project's cron pattern guidance).
- Workload: Redis only. The dedup feature ships behind the Redis
  adapter's onePhaseTxnDedup flag (RPUSH/LPUSH via
  listPushCoreWithDedup, MULTI/EXEC via runTransactionWithDedup,
  standalone SET via single-mop EXEC routing). DynamoDB / S3 / SQS
  do NOT route through the dedup loop, so re-running them under the
  gate would add hours of CI for zero signal on the new code path.
- Cluster gate assertion: the launch step exits 2 immediately if the
  job-level env var is not '1'. The env var is set on the job and
  inherited by every run: step; nothing in demo.go can intercept or
  unset it before NewRedisServer reads os.Getenv. So if the asserted
  value is '1' at launch time, the dedup gate IS active in the
  cluster process — no log-grep guesswork.

Design doc updates (docs/design/2026_05_21_..._idempotency.md)
==============================================================

§M4 expanded with:
- Local reproduction script reference (make jepsen-redis with the
  env var on).
- 7-day criterion specifically tied to the new workflow file name.
- Workflow scope rationale (Redis-only is intentional, not an
  oversight).
- Gate assertion mechanism (env at job level, fail-fast on '!=1').

Caller audit (per /loop semantic-change rule)
=============================================

No Go code changed. This is pure infrastructure: a new workflow file
and a doc update. No production behavior change, no new test
coverage on existing Go callers.

Validation
==========

- actionlint .github/workflows/jepsen-test-scheduled-dedup.yml clean.
- The workflow's lein / go-mod-download / cache steps mirror the
  existing scheduled workflow line for line, so cache invalidation
  semantics are unchanged.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 30, 2026

@claude review

Round 3 cleanup (commit da6b0b4):

  • Stripped the two remaining claude[bot] PR #888 attribution refs from redis_set_dedup_test.go lines 56 and 78 — the same CLAUDE.md convention that was cleaned from redis.go in round 2 was left in the test file.
  • Substantive rationale (NX semantics, zero-value aliasing problem) kept; attribution moves to the commit message.

grep 'claude[bot]|PR #888' adapter/redis_set_dedup_test.go now returns empty. go test ./adapter/ -run StandaloneSetDedup passes. lint clean.

@bootjp bootjp merged commit 4da85dc into feat/txn-dedup-docs-and-tests May 30, 2026
4 checks passed
@bootjp bootjp deleted the feat/standalone-set-dedup branch May 30, 2026 08:15
@claude
Copy link
Copy Markdown

claude Bot commented May 30, 2026

Claude finished @bootjp's task in 1m 45s —— View job


PR #888 Review — Round 4 (commit da6b0b4 cleanup)

Todo list:

  • Load tools
  • Attempt to read adapter/redis_set_dedup_test.go — branch not locally available (merged into feat/txn-dedup-docs-and-tests); review based on author's verification evidence and prior-round context
  • Verify attribution cleanup against prior-round findings
  • Post final review

Verification

The PR branch (feat/standalone-set-dedup) has been merged into feat/txn-dedup-docs-and-tests and the commits are not on the local clone, so direct file inspection is not possible without additional fetch permissions. The review below is based on the prior-round findings, the author's stated verification (grep 'claude\[bot\]\|PR #888' adapter/redis_set_dedup_test.go returns empty), and the confirmed test + lint pass.

Round 3 findings addressed — ✅ Confirmed by evidence

Round 3 identified exactly two remaining attribution violations in redis_set_dedup_test.go:

Line (pre-fix) Violation Fix
56 // Closes the claude[bot] PR #888 review's test-gap observation... Strip attribution prefix; keep NX semantics rationale
78 // conn.bulk == nil -- claude[bot] PR #888 round-2 hardening. Strip attribution suffix; keep zero-value aliasing rationale

The author reports both stripped and grep 'claude\[bot\]\|PR #888' adapter/redis_set_dedup_test.go now returns empty. This matches the same pattern applied to redis.go in Round 2 (where the same grep was confirmed clean). The substantive content — the NX semantics explanation and the zero-value aliasing rationale — is described as retained, which is the correct outcome.


Cumulative review summary across all rounds

Finding Round raised Status
Redundant double gate (runTransactionrunTransactionWithDedup) R1 ✅ Fixed (R1)
Test gap: resultNil / resultBulk paths not covered in dedup context R1 ✅ Fixed (R1)
writeRedisStandaloneResult shallow-array constraint undocumented R1 (observation) ✅ Doc comment added (R1)
// claude[bot] PR #888 review: prefix in redis.go inline comments R2 ✅ Stripped (R2)
conn.bulk == nil zero-value aliasing in NXMiss test R2 ✅ Hardened with wroteNull bool witness (R2)
claude[bot] PR #888 attribution refs in redis_set_dedup_test.go (lines 56, 78) R3 ✅ Stripped (R4 / da6b0b4)

All findings addressed. No open issues remain.


LGTM. Clean PR — the dedup routing, result-type translation, gate-off preservation, and test coverage are all sound. The four-round review loop systematically closed the double-gate indirection, the NX/GET result-type test gaps, the wroteNull witness omission, and the CLAUDE.md attribution-comment violations.

bootjp added a commit that referenced this pull request May 30, 2026
Stacked on PR-B (#888).

Adds .github/workflows/jepsen-test-scheduled-dedup.yml: a daily Jepsen
run that launches the demo cluster with ELASTICKV_REDIS_ONEPHASE_DEDUP=1
and executes the Redis workload. The 7-consecutive-days-green
criterion in design doc §M4 is now operationally checkable.

Why a separate workflow vs adding the env var to the existing
jepsen-test-scheduled.yml
=========================================================

The legacy path (gate off) must also stay covered. The existing
6-hourly workflow runs the legacy path; this new daily workflow runs
the dedup path. Both must stay green for option-2 to be safe to
default-on. Mixing the gate into the existing workflow would lose
legacy coverage without adding the additional dedup-on signal that
the design doc actually calls for.

Workflow scope
==============

- Cadence: daily at 03:17 UTC (off-peak; non-zero minute matching
  the project's cron pattern guidance).
- Workload: Redis only. The dedup feature ships behind the Redis
  adapter's onePhaseTxnDedup flag (RPUSH/LPUSH via
  listPushCoreWithDedup, MULTI/EXEC via runTransactionWithDedup,
  standalone SET via single-mop EXEC routing). DynamoDB / S3 / SQS
  do NOT route through the dedup loop, so re-running them under the
  gate would add hours of CI for zero signal on the new code path.
- Cluster gate assertion: the launch step exits 2 immediately if the
  job-level env var is not '1'. The env var is set on the job and
  inherited by every run: step; nothing in demo.go can intercept or
  unset it before NewRedisServer reads os.Getenv. So if the asserted
  value is '1' at launch time, the dedup gate IS active in the
  cluster process — no log-grep guesswork.

Design doc updates (docs/design/2026_05_21_..._idempotency.md)
==============================================================

§M4 expanded with:
- Local reproduction script reference (make jepsen-redis with the
  env var on).
- 7-day criterion specifically tied to the new workflow file name.
- Workflow scope rationale (Redis-only is intentional, not an
  oversight).
- Gate assertion mechanism (env at job level, fail-fast on '!=1').

Caller audit (per /loop semantic-change rule)
=============================================

No Go code changed. This is pure infrastructure: a new workflow file
and a doc update. No production behavior change, no new test
coverage on existing Go callers.

Validation
==========

- actionlint .github/workflows/jepsen-test-scheduled-dedup.yml clean.
- The workflow's lein / go-mod-download / cache steps mirror the
  existing scheduled workflow line for line, so cache invalidation
  semantics are unchanged.
bootjp added a commit that referenced this pull request May 30, 2026
…on (#889)

## Summary

Stacked on PR-B (#888). Adds
`.github/workflows/jepsen-test-scheduled-dedup.yml`: a daily Jepsen run
that launches the demo cluster with `ELASTICKV_REDIS_ONEPHASE_DEDUP=1`
and executes the Redis workload. The 7-consecutive-days-green criterion
in design doc §M4 is now operationally checkable.

Base: `feat/standalone-set-dedup` (PR #888).

## Why a separate workflow vs adding the env var to existing scheduled
workflow

The legacy path (gate off) must also stay covered. The existing 6-hourly
workflow runs the legacy path; this new daily workflow runs the dedup
path. Both must stay green for option-2 to be safe to default-on. Mixing
the gate into the existing workflow would lose legacy coverage without
adding the additional dedup-on signal that the design doc actually calls
for.

## Workflow scope

- **Cadence**: daily at 03:17 UTC (off-peak; non-zero minute matching
the project's cron pattern guidance).
- **Workload**: Redis only. The dedup feature ships behind the Redis
adapter's `onePhaseTxnDedup` flag (RPUSH/LPUSH via
`listPushCoreWithDedup`, MULTI/EXEC via `runTransactionWithDedup`,
standalone SET via single-mop EXEC routing). DynamoDB / S3 / SQS do NOT
route through the dedup loop, so re-running them under the gate would
add hours of CI for zero signal on the new code path.
- **Cluster gate assertion**: the launch step exits 2 immediately if the
job-level env var is not `1`. The env var is set on the job and
inherited by every `run:` step; nothing in `demo.go` can intercept or
unset it before `NewRedisServer` reads `os.Getenv`. So if the asserted
value is `1` at launch time, the dedup gate IS active in the cluster
process — no log-grep guesswork.

## Design doc updates

`docs/design/2026_05_21_proposed_txn_secondary_idempotency.md` §M4
expanded with:
- Local reproduction script reference (`make jepsen-redis` with the env
var on).
- 7-day criterion specifically tied to the new workflow file name.
- Workflow scope rationale (Redis-only is intentional, not an
oversight).
- Gate assertion mechanism (env at job level, fail-fast on `!= "1"`).

## Caller audit (per /loop semantic-change rule)

No Go code changed. This is pure infrastructure: a new workflow file and
a doc update. No production behavior change, no new test coverage on
existing Go callers.

## Validation

- `actionlint .github/workflows/jepsen-test-scheduled-dedup.yml` clean
- The workflow's lein / go-mod-download / cache steps mirror the
existing scheduled workflow line for line, so cache invalidation
semantics are unchanged
bootjp added a commit that referenced this pull request May 30, 2026
…t to fix CI flake (#890)

## Summary

Fixes the CI flake observed in [Test workflow run
26678774684](https://github.com/bootjp/elastickv/actions/runs/26678774684):

```
--- FAIL: TestSQSServer_Throttle_SetQueueAttributesInvalidatesBucket (1.79s)
    sqs_throttle_integration_test.go:170: expected throttle, got 200
```

## Root cause

Identical race to the one fixed for
`TestSQSServer_Throttle_NoOpSetQueueAttributesPreservesBucket` in commit
[54c6cd5](54c6cd56) (PR #819
follow-up): the 1-token-per-second refill rate races the test's own wall
clock under `-race` on slow CI runners.

For **this** test:
1. `mustSetQueueAttributes(Capacity=10, Refill=1)`
2. `for range 10 { send }` — drains the bucket
3. Sanity send — expects HTTP 400 (throttle)

Each send goes through Raft propose+apply at ~100-250ms under `-race`.
The 11 writes from steps 1-3 elapse ~1.1-2.75s. At Refill=1/sec the
bucket has accumulated ≥1 token by step 3, so the sanity send returns
HTTP 200 instead of 400 — **falsely** indicating a bucket-invalidation
regression that does not exist.

## Fix

Drop the initial Refill from `"1"` to `"0.01"` (1 token per 100 seconds)
so no test-window wall-clock can accumulate to a whole token. The test's
intent — *verify that a Capacity/Refill **raise** invalidates the cached
bucket on the very next request* — is independent of the **initial**
refill rate. The post-set assertion at line 182 is exercised against the
fresh `Capacity=20/Refill=20` bucket, which is what the test actually
claims to pin.

## Why only this test, not the sibling

`TestSQSServer_Throttle_DeleteQueueInvalidatesBucket` has the same
`Refill=1` initial config but **no post-drain sanity assertion** — it
just drains without status checks, then verifies fresh capacity
post-recreate. No race window there.

## Relation to other open PRs

Unrelated to PRs #887/#888/#889 (option-2 dedup work). The flake
surfaced on PR #889's CI run but the fix lives in pre-existing SQS test
code that none of the dedup PRs touch.

## Caller audit (per /loop semantic-change rule)

Test-only change. The throttle config validator (`sqs_catalog.go:163`)
accepts fractional `float64 SendRefillPerSecond`; `0.01` is non-zero so
`IsEmpty` (line 172) returns `false` and throttling stays enabled — the
test still exercises the throttle path. Matches the prior fix's
caller-audit conclusion verbatim.

## Validation

- `go test ./adapter/ -run TestSQSServer_Throttle -race -count=3
-timeout 120s` passes (5.3s wall, all three iterations green)
- `gofmt`, `go vet`, `golangci-lint run` all clean


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Tests**
* Updated test configuration and expanded documentation to improve test
stability on slow CI environments.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/bootjp/elastickv/pull/890?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant