feat(txn-dedup): standalone SET routes through option-2 dedup loop by bootjp · Pull Request #888 · bootjp/elastickv

bootjp · 2026-05-30T07:30:55Z

Summary

Stacked on PR-A (#887). Extends option-2 dedup to standalone SET (dispatched outside MULTI). When r.onePhaseTxnDedup is on, r.set wraps the single command as a 1-element redcon.Command queue and dispatches via runTransactionWithDedup — reusing the M3 EXEC machinery instead of building a per-handler reusableSetTxn + dispatchSetReuse shape. SET already has applySet on txnContext, so this is the "free" extension to any command whose txn-state-aware apply hook already exists.

The standalone fast path (trySetFastPath) is intentionally bypassed under the gate: dedup is opt-in, and a non-dedup'd fast path under a dedup-on cluster would split the idempotency contract.

Base: feat/txn-dedup-docs-and-tests (PR #887).

Why standalone INCR / HSET are out of scope for THIS PR

INCR (in adapter/redis_compat_commands.go) and HSET both lack a txnContext.applyIncr / applyHSet implementation, so the "route through single-mop EXEC" pattern that worked here for SET cannot apply as-is — runTransactionWithDedup would reject the command at txn.apply ("unsupported in MULTI"). Bringing them in requires implementing applyIncr / applyHSet first (each ~30–50 LOC for the txn-state-aware read-compute-write shape), then the standalone handler routing is a one-liner via this same path. Tracked as separate follow-up PRs; until then, INCR and HSET keep today's buggy-under-churn behaviour — the design doc's stated default ("everything else keeps today's behaviour until its hook is added").

Caller audit (per /loop semantic-change rule)

r.set (handler): gate-on takes a new code path; gate-off preserved verbatim via setLegacy. No external callers exist (it is wired only into the redcon dispatch table).
setLegacy (new): byte-identical extraction of the pre-PR set() body. No external callers.
writeRedisStandaloneResult (new): translates a single-element []redisResult from runTransactionWithDedup into the bare redcon reply shape (no WriteArray wrapper). Single caller in this PR; future SET-pattern callers will reuse it.

Tests

adapter/redis_set_dedup_test.go (new):
- TestStandaloneSetDedup_LandedPriorAttempt_ReturnsOK: gate-on + ambiguous attempt-1 land + reuse → probe hits → cached "OK" returned, dispatches=2, probeNoOps=1.
- TestStandaloneSetDedup_DisabledKeepsLegacyPath: gate-off path runs no probe.
Existing TestRedis_SET* and dedup suites still pass.

Validation

go test ./adapter/ -run 'Dedup|Txn|MULTI|EXEC|TestRedis_SET' passes
gofmt, go vet, golangci-lint run all clean (0 issues)

Design doc

docs/design/2026_05_21_proposed_txn_secondary_idempotency.md §M3 — standalone SET marked LANDED, standalone INCR/HSET called out as follow-up with the precise reason they cannot land alongside SET in this PR.

chatgpt-codex-connector · 2026-05-30T07:30:59Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

coderabbitai · 2026-05-30T07:31:02Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b5b3a75c-93be-4bee-a83a-8ecf60379811

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/standalone-set-dedup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request implements deduplication for standalone SET commands by routing them through the transaction deduplication machinery (runTransaction) when the onePhaseTxnDedup gate is enabled. It extracts the legacy path into setLegacy, introduces writeRedisStandaloneResult to handle standalone response translation, adds corresponding unit tests, and updates the design documentation. I have no feedback to provide as there are no review comments and the implementation is clean and well-tested.

bootjp · 2026-05-30T07:35:21Z

@claude review

PR-B: standalone SET routes through option-2 dedup loop.

Mechanism: when r.onePhaseTxnDedup is on, r.set wraps the command as a 1-element redcon.Command queue and dispatches via runTransactionWithDedup. setLegacy preserves the pre-PR body byte-identical for the gate-off path. writeRedisStandaloneResult translates the single-element results array to a bare redcon reply (no WriteArray wrapper).

Standalone INCR / HSET are documented as out of scope because they lack txnContext.applyIncr / applyHSet hooks (would each be a ~30-50 LOC feature add).

Please verify: (1) setLegacy is byte-identical to the pre-PR set() body; (2) the writeRedisStandaloneResult coverage matches every redisResult type emitted by applySet (resultString OK, resultNil for NX/XX miss, resultBulk for GET old-value); (3) the gate-off path is unchanged (TestStandaloneSetDedup_DisabledKeepsLegacyPath pins this).

claude · 2026-05-30T07:35:33Z

Claude finished @bootjp's task in 2m 57s —— View job

PR #888 Review — standalone SET dedup

Todo list:

Load tools
Read changed files and diff
Verify (1): setLegacy is byte-identical to pre-PR set() body
Verify (2): writeRedisStandaloneResult covers all redisResult types from applySet
Verify (3): gate-off path unchanged (TestStandaloneSetDedup_DisabledKeepsLegacyPath)
General review: correctness, concurrency, edge cases
Post final review

Verification Results

(1) `setLegacy` byte-identical to pre-PR `set()` body — ✅ Confirmed

The diff cleanly confirms the extraction: the pre-PR set() body (everything after proxyToLeader) becomes setLegacy verbatim. First line is opts, err := parseRedisSetOptions(cmd.Args[3:], time.Now()), last line is conn.WriteString("OK") — no reordering, no rewriting. The comment "byte-identical" is accurate.

(2) `writeRedisStandaloneResult` coverage vs. `applySet` result types — ✅ Covered, with one minor observation

applySet can return these result types:

Scenario	Result type	Path
WRONGTYPE error	`resultError`	`applySet` line 2337
NX miss / XX miss	`resultNil`	`applySetCondition` line 2390
Normal SET	`resultString` (`"OK"`)	`applySetResult` line 2410
GET option, no prior value	`resultNil`	`applySetResult` line 2413
GET option, returns old value	`resultBulk`	`applySetResult` line 2415

writeRedisStandaloneResult handles all five: resultNil → WriteNull, resultError → writeRedisError, resultString → WriteString, resultBulk → WriteBulk. The extra resultArray/resultInt arms are correct defensive coding for future callers.

Minor observation — resultArray is shallow: the array arm writes WriteBulkString for each element. Redis nested arrays (e.g. HGETALL) need recursive handling, so writeRedisStandaloneResult is not safe to reuse for commands whose applyXxx can produce resultArray with non-string elements. This is fine for applySet and for any future string-list command, but the function's doc comment says "future SET-pattern callers" — worth noting the shallow-array constraint so a future caller doesn't silently mangle a nested response.

(3) Gate-off path unchanged — ✅ Confirmed

When onePhaseTxnDedup is false, set() calls setLegacy(conn, cmd) with no other side effects. The TestStandaloneSetDedup_DisabledKeepsLegacyPath test correctly pins this: probeNoOps == 0 guarantees no dedup machinery was entered, and the test uses newDedupTestCoordinator(st, 1, false) (pre-reject on attempt 1) so any accidental routing through runTransactionWithDedup would be observable.

Issues

🔶 Redundant double gate — `r.runTransaction` vs. `r.runTransactionWithDedup` (adapter/redis.go:1135)

The gate-on branch calls r.runTransaction, which then immediately re-checks r.onePhaseTxnDedup:

// set() line 1134-1142:
if r.onePhaseTxnDedup {
    results, err := r.runTransaction([]redcon.Command{cmd})  // <-- indirect
    ...
}

// runTransaction() line 3088-3091:
func (r *RedisServer) runTransaction(queue []redcon.Command) ([]redisResult, error) {
    if r.onePhaseTxnDedup {
        return r.runTransactionWithDedup(queue)  // same gate, checked twice
    }
    ...
}

This is functionally correct — runTransactionWithDedup is reached — but it creates a misleading call chain. The PR description and the block comment say "dispatches via runTransactionWithDedup", which is true only by indirection. The exec() handler also calls runTransaction and relies on the same internal dispatch (line 1900), so there is prior art for this style, but exec() is called outside any gate. A reader of set() who knows runTransaction contains its own gate will wonder why the outer gate re-checks.

The cleaner call under the gate is r.runTransactionWithDedup([]redcon.Command{cmd}) directly, which removes the double-check and makes the intent explicit. Fix this →

🔶 Test gap: NX/XX miss and GET return paths not covered (adapter/redis_set_dedup_test.go)

TestStandaloneSetDedup_LandedPriorAttempt_ReturnsOK only exercises the resultString("OK") path. The resultNil (NX miss, or GET with no prior value) and resultBulk (GET with old value) paths through writeRedisStandaloneResult are not tested via the dedup route. These result types come from applySet under normal conditions (NX/XX/GET options) and are exercised by existing TestRedis_SET* tests, but not in the dedup context.

This is lower priority because writeRedisStandaloneResult is a straightforward switch with no branching logic beyond type dispatch, and the covered path (resultString) validates the routing is wired correctly. But since the PR calls out "coverage matches every redisResult type emitted by applySet" as a review criterion, it's worth noting the gap.

Verdict

All three verification targets pass. The implementation is correct: the dedup machinery is properly reached, the result translation is complete for all applySet output types, and the legacy path is untouched for gate-off. The redundant double-gate is a style issue worth fixing before merge (the fix is a one-word change: runTransaction → runTransactionWithDedup). The test coverage gap for NX/XX/GET dedup paths is a nice-to-have.

claude[bot] 🔶 (redundant double gate, adapter/redis.go:1135): set's gate-on branch called r.runTransaction which immediately re-checked the same r.onePhaseTxnDedup gate and routed to runTransactionWithDedup. The indirection was misleading -- the PR description said "dispatches via runTransactionWithDedup" but that was true only by indirection. Replaced with a direct call to runTransactionWithDedup; intent is now explicit and the double gate check is removed. claude[bot] 🔶 (test gap, adapter/redis_set_dedup_test.go): Only the resultString OK path was tested through the dedup route. Added two regression tests covering the other applySet result types that go through writeRedisStandaloneResult: - TestStandaloneSetDedup_NXMissReturnsNil pins resultNil routing: SET k v NX on an existing key returns nil; the cached attempt-1 result must round-trip through writeRedisStandaloneResult -> WriteNull, leaving conn.bulk == nil. - TestStandaloneSetDedup_GETOptionReturnsOldBulk pins resultBulk routing: SET k v GET on an existing key returns the prior value as a bulk reply; conn.bulk must be the prior bytes. Both fire the ambiguous-attempt-1-lands path (newDedupTestCoordinator ambiguousLands=true) so the result MUST come from the cached attempt-1 array, not a re-execution. claude[bot] minor observation (shallow-array constraint in writeRedisStandaloneResult): Documented in the function's doc comment that the resultArray arm flattens via WriteBulkString and is NOT safe to reuse for callers whose applyXxx emits nested arrays. Future HGETALL-pattern callers must either pre-flatten their result or extend the switch. Caller audit per /loop semantic-change rule: - set's gate-on branch is the only call site changed; the new direct call to runTransactionWithDedup uses the same []redcon.Command{cmd} shape and the same []redisResult return type. runTransactionWithDedup is exported within the package and has no other callers outside the legacy runTransaction path (which is unchanged). - writeRedisStandaloneResult signature unchanged. Documentation expanded but no behavior change. - New tests are pure additions; no existing tests modified. Validation: go test ./adapter/ -run StandaloneSetDedup passes (5 tests now). go build ./adapter/... clean. golangci-lint run ./adapter/... 0 issues.

bootjp · 2026-05-30T07:43:00Z

@claude review

Round 1 fixes (commit 963a2d3):

🔶 Redundant double gate: set's gate-on branch now calls runTransactionWithDedup directly instead of runTransaction (which re-checked the same gate). Intent explicit, indirection removed.
🔶 Test gap: added TestStandaloneSetDedup_NXMissReturnsNil (resultNil routing) and TestStandaloneSetDedup_GETOptionReturnsOldBulk (resultBulk routing). Both fire the ambiguous-attempt-1-lands path so the result must come from the cached attempt-1 array.
Minor: documented writeRedisStandaloneResult's shallow-array constraint in its doc comment per claude[bot]'s nested-array observation.

go test ./adapter/ -run StandaloneSetDedup passes (5 tests). gofmt/vet/golangci-lint clean.

claude · 2026-05-30T07:43:11Z

Claude finished @bootjp's task in 2m 1s —— View job

PR #888 Review — Round 2 (commit `963a2d3` fixes)

Todo list:

Load tools
Read adapter/redis.go — verify double-gate fix
Read adapter/redis_set_dedup_test.go — verify new test cases
Verify shallow-array doc comment added to writeRedisStandaloneResult
Post final review

Verification Results

(1) Double-gate fix — ✅ Confirmed

adapter/redis.go:1142 now calls r.runTransactionWithDedup([]redcon.Command{cmd}) directly. The old runTransaction → re-check-gate → runTransactionWithDedup indirection is gone. The surrounding block comment at lines 1121–1133 clearly explains the routing rationale.

Minor style note (not a blocker): the inline comment at lines 1135–1141 uses // claude[bot] PR #888 review: as an attribution prefix. The CLAUDE.md convention says to not reference the current task/fix in code comments (that belongs in the commit message, not the source). The substantive "why" content (explaining the double-gate indirection) is correct and worth keeping — the attribution prefix is the only part that violates the convention. Worth trimming before merge: strip claude[bot] PR #888 review: and keep the rest.

(2) Test gap — ✅ Confirmed addressed

Both new tests are structurally sound:

Test	Result type	How pinned
`TestStandaloneSetDedup_NXMissReturnsNil`	`resultNil`	`conn.bulk == nil`, `conn.err == ""`, stored value still `"seed"`
`TestStandaloneSetDedup_GETOptionReturnsOldBulk`	`resultBulk`	`conn.bulk == "prior"`, new value `"v1"` committed

Both use newDedupTestCoordinator(st, 1, true) (ambiguousLands=true) so the result must come from the cached attempt-1 array, not a re-execution. The tripartite assertion (conn write + no error + storage state) in each test provides meaningful coverage.

Weak nil assertion (informational, not a blocker): TestStandaloneSetDedup_NXMissReturnsNil asserts conn.bulk == nil to pin that WriteNull was called. Since recordingConn.bulk starts as nil, this assertion is also satisfied if nothing at all was written. The test is still an effective regression pin because (a) any wrong branch (WriteString, WriteBulk, writeRedisError) would make the assertion fail or set conn.err/conn.bulk to non-nil, and (b) the storage-state check confirms the NX didn't apply. A wroteNull bool field on recordingConn would make it airtight, but the current form is adequate.

(3) Shallow-array doc comment — ✅ Confirmed

writeRedisStandaloneResult at lines 1202–1209 now documents the flat-array constraint clearly:

The resultArray arm writes each element via WriteBulkString... It does NOT recurse into nested arrays. A future caller whose applyXxx emits resultArray with non-string elements... must either pre-flatten its result or extend this switch with a recursive arm.

The warning is precise, actionable, and correctly identifies the failure mode.

Verdict

All three Round 1 findings are addressed. The implementation is correct. Two minor notes before merge:

// claude[bot] PR #888 review: inline comment prefix (adapter/redis.go:1135): strip the attribution, keep the explanation. One-word fix, purely cosmetic.
Nil-assertion strength in TestStandaloneSetDedup_NXMissReturnsNil: acceptable as-is; adding wroteNull bool to recordingConn would harden it but is not required for this PR.

Neither is a merge blocker. LGTM.

Stacked on PR-A (re-land of M3 EXEC reuse + multi-mop test). Extends option-2 dedup to standalone SET (dispatched outside MULTI). When `r.onePhaseTxnDedup` is on, `r.set` wraps the single command as a 1-element redcon.Command queue and dispatches via runTransactionWithDedup — reusing the M3 EXEC machinery instead of building a per-handler reusableSetTxn + dispatchSetReuse shape. SET already has `applySet` on `txnContext`, so this is the "free" extension to any command whose txn-state-aware apply hook already exists. The standalone fast path (`trySetFastPath`) is intentionally bypassed under the gate: dedup is opt-in, and a non-dedup'd fast path under a dedup-on cluster would split the idempotency contract. Why standalone INCR / HSET are out of scope for THIS PR: ======================================================== INCR (in adapter/redis_compat_commands.go) and HSET both lack a `txnContext.applyIncr` / `applyHSet` implementation, so the "route through single-mop EXEC" pattern that worked here for SET cannot apply as-is — `runTransactionWithDedup` would reject the command at `txn.apply` ("unsupported in MULTI"). Bringing them in requires implementing `applyIncr` / `applyHSet` first (each ~30–50 LOC for the txn-state-aware read-compute-write shape), then the standalone handler routing is a one-liner via this same path. Tracked as separate follow-up PRs; until then, INCR and HSET keep today's buggy-under-churn behaviour, which is the design doc's stated default ("everything else keeps today's behaviour until its hook is added"). Caller audit (per /loop semantic-change rule): - `r.set` (handler): gate-on takes a new code path; gate-off preserved verbatim via `setLegacy`. No external callers exist (it is wired only into the redcon dispatch table). - `setLegacy` (new): byte-identical extraction of the pre-PR set() body. No external callers. - `writeRedisStandaloneResult` (new): translates a single-element `[]redisResult` from `runTransactionWithDedup` into the bare redcon reply shape (no WriteArray wrapper). Single caller in this PR; future SET-pattern callers will reuse it. Validation: - adapter/redis_set_dedup_test.go (new): TestStandaloneSetDedup_LandedPriorAttempt_ReturnsOK, TestStandaloneSetDedup_DisabledKeepsLegacyPath. Existing TestRedis_SET / dedup suites still pass. - gofmt, go vet, golangci-lint run all clean (0 issues). Design doc updated to mark standalone SET as LANDED and call out INCR/HSET as the next follow-up with the precise reason they cannot land alongside SET in this PR.

claude[bot] 🔶 (redundant double gate, adapter/redis.go:1135): set's gate-on branch called r.runTransaction which immediately re-checked the same r.onePhaseTxnDedup gate and routed to runTransactionWithDedup. The indirection was misleading -- the PR description said "dispatches via runTransactionWithDedup" but that was true only by indirection. Replaced with a direct call to runTransactionWithDedup; intent is now explicit and the double gate check is removed. claude[bot] 🔶 (test gap, adapter/redis_set_dedup_test.go): Only the resultString OK path was tested through the dedup route. Added two regression tests covering the other applySet result types that go through writeRedisStandaloneResult: - TestStandaloneSetDedup_NXMissReturnsNil pins resultNil routing: SET k v NX on an existing key returns nil; the cached attempt-1 result must round-trip through writeRedisStandaloneResult -> WriteNull, leaving conn.bulk == nil. - TestStandaloneSetDedup_GETOptionReturnsOldBulk pins resultBulk routing: SET k v GET on an existing key returns the prior value as a bulk reply; conn.bulk must be the prior bytes. Both fire the ambiguous-attempt-1-lands path (newDedupTestCoordinator ambiguousLands=true) so the result MUST come from the cached attempt-1 array, not a re-execution. claude[bot] minor observation (shallow-array constraint in writeRedisStandaloneResult): Documented in the function's doc comment that the resultArray arm flattens via WriteBulkString and is NOT safe to reuse for callers whose applyXxx emits nested arrays. Future HGETALL-pattern callers must either pre-flatten their result or extend the switch. Caller audit per /loop semantic-change rule: - set's gate-on branch is the only call site changed; the new direct call to runTransactionWithDedup uses the same []redcon.Command{cmd} shape and the same []redisResult return type. runTransactionWithDedup is exported within the package and has no other callers outside the legacy runTransaction path (which is unchanged). - writeRedisStandaloneResult signature unchanged. Documentation expanded but no behavior change. - New tests are pure additions; no existing tests modified. Validation: go test ./adapter/ -run StandaloneSetDedup passes (5 tests now). go build ./adapter/... clean. golangci-lint run ./adapter/... 0 issues.

Stacked on PR-B (#888). Adds .github/workflows/jepsen-test-scheduled-dedup.yml: a daily Jepsen run that launches the demo cluster with ELASTICKV_REDIS_ONEPHASE_DEDUP=1 and executes the Redis workload. The 7-consecutive-days-green criterion in design doc §M4 is now operationally checkable. Why a separate workflow vs adding the env var to the existing jepsen-test-scheduled.yml ========================================================= The legacy path (gate off) must also stay covered. The existing 6-hourly workflow runs the legacy path; this new daily workflow runs the dedup path. Both must stay green for option-2 to be safe to default-on. Mixing the gate into the existing workflow would lose legacy coverage without adding the additional dedup-on signal that the design doc actually calls for. Workflow scope ============== - Cadence: daily at 03:17 UTC (off-peak; non-zero minute matching the project's cron pattern guidance). - Workload: Redis only. The dedup feature ships behind the Redis adapter's onePhaseTxnDedup flag (RPUSH/LPUSH via listPushCoreWithDedup, MULTI/EXEC via runTransactionWithDedup, standalone SET via single-mop EXEC routing). DynamoDB / S3 / SQS do NOT route through the dedup loop, so re-running them under the gate would add hours of CI for zero signal on the new code path. - Cluster gate assertion: the launch step exits 2 immediately if the job-level env var is not '1'. The env var is set on the job and inherited by every run: step; nothing in demo.go can intercept or unset it before NewRedisServer reads os.Getenv. So if the asserted value is '1' at launch time, the dedup gate IS active in the cluster process — no log-grep guesswork. Design doc updates (docs/design/2026_05_21_..._idempotency.md) ============================================================== §M4 expanded with: - Local reproduction script reference (make jepsen-redis with the env var on). - 7-day criterion specifically tied to the new workflow file name. - Workflow scope rationale (Redis-only is intentional, not an oversight). - Gate assertion mechanism (env at job level, fail-fast on '!=1'). Caller audit (per /loop semantic-change rule) ============================================= No Go code changed. This is pure infrastructure: a new workflow file and a doc update. No production behavior change, no new test coverage on existing Go callers. Validation ========== - actionlint .github/workflows/jepsen-test-scheduled-dedup.yml clean. - The workflow's lein / go-mod-download / cache steps mirror the existing scheduled workflow line for line, so cache invalidation semantics are unchanged.

claude[bot] (a) attribution prefix per CLAUDE.md convention: Stripped the inline "// claude[bot] PR #888 review:" prefixes from the two comments added in round 1. The substantive "why" content (double-gate rationale; shallow-array constraint) is kept as that is what future readers need. CLAUDE.md convention is to leave fix attribution in the commit message, not the source. claude[bot] (b) wroteNull bool field on recordingConn: Added wroteNull witness to recordingConn (in redis_retry_test.go, the shared test helper). Hardened TestStandaloneSetDedup_NXMissReturnsNil to require.True(conn.wroteNull, ...) so a wrong branch that wrote nothing at all (would also satisfy conn.bulk == nil) is now caught. Caller audit per /loop semantic-change rule: - recordingConn is in *_test.go and used by tests only. Adding wroteNull bool default false is strictly additive; no existing test reads it. The only WriteNull semantics change is setting the new field; conn.bulk = nil behavior is preserved. - Verified by grep: only redis_set_dedup_test.go references wroteNull. All other recordingConn uses (8 instantiations across redis_retry_test.go + redis_lua_linearizable_read_test.go) are unaffected. Validation: go test ./adapter/ -run StandaloneSetDedup passes. go test ./adapter/ -run ExecDedup passes. gofmt + go vet + golangci-lint clean.

Stacked on PR-B (#888). Adds .github/workflows/jepsen-test-scheduled-dedup.yml: a daily Jepsen run that launches the demo cluster with ELASTICKV_REDIS_ONEPHASE_DEDUP=1 and executes the Redis workload. The 7-consecutive-days-green criterion in design doc §M4 is now operationally checkable. Why a separate workflow vs adding the env var to the existing jepsen-test-scheduled.yml ========================================================= The legacy path (gate off) must also stay covered. The existing 6-hourly workflow runs the legacy path; this new daily workflow runs the dedup path. Both must stay green for option-2 to be safe to default-on. Mixing the gate into the existing workflow would lose legacy coverage without adding the additional dedup-on signal that the design doc actually calls for. Workflow scope ============== - Cadence: daily at 03:17 UTC (off-peak; non-zero minute matching the project's cron pattern guidance). - Workload: Redis only. The dedup feature ships behind the Redis adapter's onePhaseTxnDedup flag (RPUSH/LPUSH via listPushCoreWithDedup, MULTI/EXEC via runTransactionWithDedup, standalone SET via single-mop EXEC routing). DynamoDB / S3 / SQS do NOT route through the dedup loop, so re-running them under the gate would add hours of CI for zero signal on the new code path. - Cluster gate assertion: the launch step exits 2 immediately if the job-level env var is not '1'. The env var is set on the job and inherited by every run: step; nothing in demo.go can intercept or unset it before NewRedisServer reads os.Getenv. So if the asserted value is '1' at launch time, the dedup gate IS active in the cluster process — no log-grep guesswork. Design doc updates (docs/design/2026_05_21_..._idempotency.md) ============================================================== §M4 expanded with: - Local reproduction script reference (make jepsen-redis with the env var on). - 7-day criterion specifically tied to the new workflow file name. - Workflow scope rationale (Redis-only is intentional, not an oversight). - Gate assertion mechanism (env at job level, fail-fast on '!=1'). Caller audit (per /loop semantic-change rule) ============================================= No Go code changed. This is pure infrastructure: a new workflow file and a doc update. No production behavior change, no new test coverage on existing Go callers. Validation ========== - actionlint .github/workflows/jepsen-test-scheduled-dedup.yml clean. - The workflow's lein / go-mod-download / cache steps mirror the existing scheduled workflow line for line, so cache invalidation semantics are unchanged.

bootjp · 2026-05-30T08:08:54Z

@claude review

Round 2 fixes (commit fd91a6a):

(a) Stripped // claude[bot] PR #888 review: attribution prefix from the two inline comments per CLAUDE.md convention. Substantive "why" content kept (double-gate rationale; shallow-array constraint) — fix attribution lives in the commit message instead.
(b) Added wroteNull bool witness field to recordingConn (in redis_retry_test.go). Hardened TestStandaloneSetDedup_NXMissReturnsNil to require.True(conn.wroteNull) so a wrong branch that wrote nothing at all would now be caught (not just the existing conn.bulk == nil zero-value match).

Caller audit:

recordingConn is test-only; adding wroteNull bool default false is strictly additive. Grep'd all 8 recordingConn instantiations across redis_retry_test.go + redis_lua_linearizable_read_test.go — none read wroteNull. Only redis_set_dedup_test.go uses it.

go test ./adapter/ -run StandaloneSetDedup|ExecDedup passes; gofmt + go vet + golangci-lint clean.

claude · 2026-05-30T08:09:11Z

Claude finished @bootjp's task in 1m 24s —— View job

PR #888 Review — Round 3 (commit `fd91a6a` fixes)

Todo list:

Load tools
Read adapter/redis.go — verify attribution prefix stripped, substantive content kept
Read adapter/redis_retry_test.go — verify wroteNull bool field added
Read adapter/redis_set_dedup_test.go — verify require.True(conn.wroteNull) assertion
Post final review

Verification Results

(a) Attribution prefix stripped from inline comments — ✅ Confirmed (with one remaining gap)

adapter/redis.go is clean: grep "claude\[bot\]\|PR #888" adapter/redis.go returns nothing. The two comments that previously carried the // claude[bot] PR #888 review: prefix now read as pure rationale — the double-gate explanation and the shallow-array constraint — with no task-reference attribution. ✅

Remaining gap — redis_set_dedup_test.go still carries two CLAUDE.md-violating references:

Line 56: // Closes the claude[bot] PR #888 review's test-gap observation that the NX/XX/GET result types...
Line 78: // \conn.bulk == nil` -- claude[bot] PR feat(txn-dedup): standalone SET routes through option-2 dedup loop #888 round-2 hardening.`

These are the same violation that was cleaned from redis.go in Round 2, just left in the test file. CLAUDE.md: "Don't reference the current task, fix, or callers … since those belong in the PR description and rot as the codebase evolves." Line 56 can drop to just the NX-miss test rationale; line 78 can end at the zero-value explanation without the attribution suffix.

These are not merge blockers, but the cleanup is one-liner each.

(b) `wroteNull bool` field and test hardening — ✅ Confirmed airtight

recordingConn diff:

+	wroteNull bool

WriteNull():

func (c *recordingConn) WriteNull() {
    c.bulk = nil
+   c.wroteNull = true
}

TestStandaloneSetDedup_NXMissReturnsNil asserts:

require.True(t, conn.wroteNull, "NX miss must call WriteNull, not silently skip the write")

The witness flag closes the zero-value aliasing problem exactly as identified in Round 2: any branch that wrote nothing would leave conn.bulk == nil (true) but conn.wroteNull == false (fail). The test is now airtight. ✅

The tripartite assertion (wroteNull, bulk == nil, err == "") combined with the storage-state check (seed value unchanged) gives full coverage of the NX miss path through writeRedisStandaloneResult.

Verdict

Both Round 2 items are correctly addressed. The implementation is sound. One minor cleanup remains: strip the two claude[bot] PR #888 references from redis_set_dedup_test.go lines 56 and 78, applying the same CLAUDE.md rule that was applied to redis.go in Round 2.

Not a merge blocker. LGTM with that cleanup.

…_test claude[bot] PR #888 round-3 minor cleanup: same CLAUDE.md convention violation that was cleaned from redis.go in round 2 was left in the test file. Removed two references on lines 56 and 78. Substantive rationale (NX semantics, zero-value aliasing problem) kept; the attribution moves to the commit message where it belongs. No functional or test behaviour change. golangci-lint clean.

Stacked on PR-B (#888). Adds .github/workflows/jepsen-test-scheduled-dedup.yml: a daily Jepsen run that launches the demo cluster with ELASTICKV_REDIS_ONEPHASE_DEDUP=1 and executes the Redis workload. The 7-consecutive-days-green criterion in design doc §M4 is now operationally checkable. Why a separate workflow vs adding the env var to the existing jepsen-test-scheduled.yml ========================================================= The legacy path (gate off) must also stay covered. The existing 6-hourly workflow runs the legacy path; this new daily workflow runs the dedup path. Both must stay green for option-2 to be safe to default-on. Mixing the gate into the existing workflow would lose legacy coverage without adding the additional dedup-on signal that the design doc actually calls for. Workflow scope ============== - Cadence: daily at 03:17 UTC (off-peak; non-zero minute matching the project's cron pattern guidance). - Workload: Redis only. The dedup feature ships behind the Redis adapter's onePhaseTxnDedup flag (RPUSH/LPUSH via listPushCoreWithDedup, MULTI/EXEC via runTransactionWithDedup, standalone SET via single-mop EXEC routing). DynamoDB / S3 / SQS do NOT route through the dedup loop, so re-running them under the gate would add hours of CI for zero signal on the new code path. - Cluster gate assertion: the launch step exits 2 immediately if the job-level env var is not '1'. The env var is set on the job and inherited by every run: step; nothing in demo.go can intercept or unset it before NewRedisServer reads os.Getenv. So if the asserted value is '1' at launch time, the dedup gate IS active in the cluster process — no log-grep guesswork. Design doc updates (docs/design/2026_05_21_..._idempotency.md) ============================================================== §M4 expanded with: - Local reproduction script reference (make jepsen-redis with the env var on). - 7-day criterion specifically tied to the new workflow file name. - Workflow scope rationale (Redis-only is intentional, not an oversight). - Gate assertion mechanism (env at job level, fail-fast on '!=1'). Caller audit (per /loop semantic-change rule) ============================================= No Go code changed. This is pure infrastructure: a new workflow file and a doc update. No production behavior change, no new test coverage on existing Go callers. Validation ========== - actionlint .github/workflows/jepsen-test-scheduled-dedup.yml clean. - The workflow's lein / go-mod-download / cache steps mirror the existing scheduled workflow line for line, so cache invalidation semantics are unchanged.

bootjp · 2026-05-30T08:15:11Z

@claude review

Round 3 cleanup (commit da6b0b4):

Stripped the two remaining claude[bot] PR #888 attribution refs from redis_set_dedup_test.go lines 56 and 78 — the same CLAUDE.md convention that was cleaned from redis.go in round 2 was left in the test file.
Substantive rationale (NX semantics, zero-value aliasing problem) kept; attribution moves to the commit message.

grep 'claude[bot]|PR #888' adapter/redis_set_dedup_test.go now returns empty. go test ./adapter/ -run StandaloneSetDedup passes. lint clean.

claude · 2026-05-30T08:15:25Z

Claude finished @bootjp's task in 1m 45s —— View job

PR #888 Review — Round 4 (commit `da6b0b4` cleanup)

Todo list:

Load tools
Attempt to read adapter/redis_set_dedup_test.go — branch not locally available (merged into feat/txn-dedup-docs-and-tests); review based on author's verification evidence and prior-round context
Verify attribution cleanup against prior-round findings
Post final review

Verification

The PR branch (feat/standalone-set-dedup) has been merged into feat/txn-dedup-docs-and-tests and the commits are not on the local clone, so direct file inspection is not possible without additional fetch permissions. The review below is based on the prior-round findings, the author's stated verification (grep 'claude\[bot\]\|PR #888' adapter/redis_set_dedup_test.go returns empty), and the confirmed test + lint pass.

Round 3 findings addressed — ✅ Confirmed by evidence

Round 3 identified exactly two remaining attribution violations in redis_set_dedup_test.go:

Line (pre-fix)	Violation	Fix
56	`// Closes the claude[bot] PR #888 review's test-gap observation...`	Strip attribution prefix; keep NX semantics rationale
78	`// conn.bulk == nil -- claude[bot] PR #888 round-2 hardening.`	Strip attribution suffix; keep zero-value aliasing rationale

The author reports both stripped and grep 'claude\[bot\]\|PR #888' adapter/redis_set_dedup_test.go now returns empty. This matches the same pattern applied to redis.go in Round 2 (where the same grep was confirmed clean). The substantive content — the NX semantics explanation and the zero-value aliasing rationale — is described as retained, which is the correct outcome.

Cumulative review summary across all rounds

Finding	Round raised	Status
Redundant double gate (`runTransaction` → `runTransactionWithDedup`)	R1	✅ Fixed (R1)
Test gap: `resultNil` / `resultBulk` paths not covered in dedup context	R1	✅ Fixed (R1)
`writeRedisStandaloneResult` shallow-array constraint undocumented	R1 (observation)	✅ Doc comment added (R1)
`// claude[bot] PR #888 review:` prefix in `redis.go` inline comments	R2	✅ Stripped (R2)
`conn.bulk == nil` zero-value aliasing in NXMiss test	R2	✅ Hardened with `wroteNull bool` witness (R2)
`claude[bot] PR #888` attribution refs in `redis_set_dedup_test.go` (lines 56, 78)	R3	✅ Stripped (R4 / `da6b0b4`)

All findings addressed. No open issues remain.

LGTM. Clean PR — the dedup routing, result-type translation, gate-off preservation, and test coverage are all sound. The four-round review loop systematically closed the double-gate indirection, the NX/GET result-type test gaps, the wroteNull witness omission, and the CLAUDE.md attribution-comment violations.

Stacked on PR-B (#888). Adds .github/workflows/jepsen-test-scheduled-dedup.yml: a daily Jepsen run that launches the demo cluster with ELASTICKV_REDIS_ONEPHASE_DEDUP=1 and executes the Redis workload. The 7-consecutive-days-green criterion in design doc §M4 is now operationally checkable. Why a separate workflow vs adding the env var to the existing jepsen-test-scheduled.yml ========================================================= The legacy path (gate off) must also stay covered. The existing 6-hourly workflow runs the legacy path; this new daily workflow runs the dedup path. Both must stay green for option-2 to be safe to default-on. Mixing the gate into the existing workflow would lose legacy coverage without adding the additional dedup-on signal that the design doc actually calls for. Workflow scope ============== - Cadence: daily at 03:17 UTC (off-peak; non-zero minute matching the project's cron pattern guidance). - Workload: Redis only. The dedup feature ships behind the Redis adapter's onePhaseTxnDedup flag (RPUSH/LPUSH via listPushCoreWithDedup, MULTI/EXEC via runTransactionWithDedup, standalone SET via single-mop EXEC routing). DynamoDB / S3 / SQS do NOT route through the dedup loop, so re-running them under the gate would add hours of CI for zero signal on the new code path. - Cluster gate assertion: the launch step exits 2 immediately if the job-level env var is not '1'. The env var is set on the job and inherited by every run: step; nothing in demo.go can intercept or unset it before NewRedisServer reads os.Getenv. So if the asserted value is '1' at launch time, the dedup gate IS active in the cluster process — no log-grep guesswork. Design doc updates (docs/design/2026_05_21_..._idempotency.md) ============================================================== §M4 expanded with: - Local reproduction script reference (make jepsen-redis with the env var on). - 7-day criterion specifically tied to the new workflow file name. - Workflow scope rationale (Redis-only is intentional, not an oversight). - Gate assertion mechanism (env at job level, fail-fast on '!=1'). Caller audit (per /loop semantic-change rule) ============================================= No Go code changed. This is pure infrastructure: a new workflow file and a doc update. No production behavior change, no new test coverage on existing Go callers. Validation ========== - actionlint .github/workflows/jepsen-test-scheduled-dedup.yml clean. - The workflow's lein / go-mod-download / cache steps mirror the existing scheduled workflow line for line, so cache invalidation semantics are unchanged.

…on (#889) ## Summary Stacked on PR-B (#888). Adds `.github/workflows/jepsen-test-scheduled-dedup.yml`: a daily Jepsen run that launches the demo cluster with `ELASTICKV_REDIS_ONEPHASE_DEDUP=1` and executes the Redis workload. The 7-consecutive-days-green criterion in design doc §M4 is now operationally checkable. Base: `feat/standalone-set-dedup` (PR #888). ## Why a separate workflow vs adding the env var to existing scheduled workflow The legacy path (gate off) must also stay covered. The existing 6-hourly workflow runs the legacy path; this new daily workflow runs the dedup path. Both must stay green for option-2 to be safe to default-on. Mixing the gate into the existing workflow would lose legacy coverage without adding the additional dedup-on signal that the design doc actually calls for. ## Workflow scope - **Cadence**: daily at 03:17 UTC (off-peak; non-zero minute matching the project's cron pattern guidance). - **Workload**: Redis only. The dedup feature ships behind the Redis adapter's `onePhaseTxnDedup` flag (RPUSH/LPUSH via `listPushCoreWithDedup`, MULTI/EXEC via `runTransactionWithDedup`, standalone SET via single-mop EXEC routing). DynamoDB / S3 / SQS do NOT route through the dedup loop, so re-running them under the gate would add hours of CI for zero signal on the new code path. - **Cluster gate assertion**: the launch step exits 2 immediately if the job-level env var is not `1`. The env var is set on the job and inherited by every `run:` step; nothing in `demo.go` can intercept or unset it before `NewRedisServer` reads `os.Getenv`. So if the asserted value is `1` at launch time, the dedup gate IS active in the cluster process — no log-grep guesswork. ## Design doc updates `docs/design/2026_05_21_proposed_txn_secondary_idempotency.md` §M4 expanded with: - Local reproduction script reference (`make jepsen-redis` with the env var on). - 7-day criterion specifically tied to the new workflow file name. - Workflow scope rationale (Redis-only is intentional, not an oversight). - Gate assertion mechanism (env at job level, fail-fast on `!= "1"`). ## Caller audit (per /loop semantic-change rule) No Go code changed. This is pure infrastructure: a new workflow file and a doc update. No production behavior change, no new test coverage on existing Go callers. ## Validation - `actionlint .github/workflows/jepsen-test-scheduled-dedup.yml` clean - The workflow's lein / go-mod-download / cache steps mirror the existing scheduled workflow line for line, so cache invalidation semantics are unchanged

…t to fix CI flake (#890) ## Summary Fixes the CI flake observed in [Test workflow run 26678774684](https://github.com/bootjp/elastickv/actions/runs/26678774684): ``` --- FAIL: TestSQSServer_Throttle_SetQueueAttributesInvalidatesBucket (1.79s) sqs_throttle_integration_test.go:170: expected throttle, got 200 ``` ## Root cause Identical race to the one fixed for `TestSQSServer_Throttle_NoOpSetQueueAttributesPreservesBucket` in commit [54c6cd5](54c6cd56) (PR #819 follow-up): the 1-token-per-second refill rate races the test's own wall clock under `-race` on slow CI runners. For **this** test: 1. `mustSetQueueAttributes(Capacity=10, Refill=1)` 2. `for range 10 { send }` — drains the bucket 3. Sanity send — expects HTTP 400 (throttle) Each send goes through Raft propose+apply at ~100-250ms under `-race`. The 11 writes from steps 1-3 elapse ~1.1-2.75s. At Refill=1/sec the bucket has accumulated ≥1 token by step 3, so the sanity send returns HTTP 200 instead of 400 — **falsely** indicating a bucket-invalidation regression that does not exist. ## Fix Drop the initial Refill from `"1"` to `"0.01"` (1 token per 100 seconds) so no test-window wall-clock can accumulate to a whole token. The test's intent — *verify that a Capacity/Refill **raise** invalidates the cached bucket on the very next request* — is independent of the **initial** refill rate. The post-set assertion at line 182 is exercised against the fresh `Capacity=20/Refill=20` bucket, which is what the test actually claims to pin. ## Why only this test, not the sibling `TestSQSServer_Throttle_DeleteQueueInvalidatesBucket` has the same `Refill=1` initial config but **no post-drain sanity assertion** — it just drains without status checks, then verifies fresh capacity post-recreate. No race window there. ## Relation to other open PRs Unrelated to PRs #887/#888/#889 (option-2 dedup work). The flake surfaced on PR #889's CI run but the fix lives in pre-existing SQS test code that none of the dedup PRs touch. ## Caller audit (per /loop semantic-change rule) Test-only change. The throttle config validator (`sqs_catalog.go:163`) accepts fractional `float64 SendRefillPerSecond`; `0.01` is non-zero so `IsEmpty` (line 172) returns `false` and throttling stays enabled — the test still exercises the throttle path. Matches the prior fix's caller-audit conclusion verbatim. ## Validation - `go test ./adapter/ -run TestSQSServer_Throttle -race -count=3 -timeout 120s` passes (5.3s wall, all three iterations green) - `gofmt`, `go vet`, `golangci-lint run` all clean  ## Summary by CodeRabbit * **Tests** * Updated test configuration and expanded documentation to improve test stability on slow CI environments.  [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/bootjp/elastickv/pull/890?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

gemini-code-assist Bot reviewed May 30, 2026

View reviewed changes

bootjp mentioned this pull request May 30, 2026

ci(jepsen): add scheduled dedup-mode workflow + design doc M4 criterion #889

Merged

bootjp added 2 commits May 30, 2026 16:44

bootjp force-pushed the feat/standalone-set-dedup branch from 963a2d3 to 8321ad0 Compare May 30, 2026 07:44

bootjp merged commit 4da85dc into feat/txn-dedup-docs-and-tests May 30, 2026
4 checks passed

bootjp deleted the feat/standalone-set-dedup branch May 30, 2026 08:15

bootjp mentioned this pull request May 30, 2026

feat(txn-dedup): re-land M3 EXEC reuse on main + close M2 open + multi-mop #887

Merged

bootjp mentioned this pull request May 30, 2026

test(sqs): slow throttle refill on SetQueueAttributesInvalidatesBucket to fix CI flake #890

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(txn-dedup): standalone SET routes through option-2 dedup loop#888

feat(txn-dedup): standalone SET routes through option-2 dedup loop#888
bootjp merged 4 commits into
feat/txn-dedup-docs-and-testsfrom
feat/standalone-set-dedup

bootjp commented May 30, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 30, 2026

Uh oh!

coderabbitai Bot commented May 30, 2026 •

edited

Loading

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

bootjp commented May 30, 2026

Uh oh!

claude Bot commented May 30, 2026 •

edited

Loading

Uh oh!

bootjp commented May 30, 2026

Uh oh!

claude Bot commented May 30, 2026 •

edited

Loading

Uh oh!

bootjp commented May 30, 2026

Uh oh!

claude Bot commented May 30, 2026 •

edited

Loading

Uh oh!

bootjp commented May 30, 2026

Uh oh!

Uh oh!

claude Bot commented May 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bootjp commented May 30, 2026

Summary

Why standalone INCR / HSET are out of scope for THIS PR

Caller audit (per /loop semantic-change rule)

Tests

Validation

Design doc

Uh oh!

chatgpt-codex-connector Bot commented May 30, 2026

Uh oh!

coderabbitai Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

bootjp commented May 30, 2026

Uh oh!

claude Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR #888 Review — standalone SET dedup

Verification Results

(1) setLegacy byte-identical to pre-PR set() body — ✅ Confirmed

(2) writeRedisStandaloneResult coverage vs. applySet result types — ✅ Covered, with one minor observation

(3) Gate-off path unchanged — ✅ Confirmed

Issues

🔶 Redundant double gate — r.runTransaction vs. r.runTransactionWithDedup (adapter/redis.go:1135)

🔶 Test gap: NX/XX miss and GET return paths not covered (adapter/redis_set_dedup_test.go)

Verdict

Uh oh!

bootjp commented May 30, 2026

Uh oh!

claude Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR #888 Review — Round 2 (commit 963a2d3 fixes)

Verification Results

(1) Double-gate fix — ✅ Confirmed

(2) Test gap — ✅ Confirmed addressed

(3) Shallow-array doc comment — ✅ Confirmed

Verdict

Uh oh!

bootjp commented May 30, 2026

Uh oh!

claude Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR #888 Review — Round 3 (commit fd91a6a fixes)

Verification Results

(a) Attribution prefix stripped from inline comments — ✅ Confirmed (with one remaining gap)

(b) wroteNull bool field and test hardening — ✅ Confirmed airtight

Verdict

Uh oh!

bootjp commented May 30, 2026

Uh oh!

Uh oh!

claude Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR #888 Review — Round 4 (commit da6b0b4 cleanup)

Verification

Round 3 findings addressed — ✅ Confirmed by evidence

Cumulative review summary across all rounds

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 30, 2026 •

edited

Loading

claude Bot commented May 30, 2026 •

edited

Loading

(1) `setLegacy` byte-identical to pre-PR `set()` body — ✅ Confirmed

(2) `writeRedisStandaloneResult` coverage vs. `applySet` result types — ✅ Covered, with one minor observation

🔶 Redundant double gate — `r.runTransaction` vs. `r.runTransactionWithDedup` (adapter/redis.go:1135)

claude Bot commented May 30, 2026 •

edited

Loading

PR #888 Review — Round 2 (commit `963a2d3` fixes)

claude Bot commented May 30, 2026 •

edited

Loading

PR #888 Review — Round 3 (commit `fd91a6a` fixes)

(b) `wroteNull bool` field and test hardening — ✅ Confirmed airtight

claude Bot commented May 30, 2026 •

edited

Loading

PR #888 Review — Round 4 (commit `da6b0b4` cleanup)