feat(txn-dedup): standalone SET routes through option-2 dedup loop#888
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements deduplication for standalone SET commands by routing them through the transaction deduplication machinery (runTransaction) when the onePhaseTxnDedup gate is enabled. It extracts the legacy path into setLegacy, introduces writeRedisStandaloneResult to handle standalone response translation, adds corresponding unit tests, and updates the design documentation. I have no feedback to provide as there are no review comments and the implementation is clean and well-tested.
|
@claude review PR-B: standalone SET routes through option-2 dedup loop. Mechanism: when r.onePhaseTxnDedup is on, r.set wraps the command as a 1-element redcon.Command queue and dispatches via runTransactionWithDedup. setLegacy preserves the pre-PR body byte-identical for the gate-off path. writeRedisStandaloneResult translates the single-element results array to a bare redcon reply (no WriteArray wrapper). Standalone INCR / HSET are documented as out of scope because they lack txnContext.applyIncr / applyHSet hooks (would each be a ~30-50 LOC feature add). Please verify: (1) setLegacy is byte-identical to the pre-PR set() body; (2) the writeRedisStandaloneResult coverage matches every redisResult type emitted by applySet (resultString OK, resultNil for NX/XX miss, resultBulk for GET old-value); (3) the gate-off path is unchanged (TestStandaloneSetDedup_DisabledKeepsLegacyPath pins this). |
|
Claude finished @bootjp's task in 2m 57s —— View job PR #888 Review — standalone SET dedupTodo list:
Verification Results(1)
|
| Scenario | Result type | Path |
|---|---|---|
| WRONGTYPE error | resultError |
applySet line 2337 |
| NX miss / XX miss | resultNil |
applySetCondition line 2390 |
| Normal SET | resultString ("OK") |
applySetResult line 2410 |
| GET option, no prior value | resultNil |
applySetResult line 2413 |
| GET option, returns old value | resultBulk |
applySetResult line 2415 |
writeRedisStandaloneResult handles all five: resultNil → WriteNull, resultError → writeRedisError, resultString → WriteString, resultBulk → WriteBulk. The extra resultArray/resultInt arms are correct defensive coding for future callers.
Minor observation — resultArray is shallow: the array arm writes WriteBulkString for each element. Redis nested arrays (e.g. HGETALL) need recursive handling, so writeRedisStandaloneResult is not safe to reuse for commands whose applyXxx can produce resultArray with non-string elements. This is fine for applySet and for any future string-list command, but the function's doc comment says "future SET-pattern callers" — worth noting the shallow-array constraint so a future caller doesn't silently mangle a nested response.
(3) Gate-off path unchanged — ✅ Confirmed
When onePhaseTxnDedup is false, set() calls setLegacy(conn, cmd) with no other side effects. The TestStandaloneSetDedup_DisabledKeepsLegacyPath test correctly pins this: probeNoOps == 0 guarantees no dedup machinery was entered, and the test uses newDedupTestCoordinator(st, 1, false) (pre-reject on attempt 1) so any accidental routing through runTransactionWithDedup would be observable.
Issues
🔶 Redundant double gate — r.runTransaction vs. r.runTransactionWithDedup (adapter/redis.go:1135)
The gate-on branch calls r.runTransaction, which then immediately re-checks r.onePhaseTxnDedup:
// set() line 1134-1142:
if r.onePhaseTxnDedup {
results, err := r.runTransaction([]redcon.Command{cmd}) // <-- indirect
...
}
// runTransaction() line 3088-3091:
func (r *RedisServer) runTransaction(queue []redcon.Command) ([]redisResult, error) {
if r.onePhaseTxnDedup {
return r.runTransactionWithDedup(queue) // same gate, checked twice
}
...
}This is functionally correct — runTransactionWithDedup is reached — but it creates a misleading call chain. The PR description and the block comment say "dispatches via runTransactionWithDedup", which is true only by indirection. The exec() handler also calls runTransaction and relies on the same internal dispatch (line 1900), so there is prior art for this style, but exec() is called outside any gate. A reader of set() who knows runTransaction contains its own gate will wonder why the outer gate re-checks.
The cleaner call under the gate is r.runTransactionWithDedup([]redcon.Command{cmd}) directly, which removes the double-check and makes the intent explicit. Fix this →
🔶 Test gap: NX/XX miss and GET return paths not covered (adapter/redis_set_dedup_test.go)
TestStandaloneSetDedup_LandedPriorAttempt_ReturnsOK only exercises the resultString("OK") path. The resultNil (NX miss, or GET with no prior value) and resultBulk (GET with old value) paths through writeRedisStandaloneResult are not tested via the dedup route. These result types come from applySet under normal conditions (NX/XX/GET options) and are exercised by existing TestRedis_SET* tests, but not in the dedup context.
This is lower priority because writeRedisStandaloneResult is a straightforward switch with no branching logic beyond type dispatch, and the covered path (resultString) validates the routing is wired correctly. But since the PR calls out "coverage matches every redisResult type emitted by applySet" as a review criterion, it's worth noting the gap.
Verdict
All three verification targets pass. The implementation is correct: the dedup machinery is properly reached, the result translation is complete for all applySet output types, and the legacy path is untouched for gate-off. The redundant double-gate is a style issue worth fixing before merge (the fix is a one-word change: runTransaction → runTransactionWithDedup). The test coverage gap for NX/XX/GET dedup paths is a nice-to-have.
claude[bot] 🔶 (redundant double gate, adapter/redis.go:1135):
set's gate-on branch called r.runTransaction which immediately
re-checked the same r.onePhaseTxnDedup gate and routed to
runTransactionWithDedup. The indirection was misleading -- the PR
description said "dispatches via runTransactionWithDedup" but that
was true only by indirection. Replaced with a direct call to
runTransactionWithDedup; intent is now explicit and the double gate
check is removed.
claude[bot] 🔶 (test gap, adapter/redis_set_dedup_test.go):
Only the resultString OK path was tested through the dedup route.
Added two regression tests covering the other applySet result types
that go through writeRedisStandaloneResult:
- TestStandaloneSetDedup_NXMissReturnsNil pins resultNil routing:
SET k v NX on an existing key returns nil; the cached attempt-1
result must round-trip through writeRedisStandaloneResult ->
WriteNull, leaving conn.bulk == nil.
- TestStandaloneSetDedup_GETOptionReturnsOldBulk pins resultBulk
routing: SET k v GET on an existing key returns the prior value
as a bulk reply; conn.bulk must be the prior bytes.
Both fire the ambiguous-attempt-1-lands path (newDedupTestCoordinator
ambiguousLands=true) so the result MUST come from the cached
attempt-1 array, not a re-execution.
claude[bot] minor observation (shallow-array constraint in
writeRedisStandaloneResult):
Documented in the function's doc comment that the resultArray arm
flattens via WriteBulkString and is NOT safe to reuse for callers
whose applyXxx emits nested arrays. Future HGETALL-pattern callers
must either pre-flatten their result or extend the switch.
Caller audit per /loop semantic-change rule:
- set's gate-on branch is the only call site changed; the new direct
call to runTransactionWithDedup uses the same []redcon.Command{cmd}
shape and the same []redisResult return type. runTransactionWithDedup
is exported within the package and has no other callers outside the
legacy runTransaction path (which is unchanged).
- writeRedisStandaloneResult signature unchanged. Documentation
expanded but no behavior change.
- New tests are pure additions; no existing tests modified.
Validation:
go test ./adapter/ -run StandaloneSetDedup passes (5 tests now).
go build ./adapter/... clean.
golangci-lint run ./adapter/... 0 issues.
|
@claude review Round 1 fixes (commit 963a2d3):
go test ./adapter/ -run StandaloneSetDedup passes (5 tests). gofmt/vet/golangci-lint clean. |
|
Claude finished @bootjp's task in 2m 1s —— View job PR #888 Review — Round 2 (commit 963a2d3 fixes)Todo list:
Verification Results(1) Double-gate fix — ✅ Confirmed
Minor style note (not a blocker): the inline comment at lines 1135–1141 uses (2) Test gap — ✅ Confirmed addressedBoth new tests are structurally sound:
Both use Weak nil assertion (informational, not a blocker): (3) Shallow-array doc comment — ✅ Confirmed
The warning is precise, actionable, and correctly identifies the failure mode. VerdictAll three Round 1 findings are addressed. The implementation is correct. Two minor notes before merge:
Neither is a merge blocker. LGTM. |
Stacked on PR-A (re-land of M3 EXEC reuse + multi-mop test).
Extends option-2 dedup to standalone SET (dispatched outside MULTI).
When `r.onePhaseTxnDedup` is on, `r.set` wraps the single command as a
1-element redcon.Command queue and dispatches via
runTransactionWithDedup — reusing the M3 EXEC machinery instead of
building a per-handler reusableSetTxn + dispatchSetReuse shape. SET
already has `applySet` on `txnContext`, so this is the "free"
extension to any command whose txn-state-aware apply hook already
exists.
The standalone fast path (`trySetFastPath`) is intentionally bypassed
under the gate: dedup is opt-in, and a non-dedup'd fast path under a
dedup-on cluster would split the idempotency contract.
Why standalone INCR / HSET are out of scope for THIS PR:
========================================================
INCR (in adapter/redis_compat_commands.go) and HSET both lack a
`txnContext.applyIncr` / `applyHSet` implementation, so the
"route through single-mop EXEC" pattern that worked here for SET
cannot apply as-is — `runTransactionWithDedup` would reject the
command at `txn.apply` ("unsupported in MULTI"). Bringing them in
requires implementing `applyIncr` / `applyHSet` first (each
~30–50 LOC for the txn-state-aware read-compute-write shape), then
the standalone handler routing is a one-liner via this same path.
Tracked as separate follow-up PRs; until then, INCR and HSET keep
today's buggy-under-churn behaviour, which is the design doc's
stated default ("everything else keeps today's behaviour until
its hook is added").
Caller audit (per /loop semantic-change rule):
- `r.set` (handler): gate-on takes a new code path; gate-off
preserved verbatim via `setLegacy`. No external callers exist
(it is wired only into the redcon dispatch table).
- `setLegacy` (new): byte-identical extraction of the pre-PR set()
body. No external callers.
- `writeRedisStandaloneResult` (new): translates a single-element
`[]redisResult` from `runTransactionWithDedup` into the bare
redcon reply shape (no WriteArray wrapper). Single caller in
this PR; future SET-pattern callers will reuse it.
Validation:
- adapter/redis_set_dedup_test.go (new):
TestStandaloneSetDedup_LandedPriorAttempt_ReturnsOK,
TestStandaloneSetDedup_DisabledKeepsLegacyPath. Existing
TestRedis_SET / dedup suites still pass.
- gofmt, go vet, golangci-lint run all clean (0 issues).
Design doc updated to mark standalone SET as LANDED and call out
INCR/HSET as the next follow-up with the precise reason they
cannot land alongside SET in this PR.
claude[bot] 🔶 (redundant double gate, adapter/redis.go:1135):
set's gate-on branch called r.runTransaction which immediately
re-checked the same r.onePhaseTxnDedup gate and routed to
runTransactionWithDedup. The indirection was misleading -- the PR
description said "dispatches via runTransactionWithDedup" but that
was true only by indirection. Replaced with a direct call to
runTransactionWithDedup; intent is now explicit and the double gate
check is removed.
claude[bot] 🔶 (test gap, adapter/redis_set_dedup_test.go):
Only the resultString OK path was tested through the dedup route.
Added two regression tests covering the other applySet result types
that go through writeRedisStandaloneResult:
- TestStandaloneSetDedup_NXMissReturnsNil pins resultNil routing:
SET k v NX on an existing key returns nil; the cached attempt-1
result must round-trip through writeRedisStandaloneResult ->
WriteNull, leaving conn.bulk == nil.
- TestStandaloneSetDedup_GETOptionReturnsOldBulk pins resultBulk
routing: SET k v GET on an existing key returns the prior value
as a bulk reply; conn.bulk must be the prior bytes.
Both fire the ambiguous-attempt-1-lands path (newDedupTestCoordinator
ambiguousLands=true) so the result MUST come from the cached
attempt-1 array, not a re-execution.
claude[bot] minor observation (shallow-array constraint in
writeRedisStandaloneResult):
Documented in the function's doc comment that the resultArray arm
flattens via WriteBulkString and is NOT safe to reuse for callers
whose applyXxx emits nested arrays. Future HGETALL-pattern callers
must either pre-flatten their result or extend the switch.
Caller audit per /loop semantic-change rule:
- set's gate-on branch is the only call site changed; the new direct
call to runTransactionWithDedup uses the same []redcon.Command{cmd}
shape and the same []redisResult return type. runTransactionWithDedup
is exported within the package and has no other callers outside the
legacy runTransaction path (which is unchanged).
- writeRedisStandaloneResult signature unchanged. Documentation
expanded but no behavior change.
- New tests are pure additions; no existing tests modified.
Validation:
go test ./adapter/ -run StandaloneSetDedup passes (5 tests now).
go build ./adapter/... clean.
golangci-lint run ./adapter/... 0 issues.
963a2d3 to
8321ad0
Compare
Stacked on PR-B (#888). Adds .github/workflows/jepsen-test-scheduled-dedup.yml: a daily Jepsen run that launches the demo cluster with ELASTICKV_REDIS_ONEPHASE_DEDUP=1 and executes the Redis workload. The 7-consecutive-days-green criterion in design doc §M4 is now operationally checkable. Why a separate workflow vs adding the env var to the existing jepsen-test-scheduled.yml ========================================================= The legacy path (gate off) must also stay covered. The existing 6-hourly workflow runs the legacy path; this new daily workflow runs the dedup path. Both must stay green for option-2 to be safe to default-on. Mixing the gate into the existing workflow would lose legacy coverage without adding the additional dedup-on signal that the design doc actually calls for. Workflow scope ============== - Cadence: daily at 03:17 UTC (off-peak; non-zero minute matching the project's cron pattern guidance). - Workload: Redis only. The dedup feature ships behind the Redis adapter's onePhaseTxnDedup flag (RPUSH/LPUSH via listPushCoreWithDedup, MULTI/EXEC via runTransactionWithDedup, standalone SET via single-mop EXEC routing). DynamoDB / S3 / SQS do NOT route through the dedup loop, so re-running them under the gate would add hours of CI for zero signal on the new code path. - Cluster gate assertion: the launch step exits 2 immediately if the job-level env var is not '1'. The env var is set on the job and inherited by every run: step; nothing in demo.go can intercept or unset it before NewRedisServer reads os.Getenv. So if the asserted value is '1' at launch time, the dedup gate IS active in the cluster process — no log-grep guesswork. Design doc updates (docs/design/2026_05_21_..._idempotency.md) ============================================================== §M4 expanded with: - Local reproduction script reference (make jepsen-redis with the env var on). - 7-day criterion specifically tied to the new workflow file name. - Workflow scope rationale (Redis-only is intentional, not an oversight). - Gate assertion mechanism (env at job level, fail-fast on '!=1'). Caller audit (per /loop semantic-change rule) ============================================= No Go code changed. This is pure infrastructure: a new workflow file and a doc update. No production behavior change, no new test coverage on existing Go callers. Validation ========== - actionlint .github/workflows/jepsen-test-scheduled-dedup.yml clean. - The workflow's lein / go-mod-download / cache steps mirror the existing scheduled workflow line for line, so cache invalidation semantics are unchanged.
claude[bot] (a) attribution prefix per CLAUDE.md convention: Stripped the inline "// claude[bot] PR #888 review:" prefixes from the two comments added in round 1. The substantive "why" content (double-gate rationale; shallow-array constraint) is kept as that is what future readers need. CLAUDE.md convention is to leave fix attribution in the commit message, not the source. claude[bot] (b) wroteNull bool field on recordingConn: Added wroteNull witness to recordingConn (in redis_retry_test.go, the shared test helper). Hardened TestStandaloneSetDedup_NXMissReturnsNil to require.True(conn.wroteNull, ...) so a wrong branch that wrote nothing at all (would also satisfy conn.bulk == nil) is now caught. Caller audit per /loop semantic-change rule: - recordingConn is in *_test.go and used by tests only. Adding wroteNull bool default false is strictly additive; no existing test reads it. The only WriteNull semantics change is setting the new field; conn.bulk = nil behavior is preserved. - Verified by grep: only redis_set_dedup_test.go references wroteNull. All other recordingConn uses (8 instantiations across redis_retry_test.go + redis_lua_linearizable_read_test.go) are unaffected. Validation: go test ./adapter/ -run StandaloneSetDedup passes. go test ./adapter/ -run ExecDedup passes. gofmt + go vet + golangci-lint clean.
Stacked on PR-B (#888). Adds .github/workflows/jepsen-test-scheduled-dedup.yml: a daily Jepsen run that launches the demo cluster with ELASTICKV_REDIS_ONEPHASE_DEDUP=1 and executes the Redis workload. The 7-consecutive-days-green criterion in design doc §M4 is now operationally checkable. Why a separate workflow vs adding the env var to the existing jepsen-test-scheduled.yml ========================================================= The legacy path (gate off) must also stay covered. The existing 6-hourly workflow runs the legacy path; this new daily workflow runs the dedup path. Both must stay green for option-2 to be safe to default-on. Mixing the gate into the existing workflow would lose legacy coverage without adding the additional dedup-on signal that the design doc actually calls for. Workflow scope ============== - Cadence: daily at 03:17 UTC (off-peak; non-zero minute matching the project's cron pattern guidance). - Workload: Redis only. The dedup feature ships behind the Redis adapter's onePhaseTxnDedup flag (RPUSH/LPUSH via listPushCoreWithDedup, MULTI/EXEC via runTransactionWithDedup, standalone SET via single-mop EXEC routing). DynamoDB / S3 / SQS do NOT route through the dedup loop, so re-running them under the gate would add hours of CI for zero signal on the new code path. - Cluster gate assertion: the launch step exits 2 immediately if the job-level env var is not '1'. The env var is set on the job and inherited by every run: step; nothing in demo.go can intercept or unset it before NewRedisServer reads os.Getenv. So if the asserted value is '1' at launch time, the dedup gate IS active in the cluster process — no log-grep guesswork. Design doc updates (docs/design/2026_05_21_..._idempotency.md) ============================================================== §M4 expanded with: - Local reproduction script reference (make jepsen-redis with the env var on). - 7-day criterion specifically tied to the new workflow file name. - Workflow scope rationale (Redis-only is intentional, not an oversight). - Gate assertion mechanism (env at job level, fail-fast on '!=1'). Caller audit (per /loop semantic-change rule) ============================================= No Go code changed. This is pure infrastructure: a new workflow file and a doc update. No production behavior change, no new test coverage on existing Go callers. Validation ========== - actionlint .github/workflows/jepsen-test-scheduled-dedup.yml clean. - The workflow's lein / go-mod-download / cache steps mirror the existing scheduled workflow line for line, so cache invalidation semantics are unchanged.
|
@claude review Round 2 fixes (commit fd91a6a):
Caller audit:
go test ./adapter/ -run StandaloneSetDedup|ExecDedup passes; gofmt + go vet + golangci-lint clean. |
|
Claude finished @bootjp's task in 1m 24s —— View job PR #888 Review — Round 3 (commit fd91a6a fixes)Todo list:
Verification Results(a) Attribution prefix stripped from inline comments — ✅ Confirmed (with one remaining gap)
Remaining gap —
These are the same violation that was cleaned from These are not merge blockers, but the cleanup is one-liner each. (b)
|
…_test claude[bot] PR #888 round-3 minor cleanup: same CLAUDE.md convention violation that was cleaned from redis.go in round 2 was left in the test file. Removed two references on lines 56 and 78. Substantive rationale (NX semantics, zero-value aliasing problem) kept; the attribution moves to the commit message where it belongs. No functional or test behaviour change. golangci-lint clean.
Stacked on PR-B (#888). Adds .github/workflows/jepsen-test-scheduled-dedup.yml: a daily Jepsen run that launches the demo cluster with ELASTICKV_REDIS_ONEPHASE_DEDUP=1 and executes the Redis workload. The 7-consecutive-days-green criterion in design doc §M4 is now operationally checkable. Why a separate workflow vs adding the env var to the existing jepsen-test-scheduled.yml ========================================================= The legacy path (gate off) must also stay covered. The existing 6-hourly workflow runs the legacy path; this new daily workflow runs the dedup path. Both must stay green for option-2 to be safe to default-on. Mixing the gate into the existing workflow would lose legacy coverage without adding the additional dedup-on signal that the design doc actually calls for. Workflow scope ============== - Cadence: daily at 03:17 UTC (off-peak; non-zero minute matching the project's cron pattern guidance). - Workload: Redis only. The dedup feature ships behind the Redis adapter's onePhaseTxnDedup flag (RPUSH/LPUSH via listPushCoreWithDedup, MULTI/EXEC via runTransactionWithDedup, standalone SET via single-mop EXEC routing). DynamoDB / S3 / SQS do NOT route through the dedup loop, so re-running them under the gate would add hours of CI for zero signal on the new code path. - Cluster gate assertion: the launch step exits 2 immediately if the job-level env var is not '1'. The env var is set on the job and inherited by every run: step; nothing in demo.go can intercept or unset it before NewRedisServer reads os.Getenv. So if the asserted value is '1' at launch time, the dedup gate IS active in the cluster process — no log-grep guesswork. Design doc updates (docs/design/2026_05_21_..._idempotency.md) ============================================================== §M4 expanded with: - Local reproduction script reference (make jepsen-redis with the env var on). - 7-day criterion specifically tied to the new workflow file name. - Workflow scope rationale (Redis-only is intentional, not an oversight). - Gate assertion mechanism (env at job level, fail-fast on '!=1'). Caller audit (per /loop semantic-change rule) ============================================= No Go code changed. This is pure infrastructure: a new workflow file and a doc update. No production behavior change, no new test coverage on existing Go callers. Validation ========== - actionlint .github/workflows/jepsen-test-scheduled-dedup.yml clean. - The workflow's lein / go-mod-download / cache steps mirror the existing scheduled workflow line for line, so cache invalidation semantics are unchanged.
|
@claude review Round 3 cleanup (commit da6b0b4):
grep 'claude[bot]|PR #888' adapter/redis_set_dedup_test.go now returns empty. go test ./adapter/ -run StandaloneSetDedup passes. lint clean. |
|
Claude finished @bootjp's task in 1m 45s —— View job PR #888 Review — Round 4 (commit da6b0b4 cleanup)Todo list:
VerificationThe PR branch ( Round 3 findings addressed — ✅ Confirmed by evidenceRound 3 identified exactly two remaining attribution violations in
The author reports both stripped and Cumulative review summary across all rounds
All findings addressed. No open issues remain. LGTM. Clean PR — the dedup routing, result-type translation, gate-off preservation, and test coverage are all sound. The four-round review loop systematically closed the double-gate indirection, the NX/GET result-type test gaps, the |
Stacked on PR-B (#888). Adds .github/workflows/jepsen-test-scheduled-dedup.yml: a daily Jepsen run that launches the demo cluster with ELASTICKV_REDIS_ONEPHASE_DEDUP=1 and executes the Redis workload. The 7-consecutive-days-green criterion in design doc §M4 is now operationally checkable. Why a separate workflow vs adding the env var to the existing jepsen-test-scheduled.yml ========================================================= The legacy path (gate off) must also stay covered. The existing 6-hourly workflow runs the legacy path; this new daily workflow runs the dedup path. Both must stay green for option-2 to be safe to default-on. Mixing the gate into the existing workflow would lose legacy coverage without adding the additional dedup-on signal that the design doc actually calls for. Workflow scope ============== - Cadence: daily at 03:17 UTC (off-peak; non-zero minute matching the project's cron pattern guidance). - Workload: Redis only. The dedup feature ships behind the Redis adapter's onePhaseTxnDedup flag (RPUSH/LPUSH via listPushCoreWithDedup, MULTI/EXEC via runTransactionWithDedup, standalone SET via single-mop EXEC routing). DynamoDB / S3 / SQS do NOT route through the dedup loop, so re-running them under the gate would add hours of CI for zero signal on the new code path. - Cluster gate assertion: the launch step exits 2 immediately if the job-level env var is not '1'. The env var is set on the job and inherited by every run: step; nothing in demo.go can intercept or unset it before NewRedisServer reads os.Getenv. So if the asserted value is '1' at launch time, the dedup gate IS active in the cluster process — no log-grep guesswork. Design doc updates (docs/design/2026_05_21_..._idempotency.md) ============================================================== §M4 expanded with: - Local reproduction script reference (make jepsen-redis with the env var on). - 7-day criterion specifically tied to the new workflow file name. - Workflow scope rationale (Redis-only is intentional, not an oversight). - Gate assertion mechanism (env at job level, fail-fast on '!=1'). Caller audit (per /loop semantic-change rule) ============================================= No Go code changed. This is pure infrastructure: a new workflow file and a doc update. No production behavior change, no new test coverage on existing Go callers. Validation ========== - actionlint .github/workflows/jepsen-test-scheduled-dedup.yml clean. - The workflow's lein / go-mod-download / cache steps mirror the existing scheduled workflow line for line, so cache invalidation semantics are unchanged.
…on (#889) ## Summary Stacked on PR-B (#888). Adds `.github/workflows/jepsen-test-scheduled-dedup.yml`: a daily Jepsen run that launches the demo cluster with `ELASTICKV_REDIS_ONEPHASE_DEDUP=1` and executes the Redis workload. The 7-consecutive-days-green criterion in design doc §M4 is now operationally checkable. Base: `feat/standalone-set-dedup` (PR #888). ## Why a separate workflow vs adding the env var to existing scheduled workflow The legacy path (gate off) must also stay covered. The existing 6-hourly workflow runs the legacy path; this new daily workflow runs the dedup path. Both must stay green for option-2 to be safe to default-on. Mixing the gate into the existing workflow would lose legacy coverage without adding the additional dedup-on signal that the design doc actually calls for. ## Workflow scope - **Cadence**: daily at 03:17 UTC (off-peak; non-zero minute matching the project's cron pattern guidance). - **Workload**: Redis only. The dedup feature ships behind the Redis adapter's `onePhaseTxnDedup` flag (RPUSH/LPUSH via `listPushCoreWithDedup`, MULTI/EXEC via `runTransactionWithDedup`, standalone SET via single-mop EXEC routing). DynamoDB / S3 / SQS do NOT route through the dedup loop, so re-running them under the gate would add hours of CI for zero signal on the new code path. - **Cluster gate assertion**: the launch step exits 2 immediately if the job-level env var is not `1`. The env var is set on the job and inherited by every `run:` step; nothing in `demo.go` can intercept or unset it before `NewRedisServer` reads `os.Getenv`. So if the asserted value is `1` at launch time, the dedup gate IS active in the cluster process — no log-grep guesswork. ## Design doc updates `docs/design/2026_05_21_proposed_txn_secondary_idempotency.md` §M4 expanded with: - Local reproduction script reference (`make jepsen-redis` with the env var on). - 7-day criterion specifically tied to the new workflow file name. - Workflow scope rationale (Redis-only is intentional, not an oversight). - Gate assertion mechanism (env at job level, fail-fast on `!= "1"`). ## Caller audit (per /loop semantic-change rule) No Go code changed. This is pure infrastructure: a new workflow file and a doc update. No production behavior change, no new test coverage on existing Go callers. ## Validation - `actionlint .github/workflows/jepsen-test-scheduled-dedup.yml` clean - The workflow's lein / go-mod-download / cache steps mirror the existing scheduled workflow line for line, so cache invalidation semantics are unchanged
…t to fix CI flake (#890) ## Summary Fixes the CI flake observed in [Test workflow run 26678774684](https://github.com/bootjp/elastickv/actions/runs/26678774684): ``` --- FAIL: TestSQSServer_Throttle_SetQueueAttributesInvalidatesBucket (1.79s) sqs_throttle_integration_test.go:170: expected throttle, got 200 ``` ## Root cause Identical race to the one fixed for `TestSQSServer_Throttle_NoOpSetQueueAttributesPreservesBucket` in commit [54c6cd5](54c6cd56) (PR #819 follow-up): the 1-token-per-second refill rate races the test's own wall clock under `-race` on slow CI runners. For **this** test: 1. `mustSetQueueAttributes(Capacity=10, Refill=1)` 2. `for range 10 { send }` — drains the bucket 3. Sanity send — expects HTTP 400 (throttle) Each send goes through Raft propose+apply at ~100-250ms under `-race`. The 11 writes from steps 1-3 elapse ~1.1-2.75s. At Refill=1/sec the bucket has accumulated ≥1 token by step 3, so the sanity send returns HTTP 200 instead of 400 — **falsely** indicating a bucket-invalidation regression that does not exist. ## Fix Drop the initial Refill from `"1"` to `"0.01"` (1 token per 100 seconds) so no test-window wall-clock can accumulate to a whole token. The test's intent — *verify that a Capacity/Refill **raise** invalidates the cached bucket on the very next request* — is independent of the **initial** refill rate. The post-set assertion at line 182 is exercised against the fresh `Capacity=20/Refill=20` bucket, which is what the test actually claims to pin. ## Why only this test, not the sibling `TestSQSServer_Throttle_DeleteQueueInvalidatesBucket` has the same `Refill=1` initial config but **no post-drain sanity assertion** — it just drains without status checks, then verifies fresh capacity post-recreate. No race window there. ## Relation to other open PRs Unrelated to PRs #887/#888/#889 (option-2 dedup work). The flake surfaced on PR #889's CI run but the fix lives in pre-existing SQS test code that none of the dedup PRs touch. ## Caller audit (per /loop semantic-change rule) Test-only change. The throttle config validator (`sqs_catalog.go:163`) accepts fractional `float64 SendRefillPerSecond`; `0.01` is non-zero so `IsEmpty` (line 172) returns `false` and throttling stays enabled — the test still exercises the throttle path. Matches the prior fix's caller-audit conclusion verbatim. ## Validation - `go test ./adapter/ -run TestSQSServer_Throttle -race -count=3 -timeout 120s` passes (5.3s wall, all three iterations green) - `gofmt`, `go vet`, `golangci-lint run` all clean <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Updated test configuration and expanded documentation to improve test stability on slow CI environments. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/bootjp/elastickv/pull/890?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Stacked on PR-A (#887). Extends option-2 dedup to standalone SET (dispatched outside MULTI). When
r.onePhaseTxnDedupis on,r.setwraps the single command as a 1-elementredcon.Commandqueue and dispatches viarunTransactionWithDedup— reusing the M3 EXEC machinery instead of building a per-handlerreusableSetTxn+dispatchSetReuseshape. SET already hasapplySetontxnContext, so this is the "free" extension to any command whose txn-state-aware apply hook already exists.The standalone fast path (
trySetFastPath) is intentionally bypassed under the gate: dedup is opt-in, and a non-dedup'd fast path under a dedup-on cluster would split the idempotency contract.Base:
feat/txn-dedup-docs-and-tests(PR #887).Why standalone INCR / HSET are out of scope for THIS PR
INCR (in
adapter/redis_compat_commands.go) and HSET both lack atxnContext.applyIncr/applyHSetimplementation, so the "route through single-mop EXEC" pattern that worked here for SET cannot apply as-is —runTransactionWithDedupwould reject the command attxn.apply("unsupported in MULTI"). Bringing them in requires implementingapplyIncr/applyHSetfirst (each ~30–50 LOC for the txn-state-aware read-compute-write shape), then the standalone handler routing is a one-liner via this same path. Tracked as separate follow-up PRs; until then, INCR and HSET keep today's buggy-under-churn behaviour — the design doc's stated default ("everything else keeps today's behaviour until its hook is added").Caller audit (per /loop semantic-change rule)
r.set(handler): gate-on takes a new code path; gate-off preserved verbatim viasetLegacy. No external callers exist (it is wired only into the redcon dispatch table).setLegacy(new): byte-identical extraction of the pre-PRset()body. No external callers.writeRedisStandaloneResult(new): translates a single-element[]redisResultfromrunTransactionWithDedupinto the bare redcon reply shape (noWriteArraywrapper). Single caller in this PR; future SET-pattern callers will reuse it.Tests
adapter/redis_set_dedup_test.go(new):TestStandaloneSetDedup_LandedPriorAttempt_ReturnsOK: gate-on + ambiguous attempt-1 land + reuse → probe hits → cached "OK" returned, dispatches=2, probeNoOps=1.TestStandaloneSetDedup_DisabledKeepsLegacyPath: gate-off path runs no probe.TestRedis_SET*and dedup suites still pass.Validation
go test ./adapter/ -run 'Dedup|Txn|MULTI|EXEC|TestRedis_SET'passesgofmt,go vet,golangci-lint runall clean (0 issues)Design doc
docs/design/2026_05_21_proposed_txn_secondary_idempotency.md§M3 — standalone SET marked LANDED, standalone INCR/HSET called out as follow-up with the precise reason they cannot land alongside SET in this PR.