kv(composed1): M3 — verifyComposed1 apply-time gate + retry sentinels by bootjp · Pull Request #895 · bootjp/elastickv

bootjp · 2026-05-31T09:57:39Z

Summary

Third milestone (M3) of the Composed-1 cross-group commit-time guard per docs/design/2026_05_29_proposed_composed1_cross_group_commit_guard.md §M3.

Wires the safety property at apply time using the M2 versioned-snapshot ring and the M1 ObservedRouteVersion field.

Stacked on PR #894 (M2 plumbing).

What's enforced

(a) Observed-version owner — the spec-level Composed-1 check from tla/composed/Composed.tla. Every write key of a Composed-1-pinned txn must be owned by THIS Raft group at the txn's observed catalog version. Refusal returns ErrComposed1Violation.
(b) Current-version cross-version-read fence — the §4.4 / §3 codex P1 trace. Even when (a) passes, a route shift between BeginTxn and Commit can leave the write landing on the OLD owner while readers at the new version route to the NEW owner and miss the write. The current-version fence refuses that case. Refusal returns ErrComposed1Violation with a different diagnostic prefix (current-version vs observed-version) so the M4 retry path can distinguish.
(c) Retention-miss fail-closed — when the txn's observed version has been evicted from the M2 ring (long-running txn or high catalog churn), return ErrComposed1VersionGCd. Per design doc §4.3, the not-found semantics is a hard retryable error, NOT a soft pass, because soft-pass would let the gate be bypassed exactly in the cases where the cross-version-read hazard is most likely.

Short-circuits

Three legacy / not-applicable cases that the gate explicitly bypasses:

FSM constructed without WithRouteHistory (legacy / test seam): routes == nil, gate returns nil.
Request carries ObservedRouteVersion == 0 (unpinned — pre-M1 caller, or 2PC ABORT request that doesn't carry the version): gate returns nil. This is what keeps M3 from regressing every caller that has not migrated to set OperationGroup.ObservedRouteVersion.
Engine.Current returns (zero, false) — engine has no history (bare-struct test seam): the (b) fence falls through.

Changes

distribution/engine.go — adds Engine.Current() returning the snapshot at the current catalogVersion. Adds SetHistoryDepthForTest as the cross-package test seam that lets kv-side tests trigger eviction without bypassing the package boundary (claude review on PR kv(composed1): M2 — versioned-snapshot ring + kvFSM RouteHistory wiring #894 — fragile-but-test-local lock contract documented inline).
kv/fsm.go — defines ErrComposed1Violation and ErrComposed1VersionGCd sentinels with the wrapped-error diagnostic shape M4 retry will use. Adds verifyComposed1 and verifyOwnerFromSnapshot. Wires verifyComposed1 into handleTxnRequest at the top so PREPARE / COMMIT / ABORT / NONE all pass through the gate (ABORT carries ObservedRouteVersion=0 and falls through naturally).
kv/route_history.go — RouteHistory interface gains Current(). Adapter forwards to distribution.Engine.Current().

Tests (`kv/fsm_composed1_test.go`)

Test	Maps to design doc M3 "Done when"
`TestVerifyComposed1_StaleObservedVersionWithMovedKeyFails`	(i) observed-version snapshot resolves the key to a different group than this FSM
`TestVerifyComposed1_ObservedVersionOlderThanRingFails`	(ii) observed version is outside the ring → `ErrComposed1VersionGCd`
`TestVerifyComposed1_ObservedPassesButCurrentDiffersFails`	(iii) / §3 codex P1 trace: observed-version check passes but current-version differs → `ErrComposed1Violation` (`current-version` diagnostic)
`TestVerifyComposed1_ObservedVersionZeroSkipsGate`	Legacy caller bypasses the gate
`TestVerifyComposed1_NilRouteHistorySkipsGate`	Unwired FSM bypasses the gate

Verification

go build ./... — clean
go vet ./... — clean
go test -race -count=1 ./kv ./distribution — 10.2 s + 1.0 s, pass
make lint — 0 issues

Self-review (5 lenses)

Data loss — the gate refuses commits but cannot lose them. Every rejection emits a sentinel the M4 coordinator path will convert into a successful commit on the correct owner.
Concurrency — the gate runs under the FSM's apply loop (Raft applies are serial), so no new lock ordering. RouteHistory reads take the engine's read lock; SnapshotAt and Current each acquire and release in a single call.
Performance — two map lookups + a per-mutation linear scan of the snapshot's sorted routes per Composed-1-pinned commit. OwnerOf short-circuits via the M2 round-1 break optimisation, so the per-mutation cost is bounded by the first non-covering gap rather than the full routes slice. Legacy callers (ObservedRouteVersion=0) pay one branch per commit and exit.
Data consistency — closes the Composed-1 (observed) and Composed-1a (current) gaps end-to-end. Spec correspondence: tla/composed/Composed.tla's Commit precondition is now enforced at apply time; the §4.4 fence matches Composed1a_CommitToCurrentOwner from PR tla: Composed-1a current-version fence + currentfence_gap (#870 follow-up) #878.
Test coverage — 5 new unit tests cover the three design doc criteria + the two legacy short-circuit paths. Existing handleTxnRequest tests pass unchanged (their requests carry ObservedRouteVersion=0 from M1's behaviour-neutral default).

Test plan

go test -race ./kv ./distribution
make lint
kv(composed1): M2 — versioned-snapshot ring + kvFSM RouteHistory wiring #894 (M2) must merge first — this PR's branch is stacked on it.
(Follow-up PR) M4 — coordinator retry path. When either Composed-1 sentinel returns, the coordinator re-reads the route cache, re-routes the txn against the new owning group, and re-issues it once.

Resolves

The M3 row in the Composed-1 design doc.

Summary by CodeRabbit

Bug Fixes
- Introduced validation to prevent cross-group transaction commits when route ownership has shifted between observed and current catalog versions.
- Added specific error handling for stale observed versions and unavailable route history scenarios.
Tests
- Added comprehensive test coverage for transaction validation behavior across version and ownership scenarios.

Third milestone of the Composed-1 cross-group commit-time guard per docs/design/2026_05_29_proposed_composed1_cross_group_commit_guard.md §M3. Wires the safety property at apply time using the M2 versioned-snapshot ring and the M1 ObservedRouteVersion field. Stacked on PR #894 (M2 plumbing). What's enforced: (a) Observed-version owner — the spec-level Composed-1 check from tla/composed/Composed.tla. Every write key of a Composed-1- pinned txn must be owned by THIS Raft group at the txn's observed catalog version (the version it read its read-set at, set in M1 via OperationGroup.ObservedRouteVersion). Refusal returns ErrComposed1Violation. (b) Current-version cross-version-read fence — the §4.4 / §3 codex P1 trace. Even when (a) passes, a route shift between BeginTxn and Commit can leave the write landing on the OLD owner while readers at the new version route to the NEW owner and miss the write. The current-version fence refuses that case so M4 retry can re-route. Refusal returns ErrComposed1Violation with a different diagnostic prefix ("current-version" vs "observed-version") so the retry path can distinguish. (c) Retention-miss fail-closed — when the txn's observed version has been evicted from the M2 ring (long-running txn or high catalog churn), return ErrComposed1VersionGCd. Per design doc §4.3, the not-found semantics is a hard retryable error, NOT a soft pass, because soft-pass would let the gate be bypassed exactly in the cases where the cross-version-read hazard is most likely. Short-circuits cleanly in three legacy / not-applicable cases: * FSM constructed without WithRouteHistory (legacy / test seam): routes == nil, gate returns nil. * Request carries ObservedRouteVersion == 0 (unpinned — pre-M1 caller, or 2PC ABORT request that doesn't carry the version): gate returns nil. This is what keeps M3 from regressing every caller that has not migrated to set OperationGroup. ObservedRouteVersion. * Engine.Current returns (zero, false) — engine has no history (bare-struct test seam): the (b) fence falls through. Changes: * distribution/engine.go — adds Engine.Current() returning the snapshot at the current catalogVersion (used by the M3 fence). Adds SetHistoryDepthForTest as the cross-package test seam that lets kv-side tests trigger eviction without bypassing the package boundary (claude review on PR #894 — fragile-but- test-local lock contract documented inline). * kv/fsm.go — defines ErrComposed1Violation and ErrComposed1VersionGCd sentinels with the wrapped-error diagnostic shape M4 retry will use. Adds verifyComposed1 and verifyOwnerFromSnapshot. Wires verifyComposed1 into handleTxnRequest at the top so PREPARE / COMMIT / ABORT / NONE all pass through the gate (ABORT carries ObservedRouteVersion=0 and falls through naturally). * kv/route_history.go — RouteHistory interface gains Current(). Adapter forwards to distribution.Engine.Current(). Tests (kv/fsm_composed1_test.go): * TestVerifyComposed1_StaleObservedVersionWithMovedKeyFails — design doc M3 "done when" criterion (i): observed-version snapshot resolves the key to a different group than this FSM. * TestVerifyComposed1_ObservedVersionOlderThanRingFails — criterion (ii): observed version is outside the ring → ErrComposed1VersionGCd. Uses depth=2 via the cross-package SetHistoryDepthForTest seam. * TestVerifyComposed1_ObservedPassesButCurrentDiffersFails — criterion (iii) / the §3 codex P1 trace: observed-version check passes (routes[1][k1]=g1) but the current snapshot at v=2 has moved k1 to g2 → ErrComposed1Violation with the "current-version" diagnostic prefix. * TestVerifyComposed1_ObservedVersionZeroSkipsGate — legacy caller (ObservedRouteVersion=0) bypasses the gate. * TestVerifyComposed1_NilRouteHistorySkipsGate — unwired FSM (no WithRouteHistory option) bypasses the gate; matches the pre-feature posture. Verification: * go build ./... — clean * go vet ./... — clean * go test -race -count=1 ./kv ./distribution — 10.2 s + 1.0 s, pass Self-review (5 lenses): 1. Data loss — the gate refuses commits but cannot lose them. Every rejection emits a sentinel the M4 coordinator path will convert into a successful commit on the correct owner. 2. Concurrency — the gate runs under the FSM's apply loop (Raft applies are serial), so no new lock ordering. RouteHistory reads take the engine's read lock; SnapshotAt and Current each acquire and release in a single call. 3. Performance — two map lookups + a per-mutation linear scan of the snapshot's sorted routes per Composed-1-pinned commit. OwnerOf short-circuits via the M2 round-1 break optimisation, so the per-mutation cost is bounded by the first non-covering gap rather than the full routes slice. Legacy callers (ObservedRouteVersion=0) pay one branch per commit and exit. 4. Data consistency — closes the Composed-1 (observed) and Composed-1a (current) gaps end-to-end. Spec correspondence: tla/composed/Composed.tla's Commit precondition is now enforced at apply time; the §4.4 fence matches Composed1a_CommitToCurrentOwner from PR #878. 5. Test coverage — 5 new unit tests cover the three design doc criteria + the two legacy short-circuit paths. Existing handleTxnRequest tests pass unchanged (their requests carry ObservedRouteVersion=0 from M1's behaviour-neutral default). Next milestone (separate PR per design doc §6): M4 — coordinator retry path. When either Composed-1 sentinel returns, the coordinator re-reads the route cache, re-routes the txn against the new owning group, and re-issues it once.

bootjp · 2026-05-31T09:57:47Z

@claude review

coderabbitai · 2026-05-31T09:57:49Z

📝 Walkthrough

Walkthrough

This PR introduces a Composed-1 gate that enforces transaction correctness by validating ownership at both historical and current catalog versions. It extends the route-history mechanism with Current() snapshots and implements dual ownership checks for transaction mutations before apply, rejecting stale commits and preventing cross-group consistency violations.

Changes

Composed-1 Gate for Transaction Consistency

Layer / File(s)	Summary
Engine history snapshot APIs `distribution/engine.go`, `kv/route_history.go`	`Engine.Current()` fetches the current catalogVersion route snapshot with `(RouteHistorySnapshot, bool)` return, `Engine.SetHistoryDepthForTest()` controls FIFO ring eviction, and `distributionEngineAdapter.Current()` adapts the engine method to the `RouteHistory` interface.
RouteHistory interface contract `kv/fsm.go`	`RouteHistory` interface is documented to expose `Current()` alongside `SnapshotAt()`, signaling availability of current snapshots for ownership validation.
Composed-1 gate logic and errors `kv/fsm.go`	`ErrComposed1Violation` and `ErrComposed1VersionGCd` errors distinguish ownership drift from evicted history, `verifyComposed1()` validates each non-internal mutation is owned by the FSM's shard group at both the observed (via `SnapshotAt`) and current (via `Current()`) versions, and `handleTxnRequest` invokes the gate before phase dispatch.
Composed-1 gate test suite `kv/fsm_composed1_test.go`	Test helpers (`applyComposed1Snapshot`, `newComposed1FSM`, `commitTxnRequest`) wire a `kvFSM` with route history and apply catalog snapshots to an in-memory engine, followed by 5 tests covering stale observed + moved key, evicted observed version, current-version fence override, legacy zero-version bypass, and nil-routes bypass.

Sequence Diagram

sequenceDiagram
  participant Client
  participant FSM as kvFSM<br/>(handleTxnRequest)
  participant Gate as verifyComposed1()
  participant Observed as RouteHistory<br/>(SnapshotAt)
  participant Current as RouteHistory<br/>(Current)

  Client->>FSM: Txn with ObservedRouteVersion
  FSM->>Gate: verifyComposed1(request)
  Gate->>Observed: SnapshotAt(ObservedVersion)
  alt Observed version evicted
    Observed-->>Gate: nil, false
    Gate-->>FSM: ErrComposed1VersionGCd
  else Observed version found
    Observed-->>Gate: RouteSnapshot
    Gate->>Gate: Check key owners vs shard group
    alt Owner mismatch at observed
      Gate-->>FSM: ErrComposed1Violation(observed-version)
    else Owner match at observed
      Gate->>Current: Current()
      Current-->>Gate: RouteSnapshot
      Gate->>Gate: Check key owners vs shard group
      alt Owner mismatch at current
        Gate-->>FSM: ErrComposed1Violation(current-version)
      else All checks pass
        Gate-->>FSM: nil
        FSM->>FSM: Proceed to phase dispatch
      end
    end
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

bootjp/elastickv#881: Propagates ObservedRouteVersion into pb.Request across shards during PREPARE/COMMIT, which the Composed-1 gate consumes to validate historical ownership.
bootjp/elastickv#894: Introduces versioned route-history ring (history, SnapshotAt(), RouteHistorySnapshot), which this PR extends with Current() and gates for ownership validation.

Poem

🐰 A gate guards the shard's precious keys,
Checking owners at observed and now—
No stale commits slip through the breeze,
Each mutation speaks its vow.
Cross-group chaos bows!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly summarizes the PR's primary objective—implementing the M3 verifyComposed1 gate with retry sentinels—and is specific, concise, and clearly related to all major changes across distribution/engine.go, kv/fsm.go, and supporting files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/composed1-m3-verify-gate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-31T09:57:51Z

TLA+ spec divergence review (auto-triggered)

This PR touches files that the TLA+ safety spec has an anchor on (per
docs/design/2026_05_28_partial_tla_safety_spec.md §3),
so an AI review is requested below to verify the implementation has not drifted
from the model.

Anchored files changed in this PR head (c94e0d1):

distribution/engine.go
kv/fsm.go

What to check, by subsystem:

kv/hlc*.go — Next() must respect the HLC-4 preconditions (i)/(ii)/(iii) from the design doc: bounded skew, logical-counter handoff on leader change (strategy (c) Observe(MaxAppliedHLC)), and the commit-time ceiling fence (fail-closed when wall_now >= physicalCeiling). Any change to the bit layout (48/16), the CAS loop, or the ceiling getter/setter is in scope.
kv/coordinator.go, kv/sharded_coordinator.go — RunHLCLeaseRenewal, hlcRenewalInterval, hlcPhysicalWindowMs constants, and the new-term detection that calls Observe(fsm.MaxAppliedHLC()) (strategy (c)). Any change to renewal cadence, group selection, or fail-closed behaviour is in scope.
kv/transaction.go, kv/lock_resolver.go — OCC commit-ts assignment, lock-map encoding (key, lock_ts) -> start_ts, and the LockResolver action OCC-3 depends on. (M2 spec will land OCC-1..OCC-5; until then the spec doc §5.2 is the contract.)
kv/fsm.go — FSM apply of HLC lease entries (SetPhysicalCeiling), and any future MaxAppliedHLC() accessor that strategy (c) needs.
store/mvcc_store.go — version visibility, snapshot install, and the MVCC-1..MVCC-4 invariants (M3 scope).
distribution/** — route catalog versioning, SplitRange atomicity, and CatalogWatcher async fan-out (M4 scope).

If the change is correct but requires a spec update, edit tla/hlc/HLC.tla (or the corresponding M2..M5 module once landed) and the design doc in the same PR. The tla-check workflow runs the TLC model check on the same paths.

@claude review please verify TLA+ spec divergence per the checklist above.

@codex review please verify TLA+ spec divergence per the checklist above.

chatgpt-codex-connector · 2026-05-31T09:57:58Z

To use Codex here, create a Codex account and connect to github.

claude · 2026-05-31T09:58:00Z

Claude finished @bootjp's task in 3m 24s —— View job

Code Review — kv(composed1): M3 — verifyComposed1 apply-time gate + retry sentinels

Gather context and understand the request
Read changed files
Review design doc for M3 spec correspondence
Review lens 1: Data loss
Review lens 2: Concurrency / distributed failures
Review lens 3: Performance
Review lens 4: Data consistency
Review lens 5: Test coverage

Overall the implementation is solid. The gate logic matches the TLA+ spec (§4.2(a)) and the cross-version fence (§4.4) is correctly wired. No data-loss paths, no lock-ordering issues, no safety bugs found. A few issues worth addressing are called out below in order of severity.

Issues

1. Misleading comment in `SetHistoryDepthForTest` — contradicts its own implementation

distribution/engine.go:217–228

// SetHistoryDepthForTest overrides the FIFO ring depth from outside
// the package.  Test-only — callers MUST set the depth before the
// Engine is shared with any concurrent reader (no internal
// synchronisation here …
func (e *Engine) SetHistoryDepthForTest(depth int) {
    e.mu.Lock()       // ← lock IS acquired
    defer e.mu.Unlock()
    e.historyDepth = depth
}

The comment says "no internal synchronisation here" but the function acquires e.mu.Lock(). A future maintainer reading the comment will think concurrent calls are unsafe when they are actually protected. The comment was likely copied from the in-package direct field write (TestEngineSnapshotAt_FIFOEviction) and not updated.

Suggested fix: change "no internal synchronisation here" → "caller should set the depth before the Engine receives concurrent Snapshots to avoid test-interleaving surprises, but the write itself is lock-protected." Fix this →

2. Missing "happy path" test — gate can silently block valid txns undetected

kv/fsm_composed1_test.go

All 5 tests assert that the gate fires (violation, GC'd) or short-circuits (nil routes, zero version). There is no test that asserts the gate passes cleanly when the txn's observed version correctly identifies this group as the owner at both the observed and current versions. Without a passing-gate test, an over-aggressive check (e.g., an off-by-one in OwnerOf or an inadvertent negation of owner != f.shardGroupID) could incorrectly reject every valid commit and the test suite would not catch it.

Suggested addition — a test along these lines:

func TestVerifyComposed1_ValidOwnershipPassesGate(t *testing.T) {
    t.Parallel()
    e := distribution.NewEngine()
    // v=1 and v=2 both assign k to g1 — no route shift.
    applyComposed1Snapshot(t, e, 1, []distribution.RouteDescriptor{
        {RouteID: 100, Start: []byte(""), End: nil, GroupID: 1, State: distribution.RouteStateActive},
    })
    applyComposed1Snapshot(t, e, 2, []distribution.RouteDescriptor{
        {RouteID: 100, Start: []byte(""), End: nil, GroupID: 1, State: distribution.RouteStateActive},
    })
    fsm := newComposed1FSM(t, e, 1)

    err := fsm.verifyComposed1(commitTxnRequest(1, "k"))
    require.NoError(t, err, "gate must pass when this group owns the key at both observed and current version")
}

Fix this →

3. Minor: `ErrComposed1Violation` doc comment overstates the scope of the check

kv/fsm.go:215

// ErrComposed1Violation is returned … because the
// txn's read-set or write-set keys are not owned by this group …

verifyOwnerFromSnapshot iterates r.GetMutations() (the write-set only). r.ReadKeys is not checked — consistent with the TLA+ spec where Composed1_CommitToOwningGroup quantifies over txnWriteSet. The doc comment's "read-set or write-set" is misleading.

Fix this →

4. Minor: Design doc Q1 not addressed at M3

Design doc §9 Q1 says: "allow unpinned at M1, log a warning at M3, error at M5."

The M3 implementation short-circuits ObservedRouteVersion==0 with return nil and no log. Keeping the gate silent is probably the right call right now (to avoid log spam from all legacy callers), but it drifts from the design doc's stated M3 plan. Either update the doc to defer the warning to M4/M5, or add a single slog.WarnContext at the observedVer == 0 short-circuit. This is a documentation/tracking issue, not a correctness issue.

5. Nit: Self-correction prose in test comment

kv/fsm_composed1_test.go:73–77

// Wait — re-read the design doc: criterion (i) is …

The "Wait — re-read the design doc" reads like a debugging note. The clarification is useful but could be written as a direct statement. This is purely cosmetic.

Minor optimisation opportunity (non-blocking)

When observedVer == currentSnap.Version() (the catalog hasn't advanced since txn-begin), verifyOwnerFromSnapshot is called twice against the same snapshot. Adding an early exit after the observed check passes avoids the redundant linear scan:

// (b) Current-version fence.
currentSnap, ok := f.routes.Current()
if !ok {
    return nil
}
// Fast path: if the current version is the same as the observed version,
// the observed-version check already validated current ownership.
if currentSnap.Version() == observedVer {
    return nil
}
return f.verifyOwnerFromSnapshot(r.GetMutations(), currentSnap, currentSnap.Version(), "current")

This is O(|write-set|) savings and entirely optional, but it's a clean win for the common steady-state case.

Lens summary

Lens	Result
Data loss	Clean — gate can only reject, never silently drop committed writes
Concurrency	Clean — gate runs under the serial Raft apply loop; `SnapshotAt`/`Current` acquire and release `e.mu.RLock` atomically
Performance	Two map lookups + O(
Data consistency	Spec correspondence correct; write-set only check matches `Composed1_CommitToOwningGroup`; fail-closed on ring miss is correct per §4.3
Test coverage	5 failure/short-circuit tests present; missing happy-path test (see issue 2 above); no multi-key test

gemini-code-assist

Code Review

This pull request implements the apply-time Composed-1 gate to prevent cross-version-read anomalies during route shifts. It introduces the Current() method to retrieve the latest route catalog snapshot, defines ErrComposed1Violation and ErrComposed1VersionGCd errors, and adds verification logic in kvFSM.handleTxnRequest to validate key ownership against both the observed and current catalog versions. Comprehensive unit tests are also added. The reviewer identified two critical issues: first, ABORT requests must bypass the Composed-1 gate to avoid leaking MVCC locks, and a short-circuit is needed when shardGroupID is unset (0); second, reducing history depth via SetHistoryDepthForTest on an already populated engine can cause a runtime panic unless excess history entries are immediately truncated.

gemini-code-assist · 2026-05-31T09:59:22Z

+func (f *kvFSM) verifyComposed1(r *pb.Request) error {
+	if f.routes == nil {
+		return nil
+	}


There are two critical issues here:

ABORT requests must not be blocked by the Composed-1 gate: If an ABORT request is rejected due to a route shift (ErrComposed1Violation) or history eviction (ErrComposed1VersionGCd), the transaction's locks and intents will be permanently leaked in the MVCC store, blocking future writes. Aborts should always bypass the gate to ensure successful cleanup.

Missing short-circuit for unset shardGroupID: The comments state that a shardGroupID of 0 represents an unset/legacy state and should short-circuit the gate. However, the code does not check for f.shardGroupID == 0, which will cause spurious ErrComposed1Violation errors on unset groups.

func (f *kvFSM) verifyComposed1(r *pb.Request) error { if r.Phase == pb.Phase_ABORT { return nil } if f.routes == nil || f.shardGroupID == 0 { return nil }

gemini-code-assist · 2026-05-31T09:59:22Z

+func (e *Engine) SetHistoryDepthForTest(depth int) {
+	e.mu.Lock()
+	defer e.mu.Unlock()
+	e.historyDepth = depth
+}


If SetHistoryDepthForTest is called to reduce the history depth after the history ring has already been populated with more entries than the new depth, a subsequent call to recordHistorySnapshotLocked will trigger a runtime panic. This is because make([]uint64, len(e.historyOrder)-1, e.historyDepth) will be invoked with a length greater than its capacity. To prevent this, truncate the excess history entries immediately when updating the depth.

Suggested change

func (e *Engine) SetHistoryDepthForTest(depth int) {

e.mu.Lock()

defer e.mu.Unlock()

e.historyDepth = depth

}

func (e *Engine) SetHistoryDepthForTest(depth int) {

e.mu.Lock()

defer e.mu.Unlock()

e.historyDepth = depth

if len(e.historyOrder) > depth {

excess := len(e.historyOrder) - depth

for i := 0; i < excess; i++ {

delete(e.history, e.historyOrder[i])

}

retained := make([]uint64, depth)

copy(retained, e.historyOrder[excess:])

e.historyOrder = retained

}

}

github-actions · 2026-05-31T09:59:48Z

+// docs/design/2026_05_29_proposed_composed1_cross_group_commit_guard.md
+// §4.2(a) + §4.4.  Runs two checks before the txn's writes land:
+//
+//   (a) Observed-version owner — the txn's read-set was captured


🚫 [golangci] _{reported by reviewdog 🐶}
File is not properly formatted (gci)

github-actions · 2026-05-31T09:59:48Z

+// in main.go's buildShardGroups; this helper short-circuits to the
+// test-only fixture without spinning up a real Raft group.
+//
+//nolint:unparam // shardGroupID is currently always 1 in tests but


🚫 [golangci] _{reported by reviewdog 🐶}
File is not properly formatted (gci)

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

kv/fsm_composed1_test.go (1)
32-35: ⚡ Quick win

Drop the new //nolint:unparam and refactor the helper instead.

This suppression is avoidable in new test code: either bake in the single currently used group ID, or make the group ID vary through actual table cases when you need that coverage.

As per coding guidelines, "Avoid //nolint annotations — refactor instead."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kv/fsm_composed1_test.go` around lines 32 - 35, Remove the //nolint:unparam
annotation and refactor newComposed1FSM to stop accepting the unused
shardGroupID parameter: either bake the single used group ID (1) directly into
newComposed1FSM (replace the shardGroupID parameter with the constant 1 inside
the function) and update all call sites to call newComposed1FSM(t, e)
accordingly, or convert the tests that need varying group IDs into table-driven
cases that construct the FSM with different explicit group IDs (so the parameter
becomes used). Ensure all references to function newComposed1FSM are updated and
delete the //nolint comment.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@distribution/engine.go`:
- Around line 225-229: SetHistoryDepthForTest should validate the incoming depth
and reject non-positive values to fail fast; inside
Engine.SetHistoryDepthForTest (protected by e.mu), add a check like if depth <=
0 { panic("SetHistoryDepthForTest: depth must be > 0") } (or return an
error/report) before assigning e.historyDepth so recordHistorySnapshotLocked and
the eviction path never encounter an empty historyOrder; update any tests that
call SetHistoryDepthForTest to provide positive depths.

In `@kv/fsm.go`:
- Around line 582-589: verifyComposed1 currently only short-circuits when
f.routes == nil but the documented behavior says shardGroupID == 0 should also
bypass; update kvFSM.verifyComposed1 to return nil immediately when
f.shardGroupID == 0 (before any route/version checks) so callers that install
routes before setting shardGroupID preserve the legacy skip-for-group-0
behavior; keep the existing observedVer handling unchanged.

---

Nitpick comments:
In `@kv/fsm_composed1_test.go`:
- Around line 32-35: Remove the //nolint:unparam annotation and refactor
newComposed1FSM to stop accepting the unused shardGroupID parameter: either bake
the single used group ID (1) directly into newComposed1FSM (replace the
shardGroupID parameter with the constant 1 inside the function) and update all
call sites to call newComposed1FSM(t, e) accordingly, or convert the tests that
need varying group IDs into table-driven cases that construct the FSM with
different explicit group IDs (so the parameter becomes used). Ensure all
references to function newComposed1FSM are updated and delete the //nolint
comment.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: af79a132-4d95-4f0c-b28e-0e8008f5f03d

📥 Commits

Reviewing files that changed from the base of the PR and between 2b118a4 and c94e0d1.

📒 Files selected for processing (4)

distribution/engine.go
kv/fsm.go
kv/fsm_composed1_test.go
kv/route_history.go

coderabbitai · 2026-05-31T10:02:28Z

+func (e *Engine) SetHistoryDepthForTest(depth int) {
+	e.mu.Lock()
+	defer e.mu.Unlock()
+	e.historyDepth = depth
+}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject non-positive history depths here.

Passing depth <= 0 makes the next recordHistorySnapshotLocked() hit the eviction path with an empty historyOrder, so the helper panics later with an index-out-of-range instead of failing fast at the call site.

Suggested fix

func (e *Engine) SetHistoryDepthForTest(depth int) { e.mu.Lock() defer e.mu.Unlock() + if depth <= 0 { + panic("SetHistoryDepthForTest: depth must be > 0") + } e.historyDepth = depth }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (e *Engine) SetHistoryDepthForTest(depth int) {

e.mu.Lock()

defer e.mu.Unlock()

e.historyDepth = depth

}

func (e *Engine) SetHistoryDepthForTest(depth int) {

e.mu.Lock()

defer e.mu.Unlock()

if depth <= 0 {

panic("SetHistoryDepthForTest: depth must be > 0")

}

e.historyDepth = depth

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@distribution/engine.go` around lines 225 - 229, SetHistoryDepthForTest should validate the incoming depth and reject non-positive values to fail fast; inside Engine.SetHistoryDepthForTest (protected by e.mu), add a check like if depth <= 0 { panic("SetHistoryDepthForTest: depth must be > 0") } (or return an error/report) before assigning e.historyDepth so recordHistorySnapshotLocked and the eviction path never encounter an empty historyOrder; update any tests that call SetHistoryDepthForTest to provide positive depths.

coderabbitai · 2026-05-31T10:02:28Z

+func (f *kvFSM) verifyComposed1(r *pb.Request) error {
+	if f.routes == nil {
+		return nil
+	}
+	observedVer := r.GetObservedRouteVersion()
+	if observedVer == 0 {
+		return nil
+	}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Honor the documented shardGroupID == 0 bypass.

kvFSM.shardGroupID is documented as “unset => short-circuit”, but verifyComposed1() only checks routes == nil. A partially wired caller that installs route history before the group ID will start rejecting every pinned txn against group 0 instead of preserving the legacy skip behavior.

Suggested fix

func (f *kvFSM) verifyComposed1(r *pb.Request) error { - if f.routes == nil { + if f.routes == nil || f.shardGroupID == 0 { return nil } observedVer := r.GetObservedRouteVersion() if observedVer == 0 { return nil

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@kv/fsm.go` around lines 582 - 589, verifyComposed1 currently only short-circuits when f.routes == nil but the documented behavior says shardGroupID == 0 should also bypass; update kvFSM.verifyComposed1 to return nil immediately when f.shardGroupID == 0 (before any route/version checks) so callers that install routes before setting shardGroupID preserve the legacy skip-for-group-0 behavior; keep the existing observedVer handling unchanged.

gemini-code-assist Bot reviewed May 31, 2026

View reviewed changes

github-actions Bot reviewed May 31, 2026

View reviewed changes

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

Conversation

bootjp commented May 31, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's enforced

Short-circuits

Changes

Tests (kv/fsm_composed1_test.go)

Verification

Self-review (5 lenses)

Test plan

Resolves

Summary by CodeRabbit

Uh oh!

bootjp commented May 31, 2026

Uh oh!

coderabbitai Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

github-actions Bot commented May 31, 2026

TLA+ spec divergence review (auto-triggered)

Uh oh!

chatgpt-codex-connector Bot commented May 31, 2026

Uh oh!

claude Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review — kv(composed1): M3 — verifyComposed1 apply-time gate + retry sentinels

Issues

1. Misleading comment in SetHistoryDepthForTest — contradicts its own implementation

2. Missing "happy path" test — gate can silently block valid txns undetected

3. Minor: ErrComposed1Violation doc comment overstates the scope of the check

4. Minor: Design doc Q1 not addressed at M3

5. Nit: Self-correction prose in test comment

Minor optimisation opportunity (non-blocking)

Lens summary

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bootjp commented May 31, 2026 •

edited by coderabbitai Bot

Loading

Tests (`kv/fsm_composed1_test.go`)

coderabbitai Bot commented May 31, 2026 •

edited

Loading

claude Bot commented May 31, 2026 •

edited

Loading

1. Misleading comment in `SetHistoryDepthForTest` — contradicts its own implementation

3. Minor: `ErrComposed1Violation` doc comment overstates the scope of the check