Skip to content

docs(design): propose Composed-1 cross-group commit-time ownership guard#870

Merged
bootjp merged 2 commits into
mainfrom
docs/composed1-cross-group-guard
May 29, 2026
Merged

docs(design): propose Composed-1 cross-group commit-time ownership guard#870
bootjp merged 2 commits into
mainfrom
docs/composed1-cross-group-guard

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented May 29, 2026

Summary

Forward-looking design proposal for the Composed-1 cross-group commit-time ownership guard, the implementation that would be required before introducing any cross-group route mutation (cross-group SplitRange, MoveRange, group decommission).

Today the implementation is vacuously safe with respect to Composed-1 (proved in tla/composed/Composed.tla / MCComposed.cfg) because SplitRange is same-group only and no other route-mutating RPC exists. This doc designs the apply-time check that any cross-group RPC would need to land alongside — review-and-ready instead of patched in under time pressure.

What the doc covers

  • The Composed-1 invariant from Composed.tla and the canonical 4-step gap counterexample at depth 4.
  • Three structural facts that make today's implementation vacuously safe (same-group-only SplitRange; no MoveRange/Rebalance; leader-side-cached routes).
  • The hazard if cross-group route mutation is introduced — G1c-style lost-write across a group boundary, invisible to current Jepsen workloads because the RPC surface doesn't expose the trigger.
  • Proposed plumbing:
    • ObservedRouteVersion field on OperationGroup propagated through dispatchTxn into pb.Request.
    • FSM verifyComposed1 at apply time + new ErrComposed1Violation sentinel.
    • Coordinator retry on the sentinel (re-read routes, re-route txn, retry once).
    • Catalog version retention ring in distribution/Engine (default depth 32; soft-failure on miss).
  • TLA+-to-implementation correspondence — the apply check directly refines the spec's Commit precondition.
  • Five milestones M1..M5 (plumbing → ring → check → retry → Jepsen workload), each independently testable.
  • Five-lens self-review per CLAUDE.md.
  • Three open questions (deprecation of unpinned txns; routeHistoryDepth default; hard-vs-soft missing-version behaviour).

Why now, why this PR

CLAUDE.md mandates "design-doc-first" for changes touching routing, OCC, or FSM apply semantics. The implementation does not land with this PR — only the proposal. Landing the doc first lets the eventual implementation PR cite a reviewed contract instead of inventing one inline.

Test plan

  • Reviewer cross-checks §4.2 (FSM check) against the spec's Commit predicate in tla/composed/Composed.tla.
  • Reviewer confirms the milestone ordering allows M1+M2 to land independently without a behaviour change.
  • Reviewer agrees with the three open questions or proposes alternatives.
  • No code changes expected from this PR.

Forward-looking design proposal. Today the implementation is
vacuously safe wrt Composed-1 (the TLA+ cross-module invariant
proved in tla/composed) because SplitRange is same-group only.
The doc designs the apply-time ownership re-validation that any
future cross-group route mutation (cross-group SplitRange,
MoveRange, group decommission) would need *before* it can land —
review-and-ready instead of patched in under time pressure.

What the doc covers:

  * The Composed-1 invariant from Composed.tla and the canonical
    4-step gap counterexample.
  * Why the current implementation is vacuously safe.
  * The hazard if cross-group route mutation is introduced —
    structurally a G1c-style lost-write across a group boundary,
    invisible to current Jepsen workloads because the RPC surface
    doesn't expose the trigger.
  * The proposed plumbing: ObservedRouteVersion field on
    OperationGroup → propagated through dispatchTxn into
    pb.Request → FSM verifyComposed1 check at apply time →
    coordinator-side retry on ErrComposed1Violation.
  * Catalog version retention (small versioned-snapshot ring in
    distribution/Engine, default depth 32, soft-failure on miss).
  * TLA+-to-implementation correspondence — the apply check
    refines exactly the Commit precondition the spec asserts.
  * Five milestones M1..M5 (plumbing → ring → check → retry →
    Jepsen workload), each independently testable.
  * Five-lens self-review per CLAUDE.md.
  * Three open questions (deprecation of unpinned txns,
    routeHistoryDepth default, hard-vs-soft missing-version
    behaviour).

This is the "design-doc-first" artefact required by CLAUDE.md
for changes touching routing / OCC / FSM apply semantics. No
implementation lands with this PR.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 29, 2026

@claude review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Warning

Review limit reached

@bootjp, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 15 minutes and 19 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: be7f1292-bec4-4d39-8be3-57ac4dc001ab

📥 Commits

Reviewing files that changed from the base of the PR and between cbc4c28 and 124a36e.

📒 Files selected for processing (1)
  • docs/design/2026_05_29_proposed_composed1_cross_group_commit_guard.md
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/composed1-cross-group-guard

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude Bot commented May 29, 2026

Claude finished @bootjp's task in 2m 46s —— View job


PR Review: Composed-1 Cross-Group Commit-Time Ownership Guard

Tasks

  • Load tools
  • Read design document
  • Read TLA+ spec (tla/composed/Composed.tla) for cross-check
  • Review existing distribution/routing code for alignment
  • Compile review feedback
  • Post final review

Overall Assessment

This is a well-reasoned forward-looking proposal. The TLA+ alignment is correct, the vacuous-safety argument is sound, and the milestone ordering is clean. Several implementation gaps in the pseudocode would need to be resolved before M3 starts; raising them here so the implementation PR doesn't surface them as surprises.


TLA+ Alignment ✅

The spec's Commit(t, g) precondition (lines 183–184 of Composed.tla) is:

\/ \A k \in txnWriteSet[t] : OwnerAt(txnObservedVer[t], k) = g

The proposed verifyComposed1 in §4.2 is a faithful Go refinement: it iterates r.Mutations, looks up snap.OwnerOf(mut.Key) at observedVer, and rejects if owner != f.shardGroupID. The gap counterexample at doc §1 step 4 is exactly the path TLC surfaces under MCComposed_gap.cfg. The spec correspondence section (§5) is accurate.

The BeginTxn(t) pinning of txnObservedVer[t] = catalogVersion (spec line 145) maps correctly to the proposed coordinator-side RouteEngine.Version() read at transaction start.


Issues

1. f.routes and f.shardGroupID don't exist on kvFSM

kv/fsm.go currently has no field for the route engine or the shard group ID. The distribution.Engine is owned by ShardedCoordinator, not kvFSM. The pseudocode in §4.2 calls f.routes.SnapshotAt(observedVer) and compares against f.shardGroupID, but both require new wiring.

This is not a design flaw, but it is a non-trivial M1/M3 prerequisite that the doc should acknowledge explicitly. The WithHLC/NewKvFSMWithHLC pattern (already in kv/fsm.go per CLAUDE.md) is the right template — a WithRouteEngine(e *distribution.Engine, groupID uint64) option or constructor parameter following the same style would complete M1 wiring. Suggest adding one sentence to M1 calling this out.

Fix this →

2. Proto wire-format change not mentioned in M1

proto/internal.proto Request message (line 37) currently has five fields; observed_route_version doesn't exist. M1 says "encode in pb.Request" but doesn't mention the proto schema change or the protobuf regeneration step. Per CLAUDE.md the toolchain is version-pinned (libprotoc 29.3, protoc-gen-go v1.36.11). This is a concrete M1 deliverable that belongs in the milestone table.

3. Multi-shard (two-phase) transactions not addressed

dispatchTxn handles both single-shard (Phase_NONE / onePhaseTxnRequest) and multi-shard (Phase_PREPARE + Phase_COMMIT) paths. The doc only mentions onePhaseTxnRequest in M1's "Done when" column. For two-phase txns:

  • If verifyComposed1 only runs at Phase_NONE (one-phase path), multi-shard txns bypass the guard entirely.
  • If it runs at Phase_PREPARE, a violation occurs after intent writes have already been persisted; aborting is correct but requires the abort path to clean up intents, which is a more complex error path than the one-phase rejection.
  • If it runs at Phase_COMMIT, the Composed-1 gap is already open (the intent is live, another reader might see the stale ownership).

The TLA+ spec's Commit(t, g) is a single atomic action that covers both paths. The design should clarify which phase(s) carry the verifyComposed1 call and why.

4. Soft-failure silently weakens the safety guarantee

§4.2's SnapshotAt(v) == nil returns nil — the commit proceeds as if unpinned. This is acknowledged in Q3 but appears in the self-review lens 4 (§8, data consistency) only as "The retry path preserves linearisability", without noting that ring exhaustion produces a silent safety bypass.

With a default depth of 32, the ring can be exhausted if: (a) many route changes occur in rapid succession, or (b) a very long-running transaction outlives the ring. For the "ring miss = permit" semantics to be acceptable, the doc should bound the conditions under which this can happen and confirm that "operator-controlled, infrequent" churn keeps the probability negligible. Otherwise Q3's "hard error" option looks more attractive and should be the default.

5. Retry-once in M4 is under-specified for cross-shard txns

"Coordinator re-reads the route cache, re-routes the txn, and retries once" works cleanly for single-shard txns (the whole txn moves to the new owner). For multi-shard txns where some keys stay in g1 and only the moved key now belongs to g2, the retry means rebuilding the entire grouped map[uint64][]Mutation and potentially issuing a new two-phase commit across a different set of groups. A single "retry once" may still fail if another route change lands during the retry. The word "once" is fine as a pragmatic bound; add a note clarifying that a second consecutive ErrComposed1Violation is surfaced as a client error (rather than retried again), which is the correct behavior for an adversarial route-churn scenario.


Minor Points

  • Q1 deprecation timeline is vague. "Log a warning at M3, error at M5" means the grace period for unpinned callers spans all five milestones. Since all adapters (Redis, gRPC, DynamoDB) go through OperationGroup, they would all need to set ObservedRouteVersion before M5 errors. Suggest adding explicit adapter migration notes to M1 ("update adapters to set ObservedRouteVersion") so the error at M5 isn't a surprise.

  • SnapshotAt method name. The design calls this on f.routes but distribution.Engine today has no such method (it stores only the current snapshot in e.routes). M2 adds the ring, but the pseudocode in §4.2 reads as if SnapshotAt is a pre-existing interface. A brief // new method added in M2 note in the pseudocode would clarify this for reviewers.

  • dispatchTxn signature propagation. The call chain Dispatch → dispatchTxn → onePhaseTxnRequest → pb.Request encoding all needs to thread ObservedRouteVersion. The doc mentions adding the field to OperationGroup and encoding it in pb.Request, but not that the intermediate functions (dispatchTxn at sharded_coordinator.go:565, onePhaseTxnRequest at :662) also need new parameters. Fine to defer to the implementation PR, but worth a note.


Checklist per PR test plan

  • §4.2 FSM check matches spec's Commit predicate — confirmed correct (see above).
  • M1+M2 can land independently without behaviour change — confirmed (pure additions, no FSM logic until M3).
  • Three open questions — see comments above: Q3 soft-vs-hard failure warrants stronger framing; Q1 deprecation needs a migration path per adapter.

No code changes in this PR. The proposal is approvable with the understanding that M1's scope description should be expanded to cover FSM wiring and the proto change before the implementation PR opens.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request proposes a design for a cross-group commit-time ownership guard (Composed-1) to prevent lost-write anomalies (G1c) when cross-group route mutations are introduced. The review feedback points out that the proposed verifyComposed1 method relies on fields (routes and shardGroupID) that do not currently exist on the kvFSM struct, requiring structural and constructor updates. Additionally, it is suggested to treat a missing version snapshot as a hard, retryable error rather than a soft failure to strictly guarantee safety under aggressive catalog garbage collection.

Comment on lines +153 to +176
func (f *kvFSM) verifyComposed1(r *pb.Request, observedVer uint64) error {
if observedVer == 0 {
return nil // unpinned — caller opted out (read-only, legacy)
}
snap := f.routes.SnapshotAt(observedVer)
if snap == nil {
// We don't retain this version anymore — the catalog GC
// ran. This is a soft failure: legacy and fits the
// "treat as unpinned" fallback. Logged for observability.
return nil
}
for _, mut := range r.Mutations {
if isTxnInternalKey(mut.Key) {
continue
}
owner := snap.OwnerOf(mut.Key)
if owner != f.shardGroupID {
return errors.Wrapf(ErrComposed1Violation,
"key %q owned by group %d at version %d but committing on group %d",
mut.Key, owner, observedVer, f.shardGroupID)
}
}
return nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The proposed verifyComposed1 method references f.routes and f.shardGroupID, but neither of these fields currently exists on the kvFSM struct in kv/fsm.go.

To implement this:

  1. Extend kvFSM: Add routes (e.g., a history-retaining route engine or snapshot provider) and shardGroupID (the ID of the shard group this FSM belongs to) to the kvFSM struct.
  2. Update Constructor: Update NewKvFSMWithHLC (or introduce a new option/constructor) to initialize these fields.

Without these fields, the proposed FSM-level check cannot be compiled or executed.

// Note: This requires adding routes (history provider) and shardGroupID fields to kvFSM.
func (f *kvFSM) verifyComposed1(r *pb.Request, observedVer uint64) error {
    if observedVer == 0 {
        return nil // unpinned — caller opted out (read-only, legacy)
    }
    snap := f.routes.SnapshotAt(observedVer)
    if snap == nil {
        // We don't retain this version anymore — the catalog GC
        // ran. This is a soft failure: legacy and fits the
        // "treat as unpinned" fallback. Logged for observability.
        return nil
    }
    for _, mut := range r.Mutations {
        if isTxnInternalKey(mut.Key) {
            continue
        }
        owner := snap.OwnerOf(mut.Key)
        if owner != f.shardGroupID {
            return errors.Wrapf(ErrComposed1Violation,
                "key %q owned by group %d at version %d but committing on group %d",
                mut.Key, owner, observedVer, f.shardGroupID)
        }
    }
    return nil
}

Comment on lines +295 to +298
- **Q3.** Should the soft-failure on `SnapshotAt(v) == nil` be a
hard error instead? Today it is "permit and log" because the
catalog GC is operator-controlled and unlikely to drop a recent
version; revisit if this becomes a routine path.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To strictly guarantee the Composed-1 safety invariant and prevent G1c anomalies under all conditions (including long-running transactions or aggressive catalog GC), a missing version snapshot should be treated as a hard error (triggering a transaction abort and retry) rather than a soft failure.

Since the coordinator already handles retries on ErrComposed1Violation by re-reading the catalog and re-routing, treating a missing version as a retryable error is safe, robust, and avoids silent safety degradation.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c7f9f9bf1f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +168 to +170
owner := snap.OwnerOf(mut.Key)
if owner != f.shardGroupID {
return errors.Wrapf(ErrComposed1Violation,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fence late commits to the old owner

In the §3 sequence where a transaction observes k1 on g1, MoveRange changes the current owner to g2, and the transaction still commits on g1, this check passes because SnapshotAt(observedVer).OwnerOf(k1) returns g1. The write is then accepted on the old group, while new readers route to g2 and can miss it, which is the G1c case the proposal says this guard prevents. The design needs an additional fence against route changes after the observed version, or it must specify a move protocol that drains/forwards in-flight old-owner commits.

Useful? React with 👍 / 👎.

Comment on lines +194 to +198
Versions older than the ring fall through to the soft-failure
path (§4.2): logged but not rejected. This matches the design
decision in the partial hotspot-split doc — the catalog is
authoritative at the apply boundary, and stale-version commits
are an acceptable retry pattern.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fail closed when the observed route version is gone

Allowing SnapshotAt(observedVer) == nil to fall through means the guard is bypassed exactly for long-running transactions or high catalog churn that evicts the observed version from the proposed ring. With cross-group movement enabled, a transaction that observed an evicted version can commit to a non-owning group without any ErrComposed1Violation, so the stated Composed-1 protection depends on a best-effort retention depth rather than the invariant. This path should be a hard retry/error or the catalog must pin versions while transactions are in flight.

Useful? React with 👍 / 👎.

…Cd versions (codex P1 / P2 + gemini mediums on PR #870)

Round-2 revision of the Composed-1 cross-group commit-time
ownership guard, addressing four substantive comments on the
first revision of the doc:

1. **codex P1 — late-commit fence to old owner is missing.**

   The first revision proposed only an observed-version check
   `routes[txnObservedVer[t]][k] = txnCommitGroup[t]`. This
   refines the spec's literal `Commit` precondition but does not
   prevent the §3 G1c trace: a txn that observes `routes[v_obs]`
   commits on `g1` (`routes[v_obs][k1] = g1` — observed check
   passes!) while `MoveRange` has shifted `k1` to `g2` at
   `v_obs+1`, leaving readers at `v_obs+1` routed to `g2` and
   blind to the write.

   Added a new §4.4 specifying a **current-version cross-version-
   read fence** that the apply-time FSM also runs alongside the
   observed-version check. The two together cover both the
   spec-level Composed-1 invariant AND the implementation-level
   cross-version-read consistency. Documented the `MoveRange`
   drain protocol as the rejected alternative.

   Updated §3 to call out explicitly that an observed-version-only
   check is insufficient and §5 to clarify that §4.4 is a strictly
   stronger contract than the literal spec predicate (with a
   note that extending `Composed.tla` to model `currentVersion`
   is a tracked follow-up).

2. **gemini medium / codex P2 — fail closed on GCd version.**

   The first revision proposed soft-passing
   `SnapshotAt(observedVer) == nil` (catalog GC dropped the
   version). That bypasses the guard exactly in the cases (long-
   running txns, high catalog churn) where the cross-version-read
   hazard is most likely.

   §4.2 and §4.3 now return a new `ErrComposed1VersionGCd`
   sentinel on the retention miss; the coordinator's retry path
   handles it identically to `ErrComposed1Violation` — re-read
   catalog, re-route, retry. The retention depth becomes a
   *liveness* knob (too small → more retries), not a *safety*
   knob.

3. **gemini medium — kvFSM fields don't exist.**

   The first revision's `verifyComposed1` snippet referenced
   `f.routes` and `f.shardGroupID`, neither of which exist on
   today's `kvFSM` struct.

   Added an explicit prerequisite comment block in §4.2 calling
   out the struct extension, and merged the engine ring and
   `kvFSM` extension into a single M2 milestone (they are
   tightly coupled through the constructor wiring) — was
   previously M2 (engine ring) and an implicit M3 dependency.

4. **Self-review and open-question maintenance.**

   §8 lens 1 updated to describe the three-way guard and credit
   the failing-closed fix. §9 Q2 clarified as a liveness-only
   knob. §9 Q3 replaced — the soft-failure question is resolved
   (hard error), and the new open question is "apply-time fence
   vs. MoveRange drain protocol" pending real operational data
   from the first cross-group RPC.

No code changes. Implementation deliberately not landing with
this PR — the doc is the artefact under review.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 29, 2026

All four review comments addressed in 124a36e:

codex P1 — late commit to old owner. Single biggest finding — the first revision's observed-version-only check refines the spec's literal Commit predicate but does NOT prevent the §3 MoveRange G1c trace (a commit lands on g1 because routes[v_obs][k1] = g1 passes observed-check, while readers at v_obs+1 route to g2 and miss it). Added §4.4 specifying a current-version cross-version-read fence the FSM also runs alongside the observed-version check at apply time. Updated §3 to call out the gap explicitly and §5 to flag the apply-time fence as strictly stronger than the spec predicate (with extending Composed.tla to model currentVersion noted as a follow-up).

gemini medium #2 + codex P2 — fail closed on GCd version. §4.2 / §4.3 now return a new ErrComposed1VersionGCd sentinel on SnapshotAt(observedVer) == nil. Coordinator's retry path handles it identically to ErrComposed1Violation. Retention depth is now a liveness knob, not a safety knob.

gemini medium #1 — kvFSM fields don't exist. Added explicit prerequisite comment block in §4.2 covering the struct extension (routes kv.RouteHistory + shardGroupID uint64). Merged the engine ring and kvFSM extension into a single M2 milestone since they're tightly coupled through constructor wiring.

Self-review / open questions. §8 lens 1 updated to describe the three-way guard. §9 Q2 clarified as liveness-only. §9 Q3 replaced — soft-failure question is resolved (hard error); new Q3 is apply-time fence vs. MoveRange drain protocol, pending real operational data.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 29, 2026

Claude finished @bootjp's task in 3m 39s —— View job


PR Review: Composed-1 Cross-Group Guard (r2)

Tasks

  • Load tools
  • Read updated design document (r2)
  • Cross-check TLA+ spec (tla/composed/Composed.tla)
  • Verify all 4 prior review findings were addressed
  • Check for new issues introduced by r2 amendments
  • Post final review

Overall Assessment

All four findings called out in the r2 commit message are correctly addressed. The fail-closed approach for GC'd versions is the right call, and the new §4.4 correctly identifies the gap in the observed-version-only check. Two concerns remain: one is new and introduced by §4.4's reliance on the watcher; the other carries over from the r1 review (two-phase transactions).


r2 Findings Verified ✅

codex P1 — late commit to old owner. §3 now explicitly flags the insufficiency of the observed-version check for the MoveRange G1c trace, and §4.4 adds the current-version fence. The two-part check (verifyComposed1 calling verifyCurrentVersionOwner) correctly addresses the case where routes[v_obs][k1] = g1 passes but the current catalog has since moved k1 to g2. §5 is accurate that §4.4 is strictly stronger than the spec predicate and that the TLC-verified invariant doesn't cover it yet.

gemini medium #2 + codex P2 — fail closed on GC'd version. ErrComposed1VersionGCd replaces the silent nil return. The §4.3 framing of ring depth as a liveness dial rather than a safety dial is the right framing. Fail-closed is the correct default.

gemini medium #1 — kvFSM fields. The prerequisite block in §4.2 explicitly names the two missing fields and calls out NewKvFSMWithHLC/NewKvFSMWithOptions as the wiring target. M2 merging the engine ring and kvFSM extension is the right call.

Self-review / Q2 / Q3. Lens 1 in §8 correctly describes the three-way guard. Q2 as liveness-only is accurate. Q3 (apply-time fence vs. drain protocol) is now properly framed as an operational decision deferred to when real data is available.


Issues

1. §4.4 has a watcher-propagation lag window (new — introduced in r2)

The verifyCurrentVersionOwner call in §4.4 reads f.routes.Current(). In the implementation, f.routes would be backed by distribution/Engine, which is updated by CatalogWatcher at a poll interval of 100 ms (distribution/watcher.go:11, defaultCatalogWatcherInterval = 100 * time.Millisecond).

The G1c trace §4.4 was designed to prevent can still occur within that 100 ms window:

T_0:    MoveRange commits to default Raft group (catalog: routes[v+1][k1] = g2)
T_0+ε:  T1 is proposed to g1's Raft log (proposed before watcher delivers v+1)
T_0+δ:  g1's FSM applies T1's entry — watcher hasn't fired yet
         → f.routes.Current() still shows routes[v][k1] = g1
         → verifyCurrentVersionOwner passes
         → T1 committed on g1 (lost write)
T_0+100ms: watcher delivers v+1 to the engine
         → new readers route to g2, miss T1's write → G1c

The doc's §4.4 rationale ("the FSM is the single serialisation point that observes the authoritative catalog state via the route-engine watcher") is correct in principle but the via the watcher clause means the FSM sees catalog state with a bounded lag, not instantaneously. The fence closes the gap asymptotically but not completely under the current architecture.

The doc already documents the drain protocol as an alternative (§4.4 last paragraph), and that alternative would close this window (MoveRange quiesces g1's in-flight commits before activating g2 as owner). I'm not asking for the design to change — but the doc should acknowledge the lag window explicitly and explain why it is acceptable or how it is bounded. As written, §4.4 reads as if the fence is complete, when it is actually best-effort within the watcher interval.

Suggested addition (one sentence in §4.4): "The FSM's view of the current catalog version is bounded by the watcher poll interval (defaultCatalogWatcherInterval = 100 ms); within that window a late commit may pass the current-version fence. The drain protocol alternative documented below closes this window entirely at the cost of head-of-line blocking on MoveRange."

Fix this →

2. Two-phase transaction path still unaddressed (carried from r1)

The M3 unit test scenarios in §6 all describe single-shard (one-phase) transactions. The doc doesn't specify whether verifyComposed1 runs at Phase_NONE only or also at Phase_PREPARE / Phase_COMMIT. This matters because:

  • If the check runs only on the one-phase path, cross-shard txns (which use two-phase) bypass the guard entirely.
  • If it runs at Phase_PREPARE, the OCC intent writes have already been persisted when the violation is detected; an abort path is required.
  • If it runs at Phase_COMMIT, the intent is already live and visible to readers.

The TLA+ spec's Commit(t, g) is a single atomic action — the implementation's two-phase path needs a corresponding answer. This doesn't need to be a full design, but M3's "Done when" criteria should state which phase(s) carry the check.


Minor Points

  • Current() method not in M2 scope. M2's "Done when" column mentions SnapshotAt(v) but the new §4.4 also calls f.routes.Current(). The RouteHistory interface needs both methods, and M2 should name both explicitly.

  • Proto wire-format change still absent from M1. proto/internal.proto Request needs an observed_route_version field. M1's "Done when" says "Field round-trips end-to-end" but doesn't mention the schema change or the pinned regeneration step (libprotoc 29.3, protoc-gen-go v1.36.11). Worth one line in the M1 row.

  • Q1 adapter migration path. Q1 says "error at M5" for unpinned callers but no adapter is listed as needing migration. Since all adapters go through OperationGroup, they all need to set ObservedRouteVersion before M5. Adding a migration checklist to M1 (or a separate table) would make the M5 error non-surprising.


Checklist per PR test plan

  • §4.2 FSM check matches spec's Commit predicate — confirmed correct (§4.2(a) refines OwnerAt(txnObservedVer[t], k) = g exactly; §4.4 is a strictly stronger implementation-level addition).
  • M1+M2 can land independently without behaviour change — confirmed (M1 is pure plumbing; M2 adds struct fields and ring with no FSM logic until M3).
  • [~] Three open questions — Q2 (liveness-only) and Q3 (fence vs. drain) are well-framed. Q1 (deprecation) still lacks a per-adapter migration path.

The doc is approvable for a proposed state. The watcher-lag acknowledgment (issue #1) is the one addition that would prevent the §4.4 guarantee from being over-stated when the eventual implementation PR opens.

@bootjp bootjp merged commit 860ea38 into main May 29, 2026
8 of 9 checks passed
@bootjp bootjp deleted the docs/composed1-cross-group-guard branch May 29, 2026 14:56
bootjp added a commit that referenced this pull request May 30, 2026
…w-up) (#878)

## Summary

#870 follow-up: extends the M5 Composed module so the design doc §4.4
apply-time cross-version-read fence is spec-checked by TLC. The first
revision of `tla/composed` proved the spec-level Composed-1 predicate
(commit goes to the owning group at the txn's observed catalog version)
but — as codex P1 surfaced on PR #870 — that predicate alone is **not**
sufficient to prevent the G1c trace where `ProposeRouteChange` moves a
key after `WriteIntent`: observed-version check passes because
`routes[v_obs][k]` still records the old owner, yet readers at `v_obs+1`
route to the new owner and miss the write.

This PR adds the strictly stronger **Composed-1a** invariant:

```
Composed1a_CommitToCurrentOwner ==
  \A t \in CommittedTxns :
    \A k \in txnWriteSet[t] :
      OwnerAt(currentVersionAtCommit[t], k) = txnCommitGroup[t]
```

New state variable `currentVersionAtCommit[t]` records the catalog
version the FSM observed at apply time (set on `Commit`). A new CONSTANT
`EnableCurrentFence` gates the matching Commit-action guard so the
implementation-level fence can be toggled independently of the
spec-level `EnableSafety` guard.

## Three configurations now exercise the module

| Config | Toggles | Outcome |
|---|---|---|
| `MCComposed.cfg` | safety: ON, currentfence: ON | PASS |
| `MCComposed_gap.cfg` | safety: OFF, currentfence: ON | FAIL on
`Composed1_CommitToOwningGroup` (canonical 4-step counterexample) |
| `MCComposed_currentfence_gap.cfg` (**new**) | safety: ON,
currentfence: OFF | FAIL on `Composed1a_CommitToCurrentOwner` — **the
codex P1 trace at depth 5** |

The codex P1 trace TLC found:

1. `BeginTxn(t1)` → `txnObservedVer[t1] = 0`
2. `WriteIntent(t1, k1)`
3. `ProposeRouteChange(k1, g2)` → `catalogVersion = 1`, `routes[1][k1] =
g2`
4. `Commit(t1, g1)` — observed-version check passes (`routes[0][k1] =
g1`); current-fence gate is OFF, so Commit fires
5. Invariant fails: `currentVersionAtCommit[t1] = 1`, `OwnerAt(1, k1) =
g2 ≠ g1`

## Other changes

- `scripts/tla-check.sh` refactored from a single-gap-per-module to a
list-of-gaps-per-module. Per-module entry point `gap_configs_for`
returns `<cfg-stem>|<invariant-string>` lines, one per gap; modules with
a single guard (HLC, OCC, MVCC, Routes) are unchanged.
- `tla/README.md` documents Composed-1a alongside Composed-1 in the
module description and the invariant table.

## Verification

```
$ make tla-check
HLC      safe pass + HLC-4 gap fail
OCC      safe pass + OCC-1 gap fail
MVCC     safe pass + MVCC-4 gap fail
Routes   safe pass + Routes-4 gap fail
Composed safe pass + Composed-1 gap fail + Composed-1a currentfence_gap fail (depth 5)
tla-check: all model-check outcomes match the design contract.
```

11 outcomes total (5 safe PASS + 6 gap FAIL), including the new codex P1
trace.

## Self-review (5 lenses)

1. **Data loss** — out of scope (spec-only).
2. **Concurrency** — Composed-1a captures the concurrency seam between
OCC commit and Routes route mutation that Composed-1 alone missed; the
gap counterexample IS an interleaving.
3. **Performance** — TLC state space grew from 1,684 → 5,170 distinct
states on the safe config (still ~2s wall-clock), well under §8.1's <10
min target.
4. **Data consistency** — Composed-1a is strictly stronger than
Composed-1. Spec → impl correspondence: §4.4 of the design doc.
5. **Test coverage** — `make tla-check` runs the new gap config
automatically; no new Go tests (spec-only PR).

## Test plan

- [x] `make tla-check` PASS for all 11 outcomes
- [x] codex P1 counterexample at depth 5 reproduced on
`MCComposed_currentfence_gap.cfg`
- [ ] Reviewer cross-checks `Composed1a` against §4.4 of the design doc
- [ ] Reviewer confirms `EnableCurrentFence = TRUE` on the safe config
preserves the existing `MCComposed.cfg` outcome (yes — TLC PASS verified
locally)

## Resolves

The "extending `Composed.tla` to model `currentVersion`" follow-up
flagged in §5 of
`docs/design/2026_05_29_proposed_composed1_cross_group_commit_guard.md`.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added a stronger apply-time ownership fence safety property that
verifies committed writes' ownership against the current catalog version
at commit time.

* **Testing**
* Enhanced the verification framework to support multiple gap-check
configurations per module, enabling more comprehensive safety
verification.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/bootjp/elastickv/pull/878?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant