Skip to content

tla: Composed-1a current-version fence + currentfence_gap (#870 follow-up)#878

Merged
bootjp merged 3 commits into
mainfrom
tla/composed1a-current-version
May 30, 2026
Merged

tla: Composed-1a current-version fence + currentfence_gap (#870 follow-up)#878
bootjp merged 3 commits into
mainfrom
tla/composed1a-current-version

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented May 29, 2026

Summary

#870 follow-up: extends the M5 Composed module so the design doc §4.4 apply-time cross-version-read fence is spec-checked by TLC. The first revision of tla/composed proved the spec-level Composed-1 predicate (commit goes to the owning group at the txn's observed catalog version) but — as codex P1 surfaced on PR #870 — that predicate alone is not sufficient to prevent the G1c trace where ProposeRouteChange moves a key after WriteIntent: observed-version check passes because routes[v_obs][k] still records the old owner, yet readers at v_obs+1 route to the new owner and miss the write.

This PR adds the strictly stronger Composed-1a invariant:

Composed1a_CommitToCurrentOwner ==
  \A t \in CommittedTxns :
    \A k \in txnWriteSet[t] :
      OwnerAt(currentVersionAtCommit[t], k) = txnCommitGroup[t]

New state variable currentVersionAtCommit[t] records the catalog version the FSM observed at apply time (set on Commit). A new CONSTANT EnableCurrentFence gates the matching Commit-action guard so the implementation-level fence can be toggled independently of the spec-level EnableSafety guard.

Three configurations now exercise the module

Config Toggles Outcome
MCComposed.cfg safety: ON, currentfence: ON PASS
MCComposed_gap.cfg safety: OFF, currentfence: ON FAIL on Composed1_CommitToOwningGroup (canonical 4-step counterexample)
MCComposed_currentfence_gap.cfg (new) safety: ON, currentfence: OFF FAIL on Composed1a_CommitToCurrentOwnerthe codex P1 trace at depth 5

The codex P1 trace TLC found:

  1. BeginTxn(t1)txnObservedVer[t1] = 0
  2. WriteIntent(t1, k1)
  3. ProposeRouteChange(k1, g2)catalogVersion = 1, routes[1][k1] = g2
  4. Commit(t1, g1) — observed-version check passes (routes[0][k1] = g1); current-fence gate is OFF, so Commit fires
  5. Invariant fails: currentVersionAtCommit[t1] = 1, OwnerAt(1, k1) = g2 ≠ g1

Other changes

  • scripts/tla-check.sh refactored from a single-gap-per-module to a list-of-gaps-per-module. Per-module entry point gap_configs_for returns <cfg-stem>|<invariant-string> lines, one per gap; modules with a single guard (HLC, OCC, MVCC, Routes) are unchanged.
  • tla/README.md documents Composed-1a alongside Composed-1 in the module description and the invariant table.

Verification

$ make tla-check
HLC      safe pass + HLC-4 gap fail
OCC      safe pass + OCC-1 gap fail
MVCC     safe pass + MVCC-4 gap fail
Routes   safe pass + Routes-4 gap fail
Composed safe pass + Composed-1 gap fail + Composed-1a currentfence_gap fail (depth 5)
tla-check: all model-check outcomes match the design contract.

11 outcomes total (5 safe PASS + 6 gap FAIL), including the new codex P1 trace.

Self-review (5 lenses)

  1. Data loss — out of scope (spec-only).
  2. Concurrency — Composed-1a captures the concurrency seam between OCC commit and Routes route mutation that Composed-1 alone missed; the gap counterexample IS an interleaving.
  3. Performance — TLC state space grew from 1,684 → 5,170 distinct states on the safe config (still ~2s wall-clock), well under §8.1's <10 min target.
  4. Data consistency — Composed-1a is strictly stronger than Composed-1. Spec → impl correspondence: §4.4 of the design doc.
  5. Test coveragemake tla-check runs the new gap config automatically; no new Go tests (spec-only PR).

Test plan

  • make tla-check PASS for all 11 outcomes
  • codex P1 counterexample at depth 5 reproduced on MCComposed_currentfence_gap.cfg
  • Reviewer cross-checks Composed1a against §4.4 of the design doc
  • Reviewer confirms EnableCurrentFence = TRUE on the safe config preserves the existing MCComposed.cfg outcome (yes — TLC PASS verified locally)

Resolves

The "extending Composed.tla to model currentVersion" follow-up flagged in §5 of docs/design/2026_05_29_proposed_composed1_cross_group_commit_guard.md.

Summary by CodeRabbit

  • New Features

    • Added a stronger apply-time ownership fence safety property that verifies committed writes' ownership against the current catalog version at commit time.
  • Testing

    • Enhanced the verification framework to support multiple gap-check configurations per module, enabling more comprehensive safety verification.

Review Change Stack

…w-up)

Extends the M5 Composed module so the design doc §4.4 apply-time
cross-version-read fence is also spec-checked by TLC.  The first
revision of tla/composed proved the spec-level Composed-1 predicate
(commit goes to the owning group at the txn's observed catalog
version) but — as codex P1 surfaced on PR #870 — that predicate is
NOT sufficient to prevent the G1c trace where ProposeRouteChange
moves a key after WriteIntent: observed-version check passes
because routes[v_obs][k] still records the old owner, yet readers
at v_obs+1 route to the new owner and miss the write.

This commit adds the strictly stronger Composed-1a invariant:

  Composed1a_CommitToCurrentOwner ==
    \A t \in CommittedTxns :
      \A k \in txnWriteSet[t] :
        OwnerAt(currentVersionAtCommit[t], k) = txnCommitGroup[t]

The new state variable currentVersionAtCommit[t] records the
catalog version observed by the FSM at apply time (set on Commit).
A new CONSTANT EnableCurrentFence gates the matching Commit-action
guard so the implementation-level fence can be toggled independently
of the spec-level EnableSafety guard.

Three configurations now exercise the module:

  * MCComposed.cfg                  — safe: both guards ON, PASS
  * MCComposed_gap.cfg               — original Composed-1 gap (EnableSafety=FALSE),
                                       FAIL on Composed1_CommitToOwningGroup
  * MCComposed_currentfence_gap.cfg  — NEW Composed-1a gap
                                       (EnableSafety=TRUE, EnableCurrentFence=FALSE),
                                       FAIL on Composed1a_CommitToCurrentOwner —
                                       this is the codex P1 trace surfaced as a
                                       TLC counterexample at depth 5

scripts/tla-check.sh refactored from a single gap-per-module to a
list-of-gaps-per-module so the new config is exercised automatically
alongside the original.  Per-module entry point gap_configs_for
returns "<cfg-stem>|<invariant-string>" lines, one per gap; modules
with a single guard (HLC, OCC, MVCC, Routes) are unaffected.

tla/README.md documents Composed-1a alongside Composed-1 in the
module description and the invariant table.

Verification: `make tla-check` now runs 11 outcomes (5 safe PASS + 6
gap FAIL) and produces the design-contract counterexamples for all
six gaps, including the new codex P1 trace at depth 5 for
Composed1a_CommitToCurrentOwner.

Resolves the "Composed.tla currentVersion extension" follow-up
called out in §5 of
docs/design/2026_05_29_proposed_composed1_cross_group_commit_guard.md.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 29, 2026

@claude review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: edf766f9-c360-4eab-a9c3-a9a3915fd5fd

📥 Commits

Reviewing files that changed from the base of the PR and between 446a60f and 91bf2b9.

📒 Files selected for processing (6)
  • scripts/tla-check.sh
  • tla/README.md
  • tla/composed/Composed.tla
  • tla/composed/MCComposed.cfg
  • tla/composed/MCComposed_currentfence_gap.cfg
  • tla/composed/MCComposed_gap.cfg

📝 Walkthrough

Walkthrough

This PR extends the Composed TLA+ model with a stronger apply-time ownership fence. It introduces a EnableCurrentFence constant and currentVersionAtCommit variable to track catalog version at commit time, adds a new invariant Composed1a_CommitToCurrentOwner, and updates the test harness to support multiple gap configurations per module.

Changes

Current-Version Ownership Fence

Layer / File(s) Summary
Composed Model: Constants and Variables
tla/composed/Composed.tla, tla/README.md
EnableCurrentFence constant gates an additional Commit-time ownership check; currentVersionAtCommit variable records the catalog version at apply time for each transaction. Both are initialized and documented alongside the new Composed-1a milestone.
State Preservation Across Actions
tla/composed/Composed.tla
All non-commit actions (ProposeRouteChange, BeginTxn, WriteIntent, Abort) preserve currentVersionAtCommit through state transitions.
Commit Action and Invariant
tla/composed/Composed.tla
Commit(t, g) records currentVersionAtCommit[t] and optionally enforces ownership against the current catalog version when EnableCurrentFence is true. TypeOK is extended to include the variable's domain. New invariant Composed1a_CommitToCurrentOwner asserts every committed write is owned by the commit group at the recorded apply-time version.
TLC Configurations
tla/composed/MCComposed.cfg, tla/composed/MCComposed_gap.cfg, tla/composed/MCComposed_currentfence_gap.cfg
Main configuration enables the fence and checks the new invariant. Existing gap configuration remains enabled to isolate Composed-1 violations. New gap configuration (MCComposed_currentfence_gap.cfg) disables the fence to isolate Composed-1a violations.
Test Harness: Multi-Entry Gap Configs
scripts/tla-check.sh
Replaces gap_invariant_for() with gap_configs_for() to return one or more `

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • bootjp/elastickv#858: Both PRs modify scripts/tla-check.sh's gap-check orchestration, where this PR's multi-entry gap_configs_for() replaces the framework from PR #858.
  • bootjp/elastickv#861: Both PRs refactor scripts/tla-check.sh's gap-config mapping, but this PR moves to a flexible multi-entry registry while #861 extended the single-entry model to support additional invariants.

Poem

🐰 A fence that sees the catalog whole,
Not just the past that txns stole—
At commit time we capture its view,
Gap configs test both old and new! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main change: adding Composed-1a current-version fence and currentfence_gap configuration as a follow-up to PR #870, directly matching the PR's core objectives of extending the Composed TLA+ module with a new safety invariant and gap configuration.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch tla/composed1a-current-version

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude Bot commented May 29, 2026

Claude finished @bootjp's task in 3m 52s —— View job


Code Review

Tasks:

  • Gather context and understand the request
  • Read changed files
  • Review tla/composed/Composed.tla
  • Review scripts/tla-check.sh
  • Review TLA+ config files
  • Review tla/README.md
  • Cross-check against design doc §4.4

Overall Assessment

This is a well-structured spec-only PR. The new Composed1a_CommitToCurrentOwner invariant is correctly stated, maps precisely to the design doc §4.4 apply-time fence, and the three-config test matrix exercises both guards independently. No correctness bugs found.


tla/composed/Composed.tla

Spec correctness — PASS

The core invariant at line 311–314 is a faithful TLA+ encoding of the §4.4 fence:

Composed1a_CommitToCurrentOwner ==
    \A t \in CommittedTxns :
        \A k \in txnWriteSet[t] :
            OwnerAt(currentVersionAtCommit[t], k) = txnCommitGroup[t]

The guard ↔ invariant correspondence is tight. Commit(t, g) atomically evaluates the guard OwnerAt(catalogVersion, k) = g (line 208) and sets currentVersionAtCommit' = catalogVersion (line 213). Because TLA+ actions are atomic, the same catalogVersion value is used for both; when EnableCurrentFence = TRUE, the invariant holds by construction from the guard.

currentVersionAtCommit initialization (line 118):

/\ currentVersionAtCommit = [t \in TxnIds |-> 0]

Safe: CommittedTxns is empty in Init, so the invariant is vacuously satisfied in the initial state. The default 0 value is unreachable for committed txns.

TypeOK range (line 254):

/\ currentVersionAtCommit \in [TxnIds -> 0..MaxVersions]

Range is correct — 0 for init, up to MaxVersions from Commit.

opCount not advanced in Commit (line 214 — pre-existing):
The UNCHANGED clause includes opCount, meaning Commit doesn't count toward the op budget. This is intentional: opCount bounds "generative" actions (BeginTxn, WriteIntent, ProposeRouteChange) but not terminal ones (Commit, Abort). Not a bug, but worth noting for readers.


Three-config test matrix

The config isolation is correct. The key invariant is:

Config EnableSafety EnableCurrentFence Which guard can be disabled
MCComposed.cfg TRUE TRUE Neither → PASS
MCComposed_gap.cfg FALSE TRUE Observed-version check → Composed-1 FAIL
MCComposed_currentfence_gap.cfg TRUE FALSE Current-version fence → Composed-1a FAIL

MCComposed_gap.cfgComposed1a behaviour worth documenting:

The config comment at lines 28–33 says Composed1a_CommitToCurrentOwner is NOT violated in the Composed-1 gap trace. This is true, but the reason is subtle: with EnableCurrentFence = TRUE, Commit(t, g) can only fire if OwnerAt(catalogVersion, k) = g holds. In the canonical 4-step trace (Commit(t, g2) after ProposeRouteChange(k1, g2)), the current owner IS g2, so Composed-1a passes by construction. It's not that TLC finds Composed-1 "before" Composed-1a in some priority order — rather, Composed-1a cannot fail when EnableCurrentFence = TRUE. The comment is accurate but slightly ambiguous about this distinction. Consider: \* Composed1a holds by construction when EnableCurrentFence = TRUE — the guard prevents any commit where the current owner differs.

MCComposed_currentfence_gap.cfg — trace depth:

The PR description says the gap config produces a "4-step counterexample" and this config produces a "depth 5" trace. Both actually have the same 4 actions from Init (BeginTxn → WriteIntent → ProposeRouteChange → Commit). TLC reports "depth 5" because it counts the initial state as depth 1. The discrepancy is informal phrasing in the PR, not a bug.


scripts/tla-check.sh

Refactoring is correct and backward-compatible. The gap_configs_for function using printf to output stem|invariant lines, consumed by while IFS='|' read -r gap_stem gap_inv, is idiomatic bash. The [ -z "$gap_stem" ] && continue guard handles trailing newlines.

Existing safety invariants preserved: The if ! mc="$(mc_basename "$module")" and if ! gap_list="$(gap_configs_for "$module")" patterns from PR #858/#862 carry forward correctly.

Exit code 64 in gap_configs_for error path (line 53):

exit 64

This exits the subshell from $(gap_configs_for "$module"). The parent check if ! gap_list="$(...)" catches the non-zero return, so the loop correctly skips and sets overall_rc=1. Correct.


Minor observations (non-blocking)

1. tla/README.md M5 status row:

Line 18 still reads:

| M5 | `composed/Composed.tla` (Composed-1 + Composed-3; …) | Landed |

The description block and invariant table (lines 252–301) were updated to mention Composed-1a, but the M5 status-table row was not. Suggest adding "Composed-1a" there for consistency.

Fix this →

2. Design doc §5 open follow-up:

docs/design/2026_05_29_proposed_composed1_cross_group_commit_guard.md §5 documents the Composed-1a extension as a "natural follow-up" open task. This PR delivers that follow-up, but the design doc is unchanged. A brief note in §5 (e.g., "Resolved by PR #878") would close the loop. The doc status stays Proposed because the implementation milestones (M1–M5) remain open — only the spec extension is done, which is correct.

3. Composed.tla module header (lines 32–38):

The module header describes EnableSafety but doesn't mention EnableCurrentFence. The CONSTANTS block (line 62) correctly lists it, but a sentence in the module header describing the new constant's role would help readers who read the spec header first. Minor doc gap.


Correctness claim for the reviewer checklist

Reviewer cross-checks Composed1a against §4.4 of the design doc

Verified. The invariant Composed1a_CommitToCurrentOwner (line 311–314) is the direct TLA+ encoding of the §4.4 predicate: "the FSM also checks the current owner" at apply time. The currentVersionAtCommit[t] variable corresponds to the FSM's f.routes.Current() snapshot version; the guard in Commit (line 207–208) corresponds to verifyCurrentVersionOwner.

Reviewer confirms EnableCurrentFence = TRUE on the safe config preserves the existing MCComposed.cfg outcome

Structurally sound. As noted above, when EnableCurrentFence = TRUE, Composed1a holds by construction from the guard. Adding Composed1a_CommitToCurrentOwner to the INVARIANTS list in MCComposed.cfg can only tighten the check — it cannot cause a spurious failure when the guard is active.


Summary

No correctness bugs. The spec accurately encodes the §4.4 fence, the three-config matrix isolates each guard independently, and the shell script refactoring is clean. Three minor doc suggestions above (M5 table row, design doc §5 note, module header) — none are blockers. Approve when ready.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new TLA+ invariant Composed1a_CommitToCurrentOwner representing an apply-time fence, along with a corresponding gap configuration model to isolate and verify the codex P1 trace. The tla-check.sh script is updated to support running multiple gap configurations per module. Feedback on the script highlights a potential issue where a malformed gap configuration entry with an empty invariant string could lead to silent false positives in the test harness, and suggests adding a validation check to prevent this.

Comment thread scripts/tla-check.sh
Comment on lines +134 to +136
while IFS='|' read -r gap_stem gap_inv; do
[ -z "$gap_stem" ] && continue
gap_cfg="${mc}${gap_stem}.cfg"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If a line in gap_list is malformed or does not contain the | delimiter, read will assign the entire line to gap_stem and leave gap_inv empty. If gap_inv is empty, the subsequent grep -qF "$gap_inv" check will always succeed (since an empty string matches any output), leading to a silent false positive where a run is considered successful even if the expected invariant was not violated.

Adding a validation check to ensure gap_inv is non-empty prevents this failure mode.

Suggested change
while IFS='|' read -r gap_stem gap_inv; do
[ -z "$gap_stem" ] && continue
gap_cfg="${mc}${gap_stem}.cfg"
while IFS='|' read -r gap_stem gap_inv; do
[ -z "$gap_stem" ] && continue
if [ -z "$gap_inv" ]; then
echo "ERROR: Malformed gap config entry for module '${module}': '${gap_stem}' (missing expected invariant string)." >&2
overall_rc=1
continue
fi
gap_cfg="${mc}${gap_stem}.cfg"

bootjp added 2 commits May 30, 2026 02:50
…review on PR #878)

The previous comment said Composed-1a 'is not violated' in the Composed-1 gap trace but did not explain that this holds by construction from EnableCurrentFence=TRUE — the current-version guard in Commit prevents any commit where the current owner differs from the committing group. Updates the inline comment to make this explicit.

Doc-only; no spec or harness changes.
…ini medium on PR #878)

Closes a silent-false-positive class in the gap-config loop introduced
by the multi-gap refactor in this PR's parent commit.

The loop does:
  while IFS='|' read -r gap_stem gap_inv; do
    ...
    if printf '%s\n' "$gap_out" | grep -qF "$gap_inv"; then
      OK
    fi
  done

If a gap_configs_for() entry is malformed (missing the '|' delimiter,
e.g. a typo dropped the invariant string), `read` puts the entire
line in gap_stem and leaves gap_inv empty.  Then `grep -qF ""` matches
every line of TLC output unconditionally, so the gap config 'passes'
even when TLC failed for an unrelated reason — exactly the
silent-false-positive class codex P2 on PR #856 round 2 introduced
the gap-invariant-string match to prevent.

Fix: validate both fields are non-empty before the body runs.  A
trailing blank line (the existing skip case) is still skipped; a
malformed entry now produces a clear ERROR with the stem / inv it
read so the maintainer can fix gap_configs_for() directly.

Verified: `make tla-check` still produces the 11 expected outcomes
(5 safe PASS + 5 gap FAIL + 1 currentfence gap FAIL on Composed-1a)
after the guard.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 30, 2026

Gemini medium addressed:

Validation guard added before the gap-config body runs. A malformed entry (missing the | delimiter) now produces a clear ERROR identifying the offending stem/inv so gap_configs_for() can be fixed directly, instead of silently false-positiving via grep -qF "" matching every line.

Trailing-blank-line skip preserved (the existing [ -z "$gap_stem" ] && [ -z "$gap_inv" ] && continue case).

make tla-check still produces 11 expected outcomes (5 safe + 5 gap + 1 currentfence gap on Composed-1a).

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 30, 2026

Claude finished @bootjp's task in 1m 50s —— View job


Claude finished @bootjp's task ——


Code Review

Tasks:

  • Gather context and understand the request
  • Read scripts/tla-check.sh (post-Gemini-fix state)
  • Verify the Gemini P1 fix is correct and complete
  • Full review of remaining changed files
  • Post findings

Gemini P1 fix — PASS (and improved beyond the suggestion)

The implemented fix at lines 142–150 of scripts/tla-check.sh is correct and strictly stronger than the Gemini suggestion on two axes:

1. Blank-line skip is more precise

Gemini suggested [ -z "$gap_stem" ] && continue, which silently skips any line where the stem is empty — including a genuinely malformed |Invariant X entry (missing stem). The author's fix:

[ -z "$gap_stem" ] && [ -z "$gap_inv" ] && continue

Only skips when both are empty (the trailing blank that <<< "$gap_list" appends). A |Invariant X entry now hits the error branch. ✅

2. Validates both fields

if [ -z "$gap_stem" ] || [ -z "$gap_inv" ]; then

Gemini only checked gap_inv. The || catches stem-only entries too. Error message includes both fields, so the offending entry is immediately locatable. ✅


One stale reference in tla-check.sh (non-blocking)

Line 101 comment still references the old function name:

# so `exit 64` from mc_basename / gap_invariant_for in a command

Should be gap_configs_for. The function was renamed in this PR — the comment body just wasn't updated.

Fix this →


tla/composed/Composed.tla — PASS

The spec changes are correct:

  • currentVersionAtCommit init ([t \in TxnIds |-> 0]): safe — CommittedTxns is empty in Init, so the invariant is vacuously satisfied. The default 0 is unreachable for committed txns.
  • TypeOK range (0..MaxVersions): matches the domain of catalogVersion. ✅
  • Commit atomicity: Both the EnableCurrentFence guard (OwnerAt(catalogVersion, k) = g) and the assignment (currentVersionAtCommit' = catalogVersion) reference the same catalogVersion within the same action. Since TLA+ actions are atomic, no interleaving can split them. ✅
  • UNCHANGED maintenance: All four non-Commit actions (ProposeRouteChange, BeginTxn, WriteIntent, Abort) now include currentVersionAtCommit in their UNCHANGED clauses. ✅
  • Composed1a_CommitToCurrentOwner (lines 303–314): correctly quantifies over CommittedTxns using currentVersionAtCommit[t] (apply-time version) rather than txnObservedVer[t] (BeginTxn snapshot). This is the direct TLA+ encoding of the §4.4 fence. ✅

Config files — PASS

Config EnableSafety EnableCurrentFence Expected TLC outcome Verified
MCComposed.cfg TRUE TRUE PASS (both guards active)
MCComposed_gap.cfg FALSE TRUE FAIL on Composed-1 (current-fence holds by construction)
MCComposed_currentfence_gap.cfg TRUE FALSE FAIL on Composed-1a (codex P1, depth 5)

The comment added to MCComposed_gap.cfg (lines 28–33) correctly explains why Composed-1a cannot fail when EnableCurrentFence = TRUE — the guard construction itself prevents any Commit where the current owner differs, not some TLC priority rule. This closes the ambiguity flagged in the previous review. ✅


tla/README.md — M5 status row still not updated

The previous review noted this and it remains:

| M5 | `composed/Composed.tla` (Composed-1 + Composed-3; …) | Landed |

Composed-1a is now documented in the description block and invariant table, but the M5 row at line 18 still doesn't mention it. Minor cosmetic gap.

Fix this →


Summary

The Gemini medium fix is correct and improves on the original suggestion. The TLA+ spec is sound — the guard↔invariant correspondence is tight, currentVersionAtCommit is correctly maintained across all actions, and the three-config matrix independently isolates each guard. Two non-blocking nits remain: the stale gap_invariant_for comment reference in tla-check.sh and the M5 README row. Approve when ready.

@bootjp bootjp merged commit 958ddcf into main May 30, 2026
10 checks passed
@bootjp bootjp deleted the tla/composed1a-current-version branch May 30, 2026 05:05
bootjp added a commit that referenced this pull request May 31, 2026
Third milestone of the Composed-1 cross-group commit-time guard per
docs/design/2026_05_29_proposed_composed1_cross_group_commit_guard.md
§M3.  Wires the safety property at apply time using the M2
versioned-snapshot ring and the M1 ObservedRouteVersion field.

Stacked on PR #894 (M2 plumbing).

What's enforced:

  (a) Observed-version owner — the spec-level Composed-1 check from
      tla/composed/Composed.tla.  Every write key of a Composed-1-
      pinned txn must be owned by THIS Raft group at the txn's
      observed catalog version (the version it read its read-set
      at, set in M1 via OperationGroup.ObservedRouteVersion).
      Refusal returns ErrComposed1Violation.

  (b) Current-version cross-version-read fence — the §4.4 / §3
      codex P1 trace.  Even when (a) passes, a route shift between
      BeginTxn and Commit can leave the write landing on the OLD
      owner while readers at the new version route to the NEW
      owner and miss the write.  The current-version fence refuses
      that case so M4 retry can re-route.  Refusal returns
      ErrComposed1Violation with a different diagnostic prefix
      ("current-version" vs "observed-version") so the retry path
      can distinguish.

  (c) Retention-miss fail-closed — when the txn's observed version
      has been evicted from the M2 ring (long-running txn or high
      catalog churn), return ErrComposed1VersionGCd.  Per design
      doc §4.3, the not-found semantics is a hard retryable error,
      NOT a soft pass, because soft-pass would let the gate be
      bypassed exactly in the cases where the cross-version-read
      hazard is most likely.

Short-circuits cleanly in three legacy / not-applicable cases:

  * FSM constructed without WithRouteHistory (legacy / test seam):
    routes == nil, gate returns nil.
  * Request carries ObservedRouteVersion == 0 (unpinned — pre-M1
    caller, or 2PC ABORT request that doesn't carry the version):
    gate returns nil.  This is what keeps M3 from regressing every
    caller that has not migrated to set OperationGroup.
    ObservedRouteVersion.
  * Engine.Current returns (zero, false) — engine has no history
    (bare-struct test seam): the (b) fence falls through.

Changes:

  * distribution/engine.go — adds Engine.Current() returning the
    snapshot at the current catalogVersion (used by the M3 fence).
    Adds SetHistoryDepthForTest as the cross-package test seam
    that lets kv-side tests trigger eviction without bypassing
    the package boundary (claude review on PR #894 — fragile-but-
    test-local lock contract documented inline).

  * kv/fsm.go — defines ErrComposed1Violation and
    ErrComposed1VersionGCd sentinels with the wrapped-error
    diagnostic shape M4 retry will use.  Adds verifyComposed1 and
    verifyOwnerFromSnapshot.  Wires verifyComposed1 into
    handleTxnRequest at the top so PREPARE / COMMIT / ABORT /
    NONE all pass through the gate (ABORT carries
    ObservedRouteVersion=0 and falls through naturally).

  * kv/route_history.go — RouteHistory interface gains Current().
    Adapter forwards to distribution.Engine.Current().

Tests (kv/fsm_composed1_test.go):

  * TestVerifyComposed1_StaleObservedVersionWithMovedKeyFails —
    design doc M3 "done when" criterion (i): observed-version
    snapshot resolves the key to a different group than this FSM.
  * TestVerifyComposed1_ObservedVersionOlderThanRingFails —
    criterion (ii): observed version is outside the ring →
    ErrComposed1VersionGCd.  Uses depth=2 via the cross-package
    SetHistoryDepthForTest seam.
  * TestVerifyComposed1_ObservedPassesButCurrentDiffersFails —
    criterion (iii) / the §3 codex P1 trace: observed-version
    check passes (routes[1][k1]=g1) but the current snapshot at
    v=2 has moved k1 to g2 → ErrComposed1Violation with the
    "current-version" diagnostic prefix.
  * TestVerifyComposed1_ObservedVersionZeroSkipsGate — legacy
    caller (ObservedRouteVersion=0) bypasses the gate.
  * TestVerifyComposed1_NilRouteHistorySkipsGate — unwired FSM
    (no WithRouteHistory option) bypasses the gate; matches the
    pre-feature posture.

Verification:
  * go build ./...                            — clean
  * go vet ./...                              — clean
  * go test -race -count=1 ./kv ./distribution — 10.2 s + 1.0 s, pass

Self-review (5 lenses):

  1. Data loss — the gate refuses commits but cannot lose them.
     Every rejection emits a sentinel the M4 coordinator path will
     convert into a successful commit on the correct owner.
  2. Concurrency — the gate runs under the FSM's apply loop (Raft
     applies are serial), so no new lock ordering.  RouteHistory
     reads take the engine's read lock; SnapshotAt and Current
     each acquire and release in a single call.
  3. Performance — two map lookups + a per-mutation linear scan
     of the snapshot's sorted routes per Composed-1-pinned commit.
     OwnerOf short-circuits via the M2 round-1 break optimisation,
     so the per-mutation cost is bounded by the first non-covering
     gap rather than the full routes slice.  Legacy callers
     (ObservedRouteVersion=0) pay one branch per commit and exit.
  4. Data consistency — closes the Composed-1 (observed) and
     Composed-1a (current) gaps end-to-end.  Spec correspondence:
     tla/composed/Composed.tla's Commit precondition is now
     enforced at apply time; the §4.4 fence matches
     Composed1a_CommitToCurrentOwner from PR #878.
  5. Test coverage — 5 new unit tests cover the three design doc
     criteria + the two legacy short-circuit paths.  Existing
     handleTxnRequest tests pass unchanged (their requests carry
     ObservedRouteVersion=0 from M1's behaviour-neutral default).

Next milestone (separate PR per design doc §6): M4 — coordinator
retry path.  When either Composed-1 sentinel returns, the
coordinator re-reads the route cache, re-routes the txn against
the new owning group, and re-issues it once.
bootjp added a commit that referenced this pull request May 31, 2026
Third milestone of the Composed-1 cross-group commit-time guard per
docs/design/2026_05_29_proposed_composed1_cross_group_commit_guard.md
§M3.  Wires the safety property at apply time using the M2
versioned-snapshot ring and the M1 ObservedRouteVersion field.

Stacked on PR #894 (M2 plumbing).

What's enforced:

  (a) Observed-version owner — the spec-level Composed-1 check from
      tla/composed/Composed.tla.  Every write key of a Composed-1-
      pinned txn must be owned by THIS Raft group at the txn's
      observed catalog version (the version it read its read-set
      at, set in M1 via OperationGroup.ObservedRouteVersion).
      Refusal returns ErrComposed1Violation.

  (b) Current-version cross-version-read fence — the §4.4 / §3
      codex P1 trace.  Even when (a) passes, a route shift between
      BeginTxn and Commit can leave the write landing on the OLD
      owner while readers at the new version route to the NEW
      owner and miss the write.  The current-version fence refuses
      that case so M4 retry can re-route.  Refusal returns
      ErrComposed1Violation with a different diagnostic prefix
      ("current-version" vs "observed-version") so the retry path
      can distinguish.

  (c) Retention-miss fail-closed — when the txn's observed version
      has been evicted from the M2 ring (long-running txn or high
      catalog churn), return ErrComposed1VersionGCd.  Per design
      doc §4.3, the not-found semantics is a hard retryable error,
      NOT a soft pass, because soft-pass would let the gate be
      bypassed exactly in the cases where the cross-version-read
      hazard is most likely.

Short-circuits cleanly in three legacy / not-applicable cases:

  * FSM constructed without WithRouteHistory (legacy / test seam):
    routes == nil, gate returns nil.
  * Request carries ObservedRouteVersion == 0 (unpinned — pre-M1
    caller, or 2PC ABORT request that doesn't carry the version):
    gate returns nil.  This is what keeps M3 from regressing every
    caller that has not migrated to set OperationGroup.
    ObservedRouteVersion.
  * Engine.Current returns (zero, false) — engine has no history
    (bare-struct test seam): the (b) fence falls through.

Changes:

  * distribution/engine.go — adds Engine.Current() returning the
    snapshot at the current catalogVersion (used by the M3 fence).
    Adds SetHistoryDepthForTest as the cross-package test seam
    that lets kv-side tests trigger eviction without bypassing
    the package boundary (claude review on PR #894 — fragile-but-
    test-local lock contract documented inline).

  * kv/fsm.go — defines ErrComposed1Violation and
    ErrComposed1VersionGCd sentinels with the wrapped-error
    diagnostic shape M4 retry will use.  Adds verifyComposed1 and
    verifyOwnerFromSnapshot.  Wires verifyComposed1 into
    handleTxnRequest at the top so PREPARE / COMMIT / ABORT /
    NONE all pass through the gate (ABORT carries
    ObservedRouteVersion=0 and falls through naturally).

  * kv/route_history.go — RouteHistory interface gains Current().
    Adapter forwards to distribution.Engine.Current().

Tests (kv/fsm_composed1_test.go):

  * TestVerifyComposed1_StaleObservedVersionWithMovedKeyFails —
    design doc M3 "done when" criterion (i): observed-version
    snapshot resolves the key to a different group than this FSM.
  * TestVerifyComposed1_ObservedVersionOlderThanRingFails —
    criterion (ii): observed version is outside the ring →
    ErrComposed1VersionGCd.  Uses depth=2 via the cross-package
    SetHistoryDepthForTest seam.
  * TestVerifyComposed1_ObservedPassesButCurrentDiffersFails —
    criterion (iii) / the §3 codex P1 trace: observed-version
    check passes (routes[1][k1]=g1) but the current snapshot at
    v=2 has moved k1 to g2 → ErrComposed1Violation with the
    "current-version" diagnostic prefix.
  * TestVerifyComposed1_ObservedVersionZeroSkipsGate — legacy
    caller (ObservedRouteVersion=0) bypasses the gate.
  * TestVerifyComposed1_NilRouteHistorySkipsGate — unwired FSM
    (no WithRouteHistory option) bypasses the gate; matches the
    pre-feature posture.

Verification:
  * go build ./...                            — clean
  * go vet ./...                              — clean
  * go test -race -count=1 ./kv ./distribution — 10.2 s + 1.0 s, pass

Self-review (5 lenses):

  1. Data loss — the gate refuses commits but cannot lose them.
     Every rejection emits a sentinel the M4 coordinator path will
     convert into a successful commit on the correct owner.
  2. Concurrency — the gate runs under the FSM's apply loop (Raft
     applies are serial), so no new lock ordering.  RouteHistory
     reads take the engine's read lock; SnapshotAt and Current
     each acquire and release in a single call.
  3. Performance — two map lookups + a per-mutation linear scan
     of the snapshot's sorted routes per Composed-1-pinned commit.
     OwnerOf short-circuits via the M2 round-1 break optimisation,
     so the per-mutation cost is bounded by the first non-covering
     gap rather than the full routes slice.  Legacy callers
     (ObservedRouteVersion=0) pay one branch per commit and exit.
  4. Data consistency — closes the Composed-1 (observed) and
     Composed-1a (current) gaps end-to-end.  Spec correspondence:
     tla/composed/Composed.tla's Commit precondition is now
     enforced at apply time; the §4.4 fence matches
     Composed1a_CommitToCurrentOwner from PR #878.
  5. Test coverage — 5 new unit tests cover the three design doc
     criteria + the two legacy short-circuit paths.  Existing
     handleTxnRequest tests pass unchanged (their requests carry
     ObservedRouteVersion=0 from M1's behaviour-neutral default).

Next milestone (separate PR per design doc §6): M4 — coordinator
retry path.  When either Composed-1 sentinel returns, the
coordinator re-reads the route cache, re-routes the txn against
the new owning group, and re-issues it once.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant