tla: M2 — OCC.tla (OCC-1..OCC-5 + 2 action properties) + module-aware tla-check#858
Conversation
Per §8.1 of docs/design/2026_05_28_partial_tla_safety_spec.md (M2 deliverable): the OCC layer encoded as a Percolator-style 2PC model. New module: - tla/occ/OCC.tla — transaction lifecycle Idle → Active → Prepared → Committed/Aborted, lock map (key, lock_ts) → start_ts, LockResolve action that drains abandoned locks (motivates OCC-4 / OCC-L1). - tla/occ/MCOCC.tla — TLC model module with txn + key symmetry hints. - tla/occ/MCOCC.cfg — safe config (EnableSafety = TRUE; expected PASS). - tla/occ/MCOCC_gap.cfg — Commit's OCC-1 guard removed (expected FAIL on OCC1_CommitTsAboveStart with a counterexample motivating the spec, per the same gap-evidence pattern used for HLC in M1). Invariants encoded: - OCC-1 commit_ts > start_ts for every committed txn. - OCC-2 no write-write conflict on intersecting write sets (distinct commit_ts and one txn started after the other committed). - OCC-3 snapshot-read stability under the lock encoding (reads are blocked by uncommitted locks at lock_ts ≤ start_ts; LockResolve drains them). - OCC-4 no stranded lock at quiescence (bounded-safety form). - OCC-5 start_ts == read_ts for every operation (structural). - OCC5_Action / CommitTsAssignedOnce (PROPERTIES) — start_ts and commit_ts are each assigned once and never updated. Tooling refactor: - scripts/tla-check.sh — module-aware orchestrator. Adding M3..M5 requires only an entry in TLA_MODULES + a `case` line for the gap- invariant string; Makefile target shrinks to one `bash` invocation. - Makefile tla-check now delegates to that script. - .gitignore: TLA artefact glob widened from MCHLC_TTrace_* to MC*_TTrace_* so M2/M3/M4/M5 are covered without further edits. - tla/README.md: M2 status moved from "Not started" to "Landed"; added the OCC module description block + invariant table mirroring the HLC entry. Verified locally with `make tla-check`: - HLC safe: 3,594 distinct states, no errors. - HLC gap: counterexample on HLC4_NoRegressionAcrossTerms at depth 5. - OCC safe: 4,217 distinct states, no errors. - OCC gap: counterexample on OCC1_CommitTsAboveStart at depth ≈ 5.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Warning Review limit reached
More reviews will be available in 28 minutes and 2 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis PR adds an OCC (Optimistic Concurrency Control) specification in TLA+ with five safety invariants modeling 2PC commit flow, generalizes the TLC test infrastructure from hardcoded Makefile rules to a modular Bash script supporting multiple modules, and updates documentation to mark the OCC milestone as complete. ChangesOCC specification and generalized model-check infrastructure
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces the TLA+ specification for the Optimistic Concurrency Control (OCC) layer (Milestone M2), including the core model OCC.tla, model-checking configurations, and a new orchestrator script scripts/tla-check.sh to run TLC checks. Feedback on these changes highlights several critical issues: the LockResolve action in OCC.tla is currently unreachable (dead code) because locks are cleared atomically during commit and abort; a bash scripting bug in tla-check.sh could cause gap checks to silently succeed if a command substitution fails; the OCC3_ReadSnapshotStability invariant is defined too weakly and does not assert the intended equality consistency; and the TypeOK invariant is missing type checks for versions and lockMap.lockTs.
| LockResolve(k) == | ||
| /\ HasLock(k) | ||
| /\ LET t == lockMap[k].owner IN | ||
| /\ txnState[t] \in {"Committed", "Aborted"} | ||
| /\ IF txnState[t] = "Committed" | ||
| THEN versions' = [versions EXCEPT ![k] = @ \cup | ||
| {[commitTs |-> commitTs[t], owner |-> t]}] | ||
| ELSE versions' = versions | ||
| /\ lockMap' = [lockMap EXCEPT ![k] = NoLock] | ||
| /\ UNCHANGED <<tsCounter, txnState, startTs, commitTs, writeSet, | ||
| readObs, opCount>> |
There was a problem hiding this comment.
In the current TLA+ specification, the LockResolve action is completely dead (it can never be enabled in any reachable state).
Why this happens:
Commit(t)(lines 178-179) atomically clears all locks owned bytby setting them toNoLockat the exact same step it transitionstxnState[t]to"Committed".Abort(t)(lines 188-189) also atomically clears all locks owned bytwhen transitioningtxnState[t]to"Aborted".- Consequently, there is no reachable state where
txnState[t] \in {"Committed", "Aborted"}andlockMap[k].owner = tfor any keyk. - Therefore, the guard
txnState[t] \in {"Committed", "Aborted"}inLockResolve(k)(line 202) can never be satisfied, making the entire action unreachable.
Impact:
This means the model does not actually exercise or verify the asynchronous lock resolution protocol (where locks are left behind and resolved by concurrent readers or a background resolver). Instead, locks are always cleaned up atomically during the commit/abort step. This also makes the OCC4_NoStrandedLockAtQuiescence invariant trivially satisfied by the atomic cleanup rather than by the lock resolution logic.
Recommendation:
To properly model Percolator-style 2PC where locks can be orphaned and resolved asynchronously:
- Define a "primary key" for each transaction (or designate one key in the write set as the primary).
- In
Commit(t), only transitiontxnState[t]to"Committed"and write the version/clear the lock for the primary key. - Leave the locks on the secondary keys intact.
- Allow secondary locks to be resolved asynchronously either by a separate commit-secondary action or by the
LockResolveaction. - Alternatively, if this simplified atomic model is intentional for M2 to keep the state space small, this limitation should be explicitly documented in the comments, and
LockResolveshould either be removed or marked as a placeholder for M5/M6.
| mc="$(mc_basename "$module")" | ||
| safe_cfg="${mc}.cfg" | ||
| gap_cfg="${mc}_gap.cfg" | ||
| gap_inv="$(gap_invariant_for "$module")" |
There was a problem hiding this comment.
Because set -e is not set in this script, a failure inside the command substitution gap_inv="$(gap_invariant_for "$module")" will not cause the script to terminate.
If gap_invariant_for hits the wildcard * case and calls exit 64, only the subshell exits. The main script will continue executing with gap_inv set to an empty string. Later, grep -qF "$gap_inv" (line 99) will run with an empty string, which always matches, causing the gap check to silently succeed.
Recommendation:
Check the exit status of the command substitution to ensure errors are caught immediately.
| gap_inv="$(gap_invariant_for "$module")" | |
| if ! gap_inv="$(gap_invariant_for "$module")"; then | |
| overall_rc=1 | |
| continue | |
| fi |
| OCC3_ReadSnapshotStability == | ||
| \A t \in TxnIds, k \in Keys : | ||
| (txnState[t] \in {"Active", "Prepared", "Committed"} /\ readObs[t][k] # 0) => | ||
| readObs[t][k] <= startTs[t] |
There was a problem hiding this comment.
The invariant OCC3_ReadSnapshotStability is defined as readObs[t][k] <= startTs[t], which is virtually identical to (but weaker than) OCC5_StartTsConsistency (lines 294-296).
According to the comment on lines 266-275, this invariant is intended to assert that the recorded observation is consistent with the current visible version:
"We assert the simpler equality form: the recorded observation is consistent with the current visible version..."
However, the actual formula written does not assert equality or consistency with LatestVisible. It only asserts that the observed version's timestamp is bounded by startTs[t].
Recommendation:
To actually assert snapshot stability (i.e., that the read value remains consistent with the latest visible version at the transaction's snapshot timestamp), the invariant should be defined using equality.
OCC3_ReadSnapshotStability ==
\A t \in TxnIds, k \in Keys :
(txnState[t] \in {"Active", "Prepared", "Committed"} /\ readObs[t][k] # 0) =>
readObs[t][k] = LatestVisible(k, startTs[t])
| TypeOK == | ||
| /\ tsCounter \in 0..MaxTs | ||
| /\ txnState \in [TxnIds -> States] | ||
| /\ startTs \in [TxnIds -> 0..MaxTs] | ||
| /\ commitTs \in [TxnIds -> 0..MaxTs] | ||
| /\ writeSet \in [TxnIds -> SUBSET Keys] | ||
| /\ readObs \in [TxnIds -> [Keys -> 0..MaxTs]] | ||
| /\ opCount \in 0..MaxOps | ||
| /\ \A k \in Keys : lockMap[k].owner \in TxnIds \cup {"none"} |
There was a problem hiding this comment.
The TypeOK invariant is missing type checks for the versions variable and the lockTs field of lockMap.
To ensure complete type safety and prevent any unexpected state-space issues during model checking, TypeOK should be updated to cover all variables.
TypeOK ==
/\ tsCounter \in 0..MaxTs
/\ txnState \in [TxnIds -> States]
/\ startTs \in [TxnIds -> 0..MaxTs]
/\ commitTs \in [TxnIds -> 0..MaxTs]
/\ writeSet \in [TxnIds -> SUBSET Keys]
/\ readObs \in [TxnIds -> [Keys -> 0..MaxTs]]
/\ opCount \in 0..MaxOps
/\ \A k \in Keys :
/\ lockMap[k].owner \in TxnIds \cup {"none"}
/\ lockMap[k].lockTs \in 0..MaxTs
/\ versions \in [Keys -> SUBSET [commitTs: 0..MaxTs, owner: TxnIds]]
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tla/README.md (1)
49-60:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate run-behavior wording to match module-aware
tla-check.This section still describes a single module pair (“both configurations” for HLC), but
make tla-checknow runs safe+gap configs per registered module (currently HLC and OCC). Please update wording so operators know to expect multiple module runs.Suggested doc patch
-This runs TLC against both configurations and prints whether the outcome -matches the design contract: +This runs TLC across each registered module, executing both the safe and +gap configurations per module, and prints whether outcomes match the +design contract: -1. `tla/hlc/MCHLC.cfg` — the **correct design**, with HLC-4 +1. `tla/hlc/MCHLC.cfg` — the **correct design**, with HLC-4 preconditions encoded as ASSUMEs / action guards. TLC must finish with no invariant violation. -2. `tla/hlc/MCHLC_gap.cfg` — the **gap configuration**, with the +2. `tla/hlc/MCHLC_gap.cfg` — the **gap configuration**, with the preconditions disabled (`EnableSafety = FALSE`). TLC must produce a counterexample on `HLC4_NoRegressionAcrossTerms` — this is the motivating evidence that strategy (c) handoff + the ceiling fence are necessary. The `tla-check` target inverts the exit code for this run so a TLC FAILURE here counts as PASS for CI. +3. `tla/occ/MCOCC.cfg` / `tla/occ/MCOCC_gap.cfg` follow the same + pattern for OCC (`OCC1_CommitTsAboveStart` in gap mode).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tla/README.md` around lines 49 - 60, Update the README text describing the tla-check behavior to indicate that make tla-check runs the safe and gap configurations for each registered module (not just the HLC pair), e.g. it will run MCHLC.cfg and MCHLC_gap.cfg for HLC and the corresponding safe+gap configs for OCC (and any other registered modules), and that for each module the gap run is expected to produce a counterexample on HLC4_NoRegressionAcrossTerms (or the module's equivalent) with tla-check inverting the exit code so a TLC failure on the gap run counts as a CI pass; replace phrases like “both configurations” and “the HLC” with wording that operators should expect multiple module runs (safe+gap per registered module).
🧹 Nitpick comments (1)
tla/occ/OCC.tla (1)
43-43: ⚡ Quick winFix contradictory
tsCountersemantics in the variable comment.Line 43 says
tsCounteradvances on Tick + Commit, butCommitexplicitly leavestsCounterunchanged (and Lines 157-159 document that correctly). Please align the Line 43 comment to avoid misleading spec reasoning.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tla/occ/OCC.tla` at line 43, The file comment for tsCounter is misleading: update the variable comment for tsCounter (referenced as tsCounter, and the actions Tick and Commit) so it matches the spec semantics documented later—i.e., state that tsCounter is an abstract HLC counter that advances on Tick and is left unchanged by Commit (or simply "advances on Tick only"), ensuring consistency with the Commit definition and lines 157–159.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@tla/README.md`:
- Around line 49-60: Update the README text describing the tla-check behavior to
indicate that make tla-check runs the safe and gap configurations for each
registered module (not just the HLC pair), e.g. it will run MCHLC.cfg and
MCHLC_gap.cfg for HLC and the corresponding safe+gap configs for OCC (and any
other registered modules), and that for each module the gap run is expected to
produce a counterexample on HLC4_NoRegressionAcrossTerms (or the module's
equivalent) with tla-check inverting the exit code so a TLC failure on the gap
run counts as a CI pass; replace phrases like “both configurations” and “the
HLC” with wording that operators should expect multiple module runs (safe+gap
per registered module).
---
Nitpick comments:
In `@tla/occ/OCC.tla`:
- Line 43: The file comment for tsCounter is misleading: update the variable
comment for tsCounter (referenced as tsCounter, and the actions Tick and Commit)
so it matches the spec semantics documented later—i.e., state that tsCounter is
an abstract HLC counter that advances on Tick and is left unchanged by Commit
(or simply "advances on Tick only"), ensuring consistency with the Commit
definition and lines 157–159.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c93b99d4-2903-4597-bd9e-cf234952be18
📒 Files selected for processing (8)
.gitignoreMakefilescripts/tla-check.shtla/README.mdtla/occ/MCOCC.cfgtla/occ/MCOCC.tlatla/occ/MCOCC_gap.cfgtla/occ/OCC.tla
…on removal Four findings from gemini on PR #858 commit 4b8d1b9. - HIGH (OCC.tla LockResolve dead-action): Commit and Abort atomically drain every lock they own, so LockResolve(k) — gated on HasLock(k) /\ txnState[owner] in {Committed, Aborted} — is unreachable in this model. Removed LockResolve entirely. Added a block-comment in its place documenting the M2 abstraction: the atomic-drain subsumes the resolver's effect on the durable state without modelling the asynchronous interleaving. M5 (composed) is the right place to model the real async resolver — cross-module interactions with OCC-3 (lock encoding for reads) and Routes (where the resolver issues secondary RPCs) belong together. - HIGH (scripts/tla-check.sh gap_invariant_for swallowed exit): Without `set -e`, a non-zero exit from gap_invariant_for inside the command substitution only terminated the subshell — the parent loop continued with gap_inv="" and the later `grep -qF "$gap_inv"` matched the empty pattern, silently passing the gap check. Added an explicit `if !` check so a future undeclared module fails the harness instead of fake-passing. - medium (OCC.tla OCC-3 too weak): the previous `readObs[t][k] <= startTs[t]` overlapped with OCC-5 and didn't actually assert snapshot stability. Strengthened to the canonical form `readObs[t][k] = LatestVisible(k, startTs[t])`. Soundness in the safe config now rests on monotonic ts allocation: BeginTxn and Commit both advance tsCounter, so every fresh startTs > all prior startTs, and every commit_ts > all prior startTs. This means no future Commit can produce a commit_ts <= startTs[t] for any t past BeginTxn, so LatestVisible(k, startTs[t]) cannot change after the read — strict equality is preserved. - medium (OCC.tla TypeOK incomplete): extended TypeOK to cover `lockMap[k].lockTs \in 0..MaxTs` and `versions \in [Keys -> SUBSET [commitTs: 0..MaxTs, owner: TxnIds]]`. Without these, a buggy action could smuggle in a malformed lock-ts or version record undetected. Structural changes to support strict OCC-3: - Removed the `Tick` action; tsCounter now advances only as part of BeginTxn and Commit, which makes the monotonic-ts allocation faithful to the real HLC behaviour. This also collapses the safe config's state space from 4,217 to 150 distinct states — the removed nondeterminism was Tick interleaving, not safety-relevant exploration. - BeginTxn now sets `tsCounter' = tsCounter + 1` and `startTs[t] = tsCounter + 1`. - Commit under EnableSafety advances tsCounter and uses the new value as commit_ts (yielding commit_ts > start_ts as a consequence of the allocation rather than an explicit guard); under the gap toggle, Commit borrows the *current* tsCounter without advancing, which is exactly == startTs[t] (no Tick to bump it), violating OCC-1 as the spec doc predicts. - MCOCC.cfg / MCOCC_gap.cfg: MaxTs raised from 4 to 6 to keep the bounded model from saturating tsCounter under auto-advance. Verified `make tla-check`: - HLC safe (3,594 states, unchanged) + HLC gap (HLC-4 ce at depth 5). - OCC safe (150 distinct states, no invariant violation). - OCC gap: OCC1_CommitTsAboveStart counterexample, matches the pinned string scripts/tla-check.sh greps for.
|
Round 1 fixes in
Bumped Verified
CI Spec-only PR, no Go code touched, no caller audit required. @claude review |
|
Claude finished @bootjp's task in 6m 6s —— View job Code Review — Round 2 (after
|
Six stale-documentation findings from claude bot on commit a26f650. All are doc-only drift from round 1's structural changes (Tick removal, LockResolve removal, OCC-3 strengthened to strict equality). OCC.tla: - Module header (lines 6-9) no longer describes the LockResolver action — replaced with a forward reference to the M2 atomic-drain abstraction and the M5 deferral note already living next to where LockResolve used to be. - tsCounter variable comment now says "advances on BeginTxn + Commit" instead of "Tick + Commit". - OCC4_NoStrandedLockAtQuiescence comment no longer says "LockResolve drains them"; instead points at the atomic-drain abstraction and the M5 deferral. tla/README.md: - "Run" section generalised: enumerates safe + gap per module rather than naming MCHLC alone; manual-run snippet now notes the same pattern applies to OCC and future M3/M4/M5 modules. - OCC module description (under "What each module proves") no longer mentions LockResolve; explicit forward reference to the OCC.tla block comment + M5 deferral. - OCC3_ReadSnapshotStability table entry updated to the strict equality form readObs[t][k] = LatestVisible(k, start_ts[t]); the previous commit_ts <= start_ts description matched the pre-round-1 weak form that round 1 strengthened. Verified make tla-check end-to-end still passes: - HLC safe (3,594 states) + HLC gap (HLC-4 ce). - OCC safe (150 states) + OCC gap (OCC-1 ce).
|
Round 2 doc polish in
Skipped (per claude bot's "not strictly required for M2"): the minor SYMMETRY observation that Verified Spec-only doc fixes, no semantics changed. Pre-commit @claude review |
|
Claude finished @bootjp's task in 1m 38s —— View job Code Review — Round 3 (after
|
claude bot round 3 confirmed all 6 round-2 doc fixes landed and the PR is "ready to merge" with one remaining loose end at OCC.tla:116. The ReadKey action comment described the block path as "LockResolve fires first" — but LockResolve was removed in round 1 alongside Tick. With the atomic-drain abstraction, the lock holder's Commit or Abort is what eventually unblocks the read by clearing the lock in the same atomic step. Updated the comment to say so. Verified make tla-check end-to-end unchanged: - HLC safe (3,594 states) + HLC gap (HLC-4 ce). - OCC safe (150 states) + OCC gap (OCC-1 ce).
|
Round 3 fix in
@claude review |
|
Claude finished @bootjp's task in 1m 19s —— View job Code Review — Round 4 (after
|
| Round | Finding | Status |
|---|---|---|
| R1 — Gemini HIGH | LockResolve dead action — removed, block comment added |
✓ |
| R1 — Gemini HIGH | tla-check.sh swallowed subshell exit — if ! gap_inv="$(…)" guard added |
✓ |
| R1 — Gemini MEDIUM | OCC3_ReadSnapshotStability too weak — strengthened to strict equality LatestVisible(k, startTs[t]) |
✓ |
| R1 — Gemini MEDIUM | TypeOK missing lockMap[k].lockTs + versions — both added |
✓ |
| R2 — doc drift | OCC.tla module header still named LockResolve — removed, M5 forward ref added |
✓ |
| R2 — doc drift | OCC.tla:45 tsCounter comment "advances on Tick + Commit" — updated to "BeginTxn + Commit" |
✓ |
| R2 — doc drift | OCC.tla OCC-4 comment "LockResolve drains them" — removed, points to atomic-drain abstraction |
✓ |
| R2 — doc drift | README.md Run section described only HLC configs — generalized to safe+gap per module |
✓ |
| R2 — doc drift | README.md OCC description named LockResolve — replaced with block-comment forward ref |
✓ |
| R2 — doc drift | README.md OCC-3 table entry described old weak <= form — updated to strict equality |
✓ |
| R3 — stale | OCC.tla:116 ReadKey comment "LockResolve fires first" — updated to atomic-drain |
✓ |
| R2 — minor (deferred) | MCOCC.tla symmetry group missing combined txn+key swap element — noted, deferred to M3/M4 |
deferred (acceptable) |
Final correctness re-check (no changes needed)
- OCC-1:
BeginTxn→tsCounter + 1,Commit(safe) →tsCounter + 1. Strict-greater by construction, any interleaving only raises the counter further. Gap config borrowstsCounterwithout advancing, reproducingcommit_ts == start_ts. ✓ - OCC-2:
Preparerequires~HasLock(k)for every write key; two concurrent writers cannot both acquire the same key. ✓ - OCC-3 stability: Monotonic ts allocation in
BeginTxn/Commitguarantees no futureCommitcan producecommit_ts ≤ startTs[t], soLatestVisible(k, startTs[t])is immutable after assignment. ✓ - OCC-4: Trivially satisfied by construction (atomic drain in
Commit/Abort); the block comment at lines 201-224 correctly explains this and defers the async resolver to M5. ✓ - OCC-5:
readObs[t][k] ≤ startTs[t]holds becauseReadKeyassignsLatestVisible(k, startTs[t])and that value is by definition≤ startTs[t]. ✓ tla-check.shexit handling:if ! gap_inv="$(gap_invariant_for "$module")"(line 73) catches subshell exits; gap path captures both output and RC separately; specific invariant-string match prevents parse errors or JVM crashes from silently passing. ✓TypeOK: CoverstsCounter,txnState,startTs,commitTs,writeSet,readObs,opCount,lockMap[k].owner,lockMap[k].lockTs,versions. Complete. ✓
This PR is ready to merge. All findings from Rounds 1-3 are resolved. No new issues found in Round 4.
gemini MEDIUM on PR #862 line 61: without set -e, an exit 64 from mc_basename (added for the routes module) only terminates the subshell. The parent loop would continue with mc=\"\" and the downstream files .tla / .cfg would be syntactically empty file paths that TLC mishandles silently. Same subshell-exit propagation gap that the gap_invariant_for if-check (gemini HIGH on PR #858) closed for the gap-invariant case. Added explicit if-checks at both call sites: - main loop where mc / safe_cfg / gap_cfg are derived - run_tlc inner call (used for both safe and gap runs) Verified make tla-check end-to-end: HLC 3594 / OCC 150 / MVCC 79 / Routes 29 distinct states on safe; all 4 gap configs surface their pinned invariant.
## Summary M3 of the [TLA+ safety spec](https://github.com/bootjp/elastickv/blob/main/docs/design/2026_05_28_partial_tla_safety_spec.md) — the MVCC layer encoded as a per-key version chain with a `Compact` action and a ghost variable that preserves the original `Write` history so MVCC-4 is checkable. Follows PR #858 (M2 OCC spec). ## What lands - `tla/mvcc/MVCC.tla` — `Write(k, v)` allocates a fresh `tsCounter` and appends to both `versions[k]` and the ghost `originalVersions[k]`; `Compact(newMin)` advances the retention watermark and (under `EnableSafety`) retains the largest `commit_ts` below `newMin` per key so reads at `read_ts ≥ minRetained` still see the original value. - `tla/mvcc/MCMVCC.tla` — TLC model module with key + value symmetry. - `tla/mvcc/MCMVCC.cfg` / `MCMVCC_gap.cfg` — safe (PASS) + gap (FAIL on MVCC-4). - `scripts/tla-check.sh` — `TLA_MODULES` gains `"mvcc"`; `gap_invariant_for` gains the pinned MVCC-4 string. - `tla/README.md` — M3 status `Not started` → `Landed`; MVCC module description + invariant table. ## Invariants | # | Statement | Form | |---|---|---| | MVCC-1 | No two distinct version records on the same key share a `commit_ts` | INVARIANT | | MVCC-4 | For every key `k` and every `read_ts ∈ [minRetained, MaxTs]`, `LatestBefore(versions[k], ts) = LatestBefore(originalVersions[k], ts)` | INVARIANT | | `MVCC_TsMonotonic` | `tsCounter` weakly increases across every step | PROPERTY | | `MVCC_GhostMonotonic` | `originalVersions[k]` only grows; `Compact` leaves it `UNCHANGED` | PROPERTY | ## Deferred to M5 - **MVCC-2** (no version below the HLC physical ceiling) is properly an `HLC.tla` property — the ceiling discipline lives there. M5 will check the cross-spec form once `HLC.tla` is `INSTANCE`-d into the composed model. - **MVCC-3** (cross-node read consistency) requires a multi-node model that the single-node M3 spec cannot express. ## Verification (local) ```text $ make tla-check ... TLC: tla/hlc/MCHLC.cfg (correct design, expected PASS) ... 3594 distinct states found, 0 left on queue ... TLC: tla/hlc/MCHLC_gap.cfg (...expected FAIL on HLC4...) OK: MCHLC_gap.cfg failed as designed (Invariant HLC4_NoRegressionAcrossTerms is violated) ... TLC: tla/occ/MCOCC.cfg (correct design, expected PASS) ... 150 distinct states found, 0 left on queue ... TLC: tla/occ/MCOCC_gap.cfg (...expected FAIL on OCC1...) OK: MCOCC_gap.cfg failed as designed (Invariant OCC1_CommitTsAboveStart is violated) ... TLC: tla/mvcc/MCMVCC.cfg (correct design, expected PASS) ... 79 distinct states found, 0 left on queue ... TLC: tla/mvcc/MCMVCC_gap.cfg (...expected FAIL on MVCC4...) OK: MCMVCC_gap.cfg failed as designed (Invariant MVCC4_NoLostCommitOnSnapshotInstall is violated) tla-check: all model-check outcomes match the design contract. ``` The MVCC-4 counterexample is short (depth 3): one `Write` at `commit_ts = 1`, one `Compact(newMin = 2)`, and the read at `read_ts ∈ [2, MaxTs]` now sees `NotFound` where the ghost still shows the `Write`'s value. ## Modelling notes - **`Compact` retention semantics.** Under `EnableSafety` the per-key prune keeps every version with `commit_ts ≥ newMin` AND the single largest version with `commit_ts < newMin` (so any later read in `[newMin, MaxTs]` still sees the same value the original log would have shown). Under the gap toggle, the prune drops every version with `commit_ts < newMin`, including the latest-before-newMin retention. - **`originalVersions` as a ghost.** Carrying every `Write` in a parallel set that `Compact` never touches lets MVCC-4 be a direct equality between the actual store and the canonical "what should be visible" set. It is the simplest faithful encoding of "no commit is lost" for a single-node spec. - **HLC abstraction.** `tsCounter` is a single monotonic counter; M5 will replace it with an `INSTANCE` of `HLC.tla`'s real 48/16 layout. This keeps M3's state space tight (79 distinct states under the default bounds). - **Symmetry.** Keys and values are both symmetric in MVCC.tla because they participate only as indices in the invariants — neither has an ordering. `MCMVCC.tla` declares both via `Permutations`. ## Self-review (5-lens per CLAUDE.md) 1. **Data loss** — exactly the property MVCC-4 captures. Verified by gap-config counterexample. 2. **Concurrency** — single-node M3 model; multi-node interactions deferred to M5. 3. **Performance** — `make tla-check` end-to-end runs in under 5 seconds at default bounds. 4. **Data consistency** — MVCC-1 (uniqueness) + MVCC-4 (preserved visibility) + two PROPERTY transitions. MVCC-2 / MVCC-3 deferred to M5 with rationale. 5. **Test coverage** — no Go tests added (spec-only). `make tla-check` is the new coverage layer. ## Test plan - [ ] `tla-check` CI workflow runs green (watches `tla/**`) - [ ] `tla-spec-ai-review` does NOT fire — spec-only PR - [ ] Reviewer cross-checks MVCC-1 / MVCC-4 invariants against §5.3 of the design doc - [ ] Reviewer runs `make tla-check` and confirms 6 outcomes (HLC / OCC / MVCC safe pass + gap fail) ## Out of scope - MVCC-2 (HLC ceiling discipline) — checked in M5. - MVCC-3 (multi-node read consistency) — checked in M5. - M4 (Routes) and M5 (Composed) — separate future PRs. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added a formal MVCC model capturing per-key version chains, compaction behavior, and snapshot safety checks. * **Documentation** * Marked milestone M3 (MVCC) as landed and expanded docs describing model scope, invariants, modes, and expected gap-counterexample behavior. * **Tests** * Extended verification harness to run MVCC in both safety and gap modes, including an expected counterexample check for the MVCC-4 invariant. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/bootjp/elastickv/pull/861?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…#865) ## Summary M5 of the [TLA+ safety spec](https://github.com/bootjp/elastickv/blob/main/docs/design/2026_05_28_partial_tla_safety_spec.md) — the cross-module safety properties, capping the spec-only roadmap (M1..M5). M1..M4 each modelled a single subsystem in isolation. M5 captures the **seams** between them — the two safety properties that no single-module spec can express. ## What lands - `tla/composed/Composed.tla` — recreates the minimal cross-module state needed for Composed-1 (catalog history `routes[v]`, txn observed catalog version `txnObservedVer[t]`, committing group `txnCommitGroup[t]`) and Composed-3 (monotonic `tsCounter` advancing on every Commit). **Does NOT `INSTANCE` M1..M4** — that would explode the state space well past M5's `<10 min at default bounds` target from §8.1. - `tla/composed/MCComposed.tla` — TLC model with key + group + txn symmetry. - `tla/composed/MCComposed.cfg` / `MCComposed_gap.cfg` — safe (PASS) + gap (FAIL on Composed-1). - `scripts/tla-check.sh` — `TLA_MODULES` gains `"composed"`; `gap_invariant_for` + `mc_basename` gain the corresponding cases. - `tla/README.md` — M5 status `Not started` → `Landed`; Composed module description + invariant table. ## Invariants | # | Statement | Form | |---|---|---| | Composed-1 | Every committed write key was owned by the committing group at the txn's observed catalog version | INVARIANT | | Composed-2 | Vacuously TRUE in this abstraction — `SplitRange` is same-group only per CLAUDE.md | INVARIANT | | Composed-3 | Distinct committed txns have distinct `commit_ts` | INVARIANT | | `Composed_CatalogMonotonic` | `catalogVersion` weakly increases on every step | PROPERTY | | `Composed_TsMonotonic` | `tsCounter` weakly increases on every step | PROPERTY | | `Composed3_TsAction` | Every Commit strictly raises `tsCounter` | PROPERTY | ## Why no `INSTANCE`? Each M1..M4 module tightly bounds its own state for tractable TLC. Bringing all four `INSTANCE`s into one product spec would multiply the state spaces — even a 100x growth lands well past M5's `<10 min at default bounds` target. M5 instead recreates the minimal cross-module state needed to express the two seam invariants. The integration claim is that this projection preserves the invariants the full product would assert; reviewers can sanity-check each Composed action against the corresponding M1..M4 action (e.g., `ProposeRouteChange` mirrors `Routes.tla`, `BeginTxn` + `Commit` mirror `OCC.tla` with a cross-module observed-catalog-version pin). ## Verification (local) ```text $ make tla-check HLC safe 3,594 distinct states + HLC-4 gap ce OCC safe 150 distinct states + OCC-1 gap ce MVCC safe 79 distinct states + MVCC-4 gap ce Routes safe 29 distinct states + Routes-4 gap ce Composed safe 1,684 distinct states + Composed-1 gap ce tla-check: all model-check outcomes match the design contract. ``` The Composed-1 counterexample is the canonical 4-step regression: 1. `BeginTxn(t)` — `txn` observes `catalogVersion = 0`, where `routes[0][k1] = g1`. 2. `WriteIntent(t, k1)`. 3. `ProposeRouteChange(k1, g2)` — `catalogVersion → 1`, `routes[1][k1] = g2`. 4. `Commit(t, g2)` (gap mode) — committing group is `g2`, but `routes[0][k1] = g1 ≠ g2`. **Composed-1 fails.** ## Self-review (5-lens per CLAUDE.md) 1. **Data loss** — out of scope (cross-module invariants check group ownership, not durability). 2. **Concurrency** — Composed-1 is exactly the concurrency seam between OCC and Routes; the gap counterexample is one such interleaving. 3. **Performance** — `make tla-check` end-to-end runs in ~10 s on a dev laptop, well under the 10-min target. 4. **Data consistency** — Composed-1 (cross-module write/route consistency) + Composed-3 (cross-txn ts uniqueness) + 3 PROPERTY transitions. 5. **Test coverage** — no Go tests added (spec-only). `make tla-check` is the coverage layer. ## Roadmap status (post-merge) | Milestone | Status | |---|---| | M1 HLC | Landed (PR #856) | | M2 OCC | Landed (PR #858) | | M3 MVCC | Landed (PR #861) | | M4 Routes | Landed (PR #861 via stacked #862) | | **M5 Composed** | **This PR** | | M6 liveness (OPTIONAL) | Not started — can follow as a separate PR | | M1 Go follow-up (ceiling fence) | Not started — separate Go PR with caller audit | ## Test plan - [ ] `tla-check` CI runs green (watches `tla/**`) - [ ] `tla-spec-ai-review` does NOT fire — spec-only PR - [ ] Reviewer cross-checks Composed-1 / Composed-3 against §5.5 of the design doc - [ ] Reviewer runs `make tla-check` and confirms 10 outcomes (5 safe pass + 5 gap fail) - [ ] Reviewer sanity-checks the `INSTANCE`-free projection claim — each Composed action mirrors the corresponding M1..M4 action ## Out of scope - M6 liveness (OCC-L1, Routes-L1) — separate PR if needed - HLC-4 (iii) ceiling fence Go implementation — separate code PR - Cross-group `SplitRange` in the real implementation (currently same-group only)
Summary
M2 of the TLA+ safety spec — the OCC layer encoded as a Percolator-style 2PC model.
Follows PR #856 (M1 HLC spec).
What lands
tla/occ/OCC.tla— transaction lifecycleIdle → Active → Prepared → Committed/Aborted, lock map(key, lock_ts) → start_ts,LockResolveaction that drains abandoned locks.tla/occ/MCOCC.tla— TLC model module with txn + key symmetry.tla/occ/MCOCC.cfg— safe config (EnableSafety = TRUE; expected PASS).tla/occ/MCOCC_gap.cfg— Commit's OCC-1 guard removed (expected FAIL onOCC1_CommitTsAboveStart).scripts/tla-check.sh— module-aware orchestrator. Adding M3/M4/M5 needs only a new entry inTLA_MODULES+ a one-linecasefor the gap-invariant string.Makefile tla-checkdelegates to the script..gitignoreTLA artefact glob widened fromMCHLC_TTrace_*toMC*_TTrace_*(M2 + M3..M5 covered).tla/README.md: M2 statusNot started→Landed; OCC module description block + invariant table.Invariants encoded
commit_ts(T) > start_ts(T)for every committed txncommit_tsand one started after the other committedstart_ts(T) == read_ts(T)for every operationstart_ts[t]is assigned once at BeginTxn, never updatedcommit_ts[t]is assigned once at Commit, never updatedOCC-L1 (liveness) is deferred to M6 per §8.1.
Verification (local)
Modelling notes
tsCounter, not the full 48/16 HLC. M5 (composed) will INSTANCEHLC.tlafor the real layout. This keeps the M2 state space manageable.Commitborrows ts.commit_ts = tsCounter(does NOT advance). OnlyTickadvances ts, so the bounded model stays inside0..MaxTs. OCC-1'scommit_ts > start_tsthen requires a Tick between BeginTxn and Commit.start_ts = Tsees every version withcommit_ts ≤ T, sostart_ts = commit_ts(earlier)is a valid "later runs in the world after earlier committed" boundary (<=, not<).ENABLEDguard onReadKey— TLC simply does not explore reads in a state where a conflicting lock exists, mirroring the spec's "OCC-3 is asserted only in states where no uncommitted lock at lock_ts ≤ read_ts exists".Self-review (5-lens per CLAUDE.md)
make tla-checkend-to-end runs in well under 5 seconds on a developer laptop at default bounds.make tla-checkis the new coverage layer.Test plan
tla-checkCI workflow runs green (it watchestla/**)tla-spec-ai-reviewdoes NOT fire — this PR touches no Go anchor (correct: spec-only changes don't need an AI impl-divergence review)make tla-checkand confirms 4 outcomes (HLC safe pass, HLC gap fail, OCC safe pass, OCC gap fail)Out of scope
Summary by CodeRabbit
Release Notes
New Features
Documentation
Chores