tla: M2 — OCC.tla (OCC-1..OCC-5 + 2 action properties) + module-aware tla-check by bootjp · Pull Request #858 · bootjp/elastickv

bootjp · 2026-05-28T19:13:03Z

Summary

M2 of the TLA+ safety spec — the OCC layer encoded as a Percolator-style 2PC model.

Follows PR #856 (M1 HLC spec).

What lands

tla/occ/OCC.tla — transaction lifecycle Idle → Active → Prepared → Committed/Aborted, lock map (key, lock_ts) → start_ts, LockResolve action that drains abandoned locks.
tla/occ/MCOCC.tla — TLC model module with txn + key symmetry.
tla/occ/MCOCC.cfg — safe config (EnableSafety = TRUE; expected PASS).
tla/occ/MCOCC_gap.cfg — Commit's OCC-1 guard removed (expected FAIL on OCC1_CommitTsAboveStart).
scripts/tla-check.sh — module-aware orchestrator. Adding M3/M4/M5 needs only a new entry in TLA_MODULES + a one-line case for the gap-invariant string.
Makefile tla-check delegates to the script.
.gitignore TLA artefact glob widened from MCHLC_TTrace_* to MC*_TTrace_* (M2 + M3..M5 covered).
tla/README.md: M2 status Not started → Landed; OCC module description block + invariant table.

Invariants encoded

#	Statement	Form
OCC-1	`commit_ts(T) > start_ts(T)` for every committed txn	INVARIANT
OCC-2	Two committed txns sharing a write key have distinct `commit_ts` and one started after the other committed	INVARIANT
OCC-3	Snapshot-read stability under the lock encoding (reads blocked by conflicting uncommitted locks; LockResolve drains them)	INVARIANT
OCC-4	No stranded lock at quiescence (bounded-safety form)	INVARIANT
OCC-5	`start_ts(T) == read_ts(T)` for every operation	INVARIANT
OCC5_Action	`start_ts[t]` is assigned once at BeginTxn, never updated	PROPERTY
CommitTsAssignedOnce	`commit_ts[t]` is assigned once at Commit, never updated	PROPERTY

OCC-L1 (liveness) is deferred to M6 per §8.1.

Verification (local)

$ make tla-check
... TLC: tla/hlc/MCHLC.cfg  (correct design, expected PASS)
... 3594 distinct states found, 0 left on queue
... TLC: tla/hlc/MCHLC_gap.cfg  (...expected FAIL on HLC4...)
OK: MCHLC_gap.cfg failed as designed
... TLC: tla/occ/MCOCC.cfg  (correct design, expected PASS)
... 4217 distinct states found, 0 left on queue
... TLC: tla/occ/MCOCC_gap.cfg  (...expected FAIL on OCC1...)
OK: MCOCC_gap.cfg failed as designed (Invariant OCC1_CommitTsAboveStart is violated)
tla-check: all model-check outcomes match the design contract.

Modelling notes

HLC abstraction. OCC.tla uses a single global monotonic tsCounter, not the full 48/16 HLC. M5 (composed) will INSTANCE HLC.tla for the real layout. This keeps the M2 state space manageable.
Commit borrows ts. commit_ts = tsCounter (does NOT advance). Only Tick advances ts, so the bounded model stays inside 0..MaxTs. OCC-1's commit_ts > start_ts then requires a Tick between BeginTxn and Commit.
OCC-2 non-strict serialization. A txn with start_ts = T sees every version with commit_ts ≤ T, so start_ts = commit_ts(earlier) is a valid "later runs in the world after earlier committed" boundary (<=, not <).
OCC-3 lock encoding is enforced as an ENABLED guard on ReadKey — TLC simply does not explore reads in a state where a conflicting lock exists, mirroring the spec's "OCC-3 is asserted only in states where no uncommitted lock at lock_ts ≤ read_ts exists".

Self-review (5-lens per CLAUDE.md)

Data loss — n/a, spec-only PR, no persistence paths.
Concurrency / distributed failures — the spec is exactly modelling these. Safe config (4,217 states) explored without invariant violation; gap config produces the documented counterexample.
Performance — make tla-check end-to-end runs in well under 5 seconds on a developer laptop at default bounds.
Data consistency — OCC-1..OCC-5 + the two PROPERTY transitions are the M2 contract from the design doc. The non-strict OCC-2 form is justified in the spec comment.
Test coverage — no Go tests added (spec-only). make tla-check is the new coverage layer.

Test plan

tla-check CI workflow runs green (it watches tla/**)
tla-spec-ai-review does NOT fire — this PR touches no Go anchor (correct: spec-only changes don't need an AI impl-divergence review)
Reviewer cross-checks OCC-1..OCC-5 invariants against §5.2 of the design doc
Reviewer runs make tla-check and confirms 4 outcomes (HLC safe pass, HLC gap fail, OCC safe pass, OCC gap fail)

Out of scope

OCC-L1 (liveness) — deferred to M6.
M3 (MVCC), M4 (Routes), M5 (Composed) — separate future PRs.
Integrating OCC with the real HLC.tla — happens in M5.

Summary by CodeRabbit

Release Notes

New Features
- Added OCC (Optimistic Concurrency Control) specification module with two-phase commit lifecycle, five core safety invariants, and automated model-checking verification suite
Documentation
- Updated project README with detailed OCC module documentation and marked the OCC milestone as completed
Chores
- Refactored verification build infrastructure with improved scripts and updated trace-file ignore patterns

Per §8.1 of docs/design/2026_05_28_partial_tla_safety_spec.md (M2 deliverable): the OCC layer encoded as a Percolator-style 2PC model. New module: - tla/occ/OCC.tla — transaction lifecycle Idle → Active → Prepared → Committed/Aborted, lock map (key, lock_ts) → start_ts, LockResolve action that drains abandoned locks (motivates OCC-4 / OCC-L1). - tla/occ/MCOCC.tla — TLC model module with txn + key symmetry hints. - tla/occ/MCOCC.cfg — safe config (EnableSafety = TRUE; expected PASS). - tla/occ/MCOCC_gap.cfg — Commit's OCC-1 guard removed (expected FAIL on OCC1_CommitTsAboveStart with a counterexample motivating the spec, per the same gap-evidence pattern used for HLC in M1). Invariants encoded: - OCC-1 commit_ts > start_ts for every committed txn. - OCC-2 no write-write conflict on intersecting write sets (distinct commit_ts and one txn started after the other committed). - OCC-3 snapshot-read stability under the lock encoding (reads are blocked by uncommitted locks at lock_ts ≤ start_ts; LockResolve drains them). - OCC-4 no stranded lock at quiescence (bounded-safety form). - OCC-5 start_ts == read_ts for every operation (structural). - OCC5_Action / CommitTsAssignedOnce (PROPERTIES) — start_ts and commit_ts are each assigned once and never updated. Tooling refactor: - scripts/tla-check.sh — module-aware orchestrator. Adding M3..M5 requires only an entry in TLA_MODULES + a `case` line for the gap- invariant string; Makefile target shrinks to one `bash` invocation. - Makefile tla-check now delegates to that script. - .gitignore: TLA artefact glob widened from MCHLC_TTrace_* to MC*_TTrace_* so M2/M3/M4/M5 are covered without further edits. - tla/README.md: M2 status moved from "Not started" to "Landed"; added the OCC module description block + invariant table mirroring the HLC entry. Verified locally with `make tla-check`: - HLC safe: 3,594 distinct states, no errors. - HLC gap: counterexample on HLC4_NoRegressionAcrossTerms at depth 5. - OCC safe: 4,217 distinct states, no errors. - OCC gap: counterexample on OCC1_CommitTsAboveStart at depth ≈ 5.

chatgpt-codex-connector · 2026-05-28T19:13:08Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

coderabbitai · 2026-05-28T19:13:19Z

Warning

Review limit reached

@bootjp, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 28 minutes and 2 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cf59d7e8-ea5e-4ce4-937f-f238602f7309

📥 Commits

Reviewing files that changed from the base of the PR and between 4b8d1b9 and b602383.

📒 Files selected for processing (5)

scripts/tla-check.sh
tla/README.md
tla/occ/MCOCC.cfg
tla/occ/MCOCC_gap.cfg
tla/occ/OCC.tla

📝 Walkthrough

Walkthrough

This PR adds an OCC (Optimistic Concurrency Control) specification in TLA+ with five safety invariants modeling 2PC commit flow, generalizes the TLC test infrastructure from hardcoded Makefile rules to a modular Bash script supporting multiple modules, and updates documentation to mark the OCC milestone as complete.

Changes

OCC specification and generalized model-check infrastructure

Layer / File(s)	Summary
Generalized TLA+ test infrastructure `.gitignore`, `Makefile`, `scripts/tla-check.sh`	TLC orchestration delegates to a new Bash harness (`scripts/tla-check.sh`) that runs safe and gap configs for all modules in `TLA_MODULES` array, enforces exit-code expectations, and validates gap-mode output for module-specific invariant strings. `.gitignore` patterns generalize from `MCHLC_TTrace_` to `MC_TTrace_*` to support all modules.
OCC specification with state, actions, and invariants `tla/occ/OCC.tla`	Defines 2PC optimistic concurrency control layer with transaction lifecycle (Idle→Active→Prepared/Committed/Aborted), per-key versioning, lock maps, and snapshot-read conflict detection. Five OCC safety invariants: commit_ts > start_ts (gated by `EnableSafety`), write-write conflict avoidance, snapshot-read consistency, no stranded locks, and start_ts/read consistency. Actions progress time, begin txns, gate reads on conflicts, buffer writes, prepare/commit with optional strict-greater guard, abort, and resolve orphan locks.
OCC model-check instance and configurations `tla/occ/MCOCC.tla`, `tla/occ/MCOCC.cfg`, `tla/occ/MCOCC_gap.cfg`	`MCOCC.tla` extends OCC with TLC symmetry permutations for transaction IDs and keys. `MCOCC.cfg` (safe mode) verifies all OCC invariants and `OCC5_Action` property. `MCOCC_gap.cfg` disables `EnableSafety` to intentionally violate OCC-1 and validate the harness detects the correct counterexample. Both disable deadlock checking and share constant bounds.
Status and specification documentation `tla/README.md`	Updates project status table marking OCC (M2) as "Landed". Adds "What each module proves" section documenting OCC layer model (2PC lifecycle, five invariants, `EnableSafety` gating), MCOCC model-check setup, and expected gap-mode OCC-1 counterexample.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A rabbit hops through time and key,
With stamps and locks and guarantees,
Five invariants, strong and true—
OCC proves what systems do! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly summarizes the main changes: OCC.tla with five invariants and action properties, plus module-aware tla-check tooling. It matches the core deliverables of M2.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/tla-m2-occ-spec

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces the TLA+ specification for the Optimistic Concurrency Control (OCC) layer (Milestone M2), including the core model OCC.tla, model-checking configurations, and a new orchestrator script scripts/tla-check.sh to run TLC checks. Feedback on these changes highlights several critical issues: the LockResolve action in OCC.tla is currently unreachable (dead code) because locks are cleared atomically during commit and abort; a bash scripting bug in tla-check.sh could cause gap checks to silently succeed if a command substitution fails; the OCC3_ReadSnapshotStability invariant is defined too weakly and does not assert the intended equality consistency; and the TypeOK invariant is missing type checks for versions and lockMap.lockTs.

gemini-code-assist · 2026-05-28T19:14:44Z

+LockResolve(k) ==
+    /\ HasLock(k)
+    /\ LET t == lockMap[k].owner IN
+        /\ txnState[t] \in {"Committed", "Aborted"}
+        /\ IF txnState[t] = "Committed"
+           THEN versions' = [versions EXCEPT ![k] = @ \cup
+                              {[commitTs |-> commitTs[t], owner |-> t]}]
+           ELSE versions' = versions
+        /\ lockMap' = [lockMap EXCEPT ![k] = NoLock]
+    /\ UNCHANGED <<tsCounter, txnState, startTs, commitTs, writeSet,
+                   readObs, opCount>>


In the current TLA+ specification, the LockResolve action is completely dead (it can never be enabled in any reachable state).

Why this happens:

Commit(t) (lines 178-179) atomically clears all locks owned by t by setting them to NoLock at the exact same step it transitions txnState[t] to "Committed".

Abort(t) (lines 188-189) also atomically clears all locks owned by t when transitioning txnState[t] to "Aborted".

Consequently, there is no reachable state where txnState[t] \in {"Committed", "Aborted"} and lockMap[k].owner = t for any key k.

Therefore, the guard txnState[t] \in {"Committed", "Aborted"} in LockResolve(k) (line 202) can never be satisfied, making the entire action unreachable.

Impact:

This means the model does not actually exercise or verify the asynchronous lock resolution protocol (where locks are left behind and resolved by concurrent readers or a background resolver). Instead, locks are always cleaned up atomically during the commit/abort step. This also makes the OCC4_NoStrandedLockAtQuiescence invariant trivially satisfied by the atomic cleanup rather than by the lock resolution logic.

Recommendation:

To properly model Percolator-style 2PC where locks can be orphaned and resolved asynchronously:

Define a "primary key" for each transaction (or designate one key in the write set as the primary).

In Commit(t), only transition txnState[t] to "Committed" and write the version/clear the lock for the primary key.

Leave the locks on the secondary keys intact.

Allow secondary locks to be resolved asynchronously either by a separate commit-secondary action or by the LockResolve action.

Alternatively, if this simplified atomic model is intentional for M2 to keep the state space small, this limitation should be explicitly documented in the comments, and LockResolve should either be removed or marked as a placeholder for M5/M6.

gemini-code-assist · 2026-05-28T19:14:44Z

+    mc="$(mc_basename "$module")"
+    safe_cfg="${mc}.cfg"
+    gap_cfg="${mc}_gap.cfg"
+    gap_inv="$(gap_invariant_for "$module")"


Because set -e is not set in this script, a failure inside the command substitution gap_inv="$(gap_invariant_for "$module")" will not cause the script to terminate.

If gap_invariant_for hits the wildcard * case and calls exit 64, only the subshell exits. The main script will continue executing with gap_inv set to an empty string. Later, grep -qF "$gap_inv" (line 99) will run with an empty string, which always matches, causing the gap check to silently succeed.

Recommendation:

Check the exit status of the command substitution to ensure errors are caught immediately.

Suggested change

gap_inv="$(gap_invariant_for "$module")"

if ! gap_inv="$(gap_invariant_for "$module")"; then

overall_rc=1

continue

fi

gemini-code-assist · 2026-05-28T19:14:44Z

+OCC3_ReadSnapshotStability ==
+    \A t \in TxnIds, k \in Keys :
+        (txnState[t] \in {"Active", "Prepared", "Committed"} /\ readObs[t][k] # 0) =>
+            readObs[t][k] <= startTs[t]


The invariant OCC3_ReadSnapshotStability is defined as readObs[t][k] <= startTs[t], which is virtually identical to (but weaker than) OCC5_StartTsConsistency (lines 294-296).

According to the comment on lines 266-275, this invariant is intended to assert that the recorded observation is consistent with the current visible version:

"We assert the simpler equality form: the recorded observation is consistent with the current visible version..."

However, the actual formula written does not assert equality or consistency with LatestVisible. It only asserts that the observed version's timestamp is bounded by startTs[t].

Recommendation:

To actually assert snapshot stability (i.e., that the read value remains consistent with the latest visible version at the transaction's snapshot timestamp), the invariant should be defined using equality.

OCC3_ReadSnapshotStability == \A t \in TxnIds, k \in Keys : (txnState[t] \in {"Active", "Prepared", "Committed"} /\ readObs[t][k] # 0) => readObs[t][k] = LatestVisible(k, startTs[t])

gemini-code-assist · 2026-05-28T19:14:44Z

+TypeOK ==
+    /\ tsCounter \in 0..MaxTs
+    /\ txnState  \in [TxnIds -> States]
+    /\ startTs   \in [TxnIds -> 0..MaxTs]
+    /\ commitTs  \in [TxnIds -> 0..MaxTs]
+    /\ writeSet  \in [TxnIds -> SUBSET Keys]
+    /\ readObs   \in [TxnIds -> [Keys -> 0..MaxTs]]
+    /\ opCount   \in 0..MaxOps
+    /\ \A k \in Keys : lockMap[k].owner \in TxnIds \cup {"none"}


The TypeOK invariant is missing type checks for the versions variable and the lockTs field of lockMap.

To ensure complete type safety and prevent any unexpected state-space issues during model checking, TypeOK should be updated to cover all variables.

TypeOK == /\ tsCounter \in 0..MaxTs /\ txnState \in [TxnIds -> States] /\ startTs \in [TxnIds -> 0..MaxTs] /\ commitTs \in [TxnIds -> 0..MaxTs] /\ writeSet \in [TxnIds -> SUBSET Keys] /\ readObs \in [TxnIds -> [Keys -> 0..MaxTs]] /\ opCount \in 0..MaxOps /\ \A k \in Keys : /\ lockMap[k].owner \in TxnIds \cup {"none"} /\ lockMap[k].lockTs \in 0..MaxTs /\ versions \in [Keys -> SUBSET [commitTs: 0..MaxTs, owner: TxnIds]]

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tla/README.md (1)

49-60: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update run-behavior wording to match module-aware tla-check.

This section still describes a single module pair (“both configurations” for HLC), but make tla-check now runs safe+gap configs per registered module (currently HLC and OCC). Please update wording so operators know to expect multiple module runs.

Suggested doc patch

-This runs TLC against both configurations and prints whether the outcome
-matches the design contract:
+This runs TLC across each registered module, executing both the safe and
+gap configurations per module, and prints whether outcomes match the
+design contract:
 
-1. `tla/hlc/MCHLC.cfg` — the **correct design**, with HLC-4
+1. `tla/hlc/MCHLC.cfg` — the **correct design**, with HLC-4
    preconditions encoded as ASSUMEs / action guards. TLC must finish
    with no invariant violation.
-2. `tla/hlc/MCHLC_gap.cfg` — the **gap configuration**, with the
+2. `tla/hlc/MCHLC_gap.cfg` — the **gap configuration**, with the
    preconditions disabled (`EnableSafety = FALSE`). TLC must produce a
    counterexample on `HLC4_NoRegressionAcrossTerms` — this is the
    motivating evidence that strategy (c) handoff + the ceiling fence
    are necessary. The `tla-check` target inverts the exit code for
    this run so a TLC FAILURE here counts as PASS for CI.
+3. `tla/occ/MCOCC.cfg` / `tla/occ/MCOCC_gap.cfg` follow the same
+   pattern for OCC (`OCC1_CommitTsAboveStart` in gap mode).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tla/README.md` around lines 49 - 60, Update the README text describing the
tla-check behavior to indicate that make tla-check runs the safe and gap
configurations for each registered module (not just the HLC pair), e.g. it will
run MCHLC.cfg and MCHLC_gap.cfg for HLC and the corresponding safe+gap configs
for OCC (and any other registered modules), and that for each module the gap run
is expected to produce a counterexample on HLC4_NoRegressionAcrossTerms (or the
module's equivalent) with tla-check inverting the exit code so a TLC failure on
the gap run counts as a CI pass; replace phrases like “both configurations” and
“the HLC” with wording that operators should expect multiple module runs
(safe+gap per registered module).

🧹 Nitpick comments (1)

tla/occ/OCC.tla (1)
43-43: ⚡ Quick win

Fix contradictory tsCounter semantics in the variable comment.

Line 43 says tsCounter advances on Tick + Commit, but Commit explicitly leaves tsCounter unchanged (and Lines 157-159 document that correctly). Please align the Line 43 comment to avoid misleading spec reasoning.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tla/occ/OCC.tla` at line 43, The file comment for tsCounter is misleading:
update the variable comment for tsCounter (referenced as tsCounter, and the
actions Tick and Commit) so it matches the spec semantics documented later—i.e.,
state that tsCounter is an abstract HLC counter that advances on Tick and is
left unchanged by Commit (or simply "advances on Tick only"), ensuring
consistency with the Commit definition and lines 157–159.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tla/README.md`:
- Around line 49-60: Update the README text describing the tla-check behavior to
indicate that make tla-check runs the safe and gap configurations for each
registered module (not just the HLC pair), e.g. it will run MCHLC.cfg and
MCHLC_gap.cfg for HLC and the corresponding safe+gap configs for OCC (and any
other registered modules), and that for each module the gap run is expected to
produce a counterexample on HLC4_NoRegressionAcrossTerms (or the module's
equivalent) with tla-check inverting the exit code so a TLC failure on the gap
run counts as a CI pass; replace phrases like “both configurations” and “the
HLC” with wording that operators should expect multiple module runs (safe+gap
per registered module).

---

Nitpick comments:
In `@tla/occ/OCC.tla`:
- Line 43: The file comment for tsCounter is misleading: update the variable
comment for tsCounter (referenced as tsCounter, and the actions Tick and Commit)
so it matches the spec semantics documented later—i.e., state that tsCounter is
an abstract HLC counter that advances on Tick and is left unchanged by Commit
(or simply "advances on Tick only"), ensuring consistency with the Commit
definition and lines 157–159.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c93b99d4-2903-4597-bd9e-cf234952be18

📥 Commits

Reviewing files that changed from the base of the PR and between 0a6cafe and 4b8d1b9.

📒 Files selected for processing (8)

.gitignore
Makefile
scripts/tla-check.sh
tla/README.md
tla/occ/MCOCC.cfg
tla/occ/MCOCC.tla
tla/occ/MCOCC_gap.cfg
tla/occ/OCC.tla

…on removal Four findings from gemini on PR #858 commit 4b8d1b9. - HIGH (OCC.tla LockResolve dead-action): Commit and Abort atomically drain every lock they own, so LockResolve(k) — gated on HasLock(k) /\ txnState[owner] in {Committed, Aborted} — is unreachable in this model. Removed LockResolve entirely. Added a block-comment in its place documenting the M2 abstraction: the atomic-drain subsumes the resolver's effect on the durable state without modelling the asynchronous interleaving. M5 (composed) is the right place to model the real async resolver — cross-module interactions with OCC-3 (lock encoding for reads) and Routes (where the resolver issues secondary RPCs) belong together. - HIGH (scripts/tla-check.sh gap_invariant_for swallowed exit): Without `set -e`, a non-zero exit from gap_invariant_for inside the command substitution only terminated the subshell — the parent loop continued with gap_inv="" and the later `grep -qF "$gap_inv"` matched the empty pattern, silently passing the gap check. Added an explicit `if !` check so a future undeclared module fails the harness instead of fake-passing. - medium (OCC.tla OCC-3 too weak): the previous `readObs[t][k] <= startTs[t]` overlapped with OCC-5 and didn't actually assert snapshot stability. Strengthened to the canonical form `readObs[t][k] = LatestVisible(k, startTs[t])`. Soundness in the safe config now rests on monotonic ts allocation: BeginTxn and Commit both advance tsCounter, so every fresh startTs > all prior startTs, and every commit_ts > all prior startTs. This means no future Commit can produce a commit_ts <= startTs[t] for any t past BeginTxn, so LatestVisible(k, startTs[t]) cannot change after the read — strict equality is preserved. - medium (OCC.tla TypeOK incomplete): extended TypeOK to cover `lockMap[k].lockTs \in 0..MaxTs` and `versions \in [Keys -> SUBSET [commitTs: 0..MaxTs, owner: TxnIds]]`. Without these, a buggy action could smuggle in a malformed lock-ts or version record undetected. Structural changes to support strict OCC-3: - Removed the `Tick` action; tsCounter now advances only as part of BeginTxn and Commit, which makes the monotonic-ts allocation faithful to the real HLC behaviour. This also collapses the safe config's state space from 4,217 to 150 distinct states — the removed nondeterminism was Tick interleaving, not safety-relevant exploration. - BeginTxn now sets `tsCounter' = tsCounter + 1` and `startTs[t] = tsCounter + 1`. - Commit under EnableSafety advances tsCounter and uses the new value as commit_ts (yielding commit_ts > start_ts as a consequence of the allocation rather than an explicit guard); under the gap toggle, Commit borrows the *current* tsCounter without advancing, which is exactly == startTs[t] (no Tick to bump it), violating OCC-1 as the spec doc predicts. - MCOCC.cfg / MCOCC_gap.cfg: MaxTs raised from 4 to 6 to keep the bounded model from saturating tsCounter under auto-advance. Verified `make tla-check`: - HLC safe (3,594 states, unchanged) + HLC gap (HLC-4 ce at depth 5). - OCC safe (150 distinct states, no invariant violation). - OCC gap: OCC1_CommitTsAboveStart counterexample, matches the pinned string scripts/tla-check.sh greps for.

bootjp · 2026-05-29T07:33:43Z

Round 1 fixes in a26f650f. All four gemini findings on 4b8d1b9f:

HIGH (OCC.tla LockResolve dead-action) — Commit and Abort atomically drain locks, so LockResolve(k) (gated on HasLock(k) /\ txnState[owner] ∈ {Committed, Aborted}) was unreachable. Removed LockResolve and added a block comment explaining the M2 abstraction: atomic-drain subsumes the resolver's effect on durable state. M5 (composed) is the right milestone for the real async resolver — cross-module interactions with OCC-3 (lock encoding) and Routes (resolver's secondary RPCs) belong together.
HIGH (scripts/tla-check.sh swallowed exit) — without set -e, a non-zero exit from gap_invariant_for inside the command substitution only terminated the subshell; the parent loop continued with gap_inv="" and the later grep -qF "" always matched, silently passing the gap check. Added an explicit if ! gap_inv="$(...)" check.
medium (OCC.tla OCC-3 too weak) — strengthened to canonical strict equality readObs[t][k] = LatestVisible(k, startTs[t]). To make this hold, restructured BeginTxn and Commit to advance tsCounter (monotonic ts allocation), so no future Commit can produce commit_ts ≤ startTs[t] for any t past BeginTxn. The Tick action was removed (tsCounter now advances only as part of BeginTxn / Commit). Safe config state space collapses from 4,217 to 150 distinct states because the removed nondeterminism was Tick interleaving, not safety-relevant exploration.
medium (OCC.tla TypeOK incomplete) — extended TypeOK to cover lockMap[k].lockTs ∈ 0..MaxTs and versions ∈ [Keys → SUBSET [commitTs: 0..MaxTs, owner: TxnIds]].

Bumped MaxTs from 4 to 6 in both MCOCC.cfg and MCOCC_gap.cfg to keep the bounded model from saturating tsCounter under auto-advance.

Verified make tla-check:

HLC safe (3,594 states, unchanged) + HLC gap (HLC-4 ce at depth 5)
OCC safe (150 distinct states, no violation)
OCC gap: OCC1_CommitTsAboveStart counterexample — matches the pinned string that scripts/tla-check.sh greps for

CI Analyze (go) FAILURE on the prior commit was an infrastructure flake (codeql-action@v4 download failed mid-checkout) — unrelated to spec content.

Spec-only PR, no Go code touched, no caller audit required.

@claude review

claude · 2026-05-29T07:33:56Z

Claude finished @bootjp's task in 6m 6s —— View job

Code Review — Round 2 (after `a26f650f`)

Gather context and understand request
Read tla/occ/OCC.tla
Read scripts/tla-check.sh
Read tla/occ/MCOCC.tla, MCOCC.cfg, MCOCC_gap.cfg
Read Makefile, .gitignore, tla/README.md
Analyze and write review

The round 1 fixes (commit a26f650f) correctly addressed all four Gemini findings. The spec's core correctness is sound. The following observations remain for this review pass.

Stale documentation (fixes recommended before merge)

1. OCC.tla:43 — tsCounter comment still mentions removed Tick action

tsCounter,        \* Nat. Abstract HLC counter; advances on Tick + Commit.

Tick was removed in round 1; the counter now advances in BeginTxn and Commit. Should read:

tsCounter,        \* Nat. Abstract HLC counter; advances on BeginTxn + Commit.

Fix this →

2. OCC.tla:6-9 — module header still references the removed LockResolve action

(* Models the transaction lifecycle Idle -> Active -> Prepared ->          *)
(* Committed/Aborted, the lock-map (key, lock_ts) -> start_ts, and the     *)
(* LockResolve action that turns abandoned locks into versions or         *)
(* clears them.  ...

LockResolve was removed in round 1. The phrase "and the LockResolve action that turns abandoned locks into versions or clears them" should be dropped from the header. The per-action block comment at line 199 already explains the removal correctly.

3. OCC.tla:309 — OCC-4 comment body is stale

\* OCC-4 — no stranded lock at quiescence ... LockResolve drains them.

Last sentence is stale (LockResolve removed). The actual mechanism is now: Commit/Abort atomically drain locks, so at quiescence all locks are already gone.

4. tla/README.md:127-128 — OCC module description references LockResolve

Lines 127-128:

the LockResolve action that turns abandoned locks into versions or clears them

Same stale reference. The block comment in OCC.tla at line 199-221 has the correct explanation; the README should be updated to match.

5. tla/README.md:140 — OCC-3 invariant table describes the old weak form

Current table entry:

| `OCC3_ReadSnapshotStability` | Every read observation's `commit_ts <= start_ts` of the reader (lock-encoded reads only) |

After round 1's fix, the invariant is strict equality: readObs[t][k] = LatestVisible(k, startTs[t]). The description should say "equals LatestVisible(k, startTs[t])" rather than "commit_ts ≤ start_ts".

6. tla/README.md:49-60 — "Run" section still describes only HLC configs

This runs TLC against both configurations…

tla/hlc/MCHLC.cfg — …

tla/hlc/MCHLC_gap.cfg — …

Since tla-check.sh now runs safe+gap for all modules, the section should enumerate HLC and OCC configs (and note that M3/M4/M5 will add more). Also the manual-run snippet below (lines 64-75) only shows the HLC module and should note that the pattern extends to OCC/other modules.

Spec correctness (no issues found)

OCC-1 under EnableSafety: BeginTxn allocates startTs[t] = tsCounter + 1 (post-increment). Commit under EnableSafety allocates commitTs[t] = tsCounter + 1 (another post-increment). So commitTs[t] > startTs[t] holds by construction — any interleaving between BeginTxn and Commit only raises tsCounter further. ✓
OCC-1 gap counterexample: With EnableSafety = FALSE, Commit borrows tsCounter without advancing. If no other BeginTxn fires between BeginTxn(t) and Commit(t), then commitTs[t] = tsCounter = startTs[t], violating OCC-1. TLC finds this short trace. ✓
OCC-3 stability soundness: The comment on lines 292-301 correctly argues that LatestVisible(k, startTs[t]) is stable after allocation because all future Commit invocations produce commitTs = tsCounter + 1 > tsCounter >= startTs[t]. The precondition readObs[t][k] # 0 conflating "not yet read" with "read but saw nothing" is acceptable for the same reason (no future version can appear with commitTs ≤ startTs[t]). ✓
OCC-2 enforcement: Prepare requires ~HasLock(k) for every key in the write set. Two transactions cannot both hold a lock on the same key simultaneously, so concurrent overlapping writes are serialized at Prepare time. OCC-2 invariant correctly reflects this. ✓
OCC-4 trivially satisfied: With atomic lock drain at Commit/Abort, quiescence (all txns terminal) always means no locks remain. The invariant is not vacuous in the spec sense — it is correctly checked — but is guaranteed by construction rather than by a resolver action. The block comment at line 199 explains this well. ✓
TypeOK completeness: Now covers lockMap[k].lockTs ∈ 0..MaxTs and versions ∈ [Keys → SUBSET [commitTs: 0..MaxTs, owner: TxnIds]]. Complete. ✓
tla-check.sh exit handling: The if ! gap_inv="$(gap_invariant_for "$module")" guard (line 73) correctly catches subshell exits. The safe-config path uses if ! run_tlc ...; the gap-config path captures both output and exit code separately. Both patterns are correct. ✓

Minor observation — symmetry group completeness

MCOCC.tla:

TxnSymmetry == Permutations(TxnIds)
KeySymmetry == Permutations(Keys)
Symmetry    == TxnSymmetry \cup KeySymmetry

Permutations({t1, t2}) \cup Permutations({k1, k2}) (after TLC's universe-extension) gives {id, txn-swap, key-swap} — missing the combined txn-swap ∘ key-swap element. The full product group of independent txn- and key-symmetries has 4 elements; the union only generates 3. TLC applies each permutation independently without closing under composition, so the combined equivalence class is never folded. This is safe (no unsound reduction) but means TLC explores a small number of redundant state pairs that differ only by simultaneous txn+key swap. For a 150-state model this is negligible; worth noting for M3/M4 if the state space grows.

The standard fix for TLC is to declare model values as a named SYMMETRY type directly in the .cfg file or to explicitly form the product set in TLA+. Neither is strictly required for M2.

Summary

Round 1 correctly addressed all Gemini HIGH and MEDIUM findings. The spec logic, gap-config counterexample evidence, and harness exit-code handling are all correct. The remaining items above (1-6) are documentation drift from the LockResolve removal and OCC-3 strengthening — I'd recommend fixing the stale comments and README entries before merge since they'll be the primary reference for M3/M4 contributors.

Six stale-documentation findings from claude bot on commit a26f650. All are doc-only drift from round 1's structural changes (Tick removal, LockResolve removal, OCC-3 strengthened to strict equality). OCC.tla: - Module header (lines 6-9) no longer describes the LockResolver action — replaced with a forward reference to the M2 atomic-drain abstraction and the M5 deferral note already living next to where LockResolve used to be. - tsCounter variable comment now says "advances on BeginTxn + Commit" instead of "Tick + Commit". - OCC4_NoStrandedLockAtQuiescence comment no longer says "LockResolve drains them"; instead points at the atomic-drain abstraction and the M5 deferral. tla/README.md: - "Run" section generalised: enumerates safe + gap per module rather than naming MCHLC alone; manual-run snippet now notes the same pattern applies to OCC and future M3/M4/M5 modules. - OCC module description (under "What each module proves") no longer mentions LockResolve; explicit forward reference to the OCC.tla block comment + M5 deferral. - OCC3_ReadSnapshotStability table entry updated to the strict equality form readObs[t][k] = LatestVisible(k, start_ts[t]); the previous commit_ts <= start_ts description matched the pre-round-1 weak form that round 1 strengthened. Verified make tla-check end-to-end still passes: - HLC safe (3,594 states) + HLC gap (HLC-4 ce). - OCC safe (150 states) + OCC gap (OCC-1 ce).

bootjp · 2026-05-29T07:46:50Z

Round 2 doc polish in 372d9ca8. claude bot's review on a26f650f confirmed the spec correctness ("Round 1 correctly addressed all Gemini findings… The spec logic, gap-config counterexample evidence, and harness exit-code handling are all correct") and flagged 6 stale documentation drift from round 1's structural changes (Tick removal, LockResolve removal, OCC-3 strict equality):

OCC.tla:6-9 module header — removed the "LockResolver action that turns abandoned locks into versions" sentence; replaced with a forward reference to the M2 atomic-drain abstraction.
OCC.tla:43 tsCounter variable comment — "advances on Tick + Commit" → "advances on BeginTxn + Commit".
OCC.tla:309 OCC4_NoStrandedLockAtQuiescence comment — removed "LockResolve drains them"; points at the atomic-drain abstraction + M5 deferral.
tla/README.md:49-60 "Run" section — generalised to enumerate safe + gap per module rather than naming MCHLC alone; manual-run snippet now notes the same pattern applies to OCC and future modules.
tla/README.md:127-128 OCC description — removed the LockResolve reference, added forward pointer to the OCC.tla block comment + M5 deferral.
tla/README.md:140 OCC-3 table entry — updated to strict equality readObs[t][k] = LatestVisible(k, start_ts(t)); the previous commit_ts <= start_ts matched the round-0 weak form.

Skipped (per claude bot's "not strictly required for M2"): the minor SYMMETRY observation that Permutations(TxnIds) \cup Permutations(Keys) doesn't fold the combined txn-swap ∘ key-swap element. The reduction is sound (no missed counterexamples); the cost is at most a few redundant state pairs in a 150-state model. Will revisit if M3/M4 state-space growth exposes the inefficiency.

Verified make tla-check: same 4 outcomes (HLC safe 3,594, HLC gap HLC-4 ce, OCC safe 150, OCC gap OCC-1 ce).

Spec-only doc fixes, no semantics changed. Pre-commit make lint (which scans the whole tree) flagged 3 issues in unrelated cmd/dynamodb-legacy-migrate/main.go left over from another local branch; those files are NOT part of this PR.

@claude review

claude · 2026-05-29T07:47:02Z

Claude finished @bootjp's task in 1m 38s —— View job

Code Review — Round 3 (after `372d9ca8`)

Gather context and understand request
Read tla/occ/OCC.tla — verify 3 doc fixes (module header, tsCounter comment, OCC-4 comment)
Read tla/README.md — verify 3 doc fixes (Run section, OCC description, OCC-3 table)
Cross-check all 6 fixes from round 2 report
Check for any remaining issues
Write final review

All six round-2 doc fixes landed correctly. Confirming each:

OCC.tla fixes

Module header (OCC.tla:6-11) — LockResolve description removed; replaced with forward reference to M2 atomic-drain abstraction and M5 deferral. ✓ Correct.
tsCounter variable comment (OCC.tla:45) — Now reads "advances on BeginTxn + Commit". ✓ Matches BeginTxn (tsCounter' = tsCounter + 1 at line 107) and Commit under EnableSafety (line 178).
OCC4_NoStrandedLockAtQuiescence comment (OCC.tla:309-315) — "LockResolve drains them" removed; now correctly points at the atomic-drain abstraction and M5 deferral. ✓

tla/README.md fixes

"Run" section (README.md:49-65) — Generalized to "safe + gap configuration pair for every module declared in scripts/tla-check.sh". Both safe and gap bullet entries now enumerate HLC and OCC examples. Manual-run snippet notes the OCC pattern at line 82-83. ✓
OCC module description (README.md:133-142) — LockResolve mention removed; replaced with forward reference to the block comment in OCC.tla and the M5 deferral. ✓
OCC-3 table entry (README.md:151) — Now reads "Every read observation equals LatestVisible(k, start_ts(t))". ✓ Matches the strict-equality invariant at OCC.tla:304-307.

One remaining minor stale reference

OCC.tla:116 — ReadKey action comment still says "LockResolve fires first":

(* outstanding, the read is blocked (modelled here as the action being    *)
(* disabled; LockResolve fires first).                                    *)

With LockResolve removed, the lock holder's Commit or Abort is what eventually unblocks the read (by draining the lock in that same atomic step). A one-word fix:

(* outstanding, the read is blocked (modelled here as the action being    *)
(* disabled until the lock holder's Commit or Abort drains the lock).     *)

This is a nit — the spec behavior is unambiguous from the actual action guard ~ConflictingLockExists(k, startTs[t]) — so it's fine to fix as a follow-on or fold into a later commit.

Spec correctness (unchanged from round 2 assessment)

All prior correctness findings carry over: OCC-1 through OCC-5 invariant logic is sound, the gap-config counterexample evidence is correct, tla-check.sh exit-code handling is robust, and TypeOK is complete. No new concerns introduced by the round 2 doc-only changes.

Overall: The PR is ready to merge. The one remaining stale phrase at OCC.tla:116 is the only loose end — low enough risk to address in a follow-on or squash before merge per your preference.

claude bot round 3 confirmed all 6 round-2 doc fixes landed and the PR is "ready to merge" with one remaining loose end at OCC.tla:116. The ReadKey action comment described the block path as "LockResolve fires first" — but LockResolve was removed in round 1 alongside Tick. With the atomic-drain abstraction, the lock holder's Commit or Abort is what eventually unblocks the read by clearing the lock in the same atomic step. Updated the comment to say so. Verified make tla-check end-to-end unchanged: - HLC safe (3,594 states) + HLC gap (HLC-4 ce). - OCC safe (150 states) + OCC gap (OCC-1 ce).

bootjp · 2026-05-29T07:54:21Z

Round 3 fix in b6023839. claude bot Round 3 confirmed all 6 round-2 fixes landed and rated the PR "ready to merge" with one remaining loose end at OCC.tla:116 — the ReadKey action comment said "LockResolve fires first" but LockResolve was removed in round 1. Updated to:

the read is blocked (modelled here as the action being disabled until the lock holder's Commit or Abort drains the lock in that same atomic step).

make tla-check end-to-end unchanged (HLC 3,594 / HLC-4 ce / OCC 150 / OCC-1 ce).

@claude review

claude · 2026-05-29T07:54:33Z

Claude finished @bootjp's task in 1m 19s —— View job

Code Review — Round 4 (after `b6023839`)

Read tla/occ/OCC.tla — verify Round 3 fix at lines 112-117 (ReadKey comment)
Final pass over all changed files (scripts/tla-check.sh, tla/README.md, MCOCC.tla)
Write review

Round 3 fix confirmed. OCC.tla:112-117 now reads:

(* ReadKey: snapshot read on key k at startTs[t].  Gated by OCC-3 lock     *)
(* encoding — if any uncommitted lock with lockTs <= startTs[t] is         *)
(* outstanding, the read is blocked (modelled here as the action being    *)
(* disabled until the lock holder's Commit or Abort drains the lock in    *)
(* that same atomic step).                                                 *)

The stale "LockResolve fires first" phrase is gone; the replacement correctly describes the M2 atomic-drain semantics. ✓

Full-history reconciliation (all four rounds)

Round	Finding	Status
R1 — Gemini HIGH	`LockResolve` dead action — removed, block comment added	✓
R1 — Gemini HIGH	`tla-check.sh` swallowed subshell exit — `if ! gap_inv="$(…)"` guard added	✓
R1 — Gemini MEDIUM	`OCC3_ReadSnapshotStability` too weak — strengthened to strict equality `LatestVisible(k, startTs[t])`	✓
R1 — Gemini MEDIUM	`TypeOK` missing `lockMap[k].lockTs` + `versions` — both added	✓
R2 — doc drift	`OCC.tla` module header still named `LockResolve` — removed, M5 forward ref added	✓
R2 — doc drift	`OCC.tla:45` `tsCounter` comment "advances on Tick + Commit" — updated to "BeginTxn + Commit"	✓
R2 — doc drift	`OCC.tla` OCC-4 comment "LockResolve drains them" — removed, points to atomic-drain abstraction	✓
R2 — doc drift	`README.md` Run section described only HLC configs — generalized to safe+gap per module	✓
R2 — doc drift	`README.md` OCC description named `LockResolve` — replaced with block-comment forward ref	✓
R2 — doc drift	`README.md` OCC-3 table entry described old weak `<=` form — updated to strict equality	✓
R3 — stale	`OCC.tla:116` ReadKey comment "LockResolve fires first" — updated to atomic-drain	✓
R2 — minor (deferred)	`MCOCC.tla` symmetry group missing combined txn+key swap element — noted, deferred to M3/M4	deferred (acceptable)

Final correctness re-check (no changes needed)

OCC-1: BeginTxn → tsCounter + 1, Commit (safe) → tsCounter + 1. Strict-greater by construction, any interleaving only raises the counter further. Gap config borrows tsCounter without advancing, reproducing commit_ts == start_ts. ✓
OCC-2: Prepare requires ~HasLock(k) for every write key; two concurrent writers cannot both acquire the same key. ✓
OCC-3 stability: Monotonic ts allocation in BeginTxn / Commit guarantees no future Commit can produce commit_ts ≤ startTs[t], so LatestVisible(k, startTs[t]) is immutable after assignment. ✓
OCC-4: Trivially satisfied by construction (atomic drain in Commit/Abort); the block comment at lines 201-224 correctly explains this and defers the async resolver to M5. ✓
OCC-5: readObs[t][k] ≤ startTs[t] holds because ReadKey assigns LatestVisible(k, startTs[t]) and that value is by definition ≤ startTs[t]. ✓
tla-check.sh exit handling: if ! gap_inv="$(gap_invariant_for "$module")" (line 73) catches subshell exits; gap path captures both output and RC separately; specific invariant-string match prevents parse errors or JVM crashes from silently passing. ✓
TypeOK: Covers tsCounter, txnState, startTs, commitTs, writeSet, readObs, opCount, lockMap[k].owner, lockMap[k].lockTs, versions. Complete. ✓

This PR is ready to merge. All findings from Rounds 1-3 are resolved. No new issues found in Round 4.

gemini MEDIUM on PR #862 line 61: without set -e, an exit 64 from mc_basename (added for the routes module) only terminates the subshell. The parent loop would continue with mc=\"\" and the downstream files .tla / .cfg would be syntactically empty file paths that TLC mishandles silently. Same subshell-exit propagation gap that the gap_invariant_for if-check (gemini HIGH on PR #858) closed for the gap-invariant case. Added explicit if-checks at both call sites: - main loop where mc / safe_cfg / gap_cfg are derived - run_tlc inner call (used for both safe and gap runs) Verified make tla-check end-to-end: HLC 3594 / OCC 150 / MVCC 79 / Routes 29 distinct states on safe; all 4 gap configs surface their pinned invariant.

## Summary M3 of the [TLA+ safety spec](https://github.com/bootjp/elastickv/blob/main/docs/design/2026_05_28_partial_tla_safety_spec.md) — the MVCC layer encoded as a per-key version chain with a `Compact` action and a ghost variable that preserves the original `Write` history so MVCC-4 is checkable. Follows PR #858 (M2 OCC spec). ## What lands - `tla/mvcc/MVCC.tla` — `Write(k, v)` allocates a fresh `tsCounter` and appends to both `versions[k]` and the ghost `originalVersions[k]`; `Compact(newMin)` advances the retention watermark and (under `EnableSafety`) retains the largest `commit_ts` below `newMin` per key so reads at `read_ts ≥ minRetained` still see the original value. - `tla/mvcc/MCMVCC.tla` — TLC model module with key + value symmetry. - `tla/mvcc/MCMVCC.cfg` / `MCMVCC_gap.cfg` — safe (PASS) + gap (FAIL on MVCC-4). - `scripts/tla-check.sh` — `TLA_MODULES` gains `"mvcc"`; `gap_invariant_for` gains the pinned MVCC-4 string. - `tla/README.md` — M3 status `Not started` → `Landed`; MVCC module description + invariant table. ## Invariants | # | Statement | Form | |---|---|---| | MVCC-1 | No two distinct version records on the same key share a `commit_ts` | INVARIANT | | MVCC-4 | For every key `k` and every `read_ts ∈ [minRetained, MaxTs]`, `LatestBefore(versions[k], ts) = LatestBefore(originalVersions[k], ts)` | INVARIANT | | `MVCC_TsMonotonic` | `tsCounter` weakly increases across every step | PROPERTY | | `MVCC_GhostMonotonic` | `originalVersions[k]` only grows; `Compact` leaves it `UNCHANGED` | PROPERTY | ## Deferred to M5 - **MVCC-2** (no version below the HLC physical ceiling) is properly an `HLC.tla` property — the ceiling discipline lives there. M5 will check the cross-spec form once `HLC.tla` is `INSTANCE`-d into the composed model. - **MVCC-3** (cross-node read consistency) requires a multi-node model that the single-node M3 spec cannot express. ## Verification (local) ```text $ make tla-check ... TLC: tla/hlc/MCHLC.cfg (correct design, expected PASS) ... 3594 distinct states found, 0 left on queue ... TLC: tla/hlc/MCHLC_gap.cfg (...expected FAIL on HLC4...) OK: MCHLC_gap.cfg failed as designed (Invariant HLC4_NoRegressionAcrossTerms is violated) ... TLC: tla/occ/MCOCC.cfg (correct design, expected PASS) ... 150 distinct states found, 0 left on queue ... TLC: tla/occ/MCOCC_gap.cfg (...expected FAIL on OCC1...) OK: MCOCC_gap.cfg failed as designed (Invariant OCC1_CommitTsAboveStart is violated) ... TLC: tla/mvcc/MCMVCC.cfg (correct design, expected PASS) ... 79 distinct states found, 0 left on queue ... TLC: tla/mvcc/MCMVCC_gap.cfg (...expected FAIL on MVCC4...) OK: MCMVCC_gap.cfg failed as designed (Invariant MVCC4_NoLostCommitOnSnapshotInstall is violated) tla-check: all model-check outcomes match the design contract. ``` The MVCC-4 counterexample is short (depth 3): one `Write` at `commit_ts = 1`, one `Compact(newMin = 2)`, and the read at `read_ts ∈ [2, MaxTs]` now sees `NotFound` where the ghost still shows the `Write`'s value. ## Modelling notes - **`Compact` retention semantics.** Under `EnableSafety` the per-key prune keeps every version with `commit_ts ≥ newMin` AND the single largest version with `commit_ts < newMin` (so any later read in `[newMin, MaxTs]` still sees the same value the original log would have shown). Under the gap toggle, the prune drops every version with `commit_ts < newMin`, including the latest-before-newMin retention. - **`originalVersions` as a ghost.** Carrying every `Write` in a parallel set that `Compact` never touches lets MVCC-4 be a direct equality between the actual store and the canonical "what should be visible" set. It is the simplest faithful encoding of "no commit is lost" for a single-node spec. - **HLC abstraction.** `tsCounter` is a single monotonic counter; M5 will replace it with an `INSTANCE` of `HLC.tla`'s real 48/16 layout. This keeps M3's state space tight (79 distinct states under the default bounds). - **Symmetry.** Keys and values are both symmetric in MVCC.tla because they participate only as indices in the invariants — neither has an ordering. `MCMVCC.tla` declares both via `Permutations`. ## Self-review (5-lens per CLAUDE.md) 1. **Data loss** — exactly the property MVCC-4 captures. Verified by gap-config counterexample. 2. **Concurrency** — single-node M3 model; multi-node interactions deferred to M5. 3. **Performance** — `make tla-check` end-to-end runs in under 5 seconds at default bounds. 4. **Data consistency** — MVCC-1 (uniqueness) + MVCC-4 (preserved visibility) + two PROPERTY transitions. MVCC-2 / MVCC-3 deferred to M5 with rationale. 5. **Test coverage** — no Go tests added (spec-only). `make tla-check` is the new coverage layer. ## Test plan - [ ] `tla-check` CI workflow runs green (watches `tla/**`) - [ ] `tla-spec-ai-review` does NOT fire — spec-only PR - [ ] Reviewer cross-checks MVCC-1 / MVCC-4 invariants against §5.3 of the design doc - [ ] Reviewer runs `make tla-check` and confirms 6 outcomes (HLC / OCC / MVCC safe pass + gap fail) ## Out of scope - MVCC-2 (HLC ceiling discipline) — checked in M5. - MVCC-3 (multi-node read consistency) — checked in M5. - M4 (Routes) and M5 (Composed) — separate future PRs.  ## Summary by CodeRabbit * **New Features** * Added a formal MVCC model capturing per-key version chains, compaction behavior, and snapshot safety checks. * **Documentation** * Marked milestone M3 (MVCC) as landed and expanded docs describing model scope, invariants, modes, and expected gap-counterexample behavior. * **Tests** * Extended verification harness to run MVCC in both safety and gap modes, including an expected counterexample check for the MVCC-4 invariant.  [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/bootjp/elastickv/pull/861?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

…#865) ## Summary M5 of the [TLA+ safety spec](https://github.com/bootjp/elastickv/blob/main/docs/design/2026_05_28_partial_tla_safety_spec.md) — the cross-module safety properties, capping the spec-only roadmap (M1..M5). M1..M4 each modelled a single subsystem in isolation. M5 captures the **seams** between them — the two safety properties that no single-module spec can express. ## What lands - `tla/composed/Composed.tla` — recreates the minimal cross-module state needed for Composed-1 (catalog history `routes[v]`, txn observed catalog version `txnObservedVer[t]`, committing group `txnCommitGroup[t]`) and Composed-3 (monotonic `tsCounter` advancing on every Commit). **Does NOT `INSTANCE` M1..M4** — that would explode the state space well past M5's `<10 min at default bounds` target from §8.1. - `tla/composed/MCComposed.tla` — TLC model with key + group + txn symmetry. - `tla/composed/MCComposed.cfg` / `MCComposed_gap.cfg` — safe (PASS) + gap (FAIL on Composed-1). - `scripts/tla-check.sh` — `TLA_MODULES` gains `"composed"`; `gap_invariant_for` + `mc_basename` gain the corresponding cases. - `tla/README.md` — M5 status `Not started` → `Landed`; Composed module description + invariant table. ## Invariants | # | Statement | Form | |---|---|---| | Composed-1 | Every committed write key was owned by the committing group at the txn's observed catalog version | INVARIANT | | Composed-2 | Vacuously TRUE in this abstraction — `SplitRange` is same-group only per CLAUDE.md | INVARIANT | | Composed-3 | Distinct committed txns have distinct `commit_ts` | INVARIANT | | `Composed_CatalogMonotonic` | `catalogVersion` weakly increases on every step | PROPERTY | | `Composed_TsMonotonic` | `tsCounter` weakly increases on every step | PROPERTY | | `Composed3_TsAction` | Every Commit strictly raises `tsCounter` | PROPERTY | ## Why no `INSTANCE`? Each M1..M4 module tightly bounds its own state for tractable TLC. Bringing all four `INSTANCE`s into one product spec would multiply the state spaces — even a 100x growth lands well past M5's `<10 min at default bounds` target. M5 instead recreates the minimal cross-module state needed to express the two seam invariants. The integration claim is that this projection preserves the invariants the full product would assert; reviewers can sanity-check each Composed action against the corresponding M1..M4 action (e.g., `ProposeRouteChange` mirrors `Routes.tla`, `BeginTxn` + `Commit` mirror `OCC.tla` with a cross-module observed-catalog-version pin). ## Verification (local) ```text $ make tla-check HLC safe 3,594 distinct states + HLC-4 gap ce OCC safe 150 distinct states + OCC-1 gap ce MVCC safe 79 distinct states + MVCC-4 gap ce Routes safe 29 distinct states + Routes-4 gap ce Composed safe 1,684 distinct states + Composed-1 gap ce tla-check: all model-check outcomes match the design contract. ``` The Composed-1 counterexample is the canonical 4-step regression: 1. `BeginTxn(t)` — `txn` observes `catalogVersion = 0`, where `routes[0][k1] = g1`. 2. `WriteIntent(t, k1)`. 3. `ProposeRouteChange(k1, g2)` — `catalogVersion → 1`, `routes[1][k1] = g2`. 4. `Commit(t, g2)` (gap mode) — committing group is `g2`, but `routes[0][k1] = g1 ≠ g2`. **Composed-1 fails.** ## Self-review (5-lens per CLAUDE.md) 1. **Data loss** — out of scope (cross-module invariants check group ownership, not durability). 2. **Concurrency** — Composed-1 is exactly the concurrency seam between OCC and Routes; the gap counterexample is one such interleaving. 3. **Performance** — `make tla-check` end-to-end runs in ~10 s on a dev laptop, well under the 10-min target. 4. **Data consistency** — Composed-1 (cross-module write/route consistency) + Composed-3 (cross-txn ts uniqueness) + 3 PROPERTY transitions. 5. **Test coverage** — no Go tests added (spec-only). `make tla-check` is the coverage layer. ## Roadmap status (post-merge) | Milestone | Status | |---|---| | M1 HLC | Landed (PR #856) | | M2 OCC | Landed (PR #858) | | M3 MVCC | Landed (PR #861) | | M4 Routes | Landed (PR #861 via stacked #862) | | **M5 Composed** | **This PR** | | M6 liveness (OPTIONAL) | Not started — can follow as a separate PR | | M1 Go follow-up (ceiling fence) | Not started — separate Go PR with caller audit | ## Test plan - [ ] `tla-check` CI runs green (watches `tla/**`) - [ ] `tla-spec-ai-review` does NOT fire — spec-only PR - [ ] Reviewer cross-checks Composed-1 / Composed-3 against §5.5 of the design doc - [ ] Reviewer runs `make tla-check` and confirms 10 outcomes (5 safe pass + 5 gap fail) - [ ] Reviewer sanity-checks the `INSTANCE`-free projection claim — each Composed action mirrors the corresponding M1..M4 action ## Out of scope - M6 liveness (OCC-L1, Routes-L1) — separate PR if needed - HLC-4 (iii) ceiling fence Go implementation — separate code PR - Cross-group `SplitRange` in the real implementation (currently same-group only)

gemini-code-assist Bot reviewed May 28, 2026

View reviewed changes

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

bootjp merged commit ebce174 into main May 29, 2026
10 checks passed

bootjp deleted the feat/tla-m2-occ-spec branch May 29, 2026 08:09

bootjp mentioned this pull request May 29, 2026

tla: M3 — MVCC.tla (MVCC-1 + MVCC-4) + mvcc wired into tla-check #861

Merged

4 tasks

claude Bot mentioned this pull request May 29, 2026

tla: M4 — Routes.tla (Routes-1 + Routes-4) + routes wired into tla-check #862

Merged

4 tasks

bootjp mentioned this pull request May 29, 2026

tla: M5 — Composed.tla (Composed-1 + Composed-3) wired into tla-check #865

Merged

5 tasks

claude Bot mentioned this pull request May 29, 2026

tla: Composed-1a current-version fence + currentfence_gap (#870 follow-up) #878

Merged

4 tasks

coderabbitai Bot mentioned this pull request May 30, 2026

tla: M6 liveness — OCC-L1 + Routes-L1 + tla-check.sh live-config wiring #880

Merged

5 tasks

-    gap_inv="$(gap_invariant_for "$module")"
+    if ! gap_inv="$(gap_invariant_for "$module")"; then
+        overall_rc=1
+        continue
+    fi

Conversation

bootjp commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What lands

Invariants encoded

Verification (local)

Modelling notes

Self-review (5-lens per CLAUDE.md)

Test plan

Out of scope

Summary by CodeRabbit

Release Notes

Uh oh!

chatgpt-codex-connector Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Why this happens:

Impact:

Recommendation:

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Recommendation:

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Recommendation:

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

bootjp commented May 29, 2026

Uh oh!

claude Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review — Round 2 (after a26f650f)

Stale documentation (fixes recommended before merge)

Spec correctness (no issues found)

Minor observation — symmetry group completeness

Summary

Uh oh!

bootjp commented May 29, 2026

Uh oh!

claude Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review — Round 3 (after 372d9ca8)

One remaining minor stale reference

Spec correctness (unchanged from round 2 assessment)

Uh oh!

bootjp commented May 29, 2026

Uh oh!

claude Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review — Round 4 (after b6023839)

Full-history reconciliation (all four rounds)

Final correctness re-check (no changes needed)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

bootjp commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading

claude Bot commented May 29, 2026 •

edited

Loading

Code Review — Round 2 (after `a26f650f`)

claude Bot commented May 29, 2026 •

edited

Loading

Code Review — Round 3 (after `372d9ca8`)

claude Bot commented May 29, 2026 •

edited

Loading

Code Review — Round 4 (after `b6023839`)