Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 42 additions & 14 deletions scripts/tla-check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,16 @@ if [ ! -f "$TLA_JAR" ]; then
fi

# Modules to check, in dependency order (libs before consumers).
TLA_MODULES=( "hlc" "occ" )
TLA_MODULES=( "hlc" "occ" "mvcc" "routes" )

# The invariant the gap config is expected to break, per module.
# These strings are matched against TLC stdout via grep -F (literal).
gap_invariant_for() {
case "$1" in
hlc) echo "Invariant HLC4_NoRegressionAcrossTerms is violated" ;;
occ) echo "Invariant OCC1_CommitTsAboveStart is violated" ;;
hlc) echo "Invariant HLC4_NoRegressionAcrossTerms is violated" ;;
occ) echo "Invariant OCC1_CommitTsAboveStart is violated" ;;
mvcc) echo "Invariant MVCC4_NoLostCommitOnSnapshotInstall is violated" ;;
routes) echo "Invariant Routes4_NoEngineRegression is violated" ;;
*)
echo "ERROR: no gap-invariant string registered for module '$1'." \
"Add a case to scripts/tla-check.sh." >&2
Expand All @@ -42,15 +44,36 @@ gap_invariant_for() {
}

mc_basename() {
# tla/<dir>/MC<UPPER>.tla — module name in UPPERCASE
printf 'MC%s' "$(printf '%s' "$1" | tr '[:lower:]' '[:upper:]')"
# tla/<dir>/MC<MODULE>.tla — per-module spelling. Acronym modules
# (HLC, OCC, MVCC) are spelled in uppercase; word-modules (Routes)
# use TitleCase. Override per module rather than a one-size-fits-
# all `tr a-z A-Z`, which would produce ugly `MCROUTES` for the
# word case.
case "$1" in
hlc) printf 'MCHLC' ;;
occ) printf 'MCOCC' ;;
mvcc) printf 'MCMVCC' ;;
routes) printf 'MCRoutes' ;;
*)
echo "ERROR: no mc_basename mapping for module '$1'." \
"Add a case to scripts/tla-check.sh." >&2
exit 64
;;
esac
}

run_tlc() {
local module="$1"
local cfg="$2"
local mc
mc="$(mc_basename "$module")"
# Same subshell-exit propagation concern as the main loop —
# without `set -e` an `exit 64` from mc_basename would leave
# mc="" and the subsequent java invocation would try to load
# `.tla`, which TLC parses as a malformed path. Fail explicitly.
if ! mc="$(mc_basename "$module")"; then
echo "ERROR: mc_basename failed for module '${module}'." >&2
return 64
fi
( cd "${REPO_ROOT}/tla/${module}" && \
java -XX:+UseParallelGC \
-cp "${TLA_JAR}" -DTLA-Library=../lib \
Expand All @@ -60,16 +83,21 @@ run_tlc() {
overall_rc=0

for module in "${TLA_MODULES[@]}"; do
mc="$(mc_basename "$module")"
# `set -e` is not in effect (the script uses `set -uo pipefail`),
# so `exit 64` from mc_basename / gap_invariant_for in a command
# substitution only terminates the subshell — the parent loop
# would otherwise continue with an empty `mc` / `gap_inv` and the
# downstream `grep -qF` would match the empty pattern, silently
# passing the gap check. Check both explicitly (gemini HIGH on
# PR #858 for gap_invariant_for; gemini MEDIUM on PR #862 for
# mc_basename).
if ! mc="$(mc_basename "$module")"; then
echo "ERROR: mc_basename failed for module '${module}' — see error above." >&2
overall_rc=1
continue
fi
safe_cfg="${mc}.cfg"
gap_cfg="${mc}_gap.cfg"
# `set -e` is not in effect (the script uses `set -uo pipefail`).
# Without explicit checking, a non-zero exit from gap_invariant_for
# inside the command substitution would only terminate the
# subshell — the parent loop would continue with gap_inv="" and the
# later `grep -qF "$gap_inv"` would always match (empty pattern),
# silently passing the gap check. Check explicitly (gemini HIGH
# on PR #858).
if ! gap_inv="$(gap_invariant_for "$module")"; then
echo "ERROR: gap_invariant_for failed for module '${module}' — see error above." >&2
overall_rc=1
Expand Down
102 changes: 96 additions & 6 deletions tla/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ run it without touching anything under the Go source tree.
|---|---|---|
| M1 | `lib/Raft.tla`, `lib/Env.tla`, `hlc/HLC.tla`, `make tla-check` | Landed |
| M2 | `occ/OCC.tla` (safety invariants OCC-1..OCC-5) | Landed |
| M3 | `mvcc/MVCC.tla` (MVCC-1..MVCC-4) | Not started |
| M4 | `routes/Routes.tla` (Routes-1..Routes-4) | Not started |
| M3 | `mvcc/MVCC.tla` (MVCC-1 + MVCC-4; MVCC-2 / MVCC-3 deferred to M5) | Landed |
| M4 | `routes/Routes.tla` (Routes-1 + Routes-4; Routes-2 / Routes-3 trivially-satisfied in M4 abstraction, full range modelling deferred to M5) | Landed |
| M5 | `composed/Composed.tla` + CI integration | Not started |
| M6 | Liveness checking (`tla-check-deep`) | Not started |

Expand Down Expand Up @@ -47,9 +47,9 @@ make tla-check
```

This runs TLC against the safe + gap configuration pair for every
module declared in `scripts/tla-check.sh` (currently HLC and OCC; M3 /
M4 / M5 will be added as they land), and prints whether the outcome
matches the design contract:
module declared in `scripts/tla-check.sh` (currently HLC, OCC, MVCC,
and Routes; M5 will be added as it lands), and prints whether the
outcome matches the design contract:

- **Safe config (`tla/<module>/MC<MODULE>.cfg`)** — the correct
design, with each module's preconditions or commit guards enabled.
Expand Down Expand Up @@ -80,7 +80,8 @@ java -XX:+UseParallelGC -cp ../../$JAR -DTLA-Library=../lib \
```

The OCC module follows the same pattern under `tla/occ/` with
`MCOCC.cfg` / `MCOCC_gap.cfg`; future M3 / M4 / M5 modules will too.
`MCOCC.cfg` / `MCOCC_gap.cfg`; MVCC follows under `tla/mvcc/` (see
below). Future M4 / M5 modules will use the same pattern.

State-space bounds are set in the `.cfg` files (M1 defaults: 3 nodes,
2 terms, ≤ 3 IssueTimestamp ops, wall clock ≤ 3 ms, `LogicalMax = 1`;
Expand Down Expand Up @@ -159,6 +160,95 @@ TLC model-check instance for OCC. Same one-module / two-config layout
as MCHLC. The gap config disables the OCC-1 commit guard; TLC
produces an `OCC1_CommitTsAboveStart` counterexample at depth ≈ 5.

### `mvcc/MVCC.tla`
The MVCC layer. Single-node model of a per-key version chain
(`versions[k]`) with a `Compact` action that drains older versions
to bound storage growth. Carries a ghost variable
`originalVersions[k]` that records every version ever written and
is never pruned — MVCC-4 compares the two to assert that no
committed entry has been lost across compaction. HLC is abstracted
to a single monotonic counter (`tsCounter`); M5 (composed) will
INSTANCE `HLC.tla` for the real 48/16 layout. The `EnableSafety`
CONSTANT gates the per-key "retain the latest commit_ts below the
new minRetained" rule inside `Compact` — under the gap config that
guard is removed and TLC surfaces a `MVCC-4` counterexample where a
read at `read_ts ≥ minRetained` now misses a value the original log
would have shown.

MVCC-2 (no committed version below the HLC physical ceiling) and
MVCC-3 (cross-node read consistency) are deferred to M5:

- MVCC-2 is properly an HLC.tla property — the ceiling discipline
lives there. M5 will integrate the modules and check the cross-
spec form.
- MVCC-3 requires a multi-node model that the single-node M3 spec
cannot express. M5 is the right place for it.

Invariants asserted:

| Invariant | Statement |
|---|---|
| `TypeOK` | Variable types are well-formed |
| `MVCC1_VisibleVersionUnique` | No two distinct version records in `versions[k]` share a `commit_ts` |
| `MVCC4_NoLostCommitOnSnapshotInstall` | For every key and every `read_ts ≥ minRetained`, the visible version in `versions[k]` equals the visible version in the ghost `originalVersions[k]` |
| `MVCC_TsMonotonic` (PROPERTY) | Transition form: `tsCounter` weakly increases on every step |
| `MVCC_GhostMonotonic` (PROPERTY) | Transition form: `originalVersions[k]` only grows; `Compact` leaves it `UNCHANGED` |

### `mvcc/MCMVCC.tla` + `MCMVCC.cfg` / `MCMVCC_gap.cfg`
TLC model-check instance for MVCC. Same one-module / two-config
layout as MCHLC and MCOCC. The gap config disables the retention
guard inside `Compact`; TLC produces a
`MVCC4_NoLostCommitOnSnapshotInstall` counterexample at depth ≈ 3
(one `Write`, one `Compact(newMin > commit_ts)`).

### `routes/Routes.tla`
The route catalog + CatalogWatcher layer. Models the durable catalog
as a single monotonic `catalogVersion` counter and each node's
`RouteEngine` as a per-node `engineVersion[n]` that re-syncs via the
`CatalogWatcherSync(n)` action (mirroring
`distribution/watcher.go SyncOnce`). A ghost variable
`engineMaxObserved[n]` records the highest version each node has
ever observed, so Routes-4 (watcher fan-out monotonicity) can be
stated as a state invariant
(`engineVersion[n] ≥ engineMaxObserved[n]`). The `EnableSafety`
CONSTANT gates the monotonicity guard inside `CatalogWatcherSync`
— under the gap config that guard is removed and TLC surfaces a
Routes-4 counterexample where a node fetches an older snapshot
after a newer one.

Routes-2 (coverage and disjointness of ranges) and Routes-3
(SplitRange catalog atomicity) are trivially satisfied in this M4
abstraction:

- Routes-2 — M4 does not model individual ranges explicitly. Full
range modelling (a `routes : Keys → GroupId` function whose
totality implies both properties) is deferred to M5 (composed),
where SplitRange exercising key-boundary changes is in scope.
- Routes-3 — every catalog update is a single TLA+ action
(`ProposeRouteChange`) that bumps `catalogVersion` and the
routes in one step, so atomicity is structural. Asynchronous
cross-node propagation is captured by Routes-4.

Invariants asserted:

| Invariant | Statement |
|---|---|
| `TypeOK` | Variable types are well-formed |
| `Routes1_VersionInRange` | `catalogVersion ∈ 0..MaxVersions`; combined with `Routes1_Action` this captures Routes-1 monotonicity |
| `Routes2_CoverageDisjoint` | Vacuously TRUE in M4 (range partition not explicitly modelled; see comment in `Routes.tla`) |
| `Routes3_SplitAtomicity` | Vacuously TRUE in M4 (single-step `ProposeRouteChange`; see comment in `Routes.tla`) |
| `Routes4_NoEngineRegression` | For every node `n`, `engineVersion[n] ≥ engineMaxObserved[n]` |
| `Routes1_Action` (PROPERTY) | Transition form: `catalogVersion` weakly increases on every step |
| `Routes4_GhostMonotonic` (PROPERTY) | Transition form: `engineMaxObserved[n]` weakly increases on every step |

### `routes/MCRoutes.tla` + `MCRoutes.cfg` / `MCRoutes_gap.cfg`
TLC model-check instance for Routes. Same one-module / two-config
layout as the other modules. The gap config disables the monotonicity
guard inside `CatalogWatcherSync`; TLC produces a
`Routes4_NoEngineRegression` counterexample at depth ≈ 3 (one
`ProposeRouteChange (v → 1)`, one `CatalogWatcherSync(n)` fetching
`v = 1`, one `CatalogWatcherSync(n)` fetching `v = 0` — regression).

## How to interpret a TLC failure

When TLC finds a counterexample on an invariant it prints:
Expand Down
29 changes: 29 additions & 0 deletions tla/mvcc/MCMVCC.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
\* TLC config: MVCC safety model — Compact retains the latest version
\* below the new retention watermark per key. Expected outcome: TLC
\* explores the bounded state space and finds no invariant violation.

SPECIFICATION Spec

CONSTANTS
Keys = {k1, k2}
Vals = {v1, v2}
MaxTs = 4
MaxOps = 3
EnableSafety = TRUE

SYMMETRY Symmetry

INVARIANTS
TypeOK
MVCC1_VisibleVersionUnique
MVCC4_NoLostCommitOnSnapshotInstall

PROPERTIES
MVCC_TsMonotonic
MVCC_GhostMonotonic

CONSTRAINT StateConstraint

\* Quiescence (no Write/Compact enabled) is a valid MVCC end state.
CHECK_DEADLOCK
FALSE
24 changes: 24 additions & 0 deletions tla/mvcc/MCMVCC.tla
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
-------------------------------- MODULE MCMVCC -------------------------------
(***************************************************************************)
(* Model-check instance for MVCC.tla. *)
(* *)
(* Two .cfg files point at this module: *)
(* MCMVCC.cfg — EnableSafety = TRUE (correct design; TLC passes) *)
(* MCMVCC_gap.cfg — EnableSafety = FALSE (Compact drops the latest- *)
(* before-newMin retention; TLC fails on *)
(* MVCC4_NoLostCommitOnSnapshotInstall and emits a *)
(* counterexample, motivating the retention guard) *)
(* *)
(* Run via `make tla-check` from the repository root. *)
(***************************************************************************)

EXTENDS MVCC, TLC

\* Keys are symmetric in MVCC.tla per §6.3 of the design doc — they
\* participate only as indices, not in any ordering on the invariants.
\* Vals are symmetric for the same reason.
KeySymmetry == Permutations(Keys)
ValSymmetry == Permutations(Vals)
Symmetry == KeySymmetry \cup ValSymmetry

=============================================================================
37 changes: 37 additions & 0 deletions tla/mvcc/MCMVCC_gap.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
\* TLC config: MVCC gap model — EnableSafety = FALSE. Compact's
\* per-key retention drops every version below the new minRetained
\* watermark INCLUDING the latest-before-newMin entry. TLC is expected
\* to FAIL the MVCC4_NoLostCommitOnSnapshotInstall invariant and emit a
\* counterexample (a key written below newMin, no later write, and a
\* read at read_ts in [newMin, MaxTs] that would now miss the value).
\*
\* The make tla-check harness inverts the exit code for this config AND
\* greps for the specific MVCC-4 violation string, so any other failure
\* mode (parse error, deadlock, JVM crash, different invariant) fails
\* rather than being silently treated as the expected counterexample.

SPECIFICATION Spec

CONSTANTS
Keys = {k1, k2}
Vals = {v1, v2}
MaxTs = 4
MaxOps = 3
EnableSafety = FALSE

SYMMETRY Symmetry

INVARIANTS
TypeOK
MVCC1_VisibleVersionUnique
MVCC4_NoLostCommitOnSnapshotInstall

PROPERTIES
MVCC_TsMonotonic
MVCC_GhostMonotonic

CONSTRAINT StateConstraint

\* See MCMVCC.cfg — quiescence is a valid end state, not a deadlock.
CHECK_DEADLOCK
FALSE
Loading
Loading