Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions .github/workflows/tla-check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
name: TLA+ model check

# Runs `make tla-check` on PRs (and pushes to main) that touch the
# subsystems modelled by the spec or the spec / tooling themselves.
# Per docs/design/2026_05_28_partial_tla_safety_spec.md §7.3.
#
# The job runs both MCHLC.cfg (correct design — expected PASS) and
# MCHLC_gap.cfg (preconditions disabled — expected FAIL with the HLC-4
# counterexample); the Makefile target greps for the specific HLC-4
# violation string so the gap evidence is validated, not just the exit
# code.

on:
pull_request:
types: [opened, synchronize, reopened]
paths:
- 'tla/**'
- 'Makefile'
- 'kv/hlc*.go'
- 'kv/coordinator.go'
- 'kv/sharded_coordinator.go'
- 'kv/transaction.go'
- 'kv/fsm.go'
- 'kv/lock_resolver.go'
- 'store/mvcc_store.go'
- 'distribution/**'
- '.github/workflows/tla-check.yml'
push:
branches: [main]
paths:
- 'tla/**'
- 'Makefile'
- 'kv/hlc*.go'
- 'kv/coordinator.go'
- 'kv/sharded_coordinator.go'
- 'kv/transaction.go'
- 'kv/fsm.go'
- 'kv/lock_resolver.go'
- 'store/mvcc_store.go'
- 'distribution/**'
- '.github/workflows/tla-check.yml'

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-tla-check
cancel-in-progress: false

permissions:
contents: read

jobs:
tla-check:
name: tla-check
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- name: Check out
uses: actions/checkout@v6

- name: Set up JDK 21 (Temurin)
uses: actions/setup-java@v5
with:
distribution: temurin
java-version: '21'

# Cache the checksum-pinned tla2tools.jar so we don't re-download
# GitHub release artefacts on every run. Key includes the Makefile
# hash so a version-pin change invalidates the cache automatically
# (TLA_VERSION + TLA_SHA256 both live in the Makefile).
#
# NO restore-keys: a prefix-match restore would bring back an older
# jar after a TLA_VERSION/TLA_SHA256 bump. Because the Makefile's
# $(TLA_JAR) recipe is gated on file existence, the restored stale
# jar would skip checksum verification entirely — defeating the
# whole point of the pin (codex P2 on PR #856 round 4). A full
# miss-and-redownload is correct here.
- name: Cache tla2tools.jar
uses: actions/cache@v5
with:
path: .cache/tla
key: ${{ runner.os }}-tla-tools-${{ hashFiles('Makefile') }}

- name: Pre-fetch tla2tools.jar (logs separately from tla-check)
run: make tla-tools

- name: Run make tla-check
run: make tla-check
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,11 @@ jepsen/.ssh/
.golangci-cache/
server

# TLA+ TLC artifacts (counterexample traces, state directories)
tla/**/MCHLC_TTrace_*.tla
tla/**/MCHLC_TTrace_*.bin
tla/**/states/

# Admin SPA build outputs. The placeholder internal/admin/dist/index.html
# is committed so `go build` succeeds in a fresh clone (the //go:embed
# directive needs at least one matching file). Everything else under
Expand Down
90 changes: 90 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,93 @@ lint:

gen:
@$(MAKE) -C proto gen

# === TLA+ model-check support (M1 deliverable) ===
# Per docs/design/2026_05_28_partial_tla_safety_spec.md §7.2.
# Downloads a checksum-pinned tla2tools.jar on first use, then runs TLC on
# both MCHLC.cfg (correct design — expected PASS) and MCHLC_gap.cfg
# (preconditions disabled — expected FAIL on HLC-4 with a counterexample
# that motivates the implementation gaps).
TLA_VERSION := v1.8.0
TLA_JAR := .cache/tla/tla2tools.jar
TLA_SHA256 := 237332bdcc79a35c7d26efa7b82c77c85c2744591c5598673a8a45085ff2a4fb
TLA_URL := https://github.com/tlaplus/tlaplus/releases/download/$(TLA_VERSION)/tla2tools.jar
TLA_LIB := ../lib

.PHONY: tla-check tla-tools

tla-tools: $(TLA_JAR)

$(TLA_JAR):
@mkdir -p $(dir $(TLA_JAR))
@echo "Downloading tla2tools.jar $(TLA_VERSION)..."
@curl -fsSL -o $(TLA_JAR).tmp $(TLA_URL)
@# Prefer sha256sum (GNU coreutils, universal on Linux); fall back to
@# shasum -a 256 (default on macOS). Either yields the same hex
@# digest in the first whitespace-delimited field.
@if command -v sha256sum >/dev/null 2>&1; then \
actual=$$(sha256sum $(TLA_JAR).tmp | awk '{print $$1}'); \
elif command -v shasum >/dev/null 2>&1; then \
actual=$$(shasum -a 256 $(TLA_JAR).tmp | awk '{print $$1}'); \
else \
echo "ERROR: neither sha256sum nor shasum is available."; \
rm -f $(TLA_JAR).tmp; \
exit 1; \
fi; \
if [ "$$actual" != "$(TLA_SHA256)" ]; then \
echo "ERROR: tla2tools.jar SHA-256 mismatch."; \
echo " expected: $(TLA_SHA256)"; \
echo " actual: $$actual"; \
rm -f $(TLA_JAR).tmp; \
exit 1; \
fi
@mv $(TLA_JAR).tmp $(TLA_JAR)
@echo "tla2tools.jar ready at $(TLA_JAR) (SHA-256 $(TLA_SHA256))"

tla-check: $(TLA_JAR)
@echo "================================================================"
@echo " TLC: tla/hlc/MCHLC.cfg (correct design, expected PASS)"
@echo "================================================================"
@cd tla/hlc && java -XX:+UseParallelGC \
-cp ../../$(TLA_JAR) -DTLA-Library=$(TLA_LIB) \
tlc2.TLC -nowarning -config MCHLC.cfg MCHLC.tla
@echo
@echo "================================================================"
@echo " TLC: tla/hlc/MCHLC_gap.cfg (no preconditions, expected FAIL)"
@echo "================================================================"
@# Capture the gap run output so we can validate not just that TLC
@# exited non-zero but that the failure reason is exactly the HLC-4
@# invariant violation we expect. Without this check a parse error,
@# deadlock, JVM crash, or different invariant violation would silently
@# count as "expected gap counterexample" (codex P2 on PR #856 round 2).
@cd tla/hlc && \
out=$$(java -XX:+UseParallelGC \
-cp ../../$(TLA_JAR) -DTLA-Library=$(TLA_LIB) \
tlc2.TLC -nowarning -config MCHLC_gap.cfg MCHLC.tla 2>&1) ; \
rc=$$? ; \
printf '%s\n' "$$out" ; \
if [ "$$rc" -eq 0 ]; then \
echo ; \
echo "ERROR: MCHLC_gap.cfg unexpectedly passed."; \
echo " The gap configuration disables HLC-4 preconditions (ii)+(iii);"; \
echo " TLC was supposed to surface a counterexample showing why those"; \
echo " preconditions are necessary. A clean pass means either the spec"; \
echo " no longer encodes the gap correctly or the safety guards leaked"; \
echo " past the EnableSafety toggle."; \
exit 1; \
fi ; \
if printf '%s\n' "$$out" | grep -qF "Invariant HLC4_NoRegressionAcrossTerms is violated"; then \
echo ; \
echo "OK: MCHLC_gap.cfg failed as designed (HLC-4 counterexample)."; \
else \
echo ; \
echo "ERROR: MCHLC_gap.cfg failed, but the reason is NOT a HLC-4 violation."; \
echo " Expected substring in TLC output:"; \
echo " 'Invariant HLC4_NoRegressionAcrossTerms is violated'"; \
echo " The non-zero exit may indicate a parse error, deadlock, JVM"; \
echo " crash, or a different invariant breaking — review the output"; \
echo " above before treating this as a regression in the gap evidence."; \
exit 1; \
fi
@echo
@echo "tla-check: all model-check outcomes match the design contract."
Original file line number Diff line number Diff line change
@@ -1,9 +1,17 @@
# TLA+ Safety Specification for elastickv

Status: Proposed
Status: Partial
Author: bootjp
Date: 2026-05-28

> **Implementation status.** M1 landed in PR #TBD (the PR opening this
> document's rename to `partial`). The HLC TLA+ module + `make tla-check`
> + the gap-config counterexample evidence are now part of the build.
> M2–M5 (OCC, MVCC, Routes, Composed) remain open per §8. The follow-up
> Go code changes that strategy (c) and the ceiling fence require (per
> §5.1 HLC-4 (ii) and (iii)) are tracked as separate issues against
> `kv/hlc.go`, `kv/sharded_coordinator.go`, and `kv/fsm.go`.

---

## 1. Background and Motivation
Expand Down Expand Up @@ -536,23 +544,30 @@ Both targets shell out to `tla2tools.jar` which is downloaded on first use
into `.cache/tla/` (matching the existing `.cache/` pattern from CLAUDE.md).
The downloaded jar is checksum-pinned. No system-wide install required.

### 7.3 CI integration (deferred)
### 7.3 CI integration

`tla-check` is intended to run in CI on PRs that touch:
`tla-check` runs in CI via `.github/workflows/tla-check.yml` on every PR
(and push to `main`) that touches:

- `tla/**` and `Makefile` (the spec and the harness themselves).
- `kv/hlc*.go`, `kv/coordinator.go`, `kv/sharded_coordinator.go`,
`kv/transaction.go`, `kv/fsm.go`, `kv/lock_resolver.go` — the HLC
renewal scheduler, the timestamp-issuance dispatch path, and the
OCC commit-ts assignment all live in these files (per Section 3
anchors); a change here can violate HLC/OCC invariants and must
re-run `tla-check`.
- `store/mvcc_store.go`
- `distribution/`
- the spec files themselves

The exact CI wiring (workflow file, path filter) is deferred to the M1 PR
so it can be reviewed alongside a concrete spec rather than in the
abstract.
- `distribution/**`
- `.github/workflows/tla-check.yml` (the workflow file itself).

The workflow uses Temurin JDK 21, caches `.cache/tla/tla2tools.jar`
keyed on the Makefile hash so version-pin changes invalidate the cache
automatically, and inherits the Makefile's gap-config validation: a
non-zero TLC exit on `MCHLC_gap.cfg` only counts as success if the
output contains the exact string
`Invariant HLC4_NoRegressionAcrossTerms is violated`. Any other failure
mode (parse error, deadlock, JVM crash, different invariant) fails the
job. The full M1 run completes in well under a minute.

---

Expand Down
Loading
Loading