Spec Triangulator

Multi-Tool Triangulation for Formal Specification Validation
Apart Research Hackathon — Secure Program Synthesis — May 2026 — Track 2: Specification Validation

What This Is

Formal verification has a triage problem: engineers face a bewildering array of tools (Tamarin, ProVerif, TLA+, Lean, Dafny, Kani) with no systematic guidance on which to use — and no way to tell if their specification actually captures what they intended.

Spec Triangulator addresses this with two components:

Classifier — Routes natural-language system descriptions to the right formal tool pair. 93.8% accuracy, 79.8% multi-tool recall on 48 real-world examples. Zero fine-tuning.
Triangulation Engine — Runs TLC (explicit-state model checking) and Z3 (SMT inductive invariant proof) on the same TLA+ specification and compares verdicts. Disagreement surfaces specification gaps.

The Key Insight

Two tools with different mathematical foundations checking the same property:

Verdict	Meaning
Both VERIFIED	High confidence — reachability + inductive proof agree
TLC ✅ Z3 ❌	Invariant not inductive — spec needs strengthening
TLC ❌ Z3 ✅	State space too large for TLC — inductive proof scales
Both ❌	Genuine transition bug — fix before proceeding

We validate this with 40 controlled experiments covering all four quadrants.

Results

Metric	Value
Classifier accuracy	93.8% (45/48)
Multi-tool recall	79.8%
Specs triangulated	10 real TLA+ specs
TLC states checked	117,278
Z3 inductive proofs	10/10 UNSAT in <50ms
Confusion matrix cases	40/40 in correct quadrant

Repository Structure

spec-triangulator/
├── src/
│   ├── tool_runner.py        # TLC + Z3 triangulation engine (self-contained)
│   ├── confusion_matrix.py   # 40-case confusion matrix experiments
│   ├── eval.py               # Classifier evaluation pipeline
│   └── build_dataset.py      # Dataset construction
├── data/
│   ├── dataset.json          # 48 labeled examples with spec file paths
│   └── specs_manifest.json   # Spec URL/path mapping for all 48 examples
├── results/
│   ├── triangulation_results.json    # 10-spec triangulation output
│   └── confusion_matrix_results.json # 40-case confusion matrix output
├── docs/
│   └── confusion_matrix.md   # Detailed explanation of all 4 quadrants
├── notebooks/
│   └── spec_triangulator.ipynb  # Kaggle-ready notebook
├── requirements.txt
└── README.md

Quickstart

Install

pip install z3-solver groq

TLC (tla2tools.jar) is downloaded automatically by tool_runner.py on first run.

Run Triangulation (10 real TLA+ specs)

cd src
python tool_runner.py
# Output: results/triangulation_results.json

Run Confusion Matrix (40 experiments)

cd src
python confusion_matrix.py
# Output: results/confusion_matrix_results.json

Run Classifier Evaluation

export GROQ_API_KEY=your_key_here
cd src
python eval.py

Kaggle

Upload notebooks/spec_triangulator.ipynb to Kaggle. Add your GROQ_API_KEY as a Kaggle Secret. The notebook installs all dependencies and runs the full pipeline.

How Triangulation Works

TLC and Z3 verify the same property by completely different means:

TLC performs breadth-first search over all reachable states from Init, checking TypeOK in every state:

∀ s reachable from Init: TypeOK(s)

Z3 proves the invariant is inductive — preserved by every possible transition:

Check: TypeOK(s) ∧ Next(s,s') → TypeOK(s')
i.e., ¬(TypeOK(s) ∧ Next(s,s') ∧ ¬TypeOK(s')) must be UNSAT

TLC only considers reachable states. Z3 considers all mathematically possible pre-states. This difference is what makes disagreement informative.

Example: Two-Phase Commit

TLC checks 1,146 reachable states — all satisfy TPTypeOK. ✅

Z3 checks inductiveness of TypeOK ∧ TCConsistent ∧ ProtocolInv:

Without ProtocolInv (rm_committed → tm_committed): Z3 finds spurious pre-state rm=committed, tm=init — unreachable from TPInit but not ruled out inductively. ❌ → Q2: spec needs strengthening
With full invariant: UNSAT — invariant is inductive. ✅ → Q1: high confidence

Dataset

48 real formally-verified systems across 4 categories, sourced from published papers:

Category	Count	Tools	Example Systems
Protocol Security	12	Tamarin + ProVerif	TLS 1.3, 5G AKA, Needham-Schroeder
State-based / Concurrent	11	TLC + Z3	Etcd Raft, Paxos, Two-Phase Commit
Language-specific	12	Verus + Kani	Rust Vec::push, Binary Search (Dafny)
Foundational ITP	13	Lean + Rocq	Merge Sort (Mathlib), Prime Number Theorem

27/48 examples have local spec file paths. All labels sourced from original papers — not generated by LLMs.

Sources: SysMoBench, TLA+ Examples repository, TLS13Tamarin, AWS Rust Std verification, Mathlib4, FVAPPS, Vericoding

Confusion Matrix (40 Cases)

	Z3 = VERIFIED	Z3 = COUNTEREXAMPLE
TLC = VERIFIED	Q1: 10/10 — both agree ✅	Q2: 10/10 — invariant not inductive
TLC = TIMEOUT/CE	Q3: 10/10 — Z3 scales ✅	Q4: 10/10 — genuine bugs

Q2 pattern: Correct invariant on reachable states, but Z3 finds a spurious pre-state the invariant doesn't inductively exclude. Fix: strengthen the invariant with protocol-level clauses.

Q3 pattern: Readers-Writers at N≥6 (3^N states) times out TLC in 8s. Z3 inductive proof: O(N²) clauses, <100ms. Dining Philosophers N≥14 shows the same pattern.

Q4 pattern: Real transition bugs — missing preconditions. Both TLC (finds trace) and Z3 (finds non-inductive step) independently detect the fault.

Specs Used

All TLA+ specs from github.com/tlaplus/Examples (Apache 2.0):

Spec	File	States
Two-Phase Commit	`transaction_commit/TwoPhase.tla`	1,146
Paxos	`SimplifiedFastPaxos/Paxos.tla`	55,079
Dining Philosophers	`DiningPhilosophers/DiningPhilosophers.tla`	58
Die Hard	`DieHard/DieHard.tla`	97
Game of Life	`GameOfLife/GameOfLife.tla`	1,024
Coffee Can	`CoffeeCan/CoffeeCan.tla`	74
Readers-Writers	`ReadersWriters/ReadersWriters.tla`	59,674
Transaction Commit	`transaction_commit/TCommit.tla`	94
Async Channel	`SpecifyingSystems/AsynchronousInterface/Channel.tla`	16
Async Interface	`SpecifyingSystems/AsynchronousInterface/AsynchInterface.tla`	16

References

Basin et al. (2018). A Formal Analysis of 5G Authentication. CCS 2018.
Cremers et al. (2017). A Comprehensive Symbolic Analysis of TLS 1.3. CCS 2017.
Dong et al. (2024). SysMoBench. arXiv 2024.
Lamport (2002). Specifying Systems. Addison-Wesley.
de Moura & Bjørner (2008). Z3: An Efficient SMT Solver. TACAS 2008.
Newcombe et al. (2015). How AWS Uses Formal Methods. CACM 58(4).

License

MIT License. TLA+ spec files retain their original Apache 2.0 license from the TLA+ Examples repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spec Triangulator

What This Is

The Key Insight

Results

Repository Structure

Quickstart

Install

Run Triangulation (10 real TLA+ specs)

Run Confusion Matrix (40 experiments)

Run Classifier Evaluation

Kaggle

How Triangulation Works

Example: Two-Phase Commit

Dataset

Confusion Matrix (40 Cases)

Specs Used

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
docs		docs
notebooks		notebooks
results		results
src		src
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Spec Triangulator

What This Is

The Key Insight

Results

Repository Structure

Quickstart

Install

Run Triangulation (10 real TLA+ specs)

Run Confusion Matrix (40 experiments)

Run Classifier Evaluation

Kaggle

How Triangulation Works

Example: Two-Phase Commit

Dataset

Confusion Matrix (40 Cases)

Specs Used

References

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages