Skip to content

CornerCore-AI/Spec-Triangulator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spec Triangulator

Multi-Tool Triangulation for Formal Specification Validation
Apart Research Hackathon — Secure Program Synthesis — May 2026 — Track 2: Specification Validation


What This Is

Formal verification has a triage problem: engineers face a bewildering array of tools (Tamarin, ProVerif, TLA+, Lean, Dafny, Kani) with no systematic guidance on which to use — and no way to tell if their specification actually captures what they intended.

Spec Triangulator addresses this with two components:

  1. Classifier — Routes natural-language system descriptions to the right formal tool pair. 93.8% accuracy, 79.8% multi-tool recall on 48 real-world examples. Zero fine-tuning.

  2. Triangulation Engine — Runs TLC (explicit-state model checking) and Z3 (SMT inductive invariant proof) on the same TLA+ specification and compares verdicts. Disagreement surfaces specification gaps.

The Key Insight

Two tools with different mathematical foundations checking the same property:

Verdict Meaning
Both VERIFIED High confidence — reachability + inductive proof agree
TLC ✅ Z3 ❌ Invariant not inductive — spec needs strengthening
TLC ❌ Z3 ✅ State space too large for TLC — inductive proof scales
Both ❌ Genuine transition bug — fix before proceeding

We validate this with 40 controlled experiments covering all four quadrants.


Results

Metric Value
Classifier accuracy 93.8% (45/48)
Multi-tool recall 79.8%
Specs triangulated 10 real TLA+ specs
TLC states checked 117,278
Z3 inductive proofs 10/10 UNSAT in <50ms
Confusion matrix cases 40/40 in correct quadrant

Repository Structure

spec-triangulator/
├── src/
│   ├── tool_runner.py        # TLC + Z3 triangulation engine (self-contained)
│   ├── confusion_matrix.py   # 40-case confusion matrix experiments
│   ├── eval.py               # Classifier evaluation pipeline
│   └── build_dataset.py      # Dataset construction
├── data/
│   ├── dataset.json          # 48 labeled examples with spec file paths
│   └── specs_manifest.json   # Spec URL/path mapping for all 48 examples
├── results/
│   ├── triangulation_results.json    # 10-spec triangulation output
│   └── confusion_matrix_results.json # 40-case confusion matrix output
├── docs/
│   └── confusion_matrix.md   # Detailed explanation of all 4 quadrants
├── notebooks/
│   └── spec_triangulator.ipynb  # Kaggle-ready notebook
├── requirements.txt
└── README.md

Quickstart

Install

pip install z3-solver groq

TLC (tla2tools.jar) is downloaded automatically by tool_runner.py on first run.

Run Triangulation (10 real TLA+ specs)

cd src
python tool_runner.py
# Output: results/triangulation_results.json

Run Confusion Matrix (40 experiments)

cd src
python confusion_matrix.py
# Output: results/confusion_matrix_results.json

Run Classifier Evaluation

export GROQ_API_KEY=your_key_here
cd src
python eval.py

Kaggle

Upload notebooks/spec_triangulator.ipynb to Kaggle. Add your GROQ_API_KEY as a Kaggle Secret. The notebook installs all dependencies and runs the full pipeline.


How Triangulation Works

TLC and Z3 verify the same property by completely different means:

TLC performs breadth-first search over all reachable states from Init, checking TypeOK in every state:

∀ s reachable from Init: TypeOK(s)

Z3 proves the invariant is inductive — preserved by every possible transition:

Check: TypeOK(s) ∧ Next(s,s') → TypeOK(s')
i.e., ¬(TypeOK(s) ∧ Next(s,s') ∧ ¬TypeOK(s')) must be UNSAT

TLC only considers reachable states. Z3 considers all mathematically possible pre-states. This difference is what makes disagreement informative.

Example: Two-Phase Commit

TLC checks 1,146 reachable states — all satisfy TPTypeOK. ✅

Z3 checks inductiveness of TypeOK ∧ TCConsistent ∧ ProtocolInv:

  • Without ProtocolInv (rm_committed → tm_committed): Z3 finds spurious pre-state rm=committed, tm=init — unreachable from TPInit but not ruled out inductively. ❌ → Q2: spec needs strengthening
  • With full invariant: UNSAT — invariant is inductive. ✅ → Q1: high confidence

Dataset

48 real formally-verified systems across 4 categories, sourced from published papers:

Category Count Tools Example Systems
Protocol Security 12 Tamarin + ProVerif TLS 1.3, 5G AKA, Needham-Schroeder
State-based / Concurrent 11 TLC + Z3 Etcd Raft, Paxos, Two-Phase Commit
Language-specific 12 Verus + Kani Rust Vec::push, Binary Search (Dafny)
Foundational ITP 13 Lean + Rocq Merge Sort (Mathlib), Prime Number Theorem

27/48 examples have local spec file paths. All labels sourced from original papers — not generated by LLMs.

Sources: SysMoBench, TLA+ Examples repository, TLS13Tamarin, AWS Rust Std verification, Mathlib4, FVAPPS, Vericoding


Confusion Matrix (40 Cases)

Z3 = VERIFIED Z3 = COUNTEREXAMPLE
TLC = VERIFIED Q1: 10/10 — both agree ✅ Q2: 10/10 — invariant not inductive
TLC = TIMEOUT/CE Q3: 10/10 — Z3 scales ✅ Q4: 10/10 — genuine bugs

Q2 pattern: Correct invariant on reachable states, but Z3 finds a spurious pre-state the invariant doesn't inductively exclude. Fix: strengthen the invariant with protocol-level clauses.

Q3 pattern: Readers-Writers at N≥6 (3^N states) times out TLC in 8s. Z3 inductive proof: O(N²) clauses, <100ms. Dining Philosophers N≥14 shows the same pattern.

Q4 pattern: Real transition bugs — missing preconditions. Both TLC (finds trace) and Z3 (finds non-inductive step) independently detect the fault.


Specs Used

All TLA+ specs from github.com/tlaplus/Examples (Apache 2.0):

Spec File States
Two-Phase Commit transaction_commit/TwoPhase.tla 1,146
Paxos SimplifiedFastPaxos/Paxos.tla 55,079
Dining Philosophers DiningPhilosophers/DiningPhilosophers.tla 58
Die Hard DieHard/DieHard.tla 97
Game of Life GameOfLife/GameOfLife.tla 1,024
Coffee Can CoffeeCan/CoffeeCan.tla 74
Readers-Writers ReadersWriters/ReadersWriters.tla 59,674
Transaction Commit transaction_commit/TCommit.tla 94
Async Channel SpecifyingSystems/AsynchronousInterface/Channel.tla 16
Async Interface SpecifyingSystems/AsynchronousInterface/AsynchInterface.tla 16

References

  • Basin et al. (2018). A Formal Analysis of 5G Authentication. CCS 2018.
  • Cremers et al. (2017). A Comprehensive Symbolic Analysis of TLS 1.3. CCS 2017.
  • Dong et al. (2024). SysMoBench. arXiv 2024.
  • Lamport (2002). Specifying Systems. Addison-Wesley.
  • de Moura & Bjørner (2008). Z3: An Efficient SMT Solver. TACAS 2008.
  • Newcombe et al. (2015). How AWS Uses Formal Methods. CACM 58(4).

License

MIT License. TLA+ spec files retain their original Apache 2.0 license from the TLA+ Examples repository.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors