A Lean-verified access gateway for agentic data lakehouses.
Research artifact for Track 3 of the Apart Research Secure Program Synthesis Hackathon, 2026-05-22 → 2026-05-24.
Problem. Per-source authorization is non-compositional under
ETL fusion. When
Solution. Postern mediates every read at the plan boundary
against a column-grant policy. Its core — Plan IR, policy DSL,
rewriter — is mechanized in Lean 4, with nine sorry-free
theorems certifying that every accepted plan's output schema and
filter-predicate read-set are contained in the policy-allowed
columns. A Rust implementation mirrors the algorithm and is
conformance-tested against the Lean reference (18 / 18 cases).
| Path | What |
|---|---|
paper/ |
Pandoc-Markdown paper + BibTeX. Build with paper/build.sh (needs pandoc + xelatex). |
verifier/lean/ |
Lean 4 spec, nine fully-proved theorems, axiom audit, corpus emitter. |
prototype/ |
Rust workspace mirroring the Lean types (postern-core) and the conformance harness (postern-diff). |
scenarios/financial-institution/ |
Kaggle transactions-fraud-datasets case study, three departments. |
scripts/reproduce.sh |
One-shot reproduction. |
-
Fully proved Lean 4 theorem(s). Nine theorems span output- column soundness, filter-predicate soundness (closes the
WHERE ssn = ?side-channel), schema subset, monotonicity in policy, idempotence, and explicit-refusal lemmas for unknown relations and forbidden filter columns. Nosorry.CheckAxioms.leanreports the per-theorem axiom set is bounded by{propext, Quot.sound}— Lean's built-in foundational axioms; two theorems depend on none. -
Conformance testing.
postern-diffruns the Rust rewriter against the Lean-emitted reference corpus; 18 / 18 cases pass on the demo scenario (15 accept, 3 refuse — including regression cases for known attack shapes).
scripts/reproduce.shExpected tail:
18/18 cases pass (Lean reference == Rust impl)
==> All green.
The script also runs the axiom audit (#print axioms on every
load-bearing theorem) and checks the committed corpus matches what
Lean emits today (catches drift).
Toolchains: Lean 4.29.1 (pinned in verifier/lean/lean-toolchain),
Rust stable (tested with 1.93). Runs in under two minutes on an
M-series Mac on a warm cache.
-
Policy is a list of column-grants
$\langle p, r, C\rangle$ — "principal$p$ may read columns$C$ on relation$r$ ". Fail-closed, monotone grant-only (no deny-lists by design; see paper §6). -
Plan IR is
Scan(rel) | Project(plan, cols) | Filter(plan, col)— single-relation by design so soundness stays small. -
Rewriter returns
Option Plan. Refuses on unknown relation or forbidden filter column; on accept wraps the plan in aProjectofschema(q) ∩ allowed(P, prin, touched(q)). - Capability distribution is via biscuit tokens (prototype-side, outside the proof).
- Surface in the prototype is an MCP server over Polars / DuckDB.
See paper/paper.md §3–§5 for the full design and §6 for open
challenges (joins, aggregation + DP, biscuit attenuation in-proof).
Postern is the plan-boundary layer — last chance to bound what a query can read. Pair it with an agent-code-boundary layer like the Scala-3-capture-checking approach from Odersky et al. 2026 (arXiv:2603.00991), which makes capabilities first-class program variables so agent-emitted code cannot exfiltrate data it doesn't hold a capability for. The two layers compose without re-verifying each other's TCB; see paper §3 "Defense-in-depth with capability tracking".
This is a research artifact, not production code. The Lean spec covers single-relation plans; the Rust impl carries joins by per-leg rewriting but those compositions are not yet under proof. Aggregation, filter-timing side-channels, and the planner→executor lowering step are out of scope for the current theorem set — see paper §2 (threat model) and §6.
MIT OR Apache-2.0 (workspace-wide).