ProveKit: closed-loop bug-report → verified bundle with substrate self-extension by TSavo · Pull Request #4 · TSavo/provekit

TSavo · 2026-04-24T21:38:37Z

Summary

ProveKit: The Kit to Prove It's Fixed.

Implements the complete fix-loop plan (docs/plans/2026-04-23-fix-loop.md) plus the capture-the-change architectural shift, non-code artifact kinds, product rename, and documentation pass. Takes a bug report → produces a mechanically-verified bundle (patch + regression test + principle + optional new capability) gated by 18 oracles. Every LLM output feeds at least one mechanical verifier. The system extends its own substrate through the same atomic-bundle machinery that fixes bugs.

Architecture at a glance

5 primitive registries (data, not code): capabilities, DSL relations, intake adapters, remediation layers, artifact kinds
18 oracles gate every bundle across generation + verification
2 bundle types (fix / substrate); substrate bundles self-extend the capability set
Capture-the-change pipeline: Claude edits files in a scratch overlay via agent(); git diff captures the structured change; oracles verify the result
14 artifact kinds covering code, tests, docs, config, dependencies, prompts, lint rules, migrations — every kind of real-world fix is first-class

See README.md, ARCHITECTURE.md, RETROSPECTIVE.md.

What landed

Core plan (23 tasks + splits)

Section A (11 sub-tasks): SAST substrate (nodes/children/capabilities/data-flow/dominance/reindex), DSL parser/compiler/evaluator, capability + relation registries, 14/23 seed principles migrated (9 documented capability gaps)
Section B (5 tasks): intake adapter registry, locate, remediation layer registry, CLI (provekit fix <ref>), orchestrator scaffold
Section C (6 tasks): invariant formulator (Harness synthesis + property-test + 5 existing-loop refinements #1), scratch overlay, fix candidate (Triangle oracle: test-suite execution via per-framework adapters #2), complementary (Marketing: brutal honesty pass on PITCH.md + LANDING.md #3), mutation-verified tests (docs: substrate-not-blockchain manifesto #9), principle/capability candidate (Cut v1.3.0 catalog (additive over v1.2.0; signed) #6 feat(py): port binaryCid + targetProofCid + sourceContractCid + EvidenceTerm to Python kit #14 feat(self-contracts): encode 5-7 catalog-format spec rules as contract mementos #16 feat(ts): port binaryCid + targetProofCid + sourceContractCid + EvidenceTerm to TypeScript kit #17 feat(go): port binaryCid + targetProofCid + sourceContractCid + EvidenceTerm to Go kit #18)
Section D (5 tasks): artifact-kind registry, 18-oracle runner, transactional apply with substrate rollback, learning layer, e2e + dogfood

Architectural follow-ups

Oracle feat(self-contracts): encode 5-7 catalog-format spec rules as contract mementos #16 full execution: transpiles LLM-proposed extractor via ts.transpileModule → dynamic-imports → runs against positive/negative fixtures in a scratch SAST DB
Dogfood test (src/fix/dogfood.empty-catch.test.ts): proved the substrate-extension path closes the empty-catch gap end-to-end; oracle feat(py): port binaryCid + targetProofCid + sourceContractCid + EvidenceTerm to Python kit #14 rejected a DROP TABLE migration
CLI completion: provekit fix <ref> [--apply] [--max-sites N] runs the whole loop end-to-end with prDraft or autoApply modes
LLMProvider.agent(): Claude edits files in the overlay directly; C3–C6 migrated to capture-the-change; JSON-patch paths retained as backward-compat fallbacks
Artifact-kind expansion: 8 → 14 kinds covering test fixes, config updates, doc updates, dependency updates, prompt updates, lint rules, migration fixes
Rename: neurallog → provekit across the entire codebase (108 files, no legacy fallback); tagline wired into --version and CLI help
Documentation: README + ARCHITECTURE + RETROSPECTIVE for new readers

Stats

~45 commits on this branch
560 tests passing + 3 intentionally skipped + 1 flaky (apply.test.ts test 7 under parallel-load tmpdir race; passes in isolation)
tsc clean on src/fix/, src/sast/, src/dsl/, src/db/, src/llm/

Test plan

Full suite green (modulo the one known flake)
Dogfood test passes: src/fix/dogfood.empty-catch.test.ts → substrate-extension path end-to-end with oracle feat(py): port binaryCid + targetProofCid + sourceContractCid + EvidenceTerm to Python kit #14 and oracle feat(self-contracts): encode 5-7 catalog-format spec rules as contract mementos #16 full execution both firing
E2E test passes: src/e2eFixLoop.test.ts → fix-bundle happy path through C4
All 5 registries live and tested
provekit --version prints tagline
Real-LLM dogfood (follow-up): wire ClaudeAgentProvider from env; point at one of the 8 remaining A8 capability gaps

Known MVP pass-throughs (documented in `RETROSPECTIVE.md`)

Oracles v1.4.0 spec drafts: opacity-coverage consensus + binary-attestation #7 (witness replay), fix(ci): resolve sqrt name collision in mint-rust self-contracts #12 (DSL regressions), docs(manifesto): substrate is Turing complete (DAG ordering equivalence) #15 (cross-codebase regression): pass-through stubs; real implementations deferred
data_flow_transitive bipartite-graph limitation (revisit when a consumer needs chain reachability)
captures extractor best-effort (needs tsconfig-backed ts-morph Project to tighten)
code_patch.isPresent overlaps with more-specific kinds (test_fix etc.) → D1b oracle routing is over-inclusive; routing dedup is a follow-up

🤖 Generated with Claude Code

Accepted design for a persistent-relational rebuild of the pipeline: SQLite via Drizzle as the data layer, shadow AST with capability tables over ts-morph, Datalog-flavored principle DSL compiling to Drizzle queries, multi-level structural hashes for incremental invalidation, instrumented execution + encoding-gap detector as the thesis-distinctive oracle layer. Scope honestly named: 4-6 months of focused work, phased A-G with each phase inspectable on its own. Phase D ships the encoding-gap localization capability THESIS.md commits to. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Four parallel review agents surfaced real issues; each addressed: CAPABILITY REVIEW (showstopper): spec's flagship DSL example referenced decides.kind=="division" but no division is in the enum. Added seven capability tables (node_arithmetic, node_assigns, node_returns, node_member_access, node_non_null_assertion, node_truthiness, node_narrows), two edge tables (signal_interpolations, post_dominance), and new columns on existing capability tables (node_calls.callee_name, node_binding.kind, node_decides.consequent/alternate, node_iterates.executes_at_least_once + collection_source, node_throws.is_inside_handler). Collapsed redundancies (try_handler into node_throws; error_throw signal renamed to throw_message). Declared data_flow.slot vocabulary as closed set. DSL REVIEW: added reusable predicate definitions (predicate zero_guard(...)) and taint abstractions (source/sink/sanitizer); added missing primitives (branch_reaches, always_exits, literal_value, call_arity, method_name, compound_assignment, encloses, mutates); renamed dominates() to directional $a before $b reading; made parser lenient (no/not/not exists synonyms; newlines/and both accepted); added compile-time enum/column/arity checks; added LLM-generation test (20/23 threshold) as freeze gate. PHASING REVIEW: re-sequenced from A-B-C-D to A-thin -> D-core -> A-full -> B -> D-validation -> C -> E/F/G-parallel. Thesis-distinctive capability ships week 3 riding on v1's existing harness/instrumented execution instead of month 3 on greenfield foundation. Total time unchanged; risk retirement order reversed. INDEX REVIEW: added eight indexes (three driver indexes: node_signal(signal_kind,node_id), node_smt(sort,node_id), contracts(stale_flag); four FK cascade indexes to keep file delete O(n) not O(n^2); plus node_calls(callee_name,node_id) for method-name principles). Materialize transitive data-flow closure at SAST build time (symmetric with dominance) to avoid recursive-CTE cost on cyclic inputs. FOLLOW-UP from user ("no JSON columns anywhere; we have a data model, leverage it"): promoted all runtime values to a relational graph (runtime_values + runtime_value_object_members + runtime_value_array_elements). trace_values, runtime_observation_values, gap_reports.smt_value/runtime_value, clause_witnesses.model_value, and traces.outcome all now reference the graph instead of storing JSON. Validator evidence split from one JSON column into three structured tables per validator kind (semantic_classifier, semantic_diff, adversarial). Design decision updated: no JSON columns in the schema, period. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Installs better-sqlite3, drizzle-orm, drizzle-kit, and @types/better-sqlite3. Adds openDb() with WAL mode and foreign-key enforcement; gitignores the DB files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Task 3: TDD pass — traces (no FK to clauses yet) + trace_values with composite PK (traceId, nodeId, iterationIndex) and cascade delete. Migration 0001 generated. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds tv_single_point_unique — a partial unique index on (trace_id, node_id) WHERE iteration_index IS NULL — to catch duplicate writes for the common single-point snapshot case that SQLite's composite PK silently allows when iterationIndex is NULL. Column shape unchanged; test coverage added. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds three tables for SMT clause storage: clauses (with verdict enum proven/violation/unknown/vacuous), clause_bindings (composite PK on clauseId+smtConstant, cascade delete), and clause_witnesses (composite PK, NOT NULL modelValueId FK to runtime_values, cw_by_value index). Migration: drizzle/0003_workable_zarek.sql. 93/93 tests passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Remove `sort` from `clause_witnesses` (it duplicated `clause_bindings.sort` for the same (clauseId, smtConstant) pair, allowing silent drift). - Add composite FK (clauseId, smtConstant) → clause_bindings with ON DELETE CASCADE, replacing the direct clauseId → clauses FK. Cascade chain is intact: clauses → clause_bindings (cascade) → clause_witnesses (cascade). - Close DB handle before rmSync in afterEach to avoid file-handle leaks. - Add tests: FK rejects witness with no binding; cascade-deletes witness on binding delete. - Migration 0004 uses SQLite table-rebuild (PRAGMA foreign_keys=OFF, temp table, INSERT SELECT, rename) since ALTER TABLE can't add composite FKs or drop cols. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…values Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… truncation) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ion + persistence Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nts + persistence Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…urce_expr, sort) Adds a Binding Metadata section to the invariant derivation prompt. Every declared SMT constant must come with a binding record pointing at the source line and expression it represents, plus its SMT sort. The downstream gap detector uses these to validate Z3's verdict against runtime behaviour; without bindings, proofs are unactionable. Teaches the LLM via: - concrete field semantics (smt_constant must match declare-const name, source_line is 1-indexed, source_expr is a whitespace-normalised substring hint, sort matches the declared sort) - an <abstract> sentinel for pure-model constants with no source line - a worked example pairing an smt2 block with its bindings block - the per-block pairing rule when a signal produces multiple blocks Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The prompt test was placed in prompts/ and also mirrored in src/. The vitest config only covered src/**/*.test.ts so npx vitest run prompts/invariant_derivation.test.ts returned no-files-found. Extending the include glob makes both the test path specified in the task and the full-suite run find it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The snapshot hook used a fixed globalThis.__neurallog_snapshot__, which two parallel runHarnessWithTrace calls clobbered: run A installed its array-pushing function, run B overwrote it with its own, run A's instrumented code then pushed into run B's array, and run A returned with zero captures. The full test suite exposed this when gapDetection.test.ts and harness.captureTrace.test.ts ran in parallel. Fix: - instrumentForSnapshot accepts snapshotFnName (default kept for tests of the module in isolation; harness overrides it). - runHarnessWithTrace generates an 8-byte random suffix per call, uses it for both the global name and the on-disk instrumented file path (which also raced when two runs targeted the same source). - New test: two Promise.all()-parallel harness calls against different sources must each see their own captures. Regression-tests this bug deterministically rather than relying on pool-ordering luck. Does not address the Math.random / Date.now re-entrance race (concurrent runs can leave the stubs installed beyond their intended scope). That has no active test signal and is noted for follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds explainGaps(db, contractKey) to src/cli.ts. Joins gap_reports to clauses by contract key and, for each row, renders a header naming the location and SMT constant, the Z3 model value and runtime observation looked up through the runtime_values graph, the cause explanation, and the gap kind. Wires a --gaps flag into the existing `neurallog explain <file>:<line>` command. When present, after the v1 ContractStore resolves the contract by file+line (for its key), we short-circuit and read from the new SQLite store instead of v1's proven/violations text. Two paths because the encoding-gap output is meaningless for v1 contracts that predate Phase D's binding metadata. Covers three render cases: a normal IEEE gap with both SMT and runtime values, the empty-store case ('No encoding gaps for X'), and a gap with no values (invalid_binding) that should still render cleanly. 135/135 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wires the full Phase D pipeline against a committed fixture: examples/division-by-zero.ts runs divide(0, 0), which produces NaN (IEEE 754) despite the naive Real-arithmetic encoding that says the quotient is finite 0.0. The test drives validator → Z3 model parser → instrumented harness → value serializer → IEEE agent → gap_reports insert → explainGaps render, and asserts the rendered output contains 'encoding-gap', '— q', 'NaN', and 'ieee_specials'. This is the deliverable promised in THESIS.md: the tool catches a specific encoding gap and attributes it to a specific binding with a specific cause (SMT Real does not model IEEE's NaN). Phase A-thin + D-core is functionally complete. 136/136 tests pass across 3 consecutive full-suite runs (race fix from f103ca3 held). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Ships the thesis-distinctive encoding-gap capability on top of the existing v1 harness. The pipeline now catches IEEE 754 / Real-arithmetic divergences and attributes them to specific bindings with specific causes. Added: - SQLite data layer via Drizzle (8 tables: runtime_values graph with object_members and array_elements, traces, trace_values, clauses, clause_bindings, clause_witnesses, gap_reports; 6 migrations). - Z3 model parser (Real/Int/Bool/String plus NaN/Infinity/div_by_zero specials; BigInt for Int to avoid precision loss). - Runtime value serializer (JS values → relational graph; circular refs; string/array truncation). - ts-morph snapshot instrumentation with unique-per-call globals and on-disk paths (race-safe for concurrent harness runs). - runHarnessWithTrace wiring on the existing harness. - Line-based binding validator (transitional; SAST-based validation lands in Phase B). - Three comparator agents: IEEE specials, outcome mismatch, path-not-taken. Independent files; new sort agents drop in. - Gap detection orchestrator tying validator → Z3 parse → harness → agents → gap_reports persist. - LLM derivation prompt extended to emit smt_bindings per declared constant, with a worked example and abstract-sentinel rule. - CLI explain --gaps reads gap_reports + runtime_values and renders THESIS-style output. - End-to-end fixture (examples/division-by-zero.ts) driving the full pipeline to an IEEE NaN gap report. Review loops caught and fixed: - NULL-in-composite-PK silent duplicates on trace_values → partial unique index. - Sort duplication and missing structural FK on clause_witnesses → drop duplicate column, add composite FK with cascade chain. - Global-state race on __neurallog_snapshot__ across concurrent harness runs → unique per-call name + per-call instrumented file path, with a parallel-runs regression test. Deferred (flagged in commits): - Math.random / Date.now stub re-entrance race (no active test signal). - Object-truncation marker asymmetry with array branch. - persistWitness transaction wrapping for Phase B scale. Spec: docs/specs/2026-04-23-provekit-v2-design.md Plan: docs/plans/2026-04-23-phase-ad-core.md 136/136 tests pass.

8-task plan wiring Phase D gap detection into `neurallog analyze`. Bindings are derived mechanically from template match data (no LLM per signal). Dormant invariant_derivation.md loader gets removed. New GapDetectionPhase runs between derivation and axiom phases. Also lands the SmtBinding type + optional smt_bindings field on Violation as the foundation all subsequent tasks build on. Baseline: 136/136 tests pass.

buildPrompt, loadTemplate, and compiledTemplate had zero call sites. The pipeline runs entirely on mechanical templates via templateEngine; invariant_derivation.md is reference documentation, not runtime input.

… data Each TemplateResult now carries bindings mapping the emitted SMT constants to their source positions. Derived from AST match metadata the engine already computes; no per-principle authoring. Binary-expr principles covered: division-by-zero, modulo-by-zero, addition-overflow, subtraction-underflow, multiplication-overflow, falsy-default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two concurrent runHarnessWithTrace calls racing on Math.random/Date.now: Run A installed its stub, Run B overwrote with its own, Run A's restore wrote B's stub back, leaving Math.random permanently stubbed as 0.5 after both exited. Post-snapshot-name fix this was silently broken — not caught by the fixed-name race test because the snapshot fn is now per-call but the globals stayed shared. Fix: module-level ref count + origRandom/origNow pinned on 0→1 transition, restored on N→0. Concurrent callers cooperate; the last to exit restores the reals. 8/8 consecutive full-suite runs green after the fix; flake observed previously on 1-in-5 interleaved runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 185a1783f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-24T21:43:42Z

+  // two concurrent runs against the same source would race on the path.
+  const callId = randomBytes(8).toString("hex");
+  const snapshotFnName = `__neurallog_snapshot_${callId}__`;
+  const instrumentedPath = sourcePath.replace(/\.ts$/, `.__instrumented_${callId}__.ts`);


Avoid deleting non-.ts source files in trace harness

instrumentedPath is derived with sourcePath.replace(/\.ts$/, ...), which is a no-op for .tsx, .js, or .jsx inputs. In those cases writeFileSync(instrumentedPath, instrumented) overwrites the original source file, and the finally block unconditionally unlinkSyncs that same path, deleting the user’s source file after gap detection. This is a destructive bug whenever contracts come from non-.ts files.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-24T21:43:42Z

+    const contracts = new ContractStore(config.projectRoot).getAll();
+    const { data: gapDetection } = await this.gapDetectionPhase.execute(
+      { derivation, projectRoot: config.projectRoot, contracts },


Restrict gap detection input to newly derived contracts

runFull passes ContractStore.getAll() into GapDetectionPhase, so any run that has at least one stale bundle reprocesses every historical contract, not just this derivation’s output. Because GapDetectionPhase inserts fresh clauses/gap_reports rows per violation, this inflates duplicate rows and runtime on incremental runs unrelated to most contracts. Feed only the contracts derived in this run (or add deduping) to avoid repeated re-ingestion.

Useful? React with 👍 / 👎.

After plan confirmation, the CLI now invokes runFixLoop end-to-end: - Default: prDraft mode, writes ./neurallog-fix.patch + ./neurallog-fix.md - --apply: autoApply mode, cherry-picks onto target branch after verification - --max-sites N: override maxComplementarySites (default 10) Test seam: runFixArgs.runFixLoopFn is injectable; defaults to the real orchestrator. CLI tests mock it to exercise flag parsing + stdout shape + exit codes without re-running the C1-D3 pipeline. Exit codes: 0 success, 1 intake, 2 locate, 3 classify, 4 user-declined, 5 orchestrator failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add LLMProvider.agent() as an optional method to both LLMProvider interfaces (llm/Provider.ts and fix/types.ts). ClaudeAgentProvider implements it via the claude-agent-sdk query() with cwd/allowedTools/ maxTurns routing. StubLLMProvider gets an optional agentResponses constructor arg that, when supplied, assigns the agent() field. New captureChange.ts reconstructs a CodePatch from git diff after the agent completes, excluding .neurallog/ overlay internals. The C3 stage (generateFixCandidate.ts) dispatches to the agent path when llm.agent is defined, falling back to the existing JSON-patch path for providers without agent() support. One retry with oracle-#2 feedback is included. 18/18 tests pass (15 existing + 3 new captureChange + 2 new candidateGen). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…N fallback) C4 (complementary): adds proposeChangeForSiteViaAgent in complementary.ts. Per-site baseline snapshot captures all files changed vs HEAD before each agent call; after the agent runs, new-since-baseline files are detected, then reverted to HEAD or baseline so verifySiteChange can apply the patch from scratch. generateComplementary dispatches on llm.agent?. C5 (regression test): adds generateTestCodeViaAgent in testGen.ts. Agent writes the test file directly; applyPatchToOverlay idempotently keeps overlay.modifiedFiles in sync. generateRegressionTest dispatches on llm.agent?; oracle #9 unchanged. C6 (capability substrate): adds proposeCapabilitySpecViaAgent in capabilityGen.ts. Agent writes capability files under .neurallog/capability-proposal/<name>/ (already filtered by runAgentInOverlay). proposeCapabilitySpec accepts optional overlay and dispatches; overlay is threaded through orchestrator → generatePrincipleCandidate → proposeWithCapability → proposeCapabilitySpec. Tests: agent path + JSON backward-compat path added for all three stages. All 46 tests pass (1 slow integration test remains skipped). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…c/dep/prompt/lint/migration) Un-stub startup_assert, documentation, and observability_hook with real isPresent logic based on complementary change kinds and file path patterns. Add 6 new artifact kinds (test_fix, config_update, dependency_update, prompt_update, lint_rule, migration_fix) with a shared helpers.ts for file-pattern detection. Registry grows 8 → 14 kinds. Tests grow 9 → 17 in artifactKindRegistry.test.ts; full suite 560 pass (apply.test.ts worktree-cleanup flake is pre-existing, unrelated to this change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Hard rename across all source, tests, config, and docs: - CLI bin: neurallog → provekit (package.json + package-lock.json) - Directory convention: .neurallog/ → .provekit/ (37 principle files mv'd via git mv) - DB filename: neurallog.db → provekit.db - Ignore file: .neurallogignore → .provekitignore - Snapshot global: __neurallog_snapshot__ → __provekit_snapshot__ - Artifact outputs: neurallog-fix.patch/md → provekit-fix.patch/md - Transport class: NeurallogTransform → ProvekitTransform, createNeurallogTransport → createProvekitTransport - All path joins, error messages, comments, doc strings updated - Root-level .md product docs (PITCH, LANDING, PRODUCT, THESIS, SIGNALS, POSTMORTEM) fully renamed - SPEC.md and docs/plans/** archived with top-of-file note - Version banner: "provekit v0.3.0 / The Kit to Prove It's Fixed." added to --version, printHelp(), and init command No legacy fallback paths; no "try .neurallog/ if .provekit/ doesn't exist" logic. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…closes #211) Side A substrate-conformance work for the swift kit. Two related changes, both required for issue #211 acceptance: 1. SwiftSyntax-based lifter (Path B): replaces regex-v0 in `SwiftLifter.swift` with a real AST walker using Apple's official swift-syntax 600.x library. The wire shape (declarations, callEdges, warnings) is byte-preserved; LSPTests' 11 integration tests pass unmodified. Adds InitializerDeclSyntax handling (regex-v0 never matched `init`); the addition is additive and downstream-safe. 2. swift-self-contracts lift surface + RPC mode (Path A): mirrors the PR #220 daemon-lifecycle pattern (ts Side A) and PR #217 (cpp Side A wiring). The swift mint binary now speaks the lift-plugin protocol over NDJSON-on-stdio: initialize -> lift (proof-envelope shape) -> shutdown, with stdin-EOF graceful exit per architect rule #3. * `MintSwiftSelfContracts/main.swift`: thin entry-point that branches on `--rpc` argv. Slab moved to `Slab.swift`; RPC handlers in `RPC.swift`. SPM compiles all three into one module. * `Slab.swift`: pure function `swiftSelfContracts() -> [Declaration]` authoring all 11 lift-plugin-protocol contracts (C1-C8, 11 facets), callable from both human and RPC entry points. * `RPC.swift`: NDJSON loop; `handleLift` walks the slab, computes content-meaningful CIDs, emits a proof-envelope. * `.provekit/lift/swift-self-contracts/manifest.toml`: declares surface + capabilities + binary path. Mirrors the ts/go/cpp peers. * `cmd_mint.rs`: KIT_TABLE swift entry routes to swift-self-contracts. Cross-kit IR byte-equality (per dispatch directive): - contractSetCid: byte-equivalent. Computed via the canonical formula blake3_512(JCS(sorted([contract_cid(c) for c in slab]))) where each contract_cid = blake3_512(JCS({name, outBinding, pre?, post?, inv?})), matching `provekit-claim-envelope::contract_cid` and `provekit-claim-envelope::compute_contract_set_cid` exactly. - proof-envelope bytes: NOT byte-equivalent. The swift kit emits a JCS-JSON catalog with self-identifying `kind: "swift-self-contracts-catalog-phase3-pending"` rather than the cross-kit CBOR-signed catalog. This is consistent with the existing Phase 3 deferral noted in `MintSwiftSelfContracts/main.swift:208-211` (now `main.swift:46-48`); a follow-up PR can land the full CBOR + Ed25519 envelope (mirroring PR #221 for python). The dispatcher only requires `kind:"proof-envelope"`, non-empty `filename_cid`, and decodable `bytes_base64` — all satisfied. The filename_cid changes whenever the slab changes, satisfying acceptance gate #2. Acceptance gate #4 ("provekit prove --kit=swift exits 0"): provekit prove doesn't accept --kit (see `provekit-cli/src/main.rs:119`); the intent is read as "mint pipeline succeeds end-to-end on macOS." `make mint-swift` runs cleanly and `verify-self-contracts` returns OK against the new attestation, demonstrating the full pipeline. - [x] `swift run conformance` (10/10 PASS) - [x] `swift run test-swift-lsp` (11/11 PASS — wire shape preserved) - [x] `swift run mint-swift-self-contracts` human mode (CID matches) - [x] `mint-swift-self-contracts --rpc` direct subprocess (initialize/lift/shutdown trip) - [x] `cargo test cmd_mint` (9/9 unit tests pass) - [x] `cargo test mint_kit_integration swift_kit_pins_expected_contract_set_cid` (PASS) - [x] `make mint-swift` end-to-end: contractSetCid pinned, attestation OK - [x] Three independent paths agree on contractSetCid blake3-512:272543ef... - swift contractSetCid: blake3-512:272543efe7c47b911659e1fc6a7368431b6eaa6010d2560a5d3e6717fcd470b50b24b607b481272941764b731d890d6973ab88e6000bde96fd306163a5742c56 Pre-existing test failures unrelated to #211: rust/cpp self-contracts binaries are not built in this worktree, causing `rust_kit_contract_set_cid_is_pinned_to_self_contracts_canonical` and `cpp_kit_contract_set_cid_is_pinned_to_self_contracts_canonical` to fall back to empty-set CIDs. Not a regression introduced by #211. Closes #211. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…370, #384) Fixes all P1/major/minor issues flagged by Copilot + CodeRabbit: walk.rs: - Bug #1: nested-block let-bindings now substituted at nested callsites; `CallsiteHit` carries `preceding_inner_stmts` (innermost-first) populated during the forward traversal; backward walk applies them before the outer- body statements. - Bug #3: actual-arg lift now delegates to `crate::lift::lift_expr_to_term` (semantic, structure-stable) instead of the old token-string fallback; binary-expr actuals produce Ctor(op, ...) not `<expr:40 + 2>` tokens. - Bug #4: arity mismatch now logs a warning and skips the callsite rather than silently zip-truncating. New test asserts the skip. - Bug #11: stale MVP-scope comment updated (if-strengthening is live). lift.rs: - Bug #5: `debug_assert!` removed from the macro match arm; it is compiled out in release builds and must not contribute to contracts. - Bug #6: `lift_function_postcondition` now scans for later `let` bindings that shadow an entry assertion's free variables and drops unsound asserts. Narrowed to LATER rebindings only (a preceding `let y = x; assert!(y ≥ 5)` is sound). - Bug #7: explicit `return expr;` tails now derive `result = <lifted expr>` postcondition (previously only trailing-no-semicolon expressions matched). canonical.rs: - Bug #8: non-i64 JSON Numbers (floats, large u64) are now encoded as a tagged object `{"__provekit_non_i64_number__": "42.5"}` instead of a plain string, preventing numeric constants and string literals of the same text from sharing a CID. walk_emit.rs / walk_demo.rs: - Bug #9: `all_param_names` now synthesizes stable placeholder names (`__self`, `__arg{i}`) for non-Ident patterns so formal arity stays aligned with actual-argument count. - Bug #10: NaN% division guarded with `if second_run_total > 0`. 11 new unit tests added (156 lib tests total, up from 145; 12 integration tests all still passing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#596/#598/#599/#600 Per-kit changes: - rust/provekit-cli: extend KNOWN_SURFACES anchor test to assert ruby-source, csharp-source, swift-source, zig-source are all registered (were in the array, not in the test) - Makefile: wire test-ruby and test-php into test-all so the new lift kits run under ci (boy-scout -- flagged in #598 review) - typescript-language-signature/mint.sh: replace find with explicit enumeration to match c11/rust pattern and remove portability footgun (PR #595 nit #3) - ruby/ruby_source.rb: add instance_variable_get, instance_variable_set, const_get, const_set to METAPROGRAMMING_CALLS (PR #598 nit #4) - php/PhpSourceCompiler.php: simplify isUnit() to remove precedence-reliant double-condition (PR #600 nit #2) Already swept on origin/main (codex/lift-kit-nits-cleanup, empty cherry-picks confirmed): - effect sort-key alignment: 5:unresolved_call -> 5:unresolved: (ruby + swift) - csharp-source backfill in KNOWN_SURFACES - swift: emit Refusal for unparseable function signatures + ThrowStmt API fix Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…599/#600 (#632) * chore(lift-kits): sweep accumulated MERGE-WITH-NITS nits from PRs #595/#596/#598/#599/#600 Per-kit changes: - rust/provekit-cli: extend KNOWN_SURFACES anchor test to assert ruby-source, csharp-source, swift-source, zig-source are all registered (were in the array, not in the test) - Makefile: wire test-ruby and test-php into test-all so the new lift kits run under ci (boy-scout -- flagged in #598 review) - typescript-language-signature/mint.sh: replace find with explicit enumeration to match c11/rust pattern and remove portability footgun (PR #595 nit #3) - ruby/ruby_source.rb: add instance_variable_get, instance_variable_set, const_get, const_set to METAPROGRAMMING_CALLS (PR #598 nit #4) - php/PhpSourceCompiler.php: simplify isUnit() to remove precedence-reliant double-condition (PR #600 nit #2) Already swept on origin/main (codex/lift-kit-nits-cleanup, empty cherry-picks confirmed): - effect sort-key alignment: 5:unresolved_call -> 5:unresolved: (ruby + swift) - csharp-source backfill in KNOWN_SURFACES - swift: emit Refusal for unparseable function signatures + ThrowStmt API fix Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(ruby-language-signature): regen op_source-unit.spec.json with notes field The generator `generate_assets.py` was updated to emit a `notes` field on the `source-unit` op spec but the committed file was never regenerated. CI gate at Makefile:606 (`generate_assets.py --check`) caught the staleness. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci: add composer to conformance job apt block (fixes test-php exit 127) test-php runs `composer install && composer test`; the conformance job installed php-cli but never composer, so the command was not found. Adds the `composer` Ubuntu apt package to the shared system-deps step so make test-all (and make ci) can reach it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…pperMemento, validate_against, ParametricRealizationMemento Blocker #1: Mint RealizationPlanMemento after each kit dispatch in realize_function / apply_canonical_rewrite. Blocker #2: Call plan.validate_against(&realization) after construction; propagate RealizationPlanError. Blocker #3: Build synthetic ParametricRealizationMemento inline (one slot per param) via mint_realization_artifacts; catalog lookup path acknowledged as future enhancement per cmd_bind.rs gap comment. Blocker #4: Mint ObservationWrapperMemento when mode in {witness, monitor, dispatcher} and kit returns observation_wrapper_emission_record; call wrapper.validate before persisting; set plan.observation_wrapper_cid. Nit: used_sugars subset check in invoke_realize returns ext:unauthorized-sugar error when kit returns a CID not in the cited set. Add observation_wrapper_emission_record to RealizedSource (both kit_dispatch and cmd_transport layers) extracted from kit JSON response. Extend EngineResult with realization_plan_mementos and observation_wrapper_mementos; write to .provekit/bindings/ realization-plans/ and observation-wrappers/. Add three unit tests in cmd_transport for mint_realization_artifacts asserting: plan IS minted, validate_against passes, and wrapper IS minted for mode=witness. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Route contract sugar through realize kits * Add four substrate-side mints: RealizationPlanMemento, ObservationWrapperMemento, validate_against, ParametricRealizationMemento Blocker #1: Mint RealizationPlanMemento after each kit dispatch in realize_function / apply_canonical_rewrite. Blocker #2: Call plan.validate_against(&realization) after construction; propagate RealizationPlanError. Blocker #3: Build synthetic ParametricRealizationMemento inline (one slot per param) via mint_realization_artifacts; catalog lookup path acknowledged as future enhancement per cmd_bind.rs gap comment. Blocker #4: Mint ObservationWrapperMemento when mode in {witness, monitor, dispatcher} and kit returns observation_wrapper_emission_record; call wrapper.validate before persisting; set plan.observation_wrapper_cid. Nit: used_sugars subset check in invoke_realize returns ext:unauthorized-sugar error when kit returns a CID not in the cited set. Add observation_wrapper_emission_record to RealizedSource (both kit_dispatch and cmd_transport layers) extracted from kit JSON response. Extend EngineResult with realization_plan_mementos and observation_wrapper_mementos; write to .provekit/bindings/ realization-plans/ and observation-wrappers/. Add three unit tests in cmd_transport for mint_realization_artifacts asserting: plan IS minted, validate_against passes, and wrapper IS minted for mode=witness. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Code <agentwopr@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

TSavo and others added 30 commits April 23, 2026 12:21

plan: Phase A-thin + D-core (18 tasks, TDD, bite-sized steps)

becc798

chore: add better-sqlite3 + drizzle; wire DB connection

400932c

Installs better-sqlite3, drizzle-orm, drizzle-kit, and @types/better-sqlite3. Adds openDb() with WAL mode and foreign-key enforcement; gitignores the DB files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

db: runtime_values graph with object_members and array_elements

6241680

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

db: traces + trace_values schema

b6497dc

Task 3: TDD pass — traces (no FK to clauses yet) + trace_values with composite PK (traceId, nodeId, iterationIndex) and cascade delete. Migration 0001 generated. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

db: gap_reports schema (references clauses + traces + runtime_values)

2513cc1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

z3: model parser (Real/Int/Bool/String + div_by_zero/infinity specials)

c0dab95

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

z3: persistWitness writes parsed model to clause_witnesses + runtime_…

e2bf785

…values Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

runtime: value serializer (primitives + objects + arrays + circular +…

a9c1d7f

… truncation) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

runtime: snapshot instrumentation via ts-morph AST rewrite

3f0fbd9

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

harness: runHarnessWithTrace wires instrumentation + value serializat…

2d9e706

…ion + persistence Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bindings: line-based validator (transitional, pre-SAST)

6bfd45f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

comparator: core types + IEEE specials agent (NaN/Infinity/div_by_zero)

a97d8b4

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

comparator: outcome-mismatch agent (return vs throw disagreement)

022c098

comparator: path-not-taken agent

937a9f5

gap-detection: orchestrator wires validator + z3 parser + trace + age…

1e08c9f

…nts + persistence Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor(derivation): remove dormant LLM-per-signal prompt loader

4c58b24

buildPrompt, loadTemplate, and compiledTemplate had zero call sites. The pipeline runs entirely on mechanical templates via templateEngine; invariant_derivation.md is reference documentation, not runtime input.

chatgpt-codex-connector Bot reviewed Apr 24, 2026

View reviewed changes

TSavo and others added 6 commits April 24, 2026 14:47

docs: README + ARCHITECTURE + RETROSPECTIVE for ProveKit

c1d195d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

TSavo changed the title ~~fix-loop: closed-loop bug-report → verified bundle with substrate self-extension~~ ProveKit: closed-loop bug-report → verified bundle with substrate self-extension Apr 24, 2026

TSavo merged commit 042c20a into main Apr 24, 2026
1 check passed

TSavo mentioned this pull request May 28, 2026

examples/voltron-demo: multi-vendor M × N consumer (gap-exposure artifact) #1573

Merged

TSavo mentioned this pull request Jun 5, 2026

pandas: testing lift seat + showcase + the 2×4 guard-shape matrix #1937

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ProveKit: closed-loop bug-report → verified bundle with substrate self-extension#4

ProveKit: closed-loop bug-report → verified bundle with substrate self-extension#4
TSavo merged 89 commits into
mainfrom
fix-loop

TSavo commented Apr 24, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 24, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TSavo commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture at a glance

What landed

Core plan (23 tasks + splits)

Architectural follow-ups

Stats

Test plan

Known MVP pass-throughs (documented in RETROSPECTIVE.md)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TSavo commented Apr 24, 2026 •

edited

Loading

Known MVP pass-throughs (documented in `RETROSPECTIVE.md`)