Harness synthesis + property-test + 5 existing-loop refinements by TSavo · Pull Request #1 · TSavo/provekit

TSavo · 2026-04-21T03:42:39Z

Summary

Five improvements on the existing derivation/verification loop: REASON field + LLM-as-judge, template-use telemetry with auto-retirement, strength checker (drops asserts to detect vacuous proofs), CEGAR witness refinement, and one-shot retry on Z3 error/unknown.
New verification layer: property-test. Three paths — primitive model-to-param mapping, complex-type harness synthesis (LLM writes the JS test), and violation replay against Z3 witnesses. Boundary-value augmentation per numeric parameter (guarded by precondition consistency). LLM judge layer on top flips and persists confidence on the underlying contract with an auditable judge_note.
Harness synthesis specifically: prompts/harness_synthesis.md is a 650-line teaching prompt (two rounds of Opus critique applied). src/harness.ts handles synthesis + vm.createContext sandboxed execution + content-addressed caching. src/moduleLoader.ts exposes private module-scope bindings via tree-sitter + ts.transpileModule + Module._compile, so private top-level functions become reachable.
First tool-driven bug fix in the wild: src/cli.ts/runDiff used || where the SMT encoder assumed ?? semantics. The harness flagged the divergence; a one-character fix closed the loop (tool now reports pass).

Env gates for the new paths (all opt-in, off by default):

NEURALLOG_PROPERTY_TEST=1 — enable primitive property test
NEURALLOG_PROPERTY_TEST_JUDGE=1 — enable judge layer
NEURALLOG_HARNESS_SYNTHESIS=1 — enable harness synthesis path
NEURALLOG_HARNESS_LIMIT=N / NEURALLOG_HARNESS_TIMEOUT_MS=N / NEURALLOG_HARNESS_MODEL=... — tune

Test plan

npm run build clean on every commit
neurallog analyze examples/inventory.ts — no regressions in existing proven/violation counts
NEURALLOG_PROPERTY_TEST=1 NEURALLOG_PROPERTY_TEST_JUDGE=1 node dist/cli.js verify against the full 939-contract snapshot — 6 passes, judge annotations persist to contract JSON
scripts/harness-probe.ts end-to-end on 8 contracts spanning: exported class methods (EntailmentChecker.check), private module-scope functions (cli.ts/runDiff, printSummary, resolveProjectRoot), untestable control-flow claims (TemplateChecker.checkStateMutation)
Before/after demonstration on runDiff: harness reports encoding-gap on pre-fix source, pass on post-fix source, with cache invalidation via content hash
Reviewer smoke: run analyze on a new file and observe the full 5-phase pipeline including the new harness path

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added proof strength analysis to evaluate assertion load-bearing in verification results.
- Added property-based testing framework with LLM-driven harness generation and execution.
- Added automated proof refinement and judging capabilities for improved validation accuracy.
- Added example arithmetic functions demonstrating verification patterns.
Documentation
- Added comprehensive harness synthesis guidelines.
- Updated invariant derivation requirements with code-cite reasoning tags.
Chores
- Enhanced verification pipeline with principle usage tracking and retirement management.
- Improved contract path normalization and file handling.

Lifts from prior art in the LLM+SMT verification genre. 1. REASON field + judge (Lemur) - prompts/invariant_derivation.md: every smt2 block must emit a ; REASON: tag citing specific code, not a restatement of the claim. - verifier.ts: extractReason parses ; REASON: comments; VerificationResult carries reason, judgeRejected, judgeNote. - judge.ts: judgeReasoning runs an LLM check that the stated reason justifies the claim given the code and expected z3 verdict. judgeTeachingExample wraps it for principle teaching-example SMT. - principles.ts classifyAndGeneralize: gated behind judge — if rejected, principle returns validated: false with judge note. - DerivationPhase pattern-gap synthesis: newPrinciple only added to store if judge validates its teaching example. - contracts.ts: ProvenProperty / Violation carry reason, confidence, judge_note for persistence. 2. Template telemetry (Code2Inv) - principles.ts: Principle.stats { uses, proven, violations, errors, lastUsedAt, retiredAt, retiredReason }. - PrincipleStore.recordUse(tag, verdict), persistStats, evaluateRetirements, retire(id, reason) moves file to principles/retired/. - DerivationPhase records a use for every templateVerification and persists / evaluates retirement at end of run. - Retirement rule: uses >= 20 AND errors/uses > 0.5 AND proven == 0. 3. Strength checker (SpecGen) - checkers/StrengthChecker.ts: for each proven property, drop each assert one at a time, re-run z3; count load-bearing asserts. verdict = proven if any assert is load-bearing, violation (vacuous) if the claim holds after removing every premise. - Wired into AxiomPhase. 4. CEGAR witness refinement (Lemur) - DerivationPhase.cegarRefineViolations runs after the main DAG. For up to 10 violations with a Z3 witness, asks the LLM whether the counterexample is reachable or spurious: REACHABLE -> mark confidence: high, attach attack-path judge note. revised smt2 block -> re-verify. unsat flips to proven.contract; sat updates violation with tightened encoding. - ContractStore.put persists any contract that mutated. 5. One-shot retry on Z3 error / unknown - refiner.ts: refineErrorBlock asks the LLM to fix a block Z3 couldn't evaluate, preserving semantics; accepts revision only if Z3 returns sat or unsat. - DerivationPhase: per-function budget of 2 retries on template verifications with z3Result error / unknown. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- principles.ts: import unlinkSync at top instead of require("fs") inline in retire(). - checkers/PropertyTestChecker.ts (new, opt-in via NEURALLOG_PROPERTY_TEST=1): For each proven contract, strip the goal assert, ask Z3 (get-model) for a concrete input satisfying the preconditions, require() the function via ts-node/register, match SMT variable names to tree-sitter-extracted parameter names, call, and record the outcome. - returned cleanly -> verdict: proven (Z3 and runtime agree). - threw -> verdict: violation (suspected encoding drift; Z3 said unsat but the function crashed on Z3's own witness input). Gated by env var. NEURALLOG_PROPERTY_TEST_LIMIT caps total calls (default 10). Best-effort; skips silently on unmapped params, load failures, non-primitive args, or Z3 errors. No LLM judge step in this checker — outcomes are mechanical only. A followup pass could feed (code, claim, inputs, outcome) to an LLM for semantic divergence checks, but that's a separate change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Differential test now has both halves: mechanical runtime capture and semantic judgment of whether the runtime behaviour is consistent with Z3's proof claim. - judge.ts: judgeRuntimeOutcome takes {functionSource, claim, smt2, inputsSummary, outcome}, asks the LLM to reply VALID or INVALID. INVALID means Z3 said "proven" but the runtime result contradicts the claim — encoding inconsistency. - PropertyTestChecker: - runOne now returns a PropertyTestContext (CheckResult plus the structured inputs/outcome/source needed for judging). - lastRun keeps each context so the async judge pass can annotate it. - judgeResults(model?) runs only when NEURALLOG_PROPERTY_TEST_JUDGE=1. Uses createProvider() directly; model defaults to NEURALLOG_JUDGE_MODEL or claude-haiku-4-5. - Flips verdict on disagreement: proven + judge INVALID -> violation (encoding-inconsistent) violation + judge VALID -> proven (runtime threw but the throw was consistent with the claim) - AxiomPhase.execute is now async. After the synchronous checker loop, it finds PropertyTestChecker and awaits judgeResults() when the env gate is set. Recomputes proven/violation/error totals. - Pipeline.runVerifyOnly and all axiomPhase.execute call sites awaited. cli.ts awaits pipeline.runVerifyOnly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Contracts stored file field as an absolute path, which made PropertyTestChecker unable to require() them from a different checkout and silently walked all 932 stored contracts, taking 9m to produce zero results. - contracts.ts: new normalizeContractFile(file, projectRoot). On load, absolute file fields are rewritten to projectRoot-relative, with a fallback that strips everything before /src/, /examples/, /lib/, /app/, or /packages/ so contracts authored in a different checkout still resolve. - DerivationPhase.buildContracts: new contracts now store fn.relativePath, not fn.filePath. Over time all contracts migrate to relative on rewrite. PropertyTestChecker fixes so a fully-failing run no longer burns minutes: - Takes projectRoot in constructor (passed from AxiomPhase.execute options). - resolvePath resolves contract.file against projectRoot when relative. - fileRequireFailures: Set<string> — once require() throws for a file, every subsequent contract from that file skips before Z3, not after. Previously we'd extract a Z3 model for each contract before discovering the file couldn't be loaded. - Reordered: loadFunction happens before extractModel. No Z3 call when the file can't be loaded. - Limit now caps attempts, not successes. Previously this.used only incremented on successful runs, so a 100% failure rate walked every contract regardless of LIMIT. - Summary log at end of check(): attempted / ran / skipped counts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

After the path fix, LIMIT=5 attempts produced 0 successful runs because every proven contract was for a method on a class. loadFunction only looked for module[fnName] and missed them. - PropertyTestChecker.findMethodOwner scans the required module for constructors whose prototype carries the target method. If the constructor takes no required args, instantiate and bind. Classes whose ctors throw (e.g. ContractStore needing projectRoot) are silently skipped. - cli.ts: guard main() behind `if (require.main === module)`. Before this, PropertyTestChecker's require() of transitively-cli-importing files re-entered main() and started a second full verify run from inside the first — the recursion caused verify to hang at 4000+ log lines. - Skip reasons are now collected and printed when a run yields zero results, so the operator can see why. Empirical result on this repo (20 proven contracts, LIMIT=20): attempted 20, ran 1, skipped 19 [division-by-zero] formatDuration:152 — with ms=0 → returned "0ms" property-test judge: 1 judged, 0 verdict flips, 0 violations confirmed judge note: "The function returns '0ms' for input ms=0 without reaching any division operation, consistent with the claim that division by zero is prevented at formatDuration." 19 skipped because the proven-contract methods live on classes with required constructor args (ContractStore(projectRoot), TemplateEngine, etc). That's a real architectural limit — testing those needs a synthetic-instance protocol; left as future work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ance args Extended PropertyTestChecker to replay violations against the code, not just proven claims. For each violation that carries a Z3 witness, the checker executes the function with the witness values and judges the outcome: threw -> violation reproduced (bug confirmed) clean -> possible false positive; judge decides runOne now takes a source flag ("proven" | "violation") and branches on it. Model extraction uses the stored witness for violations, and runs z3 (get-model) for proven blocks. resolveCallable now tries several constructor-arg patterns when instantiating a class whose method we need: new Cls() (zero-arg) new Cls(projectRoot) (common in this codebase) new Cls({}) new Cls({ projectRoot }) new Cls(projectRoot, false) new Cls({ projectRoot, verbose: false }) Static methods are now detected via tree-sitter and invoked via Cls[fnName] without instantiation. Iteration order: pass 1 iterates all proven entries across all contracts, pass 2 iterates violations. This prevents violation-heavy contracts from exhausting the LIMIT before proven entries get a chance. Skip reasons are printed for every distinct bucket when a run ends (not just top 8). examples/arithmetic.ts: pure-primitive demo functions (safeDivide, divide, absDiff, clamp, addPositive). Every parameter is Int/Real so Z3 models map directly to runtime calls. Added specific-file entries to .neurallogignore instead of the catch-all examples/** so this file is analyzed and verified. Empirical run (NEURALLOG_PROPERTY_TEST=1 NEURALLOG_PROPERTY_TEST_JUDGE=1 LIMIT=200, 939 contracts): attempted 200, ran 5, skipped 195 1. [subtraction-underflow] absDiff:18 — x=0 y=0 → 0 (judge VALID) 2. [division-by-zero] safeDivide:8 — a=0 b=1 → 0 (judge VALID) 3. [division-by-zero] formatDuration:152 — ms=0 → "0ms" (VALID) 4. [division-by-zero] formatDuration:166 — ms=0 → "0ms" (VALID) 5. [division-by-zero] VIOLATION-REPLAY divide:13 — a=0 b=0 → null judge INVALID: "JavaScript division by zero (including 0/0) returns NaN rather than causing a runtime error, so the claimed violation doesn't fire at runtime" -> verdict flipped to encoding-inconsistent The fifth result is the payload: Z3 proved divide has a bug; runtime disagrees; judge correctly labels it as an encoding/semantics mismatch between the Z3 abstraction and JavaScript's NaN semantics. That's the entire differential-test premise working end-to-end. 195 skips are all legitimate — methods taking Contract[]/Map<> that SMT can't model as primitives, private CLI functions, or arithmetic functions where the SMT encoding used different variable names than the parameter list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New inferConstructorArgs method extracts class ctor param types via tree-sitter and synthesizes mock values per param (string defaults, number 0, boolean false, arrays [], Map/Set new, object/alias {}). Used as first-attempt path in resolveCallable; hardcoded ctor fallback patterns kept as safety net. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds three signal-improvement layers on top of ctor-introspection: 1. matchParamToModel — if the SMT model lacks an exact key for a parameter, try case-fold equality, suffix-strip (_condition, _value, _returns, _guard, _consequent, _alternate, _old, _new, _var), then substring match. Preference order: exact > normalized > stripped > substring. 2. isControlFlowModel — if every key in the SMT model looks like a control-flow abstraction (guard_condition, code_after_reached, result_consequent, branch_taken, etc.) AND none match a parameter name, classify as control-flow-only and skip with a specific reason rather than the generic "SMT model missing param". Principles like guard-narrowing and ternary-branch-collapse encode control state, not data values — they're inherently un-property-testable via input-value mapping. 3. constructor skip — contract.function === "constructor" now short- circuits. Calling a class constructor as a regular function throws ("cannot be invoked without 'new'"), producing noise. Property- testing constructors would need a new-invocation protocol. Empirical run (NEURALLOG_PROPERTY_TEST=1 JUDGE=1 LIMIT=200): Before fuzzy/tagging: attempted 200, ran 5 After: attempted 200, ran 7 3 verdict flips by judge, 1 violation confirmed New hits: [null-assertion] findSimilarByAST:77 with obs=0 → [] — judge flagged encoding-inconsistent (code actually guards the ! operations). [subtraction-underflow] absDiff:18 — already running, now flagged encoding-inconsistent by judge (claim text says VIOLATION but Z3 proved unsat). Skip reasons now bucket control-flow contracts clearly — ~60 of the 195 skips are flagged as principle-is-control-flow, which is an architectural limit not a bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

## Boundary augmentation (#1) runBoundarySamples runs each function not only with Z3's primary witness but with boundary-variant args (candidates: 0, 1, -1) substituted per numeric parameter. Each variant is only tried when Z3 says the variant still satisfies the contract's preconditions (isConsistentWithPreconditions pins the param value via an extra (assert (= name lit)) and re-checks sat). A proven contract whose primary run is clean BUT any boundary variant threw flips verdict to "violation" with a specific error. Conversely, "all N boundary cases clean" is a confidence signal on the primary run. Description line now includes a summary: [+N boundary cases all clean] or [+N boundary cases: M clean, K threw]. Cap: 8 boundary runs per contract so we don't blow up on N-param functions. ## Judge-driven contract confidence (#2) judgeResults now persists verdict-derived confidence to the underlying contract via ContractStore.put, keyed off smt2 equality: Z3 proven + judge VALID -> confidence: "high" judge_note: VALID Z3 proven + judge INVALID -> confidence: "low" judge_note: INVALID Z3 violation + judge INVALID -> confidence: "high" judge_note: confirmed bug Z3 violation + judge VALID -> confidence: "low" judge_note: false positive These get written back to .neurallog/contracts/*.json. Downstream tooling (CI gate, report summary) can now distinguish judge-corroborated proofs and violations from raw Z3 verdicts. ## Empirical result Against the same 939-contract snapshot: Before: attempted 200, ran 7 After: attempted 200, ran 6 (constructor-skip change retired one) More importantly, the 6 runs now carry boundary annotations, and on disk: divide[13] violation: confidence=high (judge confirmed bug) safeDivide[8] proven: confidence=high (judge VALID) absDiff[18] proven: confidence=low (judge INVALID — encoding-inconsistent) cegar-flipped contracts: confidence=high (propagated from earlier CEGAR) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…sn't confused by the literal 'VIOLATION' label Before: CEGAR-flipped claim kept 'VIOLATION: Subtraction Underflow' text, which the property-test judge read as asserting the bug exists — it kept flagging proven contracts as 'encoding-inconsistent' because the label said VIOLATION but Z3 proved unsat. After: 'PROVEN: Subtraction Underflow is prevented (CEGAR-refined encoding)'. Judge now correctly reads the claim as stating prevention and accepts the unsat proof as consistent. Migration: stale on-disk proven contracts with 'VIOLATION:' prefix rewritten once. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Opus agent reviewed v1 and flagged specific defects. Applied surgical fixes: Must-fix correctness - Example 3 was broken — Number.MIN_VALUE / 4 evaluates to 0 at expression time so the fixture guard fired before exercising the function. Replaced with a correct IEEE 754 ulp-collapse exhibit (distinct-positive-reals claim, (1 + EPSILON/2) === 1). Also models the harness-error vs encoding-gap prefix distinction in-code. Critical teaching gaps - Added Step 1 subsection on SMT negation direction: unsat on (assert (not goal)) means goal is proven; harness must probe the goal direction not the negated one. Not taught anywhere in v1. Opus predicted this as the single biggest gap. - Added Step 1.5 "Write the falsification hypothesis": force the LLM to commit, in a comment, to the specific runtime observation that would falsify the claim, before constructing any fixture. Operationalizes the honest-falsification disposition v1 asserted but did not cultivate. - Expanded Step 5 with explicit harness-error: vs encoding-gap: prefix distinction. Two categorically different failure kinds — the judge treats them opposite ways, and a mislabeled prefix poisons the signal. - Expanded A2 with counter-teaching: three legitimate sources for expected values (first-principle reimplementation, mathematical invariant, independent oracle) and one to avoid (the SMT claim itself). Anti-pattern sharpening - A7 rewritten to draw the crisp distinction from A2 (expected-from-claim vs fixture-contains-answer). - A10 replaced: silent-TS-coercion was not an LLM-gravitational failure mode. New A10 is "Wrong thing passed where a value was expected" (Promise-wrapping, callback confusion) — the actual LLM failure mode. Trap additions and cuts - Added: JS integer division (5/2 = 2.5 disagrees with SMT Int div), ulp collapse, sparse arrays, JSON.stringify drops undefined/BigInt/symbols, precondition vacuity, non-determinism (clocks/randomness/scheduling). - Cut: prototype pollution (production-security not encoding-faithfulness). - Demoted: cross-realm instanceof now a footnote under Prototype section. Decoration cuts (per Opus L5/L40/L470/L572) - Removed "Read that paragraph again" and drumroll meta-instructions. - Removed "steps below walk through this in the right order" connective. - Removed self-advertising "densest concentration of leverage" sentence. - Removed final moralizing paragraph duplicating the temptation warning from the thesis. 539 → 647 lines. Added content outweighs trimmed decoration by ~110 lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Opus audited v2 and found v2 ready-with-fixes. Applied the four items: 1. Example 5 prefix regression (L351) — 'fixture error:' → 'harness-error:'. The regression contradicted the Step 5 teaching. 2. L54 conflation — tightened wording so the vacuity source is properly named (reproducing a counterexample Z3 ruled out, not 'preconditions rule out the negated case'). The preconditions/claim distinction was blurred. 3. Step 1.5 judge-audit claim softened — v2 said 'the judge reads and audits hypothesis vs assertions.' The judge in the runner doesn't yet do that specific audit, so the prompt was asserting unearned capability. Rewrote to frame the hypothesis as discipline for the author and the human reader, with self-inconsistency as the diagnostic signal. 4. New Step 2.5-equivalent — Precondition checks in-line. Opus's new top prediction was 'precondition-violating fixtures that still produce throw-free passes.' Added a short paragraph requiring inline harness-error: guards for each verifiable precondition. Opus's readiness assessment: ready-with-one-trivial-fix. v3 addresses that fix plus the structural concerns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

src/harness.ts (new) - synthesizeHarness(input, provider, model, projectRoot): assembles the prompt (loads prompts/harness_synthesis.md + appends task-specific claim/SMT/source) and calls the LLM. Parses response for one of: (a) a single ```javascript code block -> returns harness string (b) a // UNTESTABLE: <reason> line -> returns untestable reason (c) neither -> returns null on both (synthesis failed). - runHarness(harnessCode, fn, fnClass, timeoutMs): executes the harness in vm.createContext with functionUnderTest and functionUnderTestClass bound, plus a curated set of globals (Promise, Error, Array/Object/ Number/String/Boolean, JSON, Math, Date, Map/Set/WeakMap/WeakSet, Reflect, BigInt, timers). Wraps harness body in async IIFE so top- level await works. Races against a timeout. Classifies throws by message prefix: encoding-gap: / harness-error: / __HARNESS_TIMEOUT__ / unclassified. - HarnessCache: content-addressed at .neurallog/harnesses/<hash>.json by sha256(smt2 + '\n---\n' + functionSource). Avoids re-synthesis across runs when nothing changed. PropertyTestChecker - Track harnessCandidates during sync check(): when primitive param- mapping fails with "SMT model missing param" (complex-type case), the contract becomes a candidate for harness synthesis. Control-flow principles and require-failures do NOT become candidates — they're fundamentally un-fixable by harness synthesis. - New async synthesizeAndRunHarnesses() method: iterates candidates up to NEURALLOG_HARNESS_LIMIT (default 5), checks cache, synthesizes, runs. Records outcomes into harnessResults and mutates the contract's ProvenProperty confidence via the same channel as the judge: pass (proven) -> confidence: high encoding-gap -> confidence: low, judge_note: harness-encoding-gap untestable -> judge_note: harness-untestable other -> judge_note: harness-<kind> Persists via ContractStore.put on touched contracts. AxiomPhase - After judgeResults, awaits synthesizeAndRunHarnesses() when NEURALLOG_HARNESS_SYNTHESIS=1. Merges harnessResults into the property-test CheckResult array and recomputes proven/violation/error counts. Final summary line now distinguishes judge and harness LLM usage separately. Env gates NEURALLOG_HARNESS_SYNTHESIS=1 enable the synthesis path NEURALLOG_HARNESS_LIMIT=N cap synthesis calls (default 5) NEURALLOG_HARNESS_TIMEOUT_MS=N per-harness execution timeout (3000) NEURALLOG_HARNESS_MODEL=name synthesis model (default sonnet-4-6) tsc clean. Not yet run empirically against arithmetic.ts + full contract set — that's the next step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The first probe run hit ts-node strict-mode typecheck failures (TS7006) on EntailmentChecker.ts when loading via require() — because ts-node's default config tries to fully typecheck every loaded file, which fails when sibling modules aren't found in isolation. Transpile-only mode strips types, skips typechecking, and just emits runtime JS — which is exactly what the sandbox needs. - PropertyTestChecker.ensureTsNode now registers ts-node with transpileOnly: true (falling back to ts-node/register/transpile-only if the explicit register call fails). - scripts/harness-probe.ts: dev tool that runs the full synthesis + execution pipeline on a single contract. Usage: npx ts-node scripts/harness-probe.ts '<contract-key>' [--violation] Prints SMT, function source, raw LLM response, parsed harness, and execution outcome. Supports --no-cache and --no-synth flags. Empirical first run: contract: src/checkers/EntailmentChecker.ts/check[11] claim: VIOLATION: Empty Collection Loop at check (empty contracts array should not execute loop body) function takes: (contracts: Contract[], callGraph: Map<string, string[]>) — complex types primitive-mapping cannot touch synthesis: 26s, sonnet-4-6 emitted a 60-line harness that: - named the falsification hypothesis in a comment - wrote an inline harness-error: precondition guard - stubbed the this-binding (fakeThis) to satisfy the method's access to this.name - used two assertions with correct encoding-gap: vs harness-error: prefixes execution: 3ms in vm sandbox, verdict pass interpretation: the function's empty-collection path is empirically consistent with the SMT claim — Z3's unsat was faithful, runtime agrees, proof upgraded to observable corroboration. This is the class of contracts the whole property-test layer could not touch before today. It works. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Private top-level functions in a CommonJS file (e.g. non-exported helpers in cli.ts — runDiff, printSummary, resolveProjectRoot) are invisible to require() because only module.exports survives the CJS wrapper closure. The harness runner therefore could not execute them even though the LLM synthesized perfectly reasonable harnesses for them. Fix: new src/moduleLoader.ts provides loadModuleWithPrivates(filePath). How it works: 1. Read the source file. 2. Tree-sitter enumerates every top-level declaration: function, generator, class, lexical/variable declarations, type/interface/enum. export_statement wrappers are unwrapped. 3. Append one exporter shim per name at end of source: try { if (typeof N !== 'undefined') exports['N'] = N } catch {} 4. Transpile the augmented source via ts.transpileModule (CJS/ES2022, esModuleInterop, skipLibCheck). 5. Instantiate a fresh Module, wire its filename and paths, then _compile the transpiled JS directly. This executes the file outside the normal require pipeline but with a real Module parent, so nested require() calls in the source resolve correctly. 6. Return mod.exports — which now contains every top-level binding the original source declared, not just those it chose to export. PropertyTestChecker.loadFunction and scripts/harness-probe.ts both try ordinary require() first (fast path, normal case), then fall back to loadModuleWithPrivates when the target binding isn't reachable via exports. The fallback is transparent to the rest of the pipeline — once a function is resolvable, the harness runner treats it identically regardless of how it was obtained. Empirical result — three previously-unloadable cli.ts contracts: printSummary[522] pass (claim corroborated) resolveProjectRoot[600] pass (claim corroborated) runDiff[265] encoding-gap (REAL FINDING) The runDiff result is substantive: the encoder modelled a default-value expression as ?? semantics (preserves falsy) while the code actually uses || (replaces "" with the default). The harness passed args=[""], observed ref="HEAD~1" where the claim said ref=="", and threw encoding-gap with full diagnostic context. This is the first genuine encoder-vs-runtime semantic gap the synthesis layer has caught on a non-toy function. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The harness synthesis pipeline flagged an encoding-faithfulness gap on src/cli.ts/runDiff[265]. Z3 proved the falsy-default-trap claim under the assumption that the default expression used nullish coalescing semantics (?? — only replaces on null/undefined), but the code actually used || (replaces on any falsy value, including ""). The harness passed args=[""], observed the runtime produce ref="HEAD~1" where the claim expected ref="" (empty preserved), and threw encoding-gap with full context. Minimum-change fix: use ?? instead of ||. Matches the encoder's assumed semantic. Empty-string args are now preserved — if a user really passes "" as a ref, git will surface the error itself rather than the CLI silently substituting HEAD~1. Post-fix probe (cache invalidated by source hash change, LLM re- synthesized a harness against the new code): VERDICT: pass First real-world fix driven by the harness synthesis layer. The before/after round-trip demonstrates the tool: broken state produces encoding-gap, code change produces pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-21T03:43:27Z

Important

Review skipped

Too many files!

This PR contains 284 files, which is 134 over the limit of 150.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 006d8133-57a1-4e5a-94e5-5fb1766b7154

📥 Commits

Reviewing files that changed from the base of the PR and between 655ee1e and f59e528.

📒 Files selected for processing (284)

.gitignore
.neurallog/classification.json
.neurallog/contexts/bundles.json
.neurallog/contracts/src/checkers/Checker.ts/extractDeclaredVars[52].json
.neurallog/contracts/src/checkers/Checker.ts/extractDeclaredVars[54].json
.neurallog/contracts/src/checkers/Checker.ts/extractDeclaredVars[56].json
.neurallog/contracts/src/checkers/Checker.ts/extractPreconditions[41].json
.neurallog/contracts/src/checkers/Checker.ts/extractPreconditions[46].json
.neurallog/contracts/src/checkers/Checker.ts/namespaceBlock[63].json
.neurallog/contracts/src/checkers/Checker.ts/runCheck[33].json
.neurallog/contracts/src/checkers/Checker.ts/runCheck[35].json
.neurallog/contracts/src/checkers/ConsistencyChecker.ts/check[12].json
.neurallog/contracts/src/checkers/ConsistencyChecker.ts/check[13].json
.neurallog/contracts/src/checkers/ConsistencyChecker.ts/check[18].json
.neurallog/contracts/src/checkers/ConsistencyChecker.ts/check[22].json
.neurallog/contracts/src/checkers/ConsistencyChecker.ts/check[34].json
.neurallog/contracts/src/checkers/ConsistencyChecker.ts/check[43].json
.neurallog/contracts/src/checkers/EntailmentChecker.ts/check[11].json
.neurallog/contracts/src/checkers/EntailmentChecker.ts/check[13].json
.neurallog/contracts/src/checkers/EntailmentChecker.ts/check[14].json
.neurallog/contracts/src/checkers/EntailmentChecker.ts/check[17].json
.neurallog/contracts/src/checkers/EntailmentChecker.ts/check[19].json
.neurallog/contracts/src/checkers/EntailmentChecker.ts/check[21].json
.neurallog/contracts/src/checkers/EntailmentChecker.ts/check[25].json
.neurallog/contracts/src/checkers/EntailmentChecker.ts/check[27].json
.neurallog/contracts/src/checkers/EntailmentChecker.ts/check[28].json
.neurallog/contracts/src/checkers/EntailmentChecker.ts/check[30].json
.neurallog/contracts/src/checkers/EntailmentChecker.ts/check[32].json
.neurallog/contracts/src/checkers/EntailmentChecker.ts/check[33].json
.neurallog/contracts/src/checkers/EntailmentChecker.ts/check[35].json
.neurallog/contracts/src/checkers/EntailmentChecker.ts/check[54].json
.neurallog/contracts/src/checkers/IndependenceChecker.ts/check[12].json
.neurallog/contracts/src/checkers/IndependenceChecker.ts/check[13].json
.neurallog/contracts/src/checkers/IndependenceChecker.ts/check[14].json
.neurallog/contracts/src/checkers/IndependenceChecker.ts/check[18].json
.neurallog/contracts/src/checkers/IndependenceChecker.ts/check[21].json
.neurallog/contracts/src/checkers/IndependenceChecker.ts/check[23].json
.neurallog/contracts/src/checkers/IndependenceChecker.ts/check[35].json
.neurallog/contracts/src/checkers/IndependenceChecker.ts/check[43].json
.neurallog/contracts/src/checkers/IndependenceChecker.ts/check[53].json
.neurallog/contracts/src/checkers/IndependenceChecker.ts/check[58].json
.neurallog/contracts/src/checkers/IndependenceChecker.ts/check[60].json
.neurallog/contracts/src/checkers/IndependenceChecker.ts/check[76].json
.neurallog/contracts/src/checkers/ReachabilityChecker.ts/check[11].json
.neurallog/contracts/src/checkers/ReachabilityChecker.ts/check[13].json
.neurallog/contracts/src/checkers/ReachabilityChecker.ts/check[14].json
.neurallog/contracts/src/checkers/ReachabilityChecker.ts/check[18].json
.neurallog/contracts/src/checkers/ReachabilityChecker.ts/check[19].json
.neurallog/contracts/src/checkers/ReachabilityChecker.ts/check[20].json
.neurallog/contracts/src/checkers/ReachabilityChecker.ts/check[21].json
.neurallog/contracts/src/checkers/ReachabilityChecker.ts/check[25].json
.neurallog/contracts/src/checkers/ReachabilityChecker.ts/check[26].json
.neurallog/contracts/src/checkers/ReachabilityChecker.ts/check[30].json
.neurallog/contracts/src/checkers/ReachabilityChecker.ts/check[32].json
.neurallog/contracts/src/checkers/ReachabilityChecker.ts/check[38].json
.neurallog/contracts/src/checkers/ReachabilityChecker.ts/check[40].json
.neurallog/contracts/src/checkers/ReachabilityChecker.ts/check[42].json
.neurallog/contracts/src/checkers/ReachabilityChecker.ts/check[47].json
.neurallog/contracts/src/checkers/ReachabilityChecker.ts/check[48].json
.neurallog/contracts/src/checkers/ReachabilityChecker.ts/check[50].json
.neurallog/contracts/src/checkers/ReachabilityChecker.ts/check[60].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[11].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[13].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[14].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[18].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[19].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[20].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[21].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[25].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[26].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[29].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[33].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[35].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[38].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[39].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[41].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[44].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[49].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[51].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[57].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[59].json
.neurallog/contracts/src/checkers/StrengtheningChecker.ts/check[63].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/checkStateMutation[55].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/checkStateMutation[57].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/checkStateMutation[58].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/checkStateMutation[61].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/checkStateMutation[62].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/checkStateMutation[65].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/checkStateMutation[66].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/checkStateMutation[67].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/checkStateMutation[69].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/checkStateMutation[71].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/check[10].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/check[11].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/check[12].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/check[14].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/check[26].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/check[27].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/check[29].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/check[31].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/check[32].json
.neurallog/contracts/src/checkers/TemplateChecker.ts/check[33].json
.neurallog/contracts/src/cli.ts/buildSignalRegistry[551].json
.neurallog/contracts/src/cli.ts/buildSignalRegistry[557].json
.neurallog/contracts/src/cli.ts/fileIssues[543].json
.neurallog/contracts/src/cli.ts/fileIssues[544].json
.neurallog/contracts/src/cli.ts/fileIssues[546].json
.neurallog/contracts/src/cli.ts/findProjectRoot[608].json
.neurallog/contracts/src/cli.ts/findProjectRoot[610].json
.neurallog/contracts/src/cli.ts/getConfidence[418].json
.neurallog/contracts/src/cli.ts/getConfidence[419].json
.neurallog/contracts/src/cli.ts/getFlag[596].json
.neurallog/contracts/src/cli.ts/printSummary[521].json
.neurallog/contracts/src/cli.ts/printSummary[522].json
.neurallog/contracts/src/cli.ts/resolveProjectRoot[600].json
.neurallog/contracts/src/cli.ts/resolveProjectRoot[601].json
.neurallog/contracts/src/cli.ts/runDiff[265].json
.neurallog/contracts/src/cli.ts/runDiff[273].json
.neurallog/contracts/src/cli.ts/runDiff[294].json
.neurallog/contracts/src/cli.ts/runDiff[297].json
.neurallog/contracts/src/cli.ts/runDiff[308].json
.neurallog/contracts/src/cli.ts/runDiff[310].json
.neurallog/contracts/src/cli.ts/runExplain[319].json
.neurallog/contracts/src/cli.ts/runExplain[320].json
.neurallog/contracts/src/cli.ts/runExplain[326].json
.neurallog/contracts/src/cli.ts/runExplain[343].json
.neurallog/contracts/src/cli.ts/runExplain[356].json
.neurallog/contracts/src/cli.ts/runExplain[358].json
.neurallog/contracts/src/cli.ts/runExplain[359].json
.neurallog/contracts/src/cli.ts/runExplain[364].json
.neurallog/contracts/src/cli.ts/runExplain[375].json
.neurallog/contracts/src/cli.ts/runExplain[377].json
.neurallog/contracts/src/cli.ts/runExplain[378].json
.neurallog/contracts/src/cli.ts/runExplain[383].json
.neurallog/contracts/src/cli.ts/runHook[487].json
.neurallog/contracts/src/cli.ts/runHook[491].json
.neurallog/contracts/src/cli.ts/runHook[495].json
.neurallog/contracts/src/cli.ts/runOverride[505].json
.neurallog/contracts/src/cli.ts/runReport[405].json
.neurallog/contracts/src/cli.ts/runReport[413].json
.neurallog/contracts/src/cli.ts/runReport[418].json
.neurallog/contracts/src/cli.ts/runReport[419].json
.neurallog/contracts/src/cli.ts/runReport[434].json
.neurallog/contracts/src/cli.ts/runReport[438].json
.neurallog/contracts/src/cli.ts/runReport[440].json
.neurallog/contracts/src/cli.ts/runReport[450].json
.neurallog/contracts/src/cli.ts/runReport[455].json
.neurallog/contracts/src/cli.ts/runReport[468].json
.neurallog/contracts/src/cli.ts/runReport[471].json
.neurallog/contracts/src/cli.ts/runReport[472].json
.neurallog/contracts/src/cli.ts/runReport[473].json
.neurallog/contracts/src/contracts.ts/contractHash[50].json
.neurallog/contracts/src/contracts.ts/contractHash[51].json
.neurallog/contracts/src/contracts.ts/contractHash[52].json
.neurallog/contracts/src/contracts.ts/contractHash[53].json
.neurallog/contracts/src/contracts.ts/findStale[156].json
.neurallog/contracts/src/contracts.ts/findStale[157].json
.neurallog/contracts/src/contracts.ts/findStale[158].json
.neurallog/contracts/src/contracts.ts/findStale[169].json
.neurallog/contracts/src/contracts.ts/findStale[170].json
.neurallog/contracts/src/contracts.ts/findStale[171].json
.neurallog/contracts/src/contracts.ts/findStale[172].json
.neurallog/contracts/src/contracts.ts/formatForPrompt[186].json
.neurallog/contracts/src/contracts.ts/formatForPrompt[190].json
.neurallog/contracts/src/contracts.ts/formatForPrompt[195].json
.neurallog/contracts/src/contracts.ts/formatForPrompt[197].json
.neurallog/contracts/src/contracts.ts/formatForPrompt[198].json
.neurallog/contracts/src/contracts.ts/formatForPrompt[202].json
.neurallog/contracts/src/contracts.ts/formatForPrompt[204].json
.neurallog/contracts/src/contracts.ts/formatForPrompt[205].json
.neurallog/contracts/src/contracts.ts/recordWitness[222].json
.neurallog/contracts/src/contracts.ts/recordWitness[224].json
.neurallog/contracts/src/contracts.ts/recordWitness[225].json
.neurallog/contracts/src/contracts.ts/remove[137].json
.neurallog/contracts/src/contracts.ts/signalKeyToPath[46].json
.neurallog/contracts/src/contracts.ts/signalKeyToPath[48].json
.neurallog/contracts/src/contracts.ts/signalKey[41].json
.neurallog/contracts/src/contracts.ts/signalKey[43].json
.neurallog/contracts/src/contracts.ts/writeSingle[214].json
.neurallog/contracts/src/contracts.ts/writeSingle[215].json
.neurallog/contracts/src/contracts.ts/writeSingle[216].json
.neurallog/contracts/src/contracts.ts/writeSingle[217].json
.neurallog/contracts/src/git/DiffAnalyzer.ts/constructor[15].json
.neurallog/contracts/src/git/DiffAnalyzer.ts/getHead[70].json
.neurallog/contracts/src/git/DiffAnalyzer.ts/isGitRepo[57].json
.neurallog/contracts/src/git/DiffAnalyzer.ts/parseDiff[100].json
.neurallog/contracts/src/git/DiffAnalyzer.ts/parseDiff[102].json
.neurallog/contracts/src/git/DiffAnalyzer.ts/parseDiff[104].json
.neurallog/contracts/src/git/DiffAnalyzer.ts/parseDiff[105].json
.neurallog/contracts/src/git/DiffAnalyzer.ts/parseDiff[106].json
.neurallog/contracts/src/git/DiffAnalyzer.ts/parseDiff[111].json
.neurallog/contracts/src/git/HookInstaller.ts/constructor[30].json
.neurallog/contracts/src/git/HookInstaller.ts/getHooksDir[104].json
.neurallog/contracts/src/git/HookInstaller.ts/install[35].json
.neurallog/contracts/src/git/HookInstaller.ts/install[42].json
.neurallog/contracts/src/git/HookInstaller.ts/install[44].json
.neurallog/contracts/src/git/HookInstaller.ts/isInstalled[95].json
.neurallog/contracts/src/git/HookInstaller.ts/isInstalled[98].json
.neurallog/contracts/src/git/HookInstaller.ts/uninstall[61].json
.neurallog/contracts/src/git/HookInstaller.ts/uninstall[66].json
.neurallog/contracts/src/git/HookInstaller.ts/uninstall[71].json
.neurallog/contracts/src/git/HookInstaller.ts/uninstall[78].json
.neurallog/contracts/src/git/HookInstaller.ts/uninstall[86].json
.neurallog/contracts/src/git/IgnoreFilter.ts/constructor[9].json
.neurallog/contracts/src/git/IgnoreFilter.ts/isIgnored[29].json
.neurallog/contracts/src/git/IgnoreFilter.ts/isIgnored[31].json
.neurallog/contracts/src/git/IgnoreFilter.ts/isIgnored[35].json
.neurallog/contracts/src/git/IgnoreFilter.ts/isIgnored[36].json
.neurallog/contracts/src/git/IgnoreFilter.ts/isIgnored[37].json
.neurallog/contracts/src/git/IgnoreFilter.ts/load[15].json
.neurallog/contracts/src/git/IgnoreFilter.ts/load[23].json
.neurallog/contracts/src/git/IgnoreFilter.ts/matches[44].json
.neurallog/contracts/src/git/IgnoreFilter.ts/matches[51].json
.neurallog/contracts/src/git/IgnoreFilter.ts/matches[52].json
.neurallog/contracts/src/git/IgnoreFilter.ts/matches[53].json
.neurallog/contracts/src/git/IgnoreFilter.ts/matches[54].json
.neurallog/contracts/src/git/IgnoreFilter.ts/matches[55].json
.neurallog/contracts/src/git/ProofDiff.ts/computeDiff[37].json
.neurallog/contracts/src/git/ProofDiff.ts/computeDiff[41].json
.neurallog/contracts/src/git/ProofDiff.ts/computeDiff[42].json
.neurallog/contracts/src/git/ProofDiff.ts/computeDiff[45].json
.neurallog/contracts/src/git/ProofDiff.ts/computeDiff[51].json
.neurallog/contracts/src/git/ProofDiff.ts/computeDiff[52].json
.neurallog/contracts/src/git/ProofDiff.ts/computeDiff[58].json
.neurallog/contracts/src/git/ProofDiff.ts/computeDiff[64].json
.neurallog/contracts/src/git/ProofDiff.ts/computeDiff[65].json
.neurallog/contracts/src/git/ProofDiff.ts/computeDiff[74].json
.neurallog/contracts/src/git/ProofDiff.ts/computeDiff[75].json
.neurallog/contracts/src/git/ProofDiff.ts/computeDiff[84].json
.neurallog/contracts/src/git/ProofDiff.ts/computeDiff[85].json
.neurallog/contracts/src/git/ProofDiff.ts/constructor[19].json
.neurallog/contracts/src/git/ProofDiff.ts/indexContracts[97].json
.neurallog/contracts/src/git/ProofDiff.ts/loadContractsAtRef[107].json
.neurallog/contracts/src/git/ProofDiff.ts/loadContractsAtRef[113].json
.neurallog/contracts/src/git/ProofDiff.ts/loadContractsAtRef[114].json
.neurallog/contracts/src/git/ProofDiff.ts/loadContractsAtRef[122].json
.neurallog/contracts/src/issues.ts/buildIssue[37].json
.neurallog/contracts/src/issues.ts/buildIssue[45].json
.neurallog/contracts/src/issues.ts/collectViolationIssues[94].json
.neurallog/contracts/src/issues.ts/collectViolationIssues[96].json
.neurallog/contracts/src/issues.ts/extractClaim[15].json
.neurallog/contracts/src/issues.ts/fileIssue[129].json
.neurallog/contracts/src/issues.ts/fileViolationIssues[150].json
.neurallog/contracts/src/issues.ts/fileViolationIssues[151].json
.neurallog/contracts/src/issues.ts/fileViolationIssues[157].json
.neurallog/contracts/src/issues.ts/fileViolationIssues[168].json
.neurallog/contracts/src/issues.ts/issueExists[116].json
.neurallog/contracts/src/issues.ts/issueExists[121].json
.neurallog/contracts/src/llm/ClaudeAgentProvider.ts/complete[12].json
.neurallog/contracts/src/llm/ClaudeAgentProvider.ts/complete[19].json
.neurallog/contracts/src/llm/ClaudeAgentProvider.ts/complete[21].json
.neurallog/contracts/src/llm/ClaudeAgentProvider.ts/complete[28].json
.neurallog/contracts/src/llm/ClaudeAgentProvider.ts/stream[38].json
.neurallog/contracts/src/llm/ClaudeAgentProvider.ts/stream[46].json
.neurallog/contracts/src/llm/ClaudeAgentProvider.ts/stream[48].json
.neurallog/contracts/src/llm/ClaudeAgentProvider.ts/stream[52].json
.neurallog/contracts/src/llm/OpenAIProvider.ts/complete[32].json
.neurallog/contracts/src/llm/OpenAIProvider.ts/complete[40].json
.neurallog/contracts/src/llm/OpenAIProvider.ts/complete[41].json
.neurallog/contracts/src/llm/OpenAIProvider.ts/complete[45].json
.neurallog/contracts/src/llm/OpenAIProvider.ts/constructor[16].json
.neurallog/contracts/src/llm/OpenAIProvider.ts/constructor[17].json
.neurallog/contracts/src/llm/OpenAIProvider.ts/post[56].json
.neurallog/contracts/src/llm/OpenAIProvider.ts/post[70].json
.neurallog/contracts/src/llm/OpenAIProvider.ts/post[71].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/complete[22].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/complete[23].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/complete[24].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/complete[25].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/complete[37].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/complete[38].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/complete[39].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/complete[40].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/complete[42].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/complete[47].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/complete[48].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/complete[58].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/complete[61].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/complete[70].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/constructor[14].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/ensureClient[117].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/ensureClient[119].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/ensureSession[125].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/ensureSession[127].json

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

The pull request introduces a comprehensive property-testing and verification refinement framework. It adds new checker implementations, LLM-driven harness synthesis for runtime validation, CEGAR-based violation refinement, principle lifecycle management with retirement logic, and supporting infrastructure for dynamic module loading and SMT proof refinement. Multiple pipeline phases are updated to support async execution and post-processing workflows.

Changes

Cohort / File(s)	Summary
Configuration `.gitignore`, `.neurallogignore`	Updated ignore patterns to exclude `*.worktrees/` directories and refine example file filtering to include specific example files explicitly.
Metadata `.neurallog/derivation.json`, `.neurallog/graph.json`, `.neurallog/principles/addition-overflow.json`	Updated derivation timestamps, contract counts, and project root references; added new principle configuration for overflow detection.
Contract Artifacts `.neurallog/contracts/examples/arithmetic.ts/`, `.neurallog/contracts/examples/inventory.ts/`, `.neurallog/contracts/src/.../...json`	Added 20+ contract JSON artifacts recording proof/violation analysis results, SMT encodings, and witness models for arithmetic and inventory functions, plus updated existing contract metadata with normalized file paths and confidence annotations.
Example Code `examples/arithmetic.ts`	New module exporting five pure arithmetic functions: `safeDivide`, `divide`, `absDiff`, `clamp`, and `addPositive` with guards/edge-case handling for demonstration.
Documentation `prompts/harness_synthesis.md`, `prompts/invariant_derivation.md`	Added comprehensive harness synthesis guide defining empirical falsification methodology, error classification, and runtime validation patterns; updated invariant derivation to mandate `REASON:` tags in SMT-LIB blocks.
Checker Implementations `src/checkers/StrengthChecker.ts`, `src/checkers/PropertyTestChecker.ts`, `src/checkers/index.ts`	Introduced `StrengthChecker` for load-bearing assertion analysis and comprehensive `PropertyTestChecker` supporting property-based runtime validation, LLM judging, and harness synthesis with timeout isolation; updated exports.
Harness & VM Infrastructure `src/harness.ts`, `scripts/harness-probe.ts`	New harness synthesis and VM-sandboxed execution framework with timeout handling, cache support, and structured outcome classification; added probe script for contract validation and harness testing.
Core Utilities `src/judge.ts`, `src/refiner.ts`, `src/moduleLoader.ts`	Added LLM-driven verdict judges for reasoning/teaching-examples/runtime outcomes; introduced SMT block refinement via LLM correction; implemented dynamic TypeScript module loading with private export extraction.
Contract & Type Definitions `src/contracts.ts`, `src/verifier.ts`	Extended `ProvenProperty` and `Violation` with optional `reason`/`confidence`/`judge_note` fields; added path normalization; enhanced `VerificationResult` with reason extraction from SMT comments; added `extractReason` helper.
Principle Management `src/principles.ts`	Introduced `PrincipleStats` tracking usage/proof/violation counts; added stat persistence and principle retirement lifecycle; integrated teaching-example judging into classification.
Pipeline Updates `src/pipeline/AxiomPhase.ts`, `src/pipeline/DerivationPhase.ts`, `src/pipeline/Pipeline.ts`, `src/cli.ts`	Converted axiom phase to async; integrated `StrengthChecker` and `PropertyTestChecker` with conditional post-processing (LLM judging, harness synthesis); implemented CEGAR-based violation refinement; added principle validation and stats maintenance; propagated async execution throughout pipeline; updated CLI entrypoint guard.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Poem

🐰 Hops through contracts, checking proofs with care,
Harnesses woven from Z3's thorough air,
CEGAR refines each violation found,
While judges and principles dance all around,
Runtime whispers truth where logic once paled! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main changes: harness synthesis, property-test checker, and five refinement features in the existing verification loop.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refinements

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR expands the verification pipeline with (opt-in) runtime property testing plus harness synthesis, and strengthens the existing LLM↔Z3 derivation loop with reasoning/judge metadata, CEGAR-style witness refinement, principle telemetry/auto-retirement, and error/unknown retries.

Changes:

Add REASON extraction + judge metadata plumbing across verifier/contracts, and integrate LLM-judge checks for teaching examples and runtime outcomes.
Introduce property-test + harness synthesis execution (sandboxed + cached) and a module loader that exposes private top-level bindings for testing.
Improve pipeline robustness/telemetry: principle usage stats + retirement, CEGAR refinement of violations, and one-shot refinement retry on Z3 error/unknown; fix runDiff ||→?? semantics.

Reviewed changes

Copilot reviewed 41 out of 44 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
src/verifier.ts	Adds `REASON` parsing and carries `reason` into `VerificationResult`.
src/refiner.ts	New LLM-based SMT block “repair” path for Z3 error/unknown blocks.
src/principles.ts	Adds principle usage stats + retirement and applies teaching-example judging.
src/pipeline/Pipeline.ts	Awaits Phase 5 execution; makes `runVerifyOnly` async.
src/pipeline/DerivationPhase.ts	Adds retry/refinement on Z3 error/unknown, principle telemetry, CEGAR refinement, and normalizes contract `file` to relative path.
src/pipeline/AxiomPhase.ts	Makes Phase 5 async; adds `StrengthChecker` + `PropertyTestChecker` and optional judge/harness flows.
src/moduleLoader.ts	New loader that transpiles and exports top-level private bindings for testing.
src/judge.ts	New judge layer for reasoning, teaching examples, and runtime outcomes.
src/harness.ts	New harness synthesis + `vm` execution + content-addressed cache.
src/contracts.ts	Normalizes stored contract file paths and adds confidence/judge metadata fields.
src/cli.ts	Awaits verify-only pipeline; fixes `runDiff` default ref semantics; guards `main()` for import use.
src/checkers/index.ts	Exports new `StrengthChecker` and `PropertyTestChecker`.
src/checkers/StrengthChecker.ts	New “load-bearing assert” strength check (but currently doesn’t match `Checker` interface).
src/checkers/PropertyTestChecker.ts	New property-test checker + harness synthesis plumbing (but currently doesn’t match `Checker` interface).
scripts/harness-probe.ts	Adds an end-to-end harness synthesis/execution probe script.
prompts/invariant_derivation.md	Requires `; REASON:` in SMT blocks.
prompts/harness_synthesis.md	Adds the large harness synthesis teaching prompt.
examples/arithmetic.ts	Adds primitive examples to exercise property testing.
.neurallogignore	Stops ignoring all `examples/**`; keeps most examples ignored except arithmetic demo.
.neurallog/principles/addition-overflow.json	Adds a new principle definition.
.neurallog/graph.json	Updates graph artifact (currently includes machine-local absolute paths).
.neurallog/derivation.json	Updates derivation artifact.
.neurallog/contracts/**	Updates/adds generated contract artifacts (some include absolute paths).
.gitignore	Ignores `.worktrees/`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,11 @@
+{
+  "key": "examples/inventory.ts/releaseStock[33]",
+  "file": "/Users/tsavo/provekit/.worktrees/refinements/examples/inventory.ts",


+export class StrengthChecker implements Checker {
+  readonly name = "strength";
+
+  check(contracts: Contract[]): CheckResult[] {
+    const results: CheckResult[] = [];
+


+export async function runHarness(
+  harnessCode: string,
+  fn: (...args: any[]) => any,
+  fnClass: any,
+  timeoutMs: number = 3000
+): Promise<HarnessOutcome> {
+  const sandbox: any = {
+    functionUnderTest: fn,
+    functionUnderTestClass: fnClass,
+    console: {


    const checkers: Checker[] = [
      new TemplateChecker(),
      new ConsistencyChecker(),
      new EntailmentChecker(),
      new ReachabilityChecker(),
      new StrengtheningChecker(),
      new IndependenceChecker(),
+      new StrengthChecker(),
+      new PropertyTestChecker(options.projectRoot),


@@ -0,0 +1,11 @@
+{
+  "key": "examples/inventory.ts/reserveStock[24]",
+  "file": "/Users/tsavo/provekit/.worktrees/refinements/examples/inventory.ts",


@@ -0,0 +1,30 @@
+{
+  "key": "examples/inventory.ts/reserveStock[20]",
+  "file": "/Users/tsavo/provekit/.worktrees/refinements/examples/inventory.ts",


@@ -0,0 +1,11 @@
+{
+  "key": "examples/inventory.ts/reserveStock[21]",
+  "file": "/Users/tsavo/provekit/.worktrees/refinements/examples/inventory.ts",


+export class PropertyTestChecker implements Checker {
+  readonly name = "property-test";
+
+  private tsNodeLoaded = false;
+  private fnCache = new Map<string, ExtractedFn | null>();
+  private fileRequireFailures = new Set<string>();
+  private ctorParamsCache = new Map<string, CtorParamInfo[] | null>();
+  private ctorArgsCache = new Map<string, any[] | null>();
+  private limit: number;
+  private attempted = 0;
+  lastRun: PropertyTestContext[] = [];
+  private harnessCandidates: Array<{
+    contract: Contract;
+    prop: ProvenProperty | Violation;
+    source: "proven" | "violation";
+    skipReason: string;
+  }> = [];
+  harnessResults: CheckResult[] = [];
+
+  constructor(private projectRoot: string = process.cwd()) {
+    const raw = process.env.NEURALLOG_PROPERTY_TEST_LIMIT;
+    this.limit = raw ? Math.max(1, parseInt(raw, 10) || 10) : 10;
+  }
+
+  check(contracts: Contract[]): CheckResult[] {
+    if (process.env.NEURALLOG_PROPERTY_TEST !== "1") return [];
+


+  "root": "/Users/tsavo/provekit/.worktrees/refinements/examples/inventory.ts",
+  "projectRoot": "/Users/tsavo/provekit/.worktrees/refinements",
  "files": [


@@ -0,0 +1,30 @@
+{
+  "key": "examples/inventory.ts/reserveStock[23]",
+  "file": "/Users/tsavo/provekit/.worktrees/refinements/examples/inventory.ts",


chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 655ee1e306

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-21T03:48:31Z

      contracts.push({
        key,
-        file: fn.filePath,
+        file: fn.relativePath,


Keep contract file path consistent with stale-prune lookup

This change stores new contracts with file: fn.relativePath, but stale-contract cleanup still filters with c.file === fn.filePath (absolute path) in the same phase. After this, previously stored entries for a function no longer match the pruning predicate, so obsolete contracts are not removed when signals move/disappear, and stale proofs/violations can continue to influence later verification runs.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-21T03:48:31Z

+      verdict = outcome.kind === "threw" ? "violation" : "proven";
+      error = outcome.kind === "threw"
+        ? "violation reproduced: function threw on the Z3 witness — bug confirmed"
+        : "violation did not reproduce: function ran cleanly on the Z3 witness — possible false positive";


Derive violation replay verdict from claim, not exception state

In the violation replay path, any non-throwing execution is marked proven, regardless of whether the violated property is value-based (e.g., arithmetic underflow/overflow) rather than exception-based. That means real counterexamples that return a bad value but do not throw will be reported as false positives, which can incorrectly flip violations to proven and corrupt confidence/judge annotations.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-21T03:48:31Z

+
+  private parseZ3Model(text: string): Record<string, number | boolean> {
+    const model: Record<string, number | boolean> = {};
+    const modelRegex = /\(define-fun\s+(\S+)\s+\(\)\s+(Int|Real|Bool)\s+([^\s)]+(?:\s+[\d.\-]+)?)\s*\)/g;


Parse parenthesized Z3 Real literals in model extraction

The model parser regex/value handling only accepts very simple scalar tokens and a/b-style text, but Z3 commonly emits Real values as parenthesized expressions such as (/ 1 2) (and negative variants). Those entries are dropped, so Real parameters go unmapped and property-test execution is skipped or misclassified for contracts with real-valued models.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 3

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

prompts/invariant_derivation.md (1)

507-513: ⚠️ Potential issue | 🟠 Major

Update the worked examples to include ; REASON: too.

Line 512 makes ; REASON: mandatory, but the SMT-LIB examples above still omit it. Because this is a production prompt and src/verifier.ts only extracts reason text from ; REASON: comments, the examples should model the required format or the LLM may keep emitting reason-less blocks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prompts/invariant_derivation.md` around lines 507 - 513, The worked SMT-LIB
examples in invariant_derivation.md are missing the required `; REASON:` comment
that src/verifier.ts expects; update each example SMT-LIB block (the fenced
```smt2 examples and their accompanying explanatory comments) to include a `;
REASON: <one or two short sentences>` line that cites specific variables/lines
in the code (e.g., reference the signal/variable names used in the example and
the corresponding source line numbers mentioned in the example) and explains
concisely why the invariant should hold; ensure every example block also keeps
the existing tags `; PRINCIPLE: ...`, `; LINE: <number>`, includes
`(check-sat)`, and a brief comment about what sat/unsat means so verifier.ts can
extract the reason text correctly.

src/contracts.ts (1)

147-150: ⚠️ Potential issue | 🟠 Major

Root cause: normalize contract.file on write, not only on load.

normalizeContractFile is applied in loadFromDisk but put/writeSingle persist contract.file verbatim. That's why the inventory artifacts committed in this PR (reserveStock[20].json, releaseStock[36].json, and likely others) contain absolute /Users/tsavo/... paths while the arithmetic ones — presumably written from a clean checkout — are relative. New contracts created on any contributor's machine will keep leaking local paths into git. Normalize before writing so behavior is consistent regardless of how the in-memory contract.file got populated.
Proposed diff
   put(contract: Contract): void {
+    contract.file = normalizeContractFile(contract.file, this.projectRoot);
+    contract.key = signalKey(contract.file, contract.function, contract.line);
     this.contracts.set(contract.key, contract);
     this.writeSingle(contract);
   }
Optionally also re-derive contract.key here (as above) so a contract constructed with an absolute path can't get stored under a differently-shaped key than the one load produces.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/contracts.ts` around lines 147 - 150, The put/write path currently
persists contract.file verbatim; normalizeContractFile is only used in
loadFromDisk so local absolute paths get committed—update the put method (and/or
writeSingle) to call normalizeContractFile(contract) before writing and persist
the normalized contract.file, and also re-derive contract.key from the
normalized file (e.g., assign contract.key = deriveKeyFrom(contract.file) or the
existing key derivation logic) so stored keys match loadFromDisk behavior;
ensure you reference the normalizeContractFile, put, writeSingle, and
contract.key symbols when making the change.

src/pipeline/DerivationPhase.ts (2)

596-605: ⚠️ Potential issue | 🟠 Major

Persist reason and judge notes into newly built contracts.

verifyAll() and retry refinement now populate VerificationResult.reason/judgeNote, but buildContracts() drops them. That prevents the new REASON/judge metadata from reaching stored proven and violations entries.

Suggested fix

         if (v.z3Result === "unsat") {
-          proven.push({ principle: v.principle, principle_hash: pHash, claim, smt2: v.smt2 });
+          proven.push({
+            principle: v.principle,
+            principle_hash: pHash,
+            claim,
+            smt2: v.smt2,
+            reason: v.reason,
+            confidence: v.confidence,
+            judge_note: v.judgeNote,
+          });
         } else if (v.z3Result === "sat") {
-          violations.push({ principle: v.principle, principle_hash: pHash, claim, smt2: v.smt2, witness: v.witness, complexity: v.complexity, confidence: v.confidence });
+          violations.push({
+            principle: v.principle,
+            principle_hash: pHash,
+            claim,
+            smt2: v.smt2,
+            witness: v.witness,
+            complexity: v.complexity,
+            confidence: v.confidence,
+            reason: v.reason,
+            judge_note: v.judgeNote,
+          });
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pipeline/DerivationPhase.ts` around lines 596 - 605, The buildContracts()
loop drops VerificationResult.reason and judgeNote when converting `toUse` items
into `proven`/`violations`; update the push objects inside the loop in
DerivationPhase.buildContracts (the for...of over `toUse`) to also copy
`v.reason` -> `reason` and `v.judgeNote` -> `judgeNote` (or use
`v?.reason`/`v?.judgeNote` if optional) so that the resulting arrays `proven`
and `violations` include those fields and the metadata from verifyAll()/retry
refinement is preserved.

282-285: ⚠️ Potential issue | 🟠 Major

Compare stale contracts using the same normalized path form.

New contracts are stored with fn.relativePath on Line 610, but stale cleanup still compares existing c.file to absolute fn.filePath. Previously generated contracts for removed signals in the same function can survive indefinitely.

Suggested fix

       const existingForFn = store.getAll().filter((c) =>
-        c.file === fn.filePath && c.function === fn.functionName && !newKeys.has(c.key)
+        c.file === fn.relativePath && c.function === fn.functionName && !newKeys.has(c.key)
       );

Also applies to: 610-610

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pipeline/DerivationPhase.ts` around lines 282 - 285, The stale-contracts
filter compares existing contract.c.file to fn.filePath (absolute) while new
contracts use fn.relativePath; update the comparison in the existingForFn
computation to use the same normalized/relative path as new contracts (e.g.,
compare c.file to fn.relativePath or normalize both sides) so that contracts are
correctly detected as stale; adjust the predicate in the store.getAll().filter
that builds existingForFn (and any other similar cleanup that references
fn.filePath) to use fn.relativePath or a normalized path form consistently with
where newKeys/contracts are stored.

🟠 Major comments (18)

.neurallog/principles/addition-overflow.json-14-20 (1)

14-20: ⚠️ Potential issue | 🟠 Major

Guard patterns are too broad and can hide real violations.

Using generic tokens like "overflow" and especially "capacity" as guard indicators can mark unsafe additions as “guarded” even when no bound check exists. This risks systematic false negatives.

Proposed tightening

       "guardPatterns": [
-        "< MAX",
-        "<= MAX",
-        "Number.MAX_SAFE_INTEGER",
-        "overflow",
-        "capacity"
+        "< MAX",
+        "<= MAX",
+        "< totalCapacity",
+        "<= totalCapacity",
+        "Number.MAX_SAFE_INTEGER"
       ]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.neurallog/principles/addition-overflow.json around lines 14 - 20, The
guardPatterns list in addition-overflow.json is too broad: remove or tighten the
generic tokens "overflow" and "capacity" (and any similarly generic words) so
they don't cause false negatives; instead keep explicit, unambiguous checks such
as "Number.MAX_SAFE_INTEGER", "< MAX", "<= MAX" and add more precise patterns if
needed (e.g. function names like "checkOverflow", "ensureCapacity" or regexes
matching explicit bound checks like "\\b<=\\s*MAX\\b" or
"Number\\.MAX_SAFE_INTEGER") so the detector only treats genuine bound checks as
guards.

.neurallog/principles/addition-overflow.json-23-27 (1)

23-27: ⚠️ Potential issue | 🟠 Major

SMT template and teaching example model different bug classes.

The template only proves result > 9007199254740991, while the teaching example is about exceeding domain capacity (new_total > total_capacity). This inconsistency can make validation/judging unreliable for the intended rule.

Suggested direction

-  "smt2Template": "(declare-const {{base}} Int)\n(declare-const {{delta}} Int)\n(declare-const {{max_safe}} Int)\n(declare-const {{result}} Int)\n(assert (= {{result}} (+ {{base}} {{delta}})))\n(assert (> {{delta}} 0))\n(assert (> {{base}} 0))\n(assert (= {{max_safe}} 9007199254740991))\n(assert (> {{result}} {{max_safe}}))\n(check-sat)",
+  "smt2Template": "(declare-const {{base}} Int)\n(declare-const {{delta}} Int)\n(declare-const {{limit}} Int)\n(declare-const {{result}} Int)\n(assert (= {{result}} (+ {{base}} {{delta}})))\n(assert (> {{delta}} 0))\n(assert (>= {{base}} 0))\n(assert (> {{limit}} 0))\n(assert (> {{result}} {{limit}}))\n(check-sat)",

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.neurallog/principles/addition-overflow.json around lines 23 - 27, The SMT
template and the teachingExample target different properties; align them by
making the smt2Template assert overflow against a domain capacity like the
example (use symbols such as booked_seats, batch_size, total_capacity,
new_total) or alternatively change the teachingExample to match the current
large-number overflow template (result and max_safe = 9007199254740991); update
either smt2Template or teachingExample so both assert the same bug condition
(e.g., replace smt2Template to declare booked_seats, batch_size, total_capacity,
new_total and assert (= new_total (+ booked_seats batch_size)) and (> new_total
total_capacity) to match the teachingExample).

.neurallog/principles/addition-overflow.json-10-13 (1)

10-13: ⚠️ Potential issue | 🟠 Major

parentTypes field is unsupported and will be silently ignored at runtime.

The ASTPattern interface in src/principles.ts:8-16 does not include a parentTypes field. The pattern matcher in src/templates/TemplateEngine.ts checks only nodeType, operator, method, requiresParamRef, guardPatterns, pairMethod, and checkPaths—never parent types. This means the constraint scoping intended here (expression_statement or lexical_declaration parents only) will have no effect, causing the principle to match all binary_expression nodes with operator + regardless of parent context, broadening matches unexpectedly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.neurallog/principles/addition-overflow.json around lines 10 - 13, The JSON
uses a non-supported "parentTypes" constraint which is ignored because
ASTPattern (src/principles.ts) and the TemplateEngine matcher
(src/templates/TemplateEngine.ts) never handle it; either remove the field from
the JSON or add support: extend the ASTPattern interface to include parentTypes:
string[] and update the TemplateEngine's pattern-matching routine (the function
that currently checks
nodeType/operator/method/requiresParamRef/guardPatterns/pairMethod/checkPaths)
to also obtain the AST node's parent type and require parentTypes to contain it
(treat absent parent as failing the constraint unless parentTypes includes a
sentinel). Ensure you reference ASTPattern and the matching function in
TemplateEngine when implementing the check so parent-scoped patterns only match
when the immediate parent node type is in parentTypes.

.neurallog/graph.json-2-25 (1)

2-25: ⚠️ Potential issue | 🟠 Major

Do not commit a graph rooted in a private local worktree.

Lines 2, 3, 6, 14, and 20 point to /Users/tsavo/provekit/.worktrees/refinements, which is non-portable, leaks local path information, and will break consumers that try to resolve the graph outside that machine. Please regenerate this artifact from the repository checkout root, use repository-relative paths if supported, or omit this local run artifact from the PR.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.neurallog/graph.json around lines 2 - 25, The graph.json file contains
non-portable absolute local worktree paths in the "root", "projectRoot", and
"files[].path" fields (and possibly "files[].relativePath"); regenerate or
replace this artifact so it uses repository-relative paths (or omits the local
run artifact entirely) and ensure "root" and "projectRoot" no longer reference
/Users/tsavo/.worktrees/refinements; update or regenerate the graph from the
repository checkout root so all file paths are relative (or remove the file from
the PR if this artifact should not be committed).

prompts/harness_synthesis.md-234-263 (1)

234-263: ⚠️ Potential issue | 🟠 Major

Remove or correct this flawed example: it violates its own precondition, creating a template for false encoding-gap reports.

The harness assigns const a = 1 + Number.EPSILON / 2 and const b = 1. At runtime, both evaluate to 1 (verified: a === b), so the fixture passes equal values to areDistinct(a, b). This violates the stated precondition a ≠ b. Per the prompt's own guidance on precondition-vacuity (lines 95, 601-604), a fixture that fails its precondition should throw harness-error or UNTESTABLE, not unconditionally report encoding-gap. As written, this example teaches LLM-synthesized harnesses to misclassify precondition violations as encoding gaps, which falsely downgrades valid proofs.

Either remove this example, or rewrite it to:

Use a fixture that actually satisfies the precondition a ≠ b at runtime, or

Explicitly check the precondition at the start and classify a violation as harness-error/UNTESTABLE.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prompts/harness_synthesis.md` around lines 234 - 263, The harness assigns a =
1 + Number.EPSILON / 2 which rounds to 1 at runtime, violating the precondition
a ≠ b and causing a false "encoding-gap" report; fix by either (A) ensuring
runtime distinctness (e.g., set a to 1 + Number.EPSILON or any pair that
preserves a !== b at runtime) or (B) explicitly check the precondition up-front
and classify violations as harness-error/UNTESTABLE (e.g., if (a === b) throw
harness-error/return UNTESTABLE) before calling functionUnderTest or asserting
areDistinct, so the harness does not mislabel precondition failures as encoding
gaps.

.neurallog/contracts/examples/inventory.ts/releaseStock[34].json-3-3 (1)

3-3: ⚠️ Potential issue | 🟠 Major

Normalize contract file paths before writing to disk.

Multiple contract JSON artifacts in .neurallog/contracts/ carry absolute local paths, including at least 7 entries in inventory.ts/ (releaseStock[33,34,36], reserveStock[20,21,23,24]) and additional entries under src/checkers/Checker.ts/. The root cause: writeSingle() in src/contracts.ts:235 serializes contracts without calling normalizeContractFile(), while the constructor's load path (lines 111, 121) does normalize.

Impact: Breaks reproducibility across machines, pollutes diffs, and persists maintainer-specific absolute paths in committed artifacts.
🔧 Fix

At src/contracts.ts:235, normalize the file path before write:
  private writeSingle(contract: Contract): void {
    const filePath = join(this.contractsDir, signalKeyToPath(contract.key));
    const dir = dirname(filePath);
    mkdirSync(dir, { recursive: true });
+   contract.file = normalizeContractFile(contract.file, this.projectRoot);
    writeFileSync(filePath, JSON.stringify(contract, null, 2));
  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.neurallog/contracts/examples/inventory.ts/releaseStock[34].json at line 3,
The serialized contract artifacts are being written with absolute local paths
because writeSingle() in src/contracts.ts does not call normalizeContractFile()
before writing; update writeSingle() to run the contract's file path through
normalizeContractFile() (the same normalization used in the class
constructor/load path) and use that normalized path when serializing/writing the
JSON so persisted artifacts contain only normalized, reproducible file paths.

.neurallog/contracts/examples/inventory.ts/reserveStock[20].json-3-3 (1)

3-3: ⚠️ Potential issue | 🟠 Major

Absolute user-specific path committed to the repo.

"file": "/Users/tsavo/provekit/.worktrees/refinements/examples/inventory.ts" leaks a local filesystem path and will not resolve on any other machine or in CI. The arithmetic contracts in this same PR correctly store "file": "examples/arithmetic.ts". Root cause is in src/contracts.ts (ContractStore.put doesn't normalize before write) — see comment there. Once that's fixed, regenerate these inventory.ts/* artifacts so the committed files use repo-relative paths.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.neurallog/contracts/examples/inventory.ts/reserveStock[20].json at line 3,
ContractStore.put is writing absolute, user-specific file paths into contract
artifacts; update src/contracts.ts so ContractStore.put normalizes the stored
path to a repo-relative path (e.g., use path.relative(projectRoot, originalPath)
or a similar normalization function) before writing the JSON, and ensure any
helper that computes the stored "file" value (inside ContractStore.put) handles
undefined/empty inputs robustly. After changing ContractStore.put, regenerate
the inventory.ts artifacts so the committed files contain repo-relative paths
like "examples/inventory.ts" instead of an absolute user path.

src/pipeline/DerivationPhase.ts-485-498 (1)

485-498: ⚠️ Potential issue | 🟠 Major

Recompute newViolations after CEGAR mutates contracts.

allNewViolations is collected before cegarRefineViolations(). If CEGAR flips or refines a [NEW] violation, the returned DerivationOutput.newViolations can still reference the pre-CEGAR violation state.

Suggested fix

-    const output: DerivationOutput = {
+    const currentNewViolations = allContracts.flatMap((c) =>
+      c.violations
+        .filter((v) => v.principle?.toUpperCase().includes("NEW"))
+        .map((v) => ({
+          violation: {
+            smt2: v.smt2,
+            z3Result: "sat" as const,
+            principle: v.principle,
+            error: undefined,
+            witness: v.witness,
+            complexity: v.complexity,
+            reason: v.reason,
+          },
+          context: c.key,
+        }))
+    );
+
+    const output: DerivationOutput = {
       contracts: allContracts,
-      newViolations: allNewViolations,
+      newViolations: currentNewViolations,
       derivedAt: new Date().toISOString(),
     };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pipeline/DerivationPhase.ts` around lines 485 - 498, The newViolations
value is computed before calling cegarRefineViolations so it can contain stale
violation objects that CEGAR later flips/refines; after calling
this.cegarRefineViolations(...) recompute the "new" violations from the mutated
contracts (allContracts) and assign that fresh array to
DerivationOutput.newViolations instead of using the precomputed
allNewViolations. Specifically, after cegarRefineViolations returns, iterate
over allContracts (or use the existing helper that produced allNewViolations) to
collect violations whose status is still NEW and use that result when building
the output object.

src/pipeline/DerivationPhase.ts-210-229 (1)

210-229: ⚠️ Potential issue | 🟠 Major

Fix metadata filtering when extracting the retry claim.

Line 215 strips ; before testing a regex that still expects ;, so PRINCIPLE, LINE, or REASON comments can be selected as the claim sent to the refiner.

Suggested fix

-        const claim = tv.smt2.split("\n").find((l) => l.trim().startsWith(";") && !/^;\s*(PRINCIPLE|LINE|REASON):/i.test(l.trim().replace(/^;\s*/, "")))?.trim().replace(/^;\s*/, "") || "(no claim)";
+        const claim = tv.smt2
+          .split("\n")
+          .map((l) => l.trim().replace(/^;\s*/, ""))
+          .find((l) => l && !/^(PRINCIPLE|LINE|REASON):/i.test(l)) || "(no claim)";

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pipeline/DerivationPhase.ts` around lines 210 - 229, The claim-extraction
logic in the templateVerifications loop mistakenly removes the leading ";"
before testing the metadata regex, so lines like "; PRINCIPLE:" pass the
negative test and become claims; fix by applying the metadata regex to the
original trimmed line (with the leading ";") instead of the semicolon-stripped
string — e.g., in the find callback use l.trim().startsWith(";") &&
!/^;\s*(PRINCIPLE|LINE|REASON):/i.test(l.trim()) to filter metadata, then only
call .replace(/^;\s*/, "") when building the claim to strip the semicolon before
storing/sending to refineErrorBlock; update references around
templateVerifications, claim and refineErrorBlock accordingly.

src/principles.ts-209-210 (1)

209-210: ⚠️ Potential issue | 🟠 Major

Avoid reusing active IDs after retirement.

Removing retired principles from this.principles makes the length-based nextId() unsafe. For example, active P1,P3 after retiring P2 gives length + 1 === P3, colliding with an existing active principle.

Suggested fix

   nextId(): string {
-    const existing = this.principles.length + 1;
-    return `P${existing}`;
+    const maxNumericId = this.principles.reduce((max, p) => {
+      const match = p.id.match(/^P(\d+)$/i);
+      return match ? Math.max(max, parseInt(match[1]!, 10)) : max;
+    }, 0);
+    return `P${maxNumericId + 1}`;
   }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/principles.ts` around lines 209 - 210, Retiring a principle by filtering
it out breaks the length-based nextId() allocation and can cause ID reuse
collisions; instead, stop removing entries from this.principles and mark them
retired (e.g., add/ensure a retired boolean on the principle object) or
implement a monotonic id generator property (e.g., this.nextIdCounter) that
nextId() increments and returns; update the method that currently does
this.principles = this.principles.filter((x) => x.id !== id) to set the retired
flag on the matching principle (by id) and ensure nextId() uses the counter or
computes max existing numeric id + 1 to guarantee uniqueness.

src/principles.ts-126-130 (1)

126-130: ⚠️ Potential issue | 🟠 Major

Track all principle IDs, not only P<N> IDs.

recordUse() drops any non-P<number> principle IDs, but DerivationPhase can add slug IDs for pattern-gap principles. Those principles will never accumulate stats, so telemetry and auto-retirement silently miss them.

Suggested fix

-      .split(/[,+&\s]+/)
+      .split(/[,+&\s]+/)
       .map((s) => s.trim())
-      .filter((s) => /^P\d+$/i.test(s));
+      .filter((s) => this.principles.some((p) => p.id === s));

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/principles.ts` around lines 126 - 130, The code in the principleTag
parsing drops any non-`P<number>` IDs so slug IDs produced by DerivationPhase
never get recorded; update the parsing in the principleTag handling (the code
that builds `ids`) so it does not filter out non-`P...` tokens — e.g. accept any
non-empty trimmed token (or explicitly allow both /^P\d+$/i and slug patterns)
instead of `.filter((s) => /^P\d+$/i.test(s))`, so that `recordUse()` receives
and accumulates stats for slug IDs as well.

src/checkers/PropertyTestChecker.ts-414-422 (1)

414-422: ⚠️ Potential issue | 🟠 Major

Don’t let the judge erase boundary failures it did not see.

judgeRuntimeOutcome receives only the primary outcome, but Line 436 can flip a violation caused solely by boundaryRuns back to proven. Include boundary failures in the judge input or guard this flip.

🐛 Proposed guard until the judge prompt supports boundary runs

     for (const ctx of this.lastRun) {
+      const boundaryFailures = ctx.boundaryRuns.filter((b) => b.outcome.kind === "threw");
       const verdict = await judgeRuntimeOutcome(
         {
           functionSource: ctx.functionSource,
           claim: ctx.claim,
@@
-      } else if (verdict.valid && ctx.result.verdict === "violation") {
+      } else if (verdict.valid && ctx.result.verdict === "violation") {
+        if (boundaryFailures.length > 0) {
+          ctx.result.error = ctx.result.error
+            ? `${ctx.result.error}; property-judge did not evaluate ${boundaryFailures.length} boundary failure(s)`
+            : `property-judge did not evaluate ${boundaryFailures.length} boundary failure(s)`;
+          confirmed++;
+          continue;
+        }
         ctx.result.verdict = "proven";
         ctx.result.error = `runtime threw but judge considered outcome consistent with claim: ${verdict.note}`;
         flipped++;

Also applies to: 436-439

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/checkers/PropertyTestChecker.ts` around lines 414 - 422, The judge call
is only given the primary outcome (via judgeRuntimeOutcome) so verdicts can
wrongly flip to proven when failures exist only in boundaryRuns; update the code
that iterates this.lastRun to either include boundaryRuns (or their aggregated
failure info) in the object passed to judgeRuntimeOutcome (e.g., add a
boundaryRuns or boundaryFailures field alongside functionSource, claim, smt2,
inputsSummary, outcome) or add a guard before accepting a "proven" verdict that
checks ctx.boundaryRuns (or ctx.boundaryFailures) for any failures and prevents
flipping to proven if any boundary failure is present; target the call to
judgeRuntimeOutcome and the logic that assigns/accepts verdict to ensure
boundary failures are considered.

src/checkers/PropertyTestChecker.ts-730-743 (1)

730-743: ⚠️ Potential issue | 🟠 Major

Avoid caching bound instance methods across property runs.

For instance methods, resolveCallable binds a method to one constructed object, and fnCache stores that bound function. Stateful methods can leak mutations between the primary run, boundary runs, and later contracts.

Cache FunctionInfo/constructor metadata instead, and create a fresh instance per test invocation. If a quick mitigation is needed, skip caching instance-bound methods:

🐛 Minimal mitigation

     if (fn) {
       const result = { fn, paramNames: info.paramNames, source: info.source };
-      this.fnCache.set(cacheKey, result);
+      if (!info.className || info.isStatic) {
+        this.fnCache.set(cacheKey, result);
+      }
       return result;
     }
@@
       if (fn) {
         const result = { fn, paramNames: info.paramNames, source: info.source };
-        this.fnCache.set(cacheKey, result);
+        if (!info.className || info.isStatic) {
+          this.fnCache.set(cacheKey, result);
+        }
         return result;
       }

Also applies to: 774-782, 798-801

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/checkers/PropertyTestChecker.ts` around lines 730 - 743, The code is
caching instance-bound methods (via resolveCallable) into fnCache, which can
leak state between runs; change the caching strategy so you do NOT cache bound
instance methods: in the resolution logic around resolveCallable,
loadModuleWithPrivates, cacheKey and fnCache.set, detect when the returned fn is
an instance-bound method (or when FunctionInfo indicates a constructor/instance
method) and either (a) store only the FunctionInfo/constructor metadata
(paramNames, source and constructor reference) in the cache and instantiate a
fresh object per test invocation before calling the method, or (b) as a minimal
mitigation skip fnCache.set entirely for instance-bound methods so each run
resolves a new bound instance; update all three spots that currently call
fnCache.set (the blocks using resolveCallable and loadModuleWithPrivates) to
follow this approach.

src/checkers/PropertyTestChecker.ts-506-506 (1)

506-506: ⚠️ Potential issue | 🟠 Major

Resolve the callable by contract line, not only by name.

contract.line is available but not used when extracting/loading the target. In files with duplicate method/function names, Line 1127 selects the first match, and Line 710 caches all same-name targets together.

🐛 Proposed direction

-      const info = this.extractFunctionInfo(absPath, cand.contract.function);
+      const info = this.extractFunctionInfo(absPath, cand.contract.function, cand.contract.line);

-      const extracted = this.loadFunction(absPath, cand.contract.function);
+      const extracted = this.loadFunction(absPath, cand.contract.function, cand.contract.line);

-  private loadFunction(filePath: string, fnName: string): ExtractedFn | null {
-    const cacheKey = `${filePath}::${fnName}`;
+  private loadFunction(filePath: string, fnName: string, line?: number): ExtractedFn | null {
+    const cacheKey = `${filePath}::${fnName}::${line ?? "any"}`;

-    const info = this.extractFunctionInfo(filePath, fnName);
+    const info = this.extractFunctionInfo(filePath, fnName, line);

-  private extractFunctionInfo(filePath: string, fnName: string): FunctionInfo | null {
+  private extractFunctionInfo(filePath: string, fnName: string, line?: number): FunctionInfo | null {

Inside extractFunctionInfo, only accept a name match when the target node spans the contract line:

-          if (name === fnName) target = node;
+          const spansContractLine =
+            line === undefined ||
+            (node.startPosition.row + 1 <= line && node.endPosition.row + 1 >= line);
+          if (name === fnName && spansContractLine) target = node;

Also applies to: 548-548, 709-713, 1111-1138

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/checkers/PropertyTestChecker.ts` at line 506, The code currently resolves
a callable only by name (calling extractFunctionInfo(absPath,
cand.contract.function)) which breaks when duplicates exist; modify
extractFunctionInfo to require that a name match also spans the contract.line
(use cand.contract.line) by checking the node's source location
(node.loc.start.line <= contract.line <= node.loc.end.line) before accepting it,
and update the caching/lookup that groups same-name targets to include the line
in its key (e.g. `${name}@${line}`) so callers that currently use
extractFunctionInfo (and any find-by-name helpers) return the specific node for
that contract.line instead of the first same-name match.

src/checkers/PropertyTestChecker.ts-388-394 (1)

388-394: ⚠️ Potential issue | 🟠 Major

Constrain fuzzy substring parameter matching.

Line 391 can map short params like id, i, or n to unrelated model keys such as valid or condition. That feeds the runtime checker arbitrary values and can flip verdicts incorrectly.

🐛 Proposed guard

     const substringMatch = keys.find(
       (k) => {
         const n = k.toLowerCase().replace(/_/g, "");
-        return n.includes(target) || target.includes(n);
+        if (target.length < 3 || n.length < 3) return false;
+        return (
+          n.startsWith(target) ||
+          n.endsWith(target) ||
+          target.startsWith(n) ||
+          target.endsWith(n)
+        );
       }
     );

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/checkers/PropertyTestChecker.ts` around lines 388 - 394, The fuzzy
substring match is too permissive and maps short targets like "id" or "n" to
unrelated keys; tighten the predicate in the keys.find used to produce
substringMatch (the block that computes n and returns n.includes(target) ||
target.includes(n)) so it only considers fuzzy matches when both the normalized
target and the normalized candidate key are reasonably long (e.g., length >= 3)
or when the target matches a whole token boundary; update the predicate to check
target.length >= 3 && n.length >= 3 (or a whole-word boundary test) before
returning true, then keep the existing return of model[substringMatch].

src/checkers/PropertyTestChecker.ts-269-275 (1)

269-275: ⚠️ Potential issue | 🟠 Major

Don't treat unresolved Promises as successful returns.

Line 271 records an async function's returned Promise as a normal value, and rejected Promises bypass this try/catch. This can mark async runtime failures as proven.

🐛 Minimal guard if async execution is not supported yet

     try {
       const value = fn(...args);
+      if (value && typeof value.then === "function") {
+        return {
+          kind: "threw",
+          error: "async function returned a Promise; property-test does not await async targets",
+        };
+      }
       return { kind: "returned", value: this.formatValue(value) };

Longer term, make check/runOne/callFunction async and await thenables.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/checkers/PropertyTestChecker.ts` around lines 269 - 275, callFunction
currently treats returned Promise objects as successful returns and doesn't
catch rejected promises; update callFunction to handle thenables: either
(preferred) make callFunction async and await the result (and propagate/format
rejections), and update callers check and runOne to be async as well, or
(minimal) detect thenables after calling fn(...args) (e.g., check typeof
value?.then === "function") and return a { kind: "threw", error: "unresolved
Promise" } (or similar) instead of treating it as a normal return; reference the
callFunction function and the check/runOne call sites so all async flows
properly await and handle promise rejections.

src/checkers/PropertyTestChecker.ts-326-333 (1)

326-333: ⚠️ Potential issue | 🟠 Major

Insert boundary pinnings before (check-sat).

The prop.smt2 always ends with (check-sat) (see extractSmt2Blocks in verifier.ts, lines 83–84). When the code appends pinnings after filtering out the goal assertion, the boundary assertions appear after (check-sat), causing Z3 to evaluate satisfiability before those pinnings are visible. This makes invalid boundary variants appear precondition-consistent.

Fix: Insert pinnings before the first (check-sat):

Proposed implementation

-    const block = lines
-      .filter((_, i) => i !== goalIdx)
-      .concat(pinnings)
-      .join("\n");
+    const preconditionLines = lines.filter((_, i) => i !== goalIdx);
+    const checkSatIdx = preconditionLines.findIndex((line) => line.trim() === "(check-sat)");
+    const blockLines = checkSatIdx >= 0
+      ? [
+          ...preconditionLines.slice(0, checkSatIdx),
+          ...pinnings,
+          ...preconditionLines.slice(checkSatIdx),
+        ]
+      : [...preconditionLines, ...pinnings, "(check-sat)"];
+    const block = blockLines.join("\n");

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/checkers/PropertyTestChecker.ts` around lines 326 - 333, In
PropertyTestChecker (the block that builds `block` from `lines`, `goalIdx`, and
`pinnings`), the code currently appends `pinnings` after filtering out the goal
which can place them after the file's final `(check-sat)`; change the assembly
so that `pinnings` are inserted before the first `(check-sat)` token in the
resulting SMT2 block (i.e., find the index of the first line that matches
`/^\s*\(check-sat\)/` and splice `pinnings` there, or if none exists append at
the end), then run Z3 with that reordered `block` so boundary assertions are
visible to the solver when it evaluates `check-sat` (references: variables
`lines`, `goalIdx`, `pinnings` in PropertyTestChecker).

src/checkers/PropertyTestChecker.ts-671-699 (1)

671-699: ⚠️ Potential issue | 🟠 Major

Handle Z3 Real values emitted as S-expressions.

Z3 commonly emits Real model values as S-expressions like (/ 1.0 2.0) and (- (/ 1.0 2.0)) for exact representation. The current modelRegex and negRegex patterns explicitly exclude parentheses via [^\s)]+, so they miss these nested forms. Real-valued parameters then vanish from the model, causing property tests to skip or misclassify.

Replace regex-only parsing with a small S-expression parser for the value portion of define-fun bodies.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/checkers/PropertyTestChecker.ts` around lines 671 - 699, The current
parseZ3Model uses regexes that fail to capture Real values emitted as
S-expressions (e.g. "(/ 1 2)" or "(- (/ 1 2))"), so update parseZ3Model and
parseValue to handle S-expression bodies instead of relying on `[^\s)]+`
matches: in parseZ3Model locate the define-fun name and sort, then extract the
full parenthesized body (handling nested parens) and pass that string to
parseValue; in parseValue accept both atomic numerics and S-exprs and implement
a tiny S-expression evaluator that recognizes (/ a b) -> a/b and unary - applied
to atoms or nested S-exprs, returning a number for Int/Real and boolean for Bool
(or undefined on parse errors) so Real rationals and negations are correctly
parsed into the model.

🟡 Minor comments (4)

.neurallog/contracts/examples/arithmetic.ts/safeDivide[7].json-11-13 (1)
11-13: ⚠️ Potential issue | 🟡 Minor

Keep reason code-cited and move CEGAR details to judge_note.

Line 13 describes the prior spurious encoding rather than why safeDivide satisfies the guard property. Since judge_note already captures cegar-flipped, the reason should be the short code-level explanation, e.g. that Line 7 returns immediately on b === 0, so Line 8 is unreachable on that branch.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.neurallog/contracts/examples/arithmetic.ts/safeDivide[7].json around lines
11 - 13, Update the JSON so the "reason" field contains a short, code-level
explanation that Line 7 in safeDivide returns immediately when b===0 (so Line
8/code_after is unreachable on that branch) and keep the existing CEGAR
debugging details (the spurious encoding, missing constraints, and cegar-flipped
note) moved into or preserved in "judge_note"; reference the symbols
guard_condition, guard_returns and code_after_reached when explaining the
immediate return and unreachable code so it’s clear which assertions correspond
to the early-return semantics.
.neurallog/contracts/src/observations.ts/findSimilarByAST[77].json-14-14 (1)
14-14: ⚠️ Potential issue | 🟡 Minor

judge_note is truncated to 200 characters without a sentinel, causing incomplete reasoning.

The judge pipeline (src/judge.ts lines 68 and 169) truncates notes via .slice(0, 200), and the CEGAR phase (src/pipeline/DerivationPhase.ts line 855) applies the same limit. No truncation indicator (e.g., [...] or …) is appended, so downstream readers encounter incomplete sentences appearing as valid text. This degrades the diagnostic value of judge notes when developers investigate rejected proofs or reconciliation decisions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.neurallog/contracts/src/observations.ts/findSimilarByAST[77].json at line
14, The judge notes are being truncated with .slice(0,200) in the judge pipeline
(judge.ts) and in the CEGAR phase (DerivationPhase.ts) without a sentinel,
producing incomplete sentences; replace those inline slices with a single
utility (e.g., truncateWithEllipsis(text, max=200)) and use it from both places
so that when text.length > max it returns text.slice(0, max - 1) + '…' (or '
[...]') otherwise returns text unchanged; add/adjust unit tests for judge note
formatting to assert the sentinel is present when truncation occurs.
.neurallog/contracts/examples/inventory.ts/reserveStock[20].json-17-17 (1)
17-17: ⚠️ Potential issue | 🟡 Minor

judge_note is truncated mid-word.

The note ends with "no guard c" — looks like the judge output was sliced at a hard character limit. If there's a .slice(0, N) upstream on judge text, consider either raising the cap or slicing at a word boundary / appending "…" so persisted notes stay readable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.neurallog/contracts/examples/inventory.ts/reserveStock[20].json at line 17,
The persisted judge_note for reserveStock is being cut mid-word; locate where
the judge output is truncated into the "judge_note" field (the code that slices
the judge text before saving, used for entries like reserveStock/judge_note) and
change it to either increase the cap or perform safe truncation: slice at the
last word boundary under the limit and append an ellipsis ("…") if truncated.
Ensure the truncation logic always preserves whole words and uses a clear
ellipsis indicator when shortening the judge output.
src/refiner.ts-57-64 (1)
57-64: ⚠️ Potential issue | 🟡 Minor

Tighten the "no-op refinement" and fence checks.

A few small correctness/robustness gaps in this block:

revised === smt2 compares the trimmed revised block against the raw original. If the original has any leading/trailing whitespace (common when the block came from a template or a previous LLM response), a truly-identical refinement will slip through and trigger a redundant Z3 call. Compare trimmed-to-trimmed.

The fence regex accepts smt-lib / smtlib2 but the system prompt only instructs the model to use smt2 fences. That's fine, but also consider rejecting blocks that contain additional prose/backticks before the fence gets picked up — match returns the first, so a commentary fenced block earlier in the response would be taken as the refinement.

No guard against empty revised (regex can match with whitespace-only body if the model replies with an empty fence); (check-sat) check catches most cases but a single literal (check-sat) with no decls will also get handed to Z3. Cheap to add a minimum-length / declare-const sanity check.
Proposed diff
-  const m = response.text.match(/```(?:smt2|smt-lib|smtlib2)?\s*\n([\s\S]*?)```/i);
-  if (!m) return null;
-  const revised = m[1]!.trim();
-  if (!revised.includes("(check-sat)")) return null;
-  if (revised === smt2) return null;
+  const m = response.text.match(/```(?:smt2|smt-lib|smtlib2)?\s*\n([\s\S]*?)```/i);
+  if (!m) return null;
+  const revised = m[1]!.trim();
+  if (!revised.includes("(check-sat)")) return null;
+  if (!/\(declare-(const|fun)\b/.test(revised)) return null;
+  if (revised === smt2.trim()) return null;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/refiner.ts` around lines 57 - 64, The current refinement extraction can
wrongly accept non-matching or empty fences and performs a raw-string equality
check; update the logic around response.text matching and the revised/smt2
checks: ensure the regex for the code fence still targets smt2 but guard against
earlier commentary by verifying the matched fence body is non-empty and contains
meaningful declarations (e.g., require a declare-const or declare-fun token)
before proceeding, trim both sides when comparing revised and smt2 (use revised
=== smt2.trim()), and keep calling verifyBlock(revised) only after these guards
pass; search for the variables/expressions response.text, revised, smt2 and the
verifyBlock call to make the changes.

🧹 Nitpick comments (5)

src/moduleLoader.ts (2)
67-107: Consider memoizing by filePath to avoid repeated parse/transpile/compile.

loadModuleWithPrivates is invoked from PropertyTestChecker and the harness probe, potentially many times against the same target file. Each call re-reads the file, re-runs tree-sitter + ts.transpileModule, and re-executes the module's top-level code via _compile. In addition to being slow, re-executing top-level side effects (logger init, DB connection setup, etc.) per contract can perturb subsequent harness runs and inflate cost.
♻️ Suggested cache
+const moduleCache = new Map<string, any>();
+
 export function loadModuleWithPrivates(filePath: string, parentModule?: NodeModule): any {
+  const cached = moduleCache.get(filePath);
+  if (cached) return cached;
   const ts = getTs();
   if (!ts) throw new Error("typescript not available");
@@
   mod._compile(transpiled.outputText, filePath);
+  moduleCache.set(filePath, mod.exports);
   return mod.exports;
 }
If cache invalidation on file mtime matters for long-running processes, key on filePath + mtimeMs instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/moduleLoader.ts` around lines 67 - 107, Add a simple in-memory cache in
loadModuleWithPrivates keyed by filePath (optionally file mtime via
fs.statSync(filePath).mtimeMs for invalidation) to avoid repeated
read/parse/transpile/compile; on entry check the cache and return cached.exports
if present, otherwise proceed to
readFileSync/collectTopLevelNames/transpileModule, compile into a new Module
(mod) as currently done, store mod.exports (and relevant meta like mtimeMs if
used) into the cache, and return it—ensure keys reference loadModuleWithPrivates
and the cache lookup happens before calling getTs/ts.transpileModule and the
cache set happens after mod._compile completes.
40-60: Add abstract_class_declaration case and remove invalid variable_statement case from the switch statement.

handleDeclaration currently misses abstract_class_declaration and includes variable_statement, which is not a valid tree-sitter-typescript node type. Variables are parsed as lexical_declaration, not variable_statement, so that case will never match. The abstract_class_declaration type is distinct from class_declaration in tree-sitter-typescript and must be explicitly handled to avoid silently missing top-level abstract class declarations.
Suggested changes:
       case "function_declaration":
       case "generator_function_declaration":
       case "class_declaration":
+      case "abstract_class_declaration":
       case "interface_declaration":
       case "type_alias_declaration":
       case "enum_declaration":
         addName(node.childForFieldName("name"));
         break;
       case "lexical_declaration":
       case "variable_declaration":
-      case "variable_statement": {
Consider also adding cases for module_declaration and internal_module to capture TypeScript namespace and module declarations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/moduleLoader.ts` around lines 40 - 60, Update handleDeclaration in
moduleLoader.ts: remove the invalid "variable_statement" case, add
"abstract_class_declaration" alongside "class_declaration" so top-level abstract
classes are captured by addName(node.childForFieldName("name")), and ensure
variable handling remains under "lexical_declaration" and "variable_declaration"
with the existing variable_declarator loop. Optionally add "module_declaration"
and "internal_module" cases to capture TypeScript namespaces/modules using
addName on their name field as well.
src/refiner.ts (1)
9-64: Consider a budget / telemetry hook.

refineErrorBlock is now called from DerivationPhase on every error/unknown verification. There's no caller-visible retry cap or logging of the refined-vs-original SMT, so a persistent Z3 bug can silently cost one LLM call per contract per run, and when refinement succeeds there's no trace of what the model changed. Even a console.log behind verbose, or returning the original z3Error alongside the result, would make debugging much easier. Deferrable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/refiner.ts` around lines 9 - 64, refineErrorBlock currently makes an LLM
call on every Z3 error/unknown with no telemetry, retry cap, or visibility into
what changed; update refineErrorBlock to (1) accept or create a simple
budget/telemetry hook (e.g., a retry limit or a callback) and enforce a max
number of refinement attempts, (2) log or surface both the original smt2/z3Error
and the revised smt2 when a refinement succeeds (use the existing
provider.complete call and the extracted revised variable), and (3) propagate
the original z3Error alongside the returned RefineResult (or return a richer
object) so callers like DerivationPhase can record the LLM call and compare
changes; reference refineErrorBlock, provider.complete, the revised variable,
and verifyBlock to locate where to add the retry counter, logging/telemetry
callback, and modified return shape.
src/contracts.ts (1)
5-14: Normalization edge cases.

A couple of quiet footguns in normalizeContractFile:

The marker loop uses POSIX separators (/src/, /examples/, …). On Windows, an absolute path like C:\Users\...\examples\foo.ts will miss every marker and fall through to return file, leaving an absolute, backslash-separated path. signalKey only strips leading /, not drive letters or backslashes, so the resulting key will be broken. Normalize separators before the marker scan (file.replace(/\\/g, "/")).

When relative(projectRoot, file) returns a ..-prefixed path (file lives outside the project root), the fallback marker list is hardcoded. For monorepos with non-listed top-level dirs (e.g. services/, apps/, crates/), you silently keep the absolute path. Consider making the markers configurable, or at a minimum documenting the assumption.

isAbsolute(rel) on the result of relative() can never be true on POSIX; the check is harmless but dead. Minor.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/contracts.ts` around lines 5 - 14, The normalizeContractFile function
fails on Windows paths and can miss monorepo top-level dirs: first, normalize
separators at the top of normalizeContractFile by replacing backslashes with
forward slashes (e.g., work on a local normalized variable like fileNorm =
file.replace(/\\/g, "/")) and use that for the marker scan and for computing
relative paths; second, make the marker list configurable (e.g., extract the
hardcoded array into a shared DEFAULT_MARKERS constant and add an optional
markers parameter to normalizeContractFile or accept injected markers) so
non-listed top-level dirs (services/apps/crates) are handled; lastly, drop the
dead isAbsolute(rel) check since relative() won’t return an absolute path.
Ensure you update all uses of normalizeContractFile and any downstream logic
that expects the original path format to handle normalized forward-slash keys.
src/checkers/StrengthChecker.ts (1)
47-67: Parse full assertion ranges to handle potential multi-line assertions.

The code only detects single-line (assert ...) forms by checking parenthesis balance on individual lines. Multi-line assertions would be silently skipped, potentially causing incorrect load-bearing counts. While all current stored contracts use single-line assertions, the code should robustly handle multi-line cases since nothing enforces the single-line constraint.
Suggested fix direction
-    const assertIndices: number[] = [];
+    const assertRanges: Array<{ start: number; end: number }> = [];
     for (let i = 0; i < lines.length; i++) {
       const trimmed = lines[i]!.trim();
       if (!trimmed.startsWith("(assert")) continue;
       let depth = 0;
-      for (const ch of lines[i]!) {
-        if (ch === "(") depth++;
-        else if (ch === ")") depth--;
+      for (let j = i; j < lines.length; j++) {
+        for (const ch of lines[j]!) {
+          if (ch === "(") depth++;
+          else if (ch === ")") depth--;
+        }
+        if (depth === 0) {
+          assertRanges.push({ start: i, end: j });
+          i = j;
+          break;
+        }
       }
-      if (depth === 0) assertIndices.push(i);
     }
 
-    if (assertIndices.length === 0) {
+    if (assertRanges.length === 0) {
       return { loadBearing: 0, totalAsserts: 0, status: "ok" };
     }
@@
-    for (const idx of assertIndices) {
-      const without = lines.filter((_, i) => i !== idx).join("\n");
+    for (const range of assertRanges) {
+      const without = lines.filter((_, i) => i < range.start || i > range.end).join("\n");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/checkers/StrengthChecker.ts` around lines 47 - 67, The current loop in
StrengthChecker that builds assertIndices by checking balanced parentheses per
line misses multi-line assertions; update the logic to scan lines for assertion
start tokens (e.g., trimmed.startsWith("(assert")) then walk forward across
subsequent lines accumulating parenthesis depth until depth returns to zero to
record the full start and end line range (store ranges instead of single
indices), and when testing each assertion use those ranges to create the
`without` variant (filter or splice out all lines in the recorded range) before
calling `verifyBlock`; refer to the existing symbols `lines`, `assertIndices`
(convert to ranges), and `verifyBlock` to locate and replace the per-line
balance check and removal logic.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6febe2a1-2898-4574-a94e-8e88e2ff38f8

📥 Commits

Reviewing files that changed from the base of the PR and between a814845 and 655ee1e.

📒 Files selected for processing (44)

.gitignore
.neurallog/contexts/bundles.json
.neurallog/contracts/examples/arithmetic.ts/absDiff[18].json
.neurallog/contracts/examples/arithmetic.ts/absDiff[19].json
.neurallog/contracts/examples/arithmetic.ts/addPositive[31].json
.neurallog/contracts/examples/arithmetic.ts/addPositive[32].json
.neurallog/contracts/examples/arithmetic.ts/clamp[24].json
.neurallog/contracts/examples/arithmetic.ts/clamp[25].json
.neurallog/contracts/examples/arithmetic.ts/divide[13].json
.neurallog/contracts/examples/arithmetic.ts/safeDivide[7].json
.neurallog/contracts/examples/arithmetic.ts/safeDivide[8].json
.neurallog/contracts/examples/inventory.ts/releaseStock[33].json
.neurallog/contracts/examples/inventory.ts/releaseStock[34].json
.neurallog/contracts/examples/inventory.ts/releaseStock[36].json
.neurallog/contracts/examples/inventory.ts/reserveStock[20].json
.neurallog/contracts/examples/inventory.ts/reserveStock[21].json
.neurallog/contracts/examples/inventory.ts/reserveStock[23].json
.neurallog/contracts/examples/inventory.ts/reserveStock[24].json
.neurallog/contracts/src/llm/OpenCodeProvider.ts/formatDuration[152].json
.neurallog/contracts/src/observations.ts/findSimilarByAST[77].json
.neurallog/contracts/src/pipeline/AxiomPhase.ts/formatDuration[166].json
.neurallog/derivation.json
.neurallog/graph.json
.neurallog/principles/addition-overflow.json
.neurallog/report.json
.neurallogignore
examples/arithmetic.ts
prompts/harness_synthesis.md
prompts/invariant_derivation.md
scripts/harness-probe.ts
src/checkers/PropertyTestChecker.ts
src/checkers/StrengthChecker.ts
src/checkers/index.ts
src/cli.ts
src/contracts.ts
src/harness.ts
src/judge.ts
src/moduleLoader.ts
src/pipeline/AxiomPhase.ts
src/pipeline/DerivationPhase.ts
src/pipeline/Pipeline.ts
src/principles.ts
src/refiner.ts
src/verifier.ts

coderabbitai · 2026-04-21T03:52:20Z

+  "file": "/Users/tsavo/provekit/.worktrees/refinements/examples/inventory.ts",
+  "function": "reserveStock",
+  "line": 21,
+  "signal_hash": "0e15c5038ab1561c7cf41e475816378338b60c7d27747ac6cd99fe6eff1ec8be",


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find every contract artifact in the PR that stores an absolute 'file' path. fd -e json . .neurallog/contracts --exec sh -c ' f="$1" path=$(jq -r ".file // empty" "$f" 2>/dev/null) case "$path" in /*|[A-Za-z]:*) echo "ABSOLUTE: $f -> $path" ;; esac ' _ {}

Repository: TSavo/provekit

Length of output: 50371

🏁 Script executed:

fd -e json . .neurallog/contracts/examples/arithmetic.ts --exec sh -c ' f="$1" file_path=$(jq -r ".file // empty" "$f" 2>/dev/null) echo "File: $f" echo " Path: $file_path" ' _ {} | head -20

Repository: TSavo/provekit

Length of output: 946

Contract artifacts across the entire .neurallog/contracts/ directory contain absolute, machine-specific paths—a systemic issue affecting 256+ artifacts and poisoning signal_hash reproducibility.

This is not localized to inventory.ts. The initial verification script found absolute paths in over 256 committed contract artifacts spanning:

.neurallog/contracts/src/verifier.ts/ (60+)

.neurallog/contracts/src/principles.ts/ (20+)

.neurallog/contracts/src/signals/ (multiple)

.neurallog/contracts/src/templates/ (40+)

.neurallog/contracts/src/parser.ts/, observations.ts, pipeline/ (10+ each)

.neurallog/contracts/examples/inventory.ts/ (6, with path /Users/tsavo/provekit/.worktrees/refinements/examples/inventory.ts)

In contrast, .neurallog/contracts/examples/arithmetic.ts/ correctly stores relative paths like examples/arithmetic.ts.

Two concrete problems:

Non-portable & leaks filesystem layout. Any other checkout, CI system, or developer will have mismatching absolute paths and cannot locate or re-verify these artifacts.

signal_hash is not reproducible across checkouts. Per src/signals/Signal.ts (lines 27–42), computeSignalHash incorporates signal.file into the SHA256. These absolute paths will produce different hashes on any other machine, defeating hash-based dedup and change detection.

The root cause: contract path normalization was not applied when these artifacts were generated (likely across different checkouts/worktrees with different base paths).

Remediation options:

Regenerate all affected contract artifacts so file stores relative paths (repo-root-relative), or

Delete all stale artifacts and let the pipeline regenerate them post-normalization fix.

Prevent regression: Add a guard in the artifact writer (likely DerivationPhase or ContextPhase) that asserts !isAbsolute(file) before persisting any contract, to catch future violations early.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.neurallog/contracts/examples/inventory.ts/reserveStock[21].json around lines 3 - 6, The committed contract artifacts contain absolute machine-specific "file" paths which break Signal.computeSignalHash reproducibility; fix by normalizing contract file paths to repo-root-relative before persisting artifacts (e.g., convert signal.file to a relative path) in the artifact writer used by DerivationPhase/ContextPhase, then regenerate or delete stale artifacts so new ones use relative paths; additionally add an assertion/guard in the writer (in DerivationPhase or ContextPhase) that throws or fails the write if path.isAbsolute(signal.file) to prevent future commits of absolute paths.

coderabbitai · 2026-04-21T03:52:20Z

+  const sandbox: any = {
+    functionUnderTest: fn,
+    functionUnderTestClass: fnClass,
+    console: {
+      log: () => {},
+      warn: () => {},
+      error: () => {},
+    },
+    setTimeout,
+    clearTimeout,
+    setInterval,
+    clearInterval,
+    Promise,
+    Error,
+    TypeError,
+    RangeError,
+    SyntaxError,
+    Array,
+    Object,
+    Number,
+    String,
+    Boolean,
+    JSON,
+    Math,
+    Date,
+    Symbol,
+    Map,
+    Set,
+    WeakMap,
+    WeakSet,
+    Reflect,
+    BigInt,
+    isNaN,
+    isFinite,
+    parseInt,
+    parseFloat,
+    encodeURIComponent,
+    decodeURIComponent,
+    globalThis: {},
+  };
+  sandbox.globalThis = sandbox;
+
+  vm.createContext(sandbox);
+
+  const wrapped = `(async () => {\n${harnessCode}\n})()`;
+
+  let promise: Promise<any>;
+  try {
+    const script = new vm.Script(wrapped, { filename: "<harness>" });
+    promise = script.runInContext(sandbox);
+  } catch (err: any) {
+    return {
+      kind: "harness-error",
+      message: `harness failed to parse/initialize: ${err?.message?.slice(0, 200) || "unknown"}`,
+      harnessCode,
+    };
+  }
+
+  let timer: NodeJS.Timeout | null = null;
+  try {
+    await Promise.race([
+      promise,
+      new Promise((_, reject) => {
+        timer = setTimeout(() => reject(new Error("__HARNESS_TIMEOUT__")), timeoutMs);
+      }),
+    ]);


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Node.js vm module not a security mechanism Script.runInContext timeout synchronous code documentation

💡 Result:

The Node.js vm module is explicitly not a security mechanism and should not be used to run untrusted code. This is stated at the top of the official documentation. The timeout option in methods like vm.runInContext(code, context, options) or script.runInContext(context, options) specifies milliseconds before terminating execution, throwing ERR_SCRIPT_EXECUTION_TIMEOUT if exceeded. It applies to the synchronous execution of the script. For synchronous code (e.g., infinite while loop), timeout works reliably and terminates the execution. However, timeout does not cover asynchronous tasks. Promises, async functions, process.nextTick, queueMicrotask, setTimeout, etc., schedule tasks that run after the synchronous script execution completes, escaping the timeout. Example from docs: vm.runInNewContext( 'Promise.resolve.then( => { while(1); });', { console }, { timeout: 5 } ); The promise .then callback runs after timeout, indefinitely. To include microtasks in timeout scope, use microtaskMode: 'afterEvaluate' in options (added v14.6.0). Microtasks (Promise.then, async/await) run immediately after script evaluation, under timeout: vm.runInNewContext( 'Promise.resolve.then( => { while(1); });', { console }, { timeout: 5, microtaskMode: 'afterEvaluate' } ); This terminates the microtask loop. Note: Sharing promises across contexts with this mode can cause issues, as inner microtasks may not drain properly. Macro-tasks (setTimeout, etc.) use global queues shared across contexts, uncontrollable by vm timeout. Using timeout incurs performance overhead (new event loops/threads). Official Node.js docs (v25.9.0) confirm these behaviors and limitations.

Citations:

1: https://nodejs.org/api/vm.html

2: https://nodejs.org/api/vm.md

3: https://nodejs.org/docs/latest-v20.x/api/vm.html

4: https://beta.docs.nodejs.org/vm.html

5: https://nodejs.org/docs/v20.14.0/api/vm.html

Move harness execution to a separate, killable process with strict timeout and minimal environment.

The Node.js vm module is not a security mechanism. The code does not use the timeout option in script.runInContext(sandbox), leaving synchronous code unprotected. Any synchronous infinite loop or blocking operation before the first await will hang the host process—the Promise.race timeout is installed after runInContext() completes, so it cannot catch synchronous attacks.

Additionally, the sandbox exposes dangerous constructors (Error, Array, Object, Function via Reflect, Symbol, BigInt, etc.) and timers (setTimeout, setInterval), enabling escape vectors and potentially preventing process termination.

If keeping vm temporarily, use runInContext(..., { timeout, microtaskMode: 'afterEvaluate' }) to enforce synchronous bounds and cover microtasks. However, this does not protect against exposed constructors or macro-task abuse. The correct fix is spawning harnesses in isolated worker threads or subprocesses with a strict time limit and IPC-only communication for results.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/harness.ts` around lines 130 - 195, The current harness runs untrusted code via vm.Script.runInContext(sandbox) (wrapped/harnessCode) leaving synchronous execution unbounded and exposing dangerous globals (sandbox, setTimeout, setInterval, Error, Array, Object, Reflect, Symbol, BigInt, etc.); replace this by executing the harness in a separate, killable worker/subprocess that receives harnessCode via IPC, runs it in a minimal global environment, returns results/errors over IPC, and the parent enforces timeoutMs by forcibly terminating the worker if it exceeds the deadline; if you must keep vm for a short-term patch, change the run to script.runInContext(sandbox, { timeout: <small ms>, microtaskMode: 'afterEvaluate' }) and remove/omit dangerous entries from sandbox (setTimeout, setInterval, Error, Function/Reflect accessors, Symbol, BigInt, globalThis shims) so synchronous/microtask work is bounded until you implement the isolated worker/subprocess approach using the same timeoutMs and promise cancellation logic.

coderabbitai · 2026-04-21T03:52:20Z

+        const smtMatch = text.match(/```(?:smt2|smt-lib|smtlib2)?\s*\n([\s\S]*?)```/i);
+        if (!smtMatch) continue;
+        const revisedSmt = smtMatch[1]!.trim();
+        if (!revisedSmt.includes("(check-sat)")) continue;
+        if (revisedSmt === v.smt2) continue;
+
+        const { result, witness: newWitness } = verifyBlock(revisedSmt);
+        if (result === "unsat") {
+          const bare = v.claim.replace(/^VIOLATION:\s*/i, "").trim();
+          const flippedClaim = `PROVEN: ${bare} is prevented (CEGAR-refined encoding)`;
+          contract.proven.push({
+            principle: v.principle,
+            principle_hash: v.principle_hash,
+            claim: flippedClaim,
+            smt2: revisedSmt,
+            reason: extractReason(revisedSmt) || "cegar-refined precondition added",
+            confidence: "high",
+            judge_note: "cegar-flipped: violation became proof after tightening encoding",
+          });


⚠️ Potential issue | 🔴 Critical

Do not flip violations to proven on an unchecked LLM-refined block.

A revised block can become unsat by adding an over-tight or contradictory precondition. Since this path immediately moves the violation into contract.proven, a bad refinement can corrupt stored verification results.

At minimum, reject refinements that change the principle/line tags, are vacuous/trivial, or lack an auditable REASON; ideally keep these as low-confidence until a second mechanical/judge check validates the added precondition against source code.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pipeline/DerivationPhase.ts` around lines 861 - 879, The code currently flips an LLM-refined SMT block to proven immediately after verifyBlock returns "unsat"; instead, add validation before pushing to contract.proven: after obtaining revisedSmt and newWitness from verifyBlock, ensure the refinement did not change principle/line tags (compare v.principle and v.principle_hash against any tags in revisedSmt or metadata), reject vacuous/trivial refinements (e.g., precondition that trivially contradicts or always false, or that removes original assertions), and require a non-empty auditable reason from extractReason(revisedSmt); if any check fails, do not push into contract.proven—either append to a separate low-confidence queue (e.g., contract.refinements or set confidence:"low" and judge_note indicating "awaiting second check") or mark for mechanical/judge re-validation instead of marking as proven.

… harness synthesis The full-run at LIMIT=10 exposed a pipeline bug: primitive property-test was calling functions with wrong-typed arguments. SMT can only declare Int/Real/Bool, so for a function like runDiff(args: string[]), the SMT would declare (declare-const args Int) and the matcher would obediently pass a JavaScript number to args.find(...), which then threw "args.find is not a function." That TypeError was labeled as a violation and entered the verdict counts as a false-positive finding. Fix: - extractFunctionInfo now returns paramTypes alongside paramNames, pulled from each parameter's type annotation via tree-sitter (child field "type"). Unannotated params get "unknown" and pass through. - New isCompatibleWithTsType(tsType, jsValue) helper. Allows the JS value if the declared TS type is: - any / unknown / missing - a primitive keyword (number, boolean, string) matching typeof - a numeric literal type and value is number - a string literal type and value is string - a union type where at least one member is compatible Rejects everything else (Contract[], Map<>, custom types, etc). - In runOne, after matchParamToModel resolves a value, we run isCompatibleWithTsType. On mismatch: push a harness-synthesis candidate with the specific reason and skip the primitive path. Empirical: the four functions that produced false positives in the LIMIT=10 run (runDiff, runHook, buildIssue, buildCallSiteContext) now skip with the new reason and become harness-synthesis candidates, where the LLM constructs proper fixtures for their actual types. The false positives are gone from the primitive-path verdicts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

## CEGAR soundness — validate premises before flipping (CodeRabbit critical) DerivationPhase.cegarRefineViolations: when the LLM-refined block returns unsat, previously we accepted the flip (violation -> proven) unconditionally. A refined block can go unsat two ways: (a) genuine refinement — new preconditions make the code path inconsistent with the bug, which is what we want; (b) vacuous refinement — new preconditions are self-contradictory, so the block is unsat for reasons unrelated to the claim. Both look identical to Z3. The fix: after getting unsat on the refined block, strip the negated goal and re-check Z3 on the premises alone. If the premises are not sat, the refinement is vacuous and we keep the original violation. Only accept the flip when premises are independently consistent. Added private stripNegatedGoal(smt2) helper that drops the last (assert) line. Used here and available for future reuse. ## Checker interface signatures (Copilot) StrengthChecker.check and PropertyTestChecker.check now match the Checker interface: check(contracts: Contract[], _callGraph: Map<string, string[]>) TypeScript structural typing was letting this slide; the explicit signature is clearer and guards against future interface changes. ## Gitignore the build-derived .neurallog artifacts (Copilot, CodeRabbit) Contract JSONs, graph.json, report.json, derivation.json, contexts/, observations/, harnesses/, classification.json are all machine-specific artifacts (absolute paths, timestamps, per-machine hash indices) and should never have been versioned. Moved to .gitignore. Kept .neurallog/principles/ in git — the learned knowledge base is the one artifact intended to be versioned. 962 files untracked from the working tree. ## Violation-replay verdict on clean runs (Codex P1) Previously: violation replay + clean runtime -> verdict "proven" with error: "possible false positive." That auto-corroborates a proof on evidence that was neither a corroboration nor a refutation. Now: violation replay + clean runtime -> verdict "error" with a clear message saying the replay did not reproduce and the judge must decide whether the claim is a false positive (flip to proven) or the encoding misrepresents the semantic (flip to violation). The judge already handles both directions; only the initial verdict changes. ## vm sandbox trust-boundary comment (CodeRabbit) Added a doc block above runHarness explaining that vm.createContext is a containment mechanism, not a security boundary, and that the trust boundary is "same as running the tool on this repo" — acceptable for the intended use case of testing the user's own source. ## Parenthesized Z3 Real literal parsing (Codex P2) parseZ3Model previously missed sexp-form values like (define-fun x () Real (/ 5 2)) (define-fun x () Real (- (/ 5 2))) because the regex only handled plain literals and (- N). Added parseSexpValue recursive handler for (- x) and (/ a b), and a splitSexpArgs helper for parenthesis-balanced tokenization. ## StrengthChecker perf gate (Copilot) StrengthChecker runs Z3 once per top-level (assert) per proven property — O(N × K) calls. Gated behind NEURALLOG_STRENGTH_CHECK=1 so AxiomPhase doesn't pay the cost unless the operator opts in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ti-model Z3 Four improvements landed together because they share test surface. ## #17 Harness code audit before execution New judgeHarnessCode in judge.ts: audits the synthesized JavaScript harness *before* it runs. Single LLM call that reads the harness, the claim, the SMT block, and the function source, then decides whether the harness is a genuine empirical test or rigged by construction. The audit prompt names six specific rigging mechanisms: 1. Function re-implementation (testing a copy instead of functionUnderTest) 2. Tautological assertion (expect(true).toBe(true) shape) 3. Claim-restated expected value (transliterated from the claim) 4. Fixture contains the answer (function reads .expected and passes through) 5. encoding-gap thrown on orthogonal conditions (TypeErrors, null derefs) 6. Hypothesis-assertion drift (comment and checks don't match) Response format: VALID or INVALID with a one-line critique. Bias toward INVALID — a falsely-validated rigged harness poisons the whole corroboration signal. Wired into synthesizeAndRunHarnesses between synthesis and execution. Rejected harnesses become harness-error outcomes with the auditor's critique. Cached via HarnessCache.putAudit so the audit isn't re-run on subsequent verify invocations (same source → same audit). Env: NEURALLOG_HARNESS_AUDIT=0 disables (default on). Model via NEURALLOG_AUDIT_MODEL or NEURALLOG_JUDGE_MODEL (haiku by default). This addresses Opus's top prediction: that the LLM would emit harnesses technically executing but not actually testing the claim. 9/10 pass rate on the previous full run couldn't distinguish corroboration from rigging. Now it can. ## #18 Judge verdict cache New src/judgeCache.ts persists judgeRuntimeOutcome verdicts at .neurallog/judge-cache/<hash>.json keyed by sha256 of (smt2, functionSource, inputsSummary, outcomeSig, source). judgeResults now checks the cache before calling the LLM. Cache hits skip the ~20-30s per-entry LLM round. Stats surface the hit count in the AxiomPhase summary line: property-test judge: 10 judged (7 cache hits), 2 flips, 3 confirmed Gitignored like the harness cache — build-derived, not versioned. ## #19 Parallel harness synthesis synthesizeAndRunHarnesses previously looped candidates sequentially: ~75-95s per LLM call × 10 candidates ≈ 15 minutes. No serial dependency between candidates — synthesis of contract A doesn't block contract B. Refactored the loop into a two-phase pipeline: Phase 1 (parallel, LLM-bound): synthesize + audit each candidate concurrently via a parallelMap helper. Produces a prepared record per candidate (either "ready" with harness and info, or "skip" with the outcome kind and message). Phase 2 (sequential, in-process): iterate prepared records, run the harness in the vm sandbox one at a time. Execution is fast (<100ms typical) and shares the event loop; parallelism here would only add contention. NEURALLOG_HARNESS_CONCURRENCY=N gates phase 1 (default 3). With N=3, 10 candidates drop from ~15 min to ~5 min. ## #20 Multi-model Z3 sampling For each proven contract, previously we took one Z3 (get-model) witness. Adversarial bugs often hide in corners the solver's first-satisfying input doesn't touch. New extractModels(smt2, k) asks Z3 for k distinct witnesses by iteratively adding (assert (or (not (= x v_prev1)) (not (= y v_prev2)) ...)) for each prior witness in the query. Stops early if Z3 returns unsat (no more distinct models exist) or produces a duplicate. In check(), the proven loop now iterates up to NEURALLOG_Z3_SAMPLES models per proven property (default 1, back- compatible). Each sample is a separate attempt counted against the LIMIT. runOne accepts a modelOverride arg so the sampling happens in the outer loop, not inside runOne. Combined with harness synthesis and the judge, each proven claim now gets K independent empirical checks instead of one. A claim that holds under Z3's first witness but fails at a different corner of the input space is now detectable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nvalidation ## #22 Non-determinism handling Two-part fix. Primitive path (src/checkers/PropertyTestChecker.ts): detectsNonDeterminism() scans the function source for references to Math.random, Date.now, performance.now, new Date(), process.hrtime, crypto.randomUUID/randomBytes/getRandomValues. When any of these are found, the contract is skipped with a specific reason and routed to the harness synthesis candidate queue — the LLM can consciously stub the nondeterminism source, the primitive path cannot. Harness sandbox (src/harness.ts): runHarness now installs deterministic default stubs: Math.random -> constant 0.5 (via Object.create(Math) + defineProperty) Date.now -> constant 0 (via Proxy on Date with custom get) Both preserve constructor behaviour and all non-stubbed members; only the time/randomness surface is pinned. Harnesses that need real time or real randomness can override inside their own code. The rationale: a claim proven by Z3 is about the function's pure logic. Any runtime-specific value the function pulls from outside that logic (clock, RNG) is a variable the proof didn't constrain. Pinning it to a constant makes the empirical check deterministic without loss of generality for the claim's scope. ## #21 Transitive-source cache invalidation HarnessCache key was sha256(smt2 + functionSource). If the function calls a helper whose implementation changes, the harness becomes stale but the cache hits. Fix: cache key now optionally includes depsSource — the concatenated source of the function's containing file plus depth-1 relative imports. New moduleLoader.collectTransitiveSource(filePath, projectRoot, depth=1) handles the collection. PropertyTestChecker memoizes depsSource per absPath within a single synthesizeAndRunHarnesses invocation so each file is read at most once. Pass the depsSource to cache.get / cache.put / cache.putAudit. Existing caches keyed without depsSource still resolve (same hash when depsSource is absent) — no migration needed, next write attaches dep info. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Added vitest (one dev dep) and 61 tests across two files. Scope: src/checkers/PropertyTestChecker.test.ts (53 tests) - isCompatibleWithTsType: 27 case-table entries covering primitives, literal types, unions with null/undefined, Contract[]/Map<> rejection, any/unknown/empty pass-through. - matchParamToModel: exact, case-fold, underscore-strip, suffix-strip (_condition, _guard), substring fallback, preference ordering. - isControlFlowModel: all-control-keys → true, mixed with param → false, empty → false, result_consequent/result_alternate recognition. - parseZ3Model: plain Int, negative via (- N), Real fraction via (/ a b), negated fraction via (- (/ a b)), Bool, multiple bindings. - detectsNonDeterminism: Math.random / Date.now / performance.now / new Date() / crypto detection + clean case. src/moduleLoader.test.ts (8 tests) - collectTopLevelNames across function_declaration, async functions, class_declaration, lexical declarations (const/let/var), type aliases and interfaces, export_statement wrappers, dedup, empty source. Bug found + fixed while writing tests: addName() only accepted nodes of type 'identifier' or 'property_identifier', but tree-sitter-typescript emits 'type_identifier' for class / interface / type_alias name fields. classes/types/interfaces were silently skipped by collectTopLevelNames — which meant a fallback loadModuleWithPrivates on a file containing classes would fail to expose them. Now fixed: type_identifier added to the accepted node types. vitest.config.ts restricts test collection to src/**/*.test.ts so the compiled dist/ output isn't picked up (vitest can't import itself in CJS-compiled .js). test script wired into package.json: `npm test`. Note: larger integration-style tests for runHarness (vm sandbox behavior, timeout enforcement, prefix classification) and for the CEGAR premise-consistency check are not included in this pass — they require Z3 / LLM mocks or integration harness and are a separate investment. The helpers tested here are the pure-logic surface where regressions silently slip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New src/lessons.ts: LessonStore persists each property-judge-flagged encoding-gap as a lesson entry at .neurallog/lessons/<hash>.json with: - contractKey: where the gap was observed - claim: the textual claim the encoder wrote - judgeNote: the property-judge's critique describing the mismatch - principleId: which principle the encoder was applying (if derivable) - addedAt: timestamp formatForPrompt(limit) renders the most recent N lessons as a "Known Encoding Gaps — Avoid These Mistakes" section suitable for injection into encoder prompts. Wired into two points in the encoder pipeline: 1. DerivationPhase pattern-gap synthesis (the prompt that asks the LLM to produce a new atomic principle with smt2Template + astPatterns when N unverified signals cluster). LessonStore.formatForPrompt() is appended to the prompt so the encoder sees specifically which SMT encodings it got wrong before. 2. principles.classifyAndGeneralize (the prompt that generalises a [NEW]-tagged violation into a reusable principle). Same injection; projectRoot now an optional parameter so the lesson store can be opened. Feedback loop flow: harness runs -> judge flags encoding-gap -> lesson stored -> next encoder invocation reads lessons -> encoder avoids the mistake Storage gitignored. On projects that regularly encounter new encoding patterns, the lesson store accumulates into a de-facto institutional memory about which SMT abstractions have been empirically unfaithful. 61 tests still passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

## #24 Runtime observation seeding (Daikon-style) New src/runtime.ts exposes two APIs for users who want to seed the verification pipeline with real values their code sees in production or under tests. observe(signalKey, values, projectRoot?) Instruments a call site. Values get sanitized (depth-limited, non-finite numbers tagged, references flattened, Maps/Sets/Errors serialized to tagged wrappers) and appended as NDJSON to .neurallog/witnesses/<slug>.ndjson. Hard-capped at 16KB per line and 10MB per file. Opt-in via NEURALLOG_OBSERVE=1 — zero-cost no-op in production without the gate. readObservations(signalKey, projectRoot?) Returns the persisted Observation array for a signal. listObservedSignals(projectRoot?) Enumerates signals with recorded witnesses. PropertyTestChecker uses readObservations as an alternative input source when NEURALLOG_USE_RUNTIME_OBSERVATIONS=1. For each proven contract, before falling back to Z3-model sampling, it queries the runtime observation store for the function's signal. If real values exist, it extracts their primitive subset (number/boolean only — SMT's domain) and tests the function with those directly. Falls through to Z3 models when no observations are available. The architectural shift: inputs now come from three sources, in order of precedence: 1. NEURALLOG_USE_RUNTIME_OBSERVATIONS=1 → real values the code saw 2. Z3 (get-model) with optional multi-sampling 3. LLM-synthesized harness for contracts that need complex types Gives operators a dial from "abstract proof" to "empirical proof conditioned on real usage." Daikon's core insight, adapted. ## #26 Triangle oracle — test suite as prompt context New src/testOracle.ts detects the project's test framework and finds existing tests referencing the target function. Used as advisory context in harness synthesis — the LLM sees how the project's own tests exercise the function and can align its harness accordingly. Detection (detectTestFramework): Recognizes vitest, jest, mocha, ava, tap, node-test. First from dependencies/devDependencies in package.json, then from scripts.test patterns. Returns null when no test infrastructure exists. Discovery (findTestsForFunction): Walks __tests__/, test/, tests/, spec/ directories plus project root, matching .test.ts/.spec.ts/.test.js/.spec.js files. For each file containing a whole-word match on the function name, extracts test blocks (describe/it/test) that reference the function. Returns structured TestReference records with file, line range, block snippet, and test name. Integration (SynthesisInput.testOracleContext): When harness synthesis runs for a candidate, PropertyTestChecker calls findTestsForFunction and appends the formatted references to the synthesis prompt under "Existing tests for this function (oracle context)". The synthesis prompt now instructs the LLM to use the tests as calibration without copying assertions verbatim (the anti-A2 guardrail is preserved). No test execution — the tests are read-only oracle context. Running the project's test runner with specific inputs and cross-referencing outcomes is a separate architectural step; this is the minimum surface needed to get harnesses conditioning on existing evidence. Unit tests: 9 new tests in src/testOracle.test.ts. Uses tmp dirs to exercise framework detection and test discovery without polluting the worktree. 70 tests total pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Step #1 of the substrate-types wiring sequence (per admissibility-spine #796 next phase). Adapts every Rust caller of compose_chain_contracts to the new Result<_, CompositionRefusalMemento> signature landed in #827. Call sites updated: - libprovekit/src/ffi.rs:390 — FfiError::ComposeRefused carries the refusal memento, formats CID + serialized refusal artifact - provekit-cli/src/cmd_compose.rs:343 — compose RPC maps refusal into refusal_cid + full refusal JSON payload - provekit-walk/src/chain.rs:85 — walk method-chain wrapper now returns Result<Option<_>, CompositionRefusalMemento> Integration tests added: - ffi_smoke::rust_jcs_entrypoint_impure_input_returns_stable_refusal_cid - compose_rpc_smoke::compose_rpc_impure_input_returns_refusal_memento_with_stable_cid - compose_method_chain_refusal::compose_method_chain_impure_input_returns_stable_refusal_cid Each test exercises an impure-input failure and asserts the refusal CID is non-empty AND deterministic across two runs (per spec §0.1 determinism guarantee). Pre-existing failures (NOT introduced here): - provekit-bridgeworks::smoke::checked_add_exhibit_passes (also fails on clean main without this PR applied) - provekit-cli::cli_surface::lift_zig_shows_production_composes_but_unit_tests_conflict (Zig .zig-cache PermissionDenied, environment flake; matches the ETXTBSY/zig flake notes in memory #72/#82) Co-authored-by: T Savo <agentwopr@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…838) Step #2 of the substrate-types wiring sequence. provekit prove now persists content-addressed ProofRunMemento + StageReceipt artifacts for each verifier run under .provekit/runs. Per stage: captures input_cids (sorted ascending), output_cids (sorted), refusal_cids (sorted), diagnostics (preserve order), started_at/finished_at ISO-8601 timestamps, verdict in {ok, warned, refused, skipped}, stage_name, kind="stage-receipt", schemaVersion="1". header.cid via StageReceipt::recompute_header_cid(). Per run: aggregates stage_receipt_cids in execution order (preserve order in CID input per spec §2.1), input_artifact_cids sorted, output_artifact_cids sorted (set semantics), proof_envelope_cid + link_bundle_cid + plugin_registry_cid + verifier_pipeline_cid (deterministic placeholder blake3_512(JCS(stage_vocabulary)) until VerifierPipelineMemento #799 follow-up lands), verdict in {admissible, refused, partial}. Integration test: prove_run_emits_durable_content_addressed_run_and_stage_receipts asserts the .proof bundle contains a ProofRunMemento + stage receipts, stage count matches VERIFIER_STAGE_VOCABULARY, receipt CIDs recompute, JCS round-trips, run CID recomputes, input artifact CIDs recorded, and the bundle reloads cleanly through load_all_proofs. cargo test -p provekit-verifier -p provekit-ir-types -p libprovekit: 427 passed, 0 failed. Note: cargo check -p provekit-cli currently blocked by a pre-existing provekit-walk/src/chain.rs Result vs Option mismatch — that's PR #837 (step #1, currently open) which adapts the CCP caller in provekit-walk. Co-authored-by: T Savo <agentwopr@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…pperMemento, validate_against, ParametricRealizationMemento Blocker #1: Mint RealizationPlanMemento after each kit dispatch in realize_function / apply_canonical_rewrite. Blocker #2: Call plan.validate_against(&realization) after construction; propagate RealizationPlanError. Blocker #3: Build synthetic ParametricRealizationMemento inline (one slot per param) via mint_realization_artifacts; catalog lookup path acknowledged as future enhancement per cmd_bind.rs gap comment. Blocker #4: Mint ObservationWrapperMemento when mode in {witness, monitor, dispatcher} and kit returns observation_wrapper_emission_record; call wrapper.validate before persisting; set plan.observation_wrapper_cid. Nit: used_sugars subset check in invoke_realize returns ext:unauthorized-sugar error when kit returns a CID not in the cited set. Add observation_wrapper_emission_record to RealizedSource (both kit_dispatch and cmd_transport layers) extracted from kit JSON response. Extend EngineResult with realization_plan_mementos and observation_wrapper_mementos; write to .provekit/bindings/ realization-plans/ and observation-wrappers/. Add three unit tests in cmd_transport for mint_realization_artifacts asserting: plan IS minted, validate_against passes, and wrapper IS minted for mode=witness. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Route contract sugar through realize kits * Add four substrate-side mints: RealizationPlanMemento, ObservationWrapperMemento, validate_against, ParametricRealizationMemento Blocker #1: Mint RealizationPlanMemento after each kit dispatch in realize_function / apply_canonical_rewrite. Blocker #2: Call plan.validate_against(&realization) after construction; propagate RealizationPlanError. Blocker #3: Build synthetic ParametricRealizationMemento inline (one slot per param) via mint_realization_artifacts; catalog lookup path acknowledged as future enhancement per cmd_bind.rs gap comment. Blocker #4: Mint ObservationWrapperMemento when mode in {witness, monitor, dispatcher} and kit returns observation_wrapper_emission_record; call wrapper.validate before persisting; set plan.observation_wrapper_cid. Nit: used_sugars subset check in invoke_realize returns ext:unauthorized-sugar error when kit returns a CID not in the cited set. Add observation_wrapper_emission_record to RealizedSource (both kit_dispatch and cmd_transport layers) extracted from kit JSON response. Extend EngineResult with realization_plan_mementos and observation_wrapper_mementos; write to .provekit/bindings/ realization-plans/ and observation-wrappers/. Add three unit tests in cmd_transport for mint_realization_artifacts asserting: plan IS minted, validate_against passes, and wrapper IS minted for mode=witness. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Code <agentwopr@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…e metadata via @substrate-signature The honest path #1: java lift produces what the rust lower needs. No more injection of rust-side metadata at integration time. LIFT (provekit-lift-java-term-shape): - Computes term_shape_cid as blake3-512 of JCS(body_shape) via BouncyCastle Blake3Digest. Stable substrate-canonical identity. - Output IR entry contains: visibility, generic_params, original_param_types, param_sort_cids, return_sort_cid, source_return_type, term_shape_cid — all recoverable from the @substrate-signature comment alone. - Bouncycastle dep added to pom. LOWER (provekit-realize-java-core): - @substrate-signature comment emitted FIRST, concept header IMMEDIATELY before the method (so JavaParser's getComment() returns the concept header for substrate identity recognition). - New concept handlers: concept:while-let, concept:if-let, concept:destructure-tuple, concept:destructure-struct, concept:try, concept:item-decl. Substrate principle satisfied: every target's realize plugin handles every catalog concept. - extractPatternBinding helper parses rust patterns (Some(x), Ok(v)) to extract the java binding variable. HONEST CYCLE RESULT on libprovekit-rpc-cross-platform: rust → ProofIR_A → java (with @substrate-signature) → ProofIR_B (full metadata recovery) → rust' (NO injection) 10/10 signatures recovered with correct visibility/generics/refs 7 functions emit with loss (java body_shape has different structural detail than rust source's term_shape — substrate- honest gap: the lift through java preserves CONCEPT identity but not all rust-side structural form choices) Zero loss in the java emission step (rust→java lossless via citations + carriers). The substrate's true scorecard for this cycle: identity-preserving, metadata-recovering, NO external injection. The 7 functions with loss are honest evidence of where rust→java→rust loses structural detail — concrete substrate-side gaps to close. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tive instead The rust kit's self-contract now comes from LIFTING native source + tests (the config-driven `provekit mint` → rust-lift), not from a bespoke .invariant DSL run by an orchestrator. Proven byte-substantial: 2.2 MB proof, contractSetCid blake3-512:e6c68587...; deterministic; pin test green. - Delete all 14 `*.invariant.rs` files (the DSL surface). - Gut `provekit-self-contracts/src/lib.rs`: remove the `#[path]` includes, `author_all_invariants`, `mint_self_proof`, MintResult/AuthoredSlab/SelfBridge, and the orchestrator-only tests. KEEP `lift_plugin_protocol` + `catalog_format` (cmd_prove's C1-C8 path) and the independent parse/serialize/marshal tests. - Delete the dead `mint-self-contracts` bin + `scan_self` (orchestrator-dependent) + the `rust-self-contracts` lift manifest; drop the `[[bin]]` entries. - Repoint `--kit=rust` (KIT_TABLE) off the orchestrator surface to the lifter; rewire `make mint-rust` to the config-driven lift. - Re-pin `rust.json` + the mint_kit_integration constant to the lifted CID (e6c68587; was the DSL CID). The DSL files are byte-neutral to the lift — deleting them leaves the contract set identical-in-kind (only self-contracts' own contracts drop). Kit #1 of the .invariant elimination. The substrate proves itself by lifting its own native tests, not by a private contract language. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

TSavo and others added 17 commits April 20, 2026 14:29

chore: gitignore .worktrees/

0d57c8f

Copilot AI review requested due to automatic review settings April 21, 2026 03:42

Copilot started reviewing on behalf of TSavo April 21, 2026 03:43 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 21, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

TSavo and others added 7 commits April 20, 2026 21:13

TSavo mentioned this pull request May 13, 2026

Route contract sugar through realize kits #840

Merged

This was referenced May 24, 2026

feat(cli): real provekit verify <kit> GATE verb — lift→dispatch→witness→receipt #1433

Merged

Full slow-lane green: python sidecar types + honesty guard, citation reconciliation, D7 visibility #1448

Merged

TSavo mentioned this pull request May 26, 2026

chore(rust-kit): eliminate the .invariant self-contract DSL — lift native instead #1541

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harness synthesis + property-test + 5 existing-loop refinements#1

Harness synthesis + property-test + 5 existing-loop refinements#1
TSavo merged 24 commits into
mainfrom
refinements

TSavo commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

Review skipped

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 21, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 21, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 21, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Uh oh!

coderabbitai Bot Apr 21, 2026

Uh oh!

coderabbitai Bot Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TSavo commented Apr 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TSavo commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading