Skip to content

Draft Pattern Predicate Protocol (PPP) v0.1.0#514

Merged
TSavo merged 17 commits into
mainfrom
ts-substrate-barrage
May 9, 2026
Merged

Draft Pattern Predicate Protocol (PPP) v0.1.0#514
TSavo merged 17 commits into
mainfrom
ts-substrate-barrage

Conversation

@TSavo
Copy link
Copy Markdown
Owner

@TSavo TSavo commented May 9, 2026

Summary

Drafts a new extension protocol that names the missing edge between an editorially-named bug class and a Fix Receipt Protocol receipt: how a producer authors the policy a fix receipt cites, with cryptographic identity, in a form a verifier can re-run locally.

The wire-level pipeline:

Editorial pattern (nefariousplan)
  -> predicate (CID, schema-bound, deterministic SQL)
  -> query application memento (predicate, substrate, lifter, result-set CIDs)
  -> closure witness (deterministic delta over pre/post applications)
  -> FRP receipt (signed, attached to commit .proof root)
  -> proofchain head (carries closure under verifier policy)

Existing protocols cover the rest of the stack: proofchain (what a witness IS), FRP (what a receipt IS), proof substrate (what a lifter fills), lift-plugin-protocol (C1-C8 conformance). PPP gives FRP's policyCid and closureWitnessCid their content shape and names the artifacts in between.

Load-bearing observation

Derived empirically from this morning's V4bel-RxRPC patch experiment (PR #510 substrate + borrowed-pages-as-scratch-v2.sql):

The predicate's substrate binding determines what patch shapes can witness closure under it.

A v2 predicate over call_edges witnesses patches that add or remove calls. V4bel's gate-widening patch (if (skb_cloned(skb)) to if (skb_cloned(skb) || skb->data_len)) does not change call edges and is invisible to v2. A v3 predicate over a future substrate-schema version that exposes gate conditions WOULD witness it. Section 8 walks the example end to end. Section 9.4 makes this an explicit growth function rather than a defect.

The closure-witness shape table in Section 5 is the structural keystone:

pre rows post rows shape
unchanged (vacuous)
R, R ≠ ∅ empty (full closure)
R S, S ⊊ R strict-subset (partial closure)
R R unchanged
R S, S ⊋ R grew
R S, S ∩ R ≠ ∅ ∧ S \ R ≠ ∅ ∧ R \ S ≠ ∅ non-monotonic

grew and non-monotonic are merge-blocking regression signals even when no fix receipt is claimed.

Sections

  • 0 Purpose
  • 1 Relation to existing protocols (proofchain / FRP / substrate / lift-plugin / ORP)
  • 2 Pattern, predicate, compilation arrow (signed memento)
  • 3 Predicate authoring (determinism, schema-binding, closed extension; v1 schema)
  • 4 Query application memento (mandatory lifterCid)
  • 5 Closure witness (shape table)
  • 6 Composition with FRP (one receipt per (predicate, gap) pair)
  • 7 Cross-language federation (per-language callee-mapping mementos)
  • 8 Worked example: borrowed-pages-as-scratch end to end
  • 9 Failure modes (lift drift, predicate drift, substrate drift, soundness, determinism, federation mismatch)
  • 10 Operational notes (distribution, revocation, catalog signing, receipt provenance)
  • 11 Pipeline diagram
  • A Canonical SQL admitted in v1 (SQLite-3 + JSON1, explicit function set)
  • B Reference implementation surface (provekit pp compile|run|witness|receipt)

Empirical grounding

Backed by today's experiments on linux@7.1.0-rc2:

Predicate Pre-patch matches Post V4bel-patch matches
borrowed-pages-as-scratch.sql (v1) 5 (40% TP) 5 (predicate sees no change; matches v3 schema gap)
borrowed-pages-as-scratch-v2.sql 2 (100% TP) 2 (same; gate-widening invisible at call_edges)

The two TPs are rxkad_verify_packet_1 (CVE-2026-43500, named in V4bel's disclosure) and rxkad_verify_packet_2 (substrate-discovered same-class sibling, not named in V4bel's disclosure). Section 8 walks the receipt structure that would close on both, and the substrate extension required to make V4bel's actual patch bytes witness closure.

Deferred to v0.2

  • Soundness-memento shape (currently a vague reference to "additional mementos"; should compose with the truth-discharge protocol).
  • Per-language callee-mapping table schema for federation.
  • Formal typed binding of predicate as a function from substrate(schema_v1) to result_set(locus_shape).

None are blocking; all are named gaps in the spec.

Test plan

  • No em-dashes / en-dashes in the spec text.
  • Section 8's worked example uses real CIDs and result sets observed today.
  • Section 5's closure-shape table is exhaustive (every pre/post pair classified).
  • Section 9 enumerates the failure modes that arose during the empirical work.
  • Reviewer to validate that v1 substrate schema (Section 3.1) matches what the C lifter emits today.
  • Reviewer to validate that the FRP composition in Section 6 doesn't conflict with FRP's nontriviality rule.
  • CI to confirm the spec markdown lints cleanly.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added function effect extraction (reads, writes, I/O, unsafe, panics, unresolved calls) to C kernel lifter.
    • Implemented contract composition system with deterministic content identifiers.
    • Introduced pattern predicate system for automated kernel security pattern detection.
    • Created Python tooling for kernel source lifting and predicate analysis automation.
  • Documentation

    • Added Contract Composition Protocol specification.
    • Added Pattern Predicate Protocol specification.
    • Added kernel security pattern research notes.
  • Tests

    • Added effect extraction integration tests.
    • Added contract composition determinism validation.

Review Change Stack

PPP specifies how an editorially-named bug class (e.g. a nefariousplan
pattern) becomes a content-addressed substrate query, how that query's
output becomes a memento, and how the output's delta across two
substrates discharges a fix receipt's policy.

Existing protocols name what a witness IS (proofchain), what a fix
receipt IS (FRP), what a substrate IS (proof substrate), and how lifters
fill it (lift-plugin-protocol). PPP names the missing edge: how a
producer authors the policy a fix receipt cites, with cryptographic
identity, in a form a verifier can re-run locally.

The pipeline:
  pattern (editorial)
    -> predicate (mechanical, schema-bound, deterministic)
    -> query application (predicate, substrate, lifter, result-set CIDs)
    -> closure witness (deterministic delta over pre/post applications)
    -> FRP receipt (signed, attached to commit .proof root)
    -> proofchain head (carries closure under verifier policy)

Borrowed-pages-as-scratch is the worked example (Section 8). The spec
includes the empirical observation surfaced by the V4bel/RxRPC patch
experiment: the predicate's substrate binding determines what patch
shapes can witness closure under it. A v2 predicate over call_edges
witnesses patches that change the call graph; gate-condition patches
require a substrate-schema extension. This is presented as the
operational growth function, not as a defect.

Sections:
  0  Purpose
  1  Relation to existing protocols
  2  Pattern, predicate, compilation
  3  Predicate authoring (determinism, schema-binding, closed extension)
  4  Query application
  5  Closure witness (shape table)
  6  Composition with FRP
  7  Cross-language federation
  8  Worked example: borrowed-pages-as-scratch
  9  Failure modes (lift drift, predicate drift, substrate drift,
     soundness, determinism violations, federation mismatch)
 10  Operational notes (distribution, revocation, catalog signing)
 11  Pipeline diagram
  A  Canonical SQL admitted in v1
  B  Reference implementation surface

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 9, 2026 16:47
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 9, 2026

Warning

Rate limit exceeded

@TSavo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 4 minutes and 58 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a5297b85-4734-49ed-aa06-9f168039e4d5

📥 Commits

Reviewing files that changed from the base of the PR and between e92f7da and 1855b04.

⛔ Files ignored due to path filters (1)
  • implementations/rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (46)
  • .provekit/ci/accepted/c/blake3-512:de2c1208b6ca445560e1a044327bc8a780709865c9a068871d894996c2b3d6cc35f0aa9ea8b0d8aa93318148ac7f2f2f15b06736a865202a256c961e070bceeb.job-result.json
  • .provekit/ci/accepted/c/blake3-512:f384ad03c7615c8fb450ac010d90b7075cab204ea39650b0f0d3f26f3500f93c01370b528be5b2c14618b8649d0eec110548d155871d162679e43aac122975ea.job-result.json
  • .provekit/ci/accepted/cpp/blake3-512:2ac37c29afe73131d4ed26879284230b88958b90cf693ad6a5facd4b8f3f25037cbd11f90a6b7368b3a4549401c09bc36cba317b67d85835c878e6f58018cb6f.job-result.json
  • .provekit/ci/accepted/cpp/blake3-512:4ad9afa4ee154a70caa78e21a31eab46d9c070b8de817c841002c291162256a027406ee515ea9831b1ce2d92b3e70b566f734b8150c1c9eea88ea9bd02f34fbc.job-result.json
  • .provekit/ci/accepted/csharp/blake3-512:20061dbf5e03ba7550a716f46d777f97d918299e93e84d3657e15ff950ac20cfc08ef40b70335b4a8fbc49b2e8c2eef1d083391a3800cb878053b236eed3c510.job-result.json
  • .provekit/ci/accepted/csharp/blake3-512:578295c0bb0afb28c340b40acf3c3e98cba28220f18773988615b93a1e020383d2cee6d740730549a50b00c450ee4a1f121a5043c3642790eaa661ac5d7069c2.job-result.json
  • .provekit/ci/accepted/go/blake3-512:1429648a5debe7fc4c76af353413d0378ab64b12c8c06f1abdcc700c964a0ea47c1899288acab63fd763f3b4f4101cb937afc80e4b00561fa12d732c57421051.job-result.json
  • .provekit/ci/accepted/go/blake3-512:177f0a9c3d6f39ea3fa1520668df9676c8a891ad7f2f65b44c524005f0612fa4c671a262ac4c5ef30a43c1e1ca63ad6586a3c083e800e35b5cb19a3f4febfb1f.job-result.json
  • .provekit/ci/accepted/java/blake3-512:7b52d628aae35dbc7335df0522b68d8990a962c1252f4b10c26a3a257d75614be537b85e3b0f3d0eca8d626a4a795f8b54f4ae053b863b5f2bc155d6fb74b819.job-result.json
  • .provekit/ci/accepted/java/blake3-512:fb1d8750fd07ae4b651d8ca911378a91d0fc74865d13d27e0c851f9794b11df7c582e6366839c2181c0241da3d424cb294d5a88319ab1c1cd046c34d1253f00d.job-result.json
  • .provekit/ci/accepted/python/blake3-512:09ea25611717aa598bc7e62d60b6075b235df800c3bc59d0f32ef2944a3ce4eb0c598699464e93c24153916b390ba457477568779e1519d297a2ac641f9d8a5f.job-result.json
  • .provekit/ci/accepted/python/blake3-512:9b61a402902e9c8b323c9e990b27c68581766713f1b0204edc990ac920187ba4ca82930cbaeb95350bb217de510af46840bb0d226d8d4a37738788787b6c0397.job-result.json
  • .provekit/ci/accepted/ruby/blake3-512:4c17f3978436947cc5d8d0e24506459cd44e3838100919c0ec973b19e66beff07ecef3d617536cdefccd57fbbe75cfd34a9e0d7e11974c8ab0aa56d790abfa07.job-result.json
  • .provekit/ci/accepted/ruby/blake3-512:8f23849dbf9d4cb3580fcab23a1d2ce415bf6cf3153f235215564392dd7d503ab995194ae450df871999025a5d8c1ce2bf529364ac329d5b6e58cffa7cf67793.job-result.json
  • .provekit/ci/accepted/rust/blake3-512:70ed2618a92ef0d09e5bef07132e5d0bd9fdb1ce201fc51c499a6d37a3508a0f0ed781b9ab8187e1ff7ce07b86c7165a3c3014cfe42e3e6f33f82b193b2a69a3.job-result.json
  • .provekit/ci/accepted/rust/blake3-512:cb5386fcdfe3202e7b1c702c00fed18403920eb55a02883d44293dcb9a676f65acd6023ba3611df89540cb3c498c1149fd6ab9f7bdf6b892a737ee4e92e1b2b8.job-result.json
  • .provekit/ci/accepted/swift/blake3-512:26ed22b0ca752bac02c263196af2be8cc54469f59845166e291bc565cff36628cc03ce66bb3e1fa92721ff36c306f331fb861b5e72e0ec4a5185fbf9acc7207c.job-result.json
  • .provekit/ci/accepted/swift/blake3-512:5cb75d546d5680a872e0233e4a42e118f0e3cfb6b6d22014f48919e020e738970d6f5f10bc908c87799bdcebd781f3c8a47b77e2d2908bec0d7b278ecf5f925d.job-result.json
  • .provekit/ci/accepted/ts/blake3-512:815b4808e6c5b5501b5630b338e55eecd82138f9861f47359f8db39f3f4371c19b64339c556ac344f2068c4b107f87c1b8f2eaf53ca4c658a991ee79dd86f8e3.job-result.json
  • .provekit/ci/accepted/ts/blake3-512:d434917f35f477c7d5f177ae5deecabb41b490a8287048014a63ce52fc34006850ebaa6a022eff04d4057042830ff6785e5199b7e6c6f4a65ddf47ab1eefafc7.job-result.json
  • .provekit/ci/accepted/zig/blake3-512:2b628143573f4b1cf89c409b139138d8022a687be0d56ba897326b88178fee8dea5b58cd42aa55753c3979ae3f137a0713b3501155bd0bbdcccebda84298a961.job-result.json
  • .provekit/ci/accepted/zig/blake3-512:78ee8397975f1a278deddbe1032ed8b846598e7625c28b87da4668369805d13cce915aeb94eb191446c91c3223387ffc1218023874c3d91411dd2db0ebd358f9.job-result.json
  • implementations/c/provekit-lift-c-kernel-doc/.gitignore
  • implementations/c/provekit-lift-c-kernel-doc/Makefile
  • implementations/c/provekit-lift-c-kernel-doc/src/composition.c
  • implementations/c/provekit-lift-c-kernel-doc/src/kernel_doc.c
  • implementations/c/provekit-lift-c-kernel-doc/tests/fixtures/composition_basic.c
  • implementations/c/provekit-lift-c-kernel-doc/tests/integration.sh
  • implementations/c/provekit-lift-core/src/clang_ast.c
  • implementations/rust/libprovekit/Cargo.toml
  • implementations/rust/libprovekit/include/provekit-compose.h
  • implementations/rust/libprovekit/src/ffi.rs
  • implementations/rust/libprovekit/src/lib.rs
  • implementations/rust/libprovekit/tests/ffi_smoke.rs
  • implementations/rust/provekit-cli/Cargo.toml
  • implementations/rust/provekit-cli/src/cmd_compose.rs
  • implementations/rust/provekit-cli/src/main.rs
  • implementations/rust/provekit-cli/tests/compose_rpc_smoke.rs
  • menagerie/bug-zoo/species/BZ-COMPOSITION-001-cross-language-equivalence/.gitignore
  • menagerie/bug-zoo/species/BZ-COMPOSITION-001-cross-language-equivalence/README.md
  • menagerie/bug-zoo/species/BZ-COMPOSITION-001-cross-language-equivalence/lab/c/chain.c
  • menagerie/bug-zoo/species/BZ-COMPOSITION-001-cross-language-equivalence/lab/c/chain.h
  • menagerie/bug-zoo/species/BZ-COMPOSITION-001-cross-language-equivalence/lab/rust/Cargo.toml
  • menagerie/bug-zoo/species/BZ-COMPOSITION-001-cross-language-equivalence/lab/rust/src/lib.rs
  • menagerie/bug-zoo/species/BZ-COMPOSITION-001-cross-language-equivalence/runner.sh
  • protocol/specs/2026-05-09-pattern-predicate-protocol.md

Walkthrough

PR introduces three major subsystems: (1) conservative C function effect extraction (reads, writes, IO, unsafe, panics, unresolved calls) integrated into the libclang-based lifter; (2) Rust canonical contract composition protocol with opacity discharge and capture-avoiding substitution; (3) SQLite-based pattern-predicate infrastructure (Python tools, SQL templates, kernel experiments) for deterministic vulnerability detection. Two protocol specifications document CCP and PPP architectures.

Changes

Effect Extraction & Composition with Pattern Predicates

Layer / File(s) Summary
C Effect Type Definitions & Public API
implementations/c/provekit-lift-core/include/provekit/c_lift_core.h
New pk_c_effect_kind enum and pk_c_function_effect struct; pk_c_function_fact extended with effects array; public APIs for emission, addition, and kind-to-string mapping.
C Effect Extraction Implementation
implementations/c/provekit-lift-core/src/effects.c
Conservative AST walker tags function effects: detects reads/writes via lvalue resolution, IO/panics via name matching, unsafe constructs, and unresolved calls via callee shape inspection. 735 lines of analysis logic.
C Effect Integration & Emission
implementations/c/provekit-lift-core/src/clang_ast.c, src/parser.c, Makefile
clang_ast invokes extraction during traversal; parser deduplicates and emits JSON; Makefile conditionally includes effects.c.
C Effect Fixtures & Integration Tests
implementations/c/provekit-lift-c-kernel-doc/tests/fixtures/effects_basic.c, tests/integration.sh
Seven test functions exercise all effect kinds; integration assertions validate effect-kind JSON output.
Rust Dependencies & Module Setup
implementations/rust/libprovekit/Cargo.toml, src/lib.rs, implementations/rust/provekit-walk/Cargo.toml
libprovekit adds provekit-ir-types dependency; declares compose module; provekit-walk adds libprovekit dependency.
Rust Effect & Contract Models
implementations/rust/libprovekit/src/compose.rs (lines 34–520)
Effect enum (reads/writes/IO/unsafe/panics/unresolved), EffectSet with purity checks, AliasingMemento, Locus, FunctionContractMemento bundling all metadata with canonical bytes/CID.
Rust Opacity Discharge & Validation
implementations/rust/libprovekit/src/compose.rs (lines 526–683)
OpacityMementoLookup trait, EmptyOpacityPool, OpacityError, and EffectSet::check_opacity_effects for validation.
Rust Composition Primitives & Algorithms
implementations/rust/libprovekit/src/compose.rs (lines 688–1370)
compose_function_contracts, compose_function_contracts_checked, compose_with_composed, compose_chain_contracts; capture-avoiding substitution; free-variable analysis; JCS bytes and CID computation.
Rust Composition Smoke Test
implementations/rust/libprovekit/tests/compose_smoke.rs
Tests compose_chain_contracts on pure identity contracts; asserts CID prefix, component count, determinism, and pins CID constant.
Rust Walk Refactoring
implementations/rust/provekit-walk/src/contract.rs, src/locus.rs, src/type_decl.rs
contract.rs re-exports libprovekit::compose; locus.rs re-exports Locus and adds from_span constructor; type_decl.rs imports LocusFromSpanExt trait.
Pattern Predicate Python Tools
menagerie/pattern-predicate-protocol/tools/lift_kernel.py, tools/run_predicates.py
lift_kernel.py walks source tree, invokes C lifter, persists callEdges to SQLite; run_predicates.py executes all .sql predicates, computes blake3-512 CIDs, prints results.
Pattern Predicate SQL Definitions
menagerie/pattern-predicate-protocol/predicates/*.sql
Six SQL predicates: borrowed-pages-as-scratch (v1 & v2), copy_from_user-no-bounds, spin-lock-no-unlock, kmalloc-no-free-locally, rcu-read-lock-no-unlock.
PPP Experimental Records & Notifications
menagerie/pattern-predicate-protocol/experiments/2026-05-09-*.md, notifications/2026-05-09-*.md
Three experiment/notification artifacts documenting net-only and wide-kernel runs, predicate results, triage, empirical observations, and vulnerability notification coordination.
PPP Reference Documentation
menagerie/pattern-predicate-protocol/README.md
Explains exploratory tooling vs spec reference, lists repository layout, provides quickstart, enumerates caveats, describes predicate provenance.
Contract Composition Protocol Specification
protocol/specs/2026-05-09-contract-composition-protocol.md
CCP specification: compose_chain_contracts primitive contract, per-language effect kinds, eager/lazy materialization, three cross-language binding modes, formal algebra, worked example, reference implementation scope (libprovekit compose.rs, C FFI, CLI, corpus).
Pattern Predicate Protocol Specification
protocol/specs/2026-05-09-pattern-predicate-protocol.md
PPP v0.1.0 specification: pattern→predicate→substrate→query-application→closure-witness→FRP-receipt pipeline; determinism/schema binding; federation requirements; worked example; failure modes; operational guidance; reference implementation commands.
Research & Protocol Documentation
docs/research/sparse-annotation-as-predicate.md
Research note documenting current sparse annotation lifting and five predicate strategy ideas with priority recommendations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • TSavo/provekit#495: Related C lifter family changes (provekit-lift-core, sparse lifter, clang AST, Makefile, test integration).
  • TSavo/provekit#370: Related Rust composition algebra refactoring and provekit-walk re-exports (FunctionContractMemento, Effect, EffectSet, Locus, compose functions).

Poem

🐰 Effects now traced in every call,
Contracts composed through functions all,
Patterns dance in SQLite rows—
Where the kernel's heart of vulnerability shows!

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ts-substrate-barrage

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 622ba97ed3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +485 to +488
admitted built-in function set is the SQLite core plus JSON1's
`json_extract`, `json_each`, `json_array`, `json_object`,
`json_array_length`, `json_type`, `json_valid`. Functions outside this
set break determinism guarantees and disqualify a query from being a
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict v1 predicate functions to deterministic SQLite built-ins

This whitelist currently permits the entire SQLite core function set, which includes non-deterministic functions like random()/randomblob(). That conflicts with Section 3's determinism requirement and allows "valid" v1 predicates that cannot reproduce bit-identical result sets across runs, leading to verifier fail-closed behavior for otherwise compliant receipts. Narrow the whitelist to deterministic core functions (or explicitly exclude non-deterministic ones) so the admitted set matches the protocol’s reproducibility guarantees.

Useful? React with 👍 / 👎.

TSavo and others added 2 commits May 9, 2026 09:52
Companion exploratory artifact set for the PPP spec at
protocol/specs/2026-05-09-pattern-predicate-protocol.md (PR #514).

This destination is the experimental record + tooling that produced
the spec, not yet the v1 reference implementation referenced in PPP
Appendix B (which should be a Rust extension to provekit-cli exposing
`provekit pp compile|run|witness|receipt`).

What's here:

  predicates/                       Six SQL predicates, one per pattern
    borrowed-pages-as-scratch.sql              v1, function-local
    borrowed-pages-as-scratch-v2.sql           v2, caller-aware
    copy_from_user-no-bounds.sql               userspace-bounds heuristic
    spin-lock-no-unlock.sql                    lock-pairing heuristic
    rcu-read-lock-no-unlock.sql                RCU read-side pairing
    kmalloc-no-free-locally.sql                alloc/free pairing

  tools/
    lift_kernel.py                  Walk a kernel subtree, lift to SQLite
    run_predicates.py               Run every .sql against a substrate

  experiments/
    2026-05-09-linux-kernel-net.md  Receipt of the run that produced
                                    the spec: linux@7.1.0-rc2 net/ subtree,
                                    347 files, 48,701 call edges, five
                                    predicates fired, full triage of the
                                    borrowed-pages-as-scratch results
                                    (v1: 5 candidates 40% TP rate;
                                     v2: 2 candidates 100% TP rate),
                                    and the empirical observation that
                                    grounds PPP Section 8: V4bel's
                                    gate-widening patch is invisible to
                                    a call_edges predicate, motivating
                                    a substrate-schema extension to
                                    expose gate conditions.

The README is explicit about caveats versus the spec: predicate CIDs
use blake2b instead of BLAKE3, no signed query-application or closure-
witness mementos are emitted, the lifterCid is not recorded.  These
are non-architectural; the proper reference impl will close them.

Substrate-discovered finding: rxkad_verify_packet_2 (net/rxrpc/rxkad.c)
is the same-class sibling of CVE-2026-43500 (rxkad_verify_packet_1)
and is not named in V4bel's public disclosure. The cluster predicate
identifies it from structure alone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second-pass receipt over a 12.5×-larger kernel surface (4332 files,
621,100 call edges across net/, crypto/, security/, drivers/net/,
mac80211/, wireless/, bluetooth/).

Findings:

  - borrowed-pages-as-scratch v1+v2 produce IDENTICAL match sets on
    the wide substrate as on net/-only. The class is concentrated in
    net/rxrpc; outside that subsystem the lifted surface does not
    contain receivers of this shape. That is itself a useful
    structural claim about the kernel, mechanically established.

  - copy_from_user-no-bounds widens 9 -> 28: PPP I/O paths, wireless
    debugfs writers, WWAN core, security-module userspace interfaces.
    Heuristic; candidates for editorial follow-up triage.

  - spin-lock-no-unlock widens 2 -> 5: ethernet driver
    acquire-and-return-locked patterns, paired across function
    boundaries; predicate's function-level scope can't see them.
    Likely all FPs.

  - rcu-read-lock-no-unlock new finding: smk_seq_start in
    security/smack. Standard seq_file lock-start / unlock-stop
    pattern; predicate can't see across vtable. FP.

  - kmalloc-no-free-locally widens 43 -> 459: most are
    return-the-allocation patterns. Predicate too coarse.

The borrowed-pages predicate's stability across 12.5× more files is a
positive signal: the predicate is not hallucinating across the wider
surface, and the kernel does not have a long tail of borrowed-pages
instances hiding in unaudited subsystems.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

TSavo and others added 7 commits May 9, 2026 10:11
Adds menagerie/pattern-predicate-protocol/notifications/
2026-05-09-rxkad-verify-packet-2.md as the receipt-chain marker for
the responsible-disclosure step that follows the
2026-05-09-linux-kernel-net.md experimental record.

Captured: recipient (imv4bel@gmail.com), subject, signing key
fingerprint (5FD21B4FE7E4A3CA7971CB09DE6639788E091026, Kevlar), body
BLAKE2b-512 hash, 48-hour publication hold start.

The signed email body itself is private correspondence and is NOT
committed; the hash lets the recipient (or any future auditor) verify
against the bytes received.

The substantive claim communicated is summarized: rxkad_verify_packet_2
shares the in-place pattern of rxkad_verify_packet_1
(CVE-2026-43500), is dispatched from the same parent on a different
security_level branch (ENCRYPT vs AUTH), with a per-fire write
primitive of sp->len rather than a fixed 8 bytes. V4bel's submitted
upstream gate-widening patch covers both via skb_unshare upstream of
the security-ops dispatch.

This is the responsible-disclosure step in the receipt chain that the
PPP spec describes (pattern -> predicate -> substrate -> result ->
triage -> notification -> public writeup -> FRP receipt). Each step
is content-addressable; the notification's hash anchors the email
without leaking the body.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CCP defines how atomic contract mementos compose across function call
sites into ComposedFunctionContract mementos whose CIDs are the
algebraic composition of their atomic constituents. The composed CID
is what the handshake algorithm's tier 2 cache reuses for O(1)
discharge of structurally-equivalent chains across any future program
in any language.

Architectural call (made jointly during today's session under Supra
omnia, rectum): composition MUST be a single canonical primitive in
libprovekit, exposed via FFI / CLI / direct-link to every lifter and
verifier. Multi-implementation across N languages is rejected because
it multiplies the trust surface and breaks federation under any
divergence.

The build order:

  1. Per-language effects extraction (lifter prerequisite)
  2. Extract compose_chain_contracts from provekit-walk into libprovekit
  3. Expose libprovekit compose via Rust direct-link, C ABI FFI, and
     a `provekit compose` CLI subprocess speaking JSON-RPC
  4. Wire each lifter to call the canonical compose
  5. Bug-zoo cross-language equivalence specimen as the federation
     guarantee in executable form

Sections:
   0  Purpose
   1  Relation to existing protocols
   2  The composition function (signature, determinism, refusal rules)
   3  Effects (per-language prerequisite, extraction notes)
   4  Materialization timing (eager / lazy, both produce same CIDs)
   5  Canonical implementation in libprovekit
   6  Binding modes (Rust direct, C ABI FFI, JSON-RPC subprocess)
   7  Cross-language equivalence (the bug-zoo specimen)
   8  Failure modes (impure, schema-version, effect incompatibility,
      determinism, lifter effects-tracking gap)
   9  Composition algebra (8 formal rules)
  10  Worked example: kernel C function chain (kmalloc/memset/zalloc)
  11  Versioning and revocation
  12  Pipeline diagram
   A  Canonical encoding for compose inputs
   B  Reference implementation surface

CCP composes with the existing protocols (memento envelope grammar,
handshake algorithm, contract merge semantics, lift-plugin protocol,
PPP, FRP) without schema changes. Implementation is multi-PR: extract
compose into libprovekit, expose bindings, add per-language effects
extraction to each lifter family.

Under Supra omnia, rectum: this is the architecturally correct shape
for cross-language composition. Federation across languages requires
byte-identical composition CIDs; only a single canonical
implementation guarantees that.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Contract Composition Protocol (CCP) v1.0.0 specified at
protocol/specs/2026-05-09-contract-composition-protocol.md refuses
to compose any subtree containing an impure atom. Without effects
extraction the C lifter emits atoms that look pure to the composer,
which would either silently produce unsound composed contracts
(liberal, broken) or be globally refused (the lift becomes useless
to composition). CCP section 3 names the per-language effects
extraction as the prerequisite for sound composition.

The Rust lifter (provekit-walk) already extracts the six canonical
effect kinds from MIR. Bring the C lifter to parity using the
libclang AST. The new pass walks each FunctionDecl body cursor
after the existing structural visit and records:

  - Reads { target } on MemberRefExpr / ArraySubscriptExpr /
    UnaryOperator(deref) used as an rvalue.
  - Writes { target } on the LHS of an assignment / compound
    assignment / ++ / --. The assignment operator is identified by
    tokenizing the cursor extent and locating the punctuation token
    that sits between the first and second children, since libclang's
    BinaryOperatorKind accessor is version-gated.
  - Io on calls to a sorted allowlist of kernel-style entry points
    (copy_from_user, copy_to_user, kmalloc family, kfree, vmalloc
    family, printk, pr_/dev_ prefix families, sysfs_/debugfs_/netlink_
    prefix families, register_chrdev, ioctl).
  - Unsafe on AsmStmt and on CStyleCastExpr that targets a non-void
    pointer type (type punning).
  - Panics on calls and macro instantiations of BUG, BUG_ON,
    BUG_ON_ONCE, WARN, WARN_ON, WARN_ON_ONCE, panic, abort, assert.
  - UnresolvedCall { name } on calls whose callee resolves to a
    struct field (ops-table dispatch like x->ops->method(...)) or a
    VarDecl/ParmDecl (function-pointer variable).

Conservative tagging is acceptable per CCP section 3; liberal is
unsound. The walker tags Writes with target "<unknown>" when the
LHS root cannot be named, prefers the Reads tag on any complex
expression that might be a read, and degrades gracefully on partial
ASTs produced by libclang's KeepGoing + Incomplete modes by
null-checking every clang_getCursorReferenced result.

The new pk_c_function_fact carries an effects array; the new
pk_c_emit_function_effects emits a function-effects declaration
into the IR-document for each function with a body, matching the
JSON shape of CCP section 3 ({"kind":"Reads","target":"x"} etc.).
Existing contract emission is untouched; effects are additive.

Effects extraction lives in provekit-lift-core/src/effects.c and
is only compiled when libclang is available (the existing stub
fallback path remains effects-free, and consumers see the
established ast-backend-unavailable opacity entry that already
flags the soundness gap on stub builds).

Test: tests/fixtures/effects_basic.c provides one function per
effect kind plus pure_function with the empty effect set.
integration.sh asserts each effect kind appears on the right
function. The assertions live inside the existing libclang
availability gate so make test stays meaningful on stub builds.

Two libclang quirks worth flagging for follow-up agents: tokenizing
a BinaryOperator cursor's full extent returns LHS punctuation
(member access dots) before the operator, so the operator probe
must skip past the first child's end offset; and macro-expansion
panic detection relies on the kernel headers not being on the
include path, which is true for our standalone fixtures but may
need rework when the kernel compile context lands real headers.
The Contract Composition Protocol (CCP) v1.0.0 specified at
protocol/specs/2026-05-09-contract-composition-protocol.md mandates
that compose_chain_contracts be a single canonical primitive in
libprovekit, callable identically from every language lifter and
verifier. CCP sections 2, 5, and 9 name the function, its
determinism guarantees, and the eight algebra rules; section 5
explicitly places the implementation at libprovekit/src/compose.rs.

Until now the function lived inside provekit-walk/src/contract.rs,
which meant only the Rust walking lifter could call it. The C
lifter, future Java / Go / TypeScript / Python lifters, and the
provekit-cli compose subcommand could not reach the canonical
implementation. Federation across languages depends on
byte-identical composed CIDs from a single source of truth.

This commit is plumbing only: move plus rewire plus smoke test.
The composition algebra is unchanged; CIDs are byte-identical
pre-refactor and post-refactor (the existing 238 walk lib tests
continue to pass without modification, including the deterministic
CID assertions that would catch any algebra drift).

Moved to libprovekit::compose:

  - compose_chain_contracts, ChainStep
  - compose_function_contracts, compose_function_contracts_checked,
    compose_with_composed
  - FunctionContractMemento, ComposedFunctionContract
  - EffectSet, Effect, AtomicKind
  - AliasingMemento, AliasingStatus
  - OpacityError, OpacityMementoLookup, EmptyOpacityPool,
    PinInvariantMementoView
  - Locus (data type only; the syn-driven from_span constructor
    stays in walk so libprovekit doesn't gain a syn dependency)
  - build_value, build_memento_value, sort_to_value
  - The capture-avoiding substitute_in_formula and the JCS / CID
    canonical-encoding glue (formula_to_canonical, cid_of_value,
    jcs_bytes_of_value), duplicated from walk's wp.rs and
    canonical.rs so the compose module is self-contained. Walk's
    pre-existing copies remain untouched; both impls operate over
    identical types from provekit-ir-types and produce
    byte-equivalent output by construction.

Walk's contract.rs is now a thin shim: pub use everything from
libprovekit::compose, plus the syn-walking helpers
(build_function_contract, detect_effects, scan_*_for_effects,
extract_formals, infer_sort, is_io_method, is_known_pure_method,
is_known_pure_call) and the existing 32 tests. Existing
'use crate::contract::FunctionContractMemento' paths in walk's
other modules (chain.rs, marriage.rs, llbc_calls.rs, llbc_lift.rs,
envelope.rs, type_decl.rs, charon_runner.rs) all continue to
resolve through the re-export, so no churn outside locus.rs and
contract.rs.

Walk's locus.rs adopts libprovekit::compose::Locus via re-export
and supplies a syn-using from_span free function plus a
LocusFromSpanExt trait so the existing five Locus::from_span
call sites in type_decl.rs keep working with one extra import.

provekit-verifier and provekit-cli only refer to the compose
primitive in comments today, so no rewiring was needed there.
They continue to build clean.

Smoke test at libprovekit/tests/compose_smoke.rs composes two
trivial pure atoms via compose_chain_contracts and asserts the
composed CID is byte-stable across runs against a pinned hex
value. Per CCP section 5 ("Test corpus") and section 11
("Versioning and revocation"), any future change that alters this
pinned CID requires a CCP version bump; the test is the
conformance witness for v1.0.0 of the algebra.

No FFI work; the C ABI wrapper at libprovekit/include/provekit-compose.h
and the JSON-RPC subprocess CLI subcommand are follow-up commits
(spec sections 6.2 and 6.3, both marked planned in Appendix B).
The C lifter's federation hookup is a separate agent's PR in
parallel.

Build verified:
  cargo build --release -p libprovekit -p provekit-walk -p provekit-cli
  cargo test -p libprovekit                 # 1 passed (compose_smoke)
  cargo test -p provekit-walk --lib         # 238 passed
  cargo test -p provekit-walk               # 238 + 12 integration files passed

Pinned smoke-test CID for the two-atom identity-result fixture:
blake3-512:36212b7bf7b9ccf264950940a33d64e1cfe88b6f4d8a47c01949fc64d9359d1813d6147aa2e1afe82b01e6e7ebcbe0a413683284b5f47ffef5bf364213304665

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captured during the contract-as-predicate exploration agent fanout
(2026-05-09). Five concrete predicate proposals that need substrate
extensions to compile: __user direct deref, __rcu without
rcu_dereference, __force cast manifest, __iomem direct deref,
__bitwise contamination. Sources lifter analysis at
implementations/c/provekit-lift-c-sparse/ and identifies the
parameter-name binding gap and missing deref-site facts as the
shared substrate extension that unblocks 3 of the 5 predicates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…CP section 7

This specimen is the load-bearing federation test for the entire Contract
Composition Protocol. It pairs a tiny pure Rust chain with a structurally
equivalent C chain and asserts that the canonical compose_chain_contracts
primitive (libprovekit, CCP section 5) produces byte-identical
ComposedFunctionContract CIDs from both sides.

Layout follows the brief flat shape: README, lab/rust, lab/c, runner.sh.
The Rust crate exposes vec_double_then_filter_positive_then_sum composed
of three pure single-line helpers (double, keep_positive, sum). The C
source mirrors the same algebra and arithmetic with the same per-helper
pre/post comments, so the only meaningful difference between the two
inputs is the surface language. Any divergence in the composed CID is
therefore attributable to the lifters or to the canonical compose
primitive, not to the reference algebra.

The runner stubs cleanly to PENDING (exit 2) when the C ABI FFI binding
or the Rust lifter CLI surface is not yet wired, per CCP section 6.2 and
section 6.3. This is the accepted v0 state: the specimen is scaffolded so
the assertion can be filled in as the bindings land.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
menagerie/pattern-predicate-protocol/predicates/rcu-read-lock-no-unlock.sql (1)

15-23: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Correlate lock/unlock by both function and path, not function name alone.

Line 17 and Line 22 use only caller_function; this conflates same-named functions across files and distorts results.

Key by `(caller_function, callsite_path)`
 WHERE NOT EXISTS (
   SELECT 1 FROM call_edges g
   WHERE g.caller_function = r.caller_function
+    AND g.callsite_path = r.callsite_path
     AND g.callee_name IN (
       'rcu_read_unlock', 'rcu_read_unlock_bh', 'rcu_read_unlock_sched'
     )
 )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@menagerie/pattern-predicate-protocol/predicates/rcu-read-lock-no-unlock.sql`
around lines 15 - 23, The WHERE clause currently correlates lock/unlock only by
caller_function which conflates same-named functions; update the join to key on
both caller_function and callsite_path so the EXISTS checks the same callsite:
in the subquery on call_edges (alias g) add AND g.callsite_path =
r.callsite_path alongside g.caller_function = r.caller_function (and keep
g.callee_name IN (...) as-is) so the predicate matches by (caller_function,
callsite_path).
🧹 Nitpick comments (2)
menagerie/pattern-predicate-protocol/experiments/2026-05-09-linux-kernel-wide.md (1)

64-68: 💤 Low value

Acronym collision: "PPP" overloaded with Pattern Predicate Protocol.

This document otherwise uses "PPP" exclusively for Pattern Predicate Protocol (e.g. "PPP-canonical receipt machinery", "PPP Appendix B"). Using "PPP I/O paths" to mean the Linux Point-to-Point Protocol driver in the same paragraph is going to confuse a reader who jumped straight to this section. Disambiguating once is enough.

📝 Suggested wording
-- **PPP I/O paths** (`drivers/net/ppp/ppp_*.c`): `ppp_async_ioctl`,
+- **Point-to-Point Protocol driver I/O paths** (`drivers/net/ppp/ppp_*.c`): `ppp_async_ioctl`,

And in the table on line 28:

-| `copy_from_user-no-bounds.sql` | 9 | 28 | +19 (drivers/net PPP, TUN, debugfs; security/apparmor; security/keys) |
+| `copy_from_user-no-bounds.sql` | 9 | 28 | +19 (drivers/net ppp driver, TUN, debugfs; security/apparmor; security/keys) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@menagerie/pattern-predicate-protocol/experiments/2026-05-09-linux-kernel-wide.md`
around lines 64 - 68, The paragraph uses "PPP" for both Pattern Predicate
Protocol and Linux Point-to-Point Protocol, causing confusion; update the
mention of the kernel drivers and function list (ppp_async_ioctl, ppp_ioctl,
ppp_set_compress, ppp_write, ppp_sync_ioctl) to disambiguate by expanding the
acronym once—e.g., change "PPP I/O paths" to "PPP (Point-to-Point Protocol) I/O
paths" or "Linux PPP I/O paths" and keep the rest of the paragraph as-is so
readers know this PPP refers to the kernel driver rather than the document's
Pattern Predicate Protocol.
implementations/c/provekit-lift-c-kernel-doc/tests/integration.sh (1)

317-357: 💤 Low value

Effect assertions are coupled to JSON key emission order.

Each grep -qE pins the object key sequence to function, kind, effects. The effects emitter uses fixed insertion order in its snprintf; if this were refactored to use JCS-canonical output (alphabetical keys: effects, function, kind, locus) — which is the convention used elsewhere in this PR for content-addressed mementos — every one of these assertions would silently regress to a misleading FAIL.

Two hardenings to consider:

  • Match each key independently with per-key grep against a per-function JSON object slice, or
  • Use jq to parse and assert against a normalized projection.

Not a blocker for the draft; brittleness worth addressing while the emitter format is still in flux.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@implementations/c/provekit-lift-c-kernel-doc/tests/integration.sh` around
lines 317 - 357, The test assertions currently check EFFECTS_RESPONSE with
ordered regexes (e.g., grep -qE
'"function":"pure_function","kind":"function-effects","effects":\[\]') which
couples tests to JSON key order; update the checks to be order-independent by
parsing the JSON output and asserting per-function properties: either
(preferred) use jq to find the object with .function == "pure_function" and
assert .effects == [] or .effects[].kind contains the expected kind, or
(simpler) isolate the JSON object for each function name from EFFECTS_RESPONSE
and run independent grep checks for the "function", "kind", and "effects" keys;
target the test harness symbol EFFECTS_RESPONSE and the function names
pure_function, writes_function, reads_function, io_function, unsafe_function,
panics_function, unresolved_call_function when applying the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/research/sparse-annotation-as-predicate.md`:
- Around line 119-123: The paragraph starts with "#1" which triggers MD018 and
may be parsed as a malformed heading; change the leading "#1" marker to a normal
numbered list token (e.g., "1.") or escape/code-format it (e.g., "`#1`") so it
isn't interpreted as a heading, keeping the rest of the text (mentions of "#2",
"#3", "#4", "#5" and "__user") unchanged; update the same sentence that begins
with "#1" to use the new marker to avoid the MD018 lint error.

In `@implementations/c/provekit-lift-c-kernel-doc/tests/integration.sh`:
- Around line 297-298: Update the inline comment "Per-function effects
extraction per CCP v1.0.0 section 3." to match this PR's protocol version by
changing "v1.0.0" to "v0.1.0" or remove the explicit version string entirely;
locate the comment in the tests/integration.sh block that contains that exact
text and edit it so it references CCP v0.1.0 (or omits the version) to keep the
comment consistent with the PR and the spec file.

In `@implementations/c/provekit-lift-core/src/clang_ast.c`:
- Around line 166-180: pk_c_clang_find_function currently returns the first fact
matching name, which can be the forward-declaration (has_body==0) so effects get
attached to the prototype and later ignored; change the lookup to prefer a fact
with the same name and has_body == 1 (return that if found), and only if no such
body-entry exists fall back to returning a prototype fact (has_body == 0); apply
the same fix to the other lookup at lines 579-586 (i.e., ensure callers that
search functions by name prefer the has_body==1 fact or use the updated
pk_c_clang_find_function behavior).
- Around line 769-774: The call to
pk_c_emit_function_effects(facts->extraction_result, facts) currently ignores
its return value; change it to check the return and propagate failure like the
AST-walk failure path: if pk_c_emit_function_effects returns an error
(non-success), propagate that to the caller (e.g., return an error code or NULL
from the current parse function) or mark facts->extraction_result as failed and
abort parsing rather than continuing; locate the block that creates
facts->extraction_result (pk_c_lift_result_new) and the subsequent
pk_c_emit_function_effects call and implement the same
failure-handling/propagation logic used by the AST-walk routine so missing
function-effects are surfaced.

In `@implementations/c/provekit-lift-core/src/effects.c`:
- Around line 723-735: pk_c_extract_function_effects currently ignores
clang_visitChildren()'s return so failures from pk_c_effects_walk (which returns
CXChildVisit_Break when pk_c_function_fact_add_effect() fails) are swallowed;
change pk_c_extract_function_effects to return an int error code (0 on success,
nonzero on failure), call clang_visitChildren and return its nonzero result to
the caller, and update callers to treat nonzero as an extraction failure and
mark the corresponding pk_c_function_fact as opaque/closed; ensure references to
pk_c_effects_walk and pk_c_function_fact_add_effect remain unchanged except for
their existing break-on-failure behavior so the over-approximation guarantee is
preserved.

In `@implementations/rust/libprovekit/src/compose.rs`:
- Around line 1018-1042: serde_to_canonical currently maps serde_json::Number
that don't fit in i64 into a synthetic object which violates JCS number
canonicalization; change the behavior in serde_to_canonical (and related Value
type) so numbers are handled as true JSON numbers: either (A) make
serde_to_canonical return a CompositionError for non-i64 numbers (propagate
CompositionError from serde_to_canonical callers, e.g., where
FunctionContractMemento bodies are composed), or (B, preferred) add numeric
variants to provekit_canonicalizer::Value (e.g., f64/u64 or a generic Number
variant), update the Value type used by encode_jcs to emit ECMA-262 canonical
number serialization, and map JsonValue::Number -> the new numeric Variant in
serde_to_canonical; update callers and encode_jcs accordingly to ensure
cross-language JCS compliance.

In
`@menagerie/pattern-predicate-protocol/notifications/2026-05-09-rxkad-verify-packet-2.md`:
- Around line 18-20: The file
menagerie/pattern-predicate-protocol/notifications/2026-05-09-rxkad-verify-packet-2.md
currently publishes direct personal identifiers (the Recipient
"imv4bel@gmail.com (Hyunwoo Kim, V4bel)" and the Signing key owner "Kevlar
<evilgenius@nefariousplan.com>, fp 5FD2...1026"); redact or replace these with
neutral placeholders (e.g., "[redacted email]" or "[Name/handle redacted]") or
reference documented consent, update the Recipient and Signing key lines
accordingly, and ensure any fingerprint or email strings (imv4bel@gmail.com,
Hyunwoo Kim, V4bel, evilgenius@nefariousplan.com, fingerprint
5FD21B4FE7E4A3CA7971CB09DE6639788E091026) are removed or masked.

In
`@menagerie/pattern-predicate-protocol/predicates/borrowed-pages-as-scratch-v2.sql`:
- Around line 40-46: The candidate_parents CTE is under-keyed because it joins
call_edges c to inplace_skb_frag_receivers r only on function name
(c.callee_name = r.caller_function), which can cross-link identically named
functions; tighten the join by adding the module/binary or address-level keys
available in your schema (e.g., c.callee_module or c.callee_binary and/or
c.callee_addr vs r.caller_module/r.caller_binary/r.caller_addr) so the join
matches both function name and its owning module/object/address, keeping the
DISTINCT and same selected aliases (candidate and parent).

In `@menagerie/pattern-predicate-protocol/predicates/kmalloc-no-free-locally.sql`:
- Around line 20-24: The NOT EXISTS free-check is only matching on
caller_function which can collide across files; update the EXISTS subquery that
references call_edges (alias g) and the caller rows (alias a) to match both the
function name and the file/path column (e.g. add a condition like g.caller_file
= a.caller_file or g.caller_path = a.caller_path depending on the actual column
name) so the check is scoped to the same function in the same file and avoids
cross-file symbol masking.

In `@menagerie/pattern-predicate-protocol/predicates/spin-lock-no-unlock.sql`:
- Around line 21-24: The query in predicates/spin-lock-no-unlock.sql uses only
g.caller_function = l.caller_function to correlate unlocks and can conflate
same-named functions across different paths; update the NOT EXISTS subquery on
table call_edges (aliases g and l) to also compare path columns so correlations
require both caller_path and callee_path to match (e.g., add g.caller_path =
l.caller_path and ensure g.callee_path matches the corresponding l path when
checking g.callee_name IN (...)); this will scope matches by full symbol path as
well as name and avoid false positives from same-named functions.

In `@menagerie/pattern-predicate-protocol/tools/lift_kernel.py`:
- Around line 113-115: The JSON decode of edges currently does json.loads(e)
inside the block that handles string edges (in lift_kernel.py near the code that
appends to norm), which can raise JSONDecodeError and abort the run; wrap the
json.loads call in a try/except catching json.JSONDecodeError (or
json.decoder.JSONDecodeError), and on failure log or record the malformed edge
and skip it (i.e., do not append a broken value to norm) so ingestion continues;
keep the rest of the branch (the isinstance(e, str) check and the
norm.append(e)) intact but only append when decoding succeeds.
- Line 15: The LIFTER constant is insecurely hardcoded to a /tmp path; replace
the literal assignment to LIFTER with code that reads a configured path (e.g.
from an environment variable like PROVEKIT_LIFTER_PATH) and otherwise resolves
the binary via shutil.which('provekit-lift-c-kernel-doc'), then validate the
resolved path (os.access(..., os.X_OK)) and check file ownership/permissions
(os.stat to ensure it is not in a world-writable directory and not
world-writable itself) before using it; update lift_kernel.py to set LIFTER from
that lookup/validation and raise a clear error if validation fails.

In `@menagerie/pattern-predicate-protocol/tools/run_predicates.py`:
- Around line 13-15: The CID prefix in function cid is incorrect: it labels the
hash as "blake3-512" but uses hashlib.blake2b; either change the prefix to match
the implemented algorithm (e.g., "blake2b-512:") or switch the implementation to
compute a BLAKE3-512 digest (use a blake3 library and its .hexdigest() for
64-byte digest). Update the cid function accordingly so the prefix and the hash
algorithm (hashlib.blake2b vs blake3) are consistent.

In `@protocol/specs/2026-05-09-contract-composition-protocol.md`:
- Around line 56-58: Update the spec wording to match the canonical Rust
implementation: rename inner-result variables using the impl convention
result__<last_12_hex_of_cid> (double underscore + last 12 hex chars of the CID)
rather than result_<full_cid>, document the truncation rule and collision
rationale (why 12 hex chars is sufficient and colons are excluded), and make the
naming consistent in both §2 and §9 Rule 2 (use the same label, e.g. G = inner)
so the spec and the implementation functions
FunctionContractMemento::result_var_name and find_namespaced_result refer to the
identical byte-level form used when hashing composed pre/post formulas.

In `@protocol/specs/2026-05-09-pattern-predicate-protocol.md`:
- Around line 169-171: The fenced code blocks showing the tuple/list and
application/result-set examples (e.g., the line with applicationCid =
blake3-512(canonical(predicateCid, substrateCid))) should include a language
hint to fix MD040 and restore syntax highlighting; add "text" for tuple/list
style blocks and "json" for application/result-set or result examples, and apply
the same change to the other similar fences in this document that contain
tuple/list examples and application/result-set examples.
- Around line 484-489: The current wording "SQLite core plus JSON1" is too broad
and admits nondeterministic core functions; update the text in the v1 query
language paragraph (Appendix A / Section 3 reference) to either (A) replace the
phrase with an explicit deterministic allowlist that enumerates only
deterministic SQLite functions plus the JSON1 functions `json_extract`,
`json_each`, `json_array`, `json_object`, `json_array_length`, `json_type`,
`json_valid`, or (B) keep the core-but-JSON1 phrasing and add an explicit ban
list calling out known nondeterministic SQLite functions (e.g. `random()`,
`randomblob()`, `changes()`, `last_insert_rowid()`, and time helpers like
`date()`, `time()`, `datetime()`, `julianday()`, `strftime()`) and state that
any function not on the allowlist disqualifies a v1 predicate; ensure the edited
paragraph references "v1 query language" and "v1 predicate" so readers can
locate the rule.
- Around line 339-343: The example text is inconsistent about where the patch
was applied; update the second site reference so both places point to the same
functions/files used in the experiment record—either change the mention of
net/rxrpc/call_event.c and net/rxrpc/conn_event.c to refer to the functions
rxrpc_input_call_event and rxrpc_verify_response, or change the earlier function
names to the file-centric references; ensure references to
rxrpc_input_call_event, rxrpc_verify_response, net/rxrpc/call_event.c and
net/rxrpc/conn_event.c are consistent so readers can replay the example without
ambiguity.

---

Outside diff comments:
In `@menagerie/pattern-predicate-protocol/predicates/rcu-read-lock-no-unlock.sql`:
- Around line 15-23: The WHERE clause currently correlates lock/unlock only by
caller_function which conflates same-named functions; update the join to key on
both caller_function and callsite_path so the EXISTS checks the same callsite:
in the subquery on call_edges (alias g) add AND g.callsite_path =
r.callsite_path alongside g.caller_function = r.caller_function (and keep
g.callee_name IN (...) as-is) so the predicate matches by (caller_function,
callsite_path).

---

Nitpick comments:
In `@implementations/c/provekit-lift-c-kernel-doc/tests/integration.sh`:
- Around line 317-357: The test assertions currently check EFFECTS_RESPONSE with
ordered regexes (e.g., grep -qE
'"function":"pure_function","kind":"function-effects","effects":\[\]') which
couples tests to JSON key order; update the checks to be order-independent by
parsing the JSON output and asserting per-function properties: either
(preferred) use jq to find the object with .function == "pure_function" and
assert .effects == [] or .effects[].kind contains the expected kind, or
(simpler) isolate the JSON object for each function name from EFFECTS_RESPONSE
and run independent grep checks for the "function", "kind", and "effects" keys;
target the test harness symbol EFFECTS_RESPONSE and the function names
pure_function, writes_function, reads_function, io_function, unsafe_function,
panics_function, unresolved_call_function when applying the change.

In
`@menagerie/pattern-predicate-protocol/experiments/2026-05-09-linux-kernel-wide.md`:
- Around line 64-68: The paragraph uses "PPP" for both Pattern Predicate
Protocol and Linux Point-to-Point Protocol, causing confusion; update the
mention of the kernel drivers and function list (ppp_async_ioctl, ppp_ioctl,
ppp_set_compress, ppp_write, ppp_sync_ioctl) to disambiguate by expanding the
acronym once—e.g., change "PPP I/O paths" to "PPP (Point-to-Point Protocol) I/O
paths" or "Linux PPP I/O paths" and keep the rest of the paragraph as-is so
readers know this PPP refers to the kernel driver rather than the document's
Pattern Predicate Protocol.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 55acde0d-e0b4-4b3d-b60a-5ebb3c89bb30

📥 Commits

Reviewing files that changed from the base of the PR and between f8be0e0 and e92f7da.

⛔ Files ignored due to path filters (1)
  • implementations/rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (30)
  • docs/research/sparse-annotation-as-predicate.md
  • implementations/c/provekit-lift-c-kernel-doc/Makefile
  • implementations/c/provekit-lift-c-kernel-doc/tests/fixtures/effects_basic.c
  • implementations/c/provekit-lift-c-kernel-doc/tests/integration.sh
  • implementations/c/provekit-lift-core/include/provekit/c_lift_core.h
  • implementations/c/provekit-lift-core/src/clang_ast.c
  • implementations/c/provekit-lift-core/src/effects.c
  • implementations/c/provekit-lift-core/src/parser.c
  • implementations/rust/libprovekit/Cargo.toml
  • implementations/rust/libprovekit/src/compose.rs
  • implementations/rust/libprovekit/src/lib.rs
  • implementations/rust/libprovekit/tests/compose_smoke.rs
  • implementations/rust/provekit-walk/Cargo.toml
  • implementations/rust/provekit-walk/src/contract.rs
  • implementations/rust/provekit-walk/src/locus.rs
  • implementations/rust/provekit-walk/src/type_decl.rs
  • menagerie/pattern-predicate-protocol/README.md
  • menagerie/pattern-predicate-protocol/experiments/2026-05-09-linux-kernel-net.md
  • menagerie/pattern-predicate-protocol/experiments/2026-05-09-linux-kernel-wide.md
  • menagerie/pattern-predicate-protocol/notifications/2026-05-09-rxkad-verify-packet-2.md
  • menagerie/pattern-predicate-protocol/predicates/borrowed-pages-as-scratch-v2.sql
  • menagerie/pattern-predicate-protocol/predicates/borrowed-pages-as-scratch.sql
  • menagerie/pattern-predicate-protocol/predicates/copy_from_user-no-bounds.sql
  • menagerie/pattern-predicate-protocol/predicates/kmalloc-no-free-locally.sql
  • menagerie/pattern-predicate-protocol/predicates/rcu-read-lock-no-unlock.sql
  • menagerie/pattern-predicate-protocol/predicates/spin-lock-no-unlock.sql
  • menagerie/pattern-predicate-protocol/tools/lift_kernel.py
  • menagerie/pattern-predicate-protocol/tools/run_predicates.py
  • protocol/specs/2026-05-09-contract-composition-protocol.md
  • protocol/specs/2026-05-09-pattern-predicate-protocol.md

Comment on lines +119 to +123
#1 (`__user` deref) gives the biggest CVE-class blast radius and forces the substrate
extension every other predicate also needs (parameter-binding + deref-site facts). Land
that scaffolding first; #2 and #4 then collapse to one-day predicates each. #3 is
cheap and high-value as a *manifest* even before bug detection. #5 is the lowest
priority but is the canonical demonstration that ProvekIt preserves discipline no other
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid #1 at the start of the paragraph.

Line 119 trips MD018 and can be parsed ambiguously as a malformed heading. Please rewrite this as a normal numbered list item (1.) or escape/code-format the #1 marker.

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 119-119: No space after hash on atx style heading

(MD018, no-missing-space-atx)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/research/sparse-annotation-as-predicate.md` around lines 119 - 123, The
paragraph starts with "#1" which triggers MD018 and may be parsed as a malformed
heading; change the leading "#1" marker to a normal numbered list token (e.g.,
"1.") or escape/code-format it (e.g., "`#1`") so it isn't interpreted as a
heading, keeping the rest of the text (mentions of "#2", "#3", "#4", "#5" and
"__user") unchanged; update the same sentence that begins with "#1" to use the
new marker to avoid the MD018 lint error.

Comment on lines +297 to +298

# Per-function effects extraction per CCP v1.0.0 section 3.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Version typo: CCP is v0.1.0 in this PR, not v1.0.0.

The PR objectives and the spec filename (protocol/specs/2026-05-09-contract-composition-protocol.md) both name CCP v0.1.0. Either drop the version from the comment or update it to match.

📝 Suggested wording
-    # Per-function effects extraction per CCP v1.0.0 section 3.
+    # Per-function effects extraction per CCP v0.1.0 section 3.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Per-function effects extraction per CCP v1.0.0 section 3.
# Per-function effects extraction per CCP v0.1.0 section 3.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@implementations/c/provekit-lift-c-kernel-doc/tests/integration.sh` around
lines 297 - 298, Update the inline comment "Per-function effects extraction per
CCP v1.0.0 section 3." to match this PR's protocol version by changing "v1.0.0"
to "v0.1.0" or remove the explicit version string entirely; locate the comment
in the tests/integration.sh block that contains that exact text and edit it so
it references CCP v0.1.0 (or omits the version) to keep the comment consistent
with the PR and the spec file.

Comment thread implementations/c/provekit-lift-core/src/clang_ast.c
Comment on lines +769 to +774
if (facts->extraction_result == NULL) {
facts->extraction_result = pk_c_lift_result_new();
}
if (facts->extraction_result != NULL) {
(void)pk_c_emit_function_effects(facts->extraction_result, facts);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate pk_c_emit_function_effects() failures.

Line 773 can fail, but the return value is discarded. That leaves the parse looking successful even though some function-effects declarations may be missing. Please surface this the same way the AST-walk failure path does, or fail the parse outright.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@implementations/c/provekit-lift-core/src/clang_ast.c` around lines 769 - 774,
The call to pk_c_emit_function_effects(facts->extraction_result, facts)
currently ignores its return value; change it to check the return and propagate
failure like the AST-walk failure path: if pk_c_emit_function_effects returns an
error (non-success), propagate that to the caller (e.g., return an error code or
NULL from the current parse function) or mark facts->extraction_result as failed
and abort parsing rather than continuing; locate the block that creates
facts->extraction_result (pk_c_lift_result_new) and the subsequent
pk_c_emit_function_effects call and implement the same
failure-handling/propagation logic used by the AST-walk routine so missing
function-effects are surfaced.

Comment on lines +723 to +735
void pk_c_extract_function_effects(
pk_c_function_fact *fact,
void *function_cursor_ptr
) {
CXCursor *cursor = (CXCursor *)function_cursor_ptr;
pk_c_effects_visit_ctx ctx;

if (fact == NULL || cursor == NULL) {
return;
}
ctx.fact = fact;
(void)clang_visitChildren(*cursor, pk_c_effects_walk, &ctx);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t swallow extraction failures in a pass that must over-approximate.

pk_c_effects_walk() uses CXChildVisit_Break when pk_c_function_fact_add_effect() fails, but pk_c_extract_function_effects() is void and ignores clang_visitChildren()’s return. An OOM or add-effect failure therefore looks like success and emits a partial effect set, which is unsound per the file header. Please make this API return an error and have the caller record opacity/fail closed on nonzero.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@implementations/c/provekit-lift-core/src/effects.c` around lines 723 - 735,
pk_c_extract_function_effects currently ignores clang_visitChildren()'s return
so failures from pk_c_effects_walk (which returns CXChildVisit_Break when
pk_c_function_fact_add_effect() fails) are swallowed; change
pk_c_extract_function_effects to return an int error code (0 on success, nonzero
on failure), call clang_visitChildren and return its nonzero result to the
caller, and update callers to treat nonzero as an extraction failure and mark
the corresponding pk_c_function_fact as opaque/closed; ensure references to
pk_c_effects_walk and pk_c_function_fact_add_effect remain unchanged except for
their existing break-on-failure behavior so the over-approximation guarantee is
preserved.

Comment on lines +13 to +15
def cid(b: bytes) -> str:
return "blake3-512:" + hashlib.blake2b(b, digest_size=64).hexdigest()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify claimed vs actual algorithm usage in this script.
rg -n -C2 'def cid|blake3-512|blake2b' menagerie/pattern-predicate-protocol/tools/run_predicates.py

Repository: TSavo/provekit

Length of output: 180


CID algorithm label mismatch: prefix claims blake3-512: but implementation uses blake2b.

Line 14 advertises BLAKE3 in the CID prefix but computes BLAKE2b, creating inconsistency that breaks downstream verification logic relying on the algorithm label.

Fix
 def cid(b: bytes) -> str:
-    return "blake3-512:" + hashlib.blake2b(b, digest_size=64).hexdigest()
+    return "blake2b-512:" + hashlib.blake2b(b, digest_size=64).hexdigest()

Alternatively, implement BLAKE3 instead of BLAKE2b if that algorithm is truly required.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def cid(b: bytes) -> str:
return "blake3-512:" + hashlib.blake2b(b, digest_size=64).hexdigest()
def cid(b: bytes) -> str:
return "blake2b-512:" + hashlib.blake2b(b, digest_size=64).hexdigest()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@menagerie/pattern-predicate-protocol/tools/run_predicates.py` around lines 13
- 15, The CID prefix in function cid is incorrect: it labels the hash as
"blake3-512" but uses hashlib.blake2b; either change the prefix to match the
implemented algorithm (e.g., "blake2b-512:") or switch the implementation to
compute a BLAKE3-512 digest (use a blake3 library and its .hexdigest() for
64-byte digest). Update the cid function accordingly so the prefix and the hash
algorithm (hashlib.blake2b vs blake3) are consistent.

Comment on lines +56 to +58
**Singular formal substitution.** For an atom F with formals `[a, b, c]` called as `F(x, G(y), z)`, only the second argument `G(y)` triggers composition (G's contract substitutes into F's at the b-position). The first and third arrivals (`x`, `z`) are leaf substitutions that do not introduce new contract-level composition.

**CID-namespaced result variable.** Each atom's `post` formula references a free variable (conventionally `result`). When composing F into a caller's pre, F's `result` is renamed to `result_<F.cid>` to avoid free-variable collision across nested composition.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Inner-result rename rule diverges from the canonical implementation.

Spec §2 says inner's result is renamed to result_<F.cid> and §9 Rule 2 repeats the rule as result_<G.cid> (single underscore, full CID). The reference implementation at implementations/rust/libprovekit/src/compose.rs (FunctionContractMemento::result_var_name and find_namespaced_result) uses result__<last_12_hex_of_cid> instead — double underscore and a 12-hex-char truncation. The literal spec form is also impractical because CIDs include : (e.g. blake3-512:...), which is awkward inside an IR variable name.

Because the inner-result variable name appears verbatim inside the canonical pre/post formulas hashed into the composed CID, this is exactly the kind of byte-level disagreement that will surface as a federation failure in the §7 cross-language equivalence specimen the moment a second-language binding is implemented faithful to the spec text. Recommend amending the spec to match the impl convention (and document the truncation length + collision argument) before CCP v1.0.0 is locked, so the smoke-test pinned CID stays stable. Note also the F/G label flip between §2 and §9 Rule 2 for the same role (the inner contract).

📝 Suggested spec wording
-When composing F into a caller's pre, F's `result` is renamed to `result_<F.cid>` to avoid free-variable collision across nested composition.
+When composing inner atom G into a caller's pre, G's `result` is renamed to
+`result__<tail>`, where `<tail>` is the trailing 12 hex characters of `G.cid`.
+Truncation is bounded so variable names stay readable; collision probability
+at 12 hex chars (≈2^48) is acceptable for chains of realistic depth.

Apply the same wording to §9 Rule 2 and unify on a single label (G = inner) across both sections.

Also applies to: 221-223

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@protocol/specs/2026-05-09-contract-composition-protocol.md` around lines 56 -
58, Update the spec wording to match the canonical Rust implementation: rename
inner-result variables using the impl convention result__<last_12_hex_of_cid>
(double underscore + last 12 hex chars of the CID) rather than
result_<full_cid>, document the truncation rule and collision rationale (why 12
hex chars is sufficient and colons are excluded), and make the naming consistent
in both §2 and §9 Rule 2 (use the same label, e.g. G = inner) so the spec and
the implementation functions FunctionContractMemento::result_var_name and
find_namespaced_result refer to the identical byte-level form used when hashing
composed pre/post formulas.

Comment on lines +169 to +171
```
applicationCid = blake3-512(canonical(predicateCid, substrateCid))
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add languages to the fenced code blocks.

These unlabeled fences trigger MD040 and lose syntax highlighting. text fits the tuple/list examples, and json fits the application/result-set examples.

Also applies to: 306-317, 328-337

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 169-169: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@protocol/specs/2026-05-09-pattern-predicate-protocol.md` around lines 169 -
171, The fenced code blocks showing the tuple/list and application/result-set
examples (e.g., the line with applicationCid =
blake3-512(canonical(predicateCid, substrateCid))) should include a language
hint to fix MD040 and restore syntax highlighting; add "text" for tuple/list
style blocks and "json" for application/result-set or result examples, and apply
the same change to the other similar fences in this document that contain
tuple/list examples and application/result-set examples.

Comment thread protocol/specs/2026-05-09-pattern-predicate-protocol.md
Comment on lines +484 to +489
The v1 query language is SQLite-3 with the JSON1 module enabled. The
admitted built-in function set is the SQLite core plus JSON1's
`json_extract`, `json_each`, `json_array`, `json_object`,
`json_array_length`, `json_type`, `json_valid`. Functions outside this
set break determinism guarantees and disqualify a query from being a
v1 predicate.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Appendix A currently admits nondeterministic SQLite functions.

“SQLite core plus JSON1” is too broad for Section 3’s determinism guarantee: SQLite core includes functions like random(), randomblob(), changes(), last_insert_rowid(), and time helpers that can vary across runs. As written, a predicate can be v1-admissible while still violating PPP determinism. Please replace this with an explicit deterministic allowlist, or explicitly ban the nondeterministic core functions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@protocol/specs/2026-05-09-pattern-predicate-protocol.md` around lines 484 -
489, The current wording "SQLite core plus JSON1" is too broad and admits
nondeterministic core functions; update the text in the v1 query language
paragraph (Appendix A / Section 3 reference) to either (A) replace the phrase
with an explicit deterministic allowlist that enumerates only deterministic
SQLite functions plus the JSON1 functions `json_extract`, `json_each`,
`json_array`, `json_object`, `json_array_length`, `json_type`, `json_valid`, or
(B) keep the core-but-JSON1 phrasing and add an explicit ban list calling out
known nondeterministic SQLite functions (e.g. `random()`, `randomblob()`,
`changes()`, `last_insert_rowid()`, and time helpers like `date()`, `time()`,
`datetime()`, `julianday()`, `strftime()`) and state that any function not on
the allowlist disqualifies a v1 predicate; ensure the edited paragraph
references "v1 query language" and "v1 predicate" so readers can locate the
rule.

TSavo and others added 7 commits May 9, 2026 12:17
Exposes the canonical compose primitive (extracted in commit f06ce69)
through a stable C ABI so the C lifter family and any other native
consumer share libprovekit's implementation. Marshaling is JCS-encoded
JSON across the boundary; libprovekit owns the canonical encoding per
CCP Appendix A. No algebra change; this is wiring, not new physics.

Surface (matches CCP §6.2 verbatim):

  pk_composition_result *pk_compose_chain_contracts(
      const char *atoms_jcs, const char *effects_jcs,
      size_t atoms_len, size_t effects_len);
  const char *pk_composition_result_cid(const pk_composition_result *r);
  const char *pk_composition_result_body_jcs(const pk_composition_result *r);
  const char *pk_composition_result_error(const pk_composition_result *r);
  void pk_composition_result_free(pk_composition_result *r);

Two intentional, documented deviations from the literal §6.2 text:

  1. The Rust algebra needs a per-step `formal_idx` (which formal of
     the outer atom the inner's result feeds). §6.2's signature has no
     slot for it. Resolution: each atom JSON carries an outer
     `formalIdx` field alongside the canonical `memento` body shape:
     `{"memento": <canonical body>, "formalIdx": N}`. The C signature
     itself is byte-identical to §6.2.

  2. Effects appear twice: embedded in each memento body (per
     `build_value`) and in the parallel `effects_jcs` array. The
     embedded copy is authoritative; the parallel array is required by
     the spec signature and MUST equal-by-value, otherwise the FFI
     returns a typed `EffectsMismatch` error. Single source of truth,
     spec signature preserved, cross-check enforced.

The crate now produces `cdylib` and `staticlib` artifacts in addition
to `lib`, so existing in-workspace Rust consumers (provekit-cli, walk,
verifier) keep working unchanged while C consumers can link the static
or dynamic library.

Implementation lives in `src/ffi.rs`. The Rust-side helper
`compose_chain_contracts_jcs(&str, &str) -> Result<(cid, body), msg>`
is the testable layer below the extern "C" wrapper; the four
`pk_*` functions are thin lifecycle plumbing on top. No panics escape
the FFI boundary; all errors flow through `pk_composition_result_error`.

The C-callable header is hand-written at
`include/provekit-compose.h` (5 declarations; cbindgen would be
overkill).

Smoke test (`tests/ffi_smoke.rs`) exercises both the Rust JCS entry
point and the extern "C" entry point with the same two pure identity
atoms as the existing `tests/compose_smoke.rs`. Both paths produce the
pinned CID
blake3-512:36212b7bf7b9ccf264950940a33d64e1cfe88b6f4d8a47c01949fc64d9359d1813d6147aa2e1afe82b01e6e7ebcbe0a413683284b5f47ffef5bf364213304665
byte-for-byte. Same algebra, different binding mode, identical CID:
this is the §6.2 federation guarantee in test form. Two additional
tests cover invalid JSON and null-pointer inputs.

Verified:
  cargo build --release -p libprovekit  →  liblibprovekit.{a,dylib,rlib}
  cargo test --release -p libprovekit   →  13 tests pass (4 new + 9 existing)

Refs CCP §5 (canonical implementation), §6.2 (C ABI), §9 (algebra),
Appendix A (canonical encoding), Appendix B (reference surface).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the JSON-RPC subprocess transport for the canonical compose
primitive. The CLI is a thin stdin/stdout wrapper over libprovekit's
compose_chain_contracts; cross-language consumers that cannot link
Rust (TypeScript, Python, Ruby, PHP lifters) drive composition by
spawning `provekit compose --rpc` and exchanging initialize / compose
/ shutdown messages over stdio.

A new compose_rpc_smoke test spawns the binary, drives the protocol,
and asserts the composed CID matches the pinned hex value libprovekit's
own compose_smoke test pins for the same algebra. Same canonical
primitive, byte-identical bodies, byte-identical CID; the test is the
federation witness that the third binding mode (after direct Rust
linking and the C ABI FFI) preserves CCP determinism.

Two documented deviations from section 6.3:

  * Each entry in the request's `atoms` array carries an optional
    per-step `formal_idx` (default 0). The canonical primitive takes
    `&[ChainStep { contract, formal_idx }]`; without it the wire
    format cannot reproduce CIDs for chains whose composition fires
    at any formal other than zero. The default-zero path is the
    libprovekit smoke conformance witness.

  * The top-level `effects` parameter is preserved per spec but is
    advisory: each FunctionContractMemento already carries its own
    effects field. When both are supplied and disagree, the request
    is rejected with an effects_mismatch error so a careless caller
    cannot silently lose effect information.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After atomic contract + per-function effects extraction, the C lifter
now walks the call_sites graph, identifies pure subtrees (every member
has empty effect set, chain length >= 2), and invokes
pk_compose_chain_contracts via libprovekit's C ABI per CCP v1.0.0
section 4 (eager materialization) and section 6.2 (C ABI FFI). The
resulting ComposedFunctionContract is emitted as an additional
declaration in the IR-document under kind="composed-contract" with the
composed CID, JCS body, and the leaf-first atom list.

Implementation notes (also in src/composition.c header):

  - The existing C "contract" envelopes (kind="contract") are
    kernel-doc preconditions, not full FunctionContractMementos. The
    FFI's MementoBody DTO requires fnName, formals, formalSorts, pre,
    post, effects, locus, etc. per build_value in compose.rs, so this
    pass synthesises pure-identity mementos (post: result = formal_0)
    for each composable function. The composed CID reflects chain
    structure (number of atoms, formal-index linkage), which is what
    BZ-COMPOSITION-001 cross-language equivalence exercises.

  - formalIdx=0 uniformly. Real argument-position resolution is
    future work.

  - Determinism: functions walked in libclang source order; first pure
    callee per source-order callsite chosen for the chain. Cycle guard
    on repeated function names.

  - Composition refusal from the FFI is treated as data: the chain is
    silently skipped and the rest of the lifter output is unaffected.

Federation property empirically demonstrated: the C lifter's
composed CID for [double_it, add_one] produced via the FFI equals
blake3-512:0d9bbbaf...969ac56, byte-identical to the same CID produced
from Rust by feeding equivalent JCS envelopes through
compose_chain_contracts_jcs. Same algebra, same CID across binding
modes, per CCP section 6.2's federation guarantee.

Makefile changes:

  - Links liblibprovekit.a + provekit-compose.h when libclang AST is
    enabled (the regex-only stub does not extract effects, so it
    cannot soundly classify pure subtrees).

  - Conditional system libs: macOS frameworks (Security,
    CoreFoundation, iconv) vs Linux (-lpthread -ldl -lm) selected via
    uname -s, since the Rust staticlib drags in libstd.

  - Lazy build of liblibprovekit.a: only invoked when the artifact is
    missing. Explicit `make libprovekit` refreshes.

Tests:

  - New fixture tests/fixtures/composition_basic.c with three pure
    helpers chained via direct calls; double_it + add_one +
    compose_three is the longest pure chain.

  - Integration test asserts composition_basic.c yields at least one
    composed-contract declaration, the compose_three composed CID is
    byte-stable across runs, has the blake3-512 prefix, and matches a
    pinned CID c636517a...94b6e40 to guard against silent regressions
    that would break Rust/C federation.

  - Existing tests (kernel_doc_basic.c, structural_basic.c,
    recovery_call.c, effects_basic.c) all still pass.

BZ-COMPOSITION-001 runner now drives the C lifter directly via its
JSON-RPC parse method instead of the never-implemented `provekit lift
--lifter ${C_LIFTER}` shape. Verdict logic typed: PENDING-RUST,
PENDING-C, PENDING-OTHER, EQUAL, DIVERGENT. Today's verdict on the
existing lab is PENDING-OTHER: lab/c/chain.c uses xs[i] reads + acc =
writes that the conservative effects walker tags impure, AND the Rust
side has no --emit-composed wiring yet. The C wiring itself is
verified by composition_basic.c via `make test`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	implementations/c/provekit-lift-c-kernel-doc/tests/integration.sh
…chema

PPP v0.1.0 was drafted earlier on 2026-05-09 before CCP existed. Under
Supra omnia, rectum the spec was stale: it neither cited CCP nor listed
the substrate relations CCP populates. This commit corrects that.

Pre-publication amendment: PPP has not been merged to main, has not
been published as a stable catalog CID, no consumer has pinned its
catalog CID. Per PEP semantics this is draft-stage spec amendment, not
formal protocol evolution. Future post-merge changes go through
provekit protocol evolve.

Additive changes only; existing v1 predicates over call_edges /
functions / contracts / lifted_files remain valid byte-for-byte:

  - Section 1: add CCP to the relation table; note FRP policyCid MAY
    reference a ComposedFunctionContract CID for chain-level policies.

  - Section 3.1: add effects relation (per-function effect set,
    populated by CCP §3 prerequisite) and composed_contracts relation
    (ComposedFunctionContract emissions per CCP §4 materialization).
    Both are non-breaking additions to v1 schema landed alongside CCP
    v1.0.0.

  - Section 7: cite BZ-COMPOSITION-001 cross-language equivalence
    specimen as the empirical federation guarantee for predicates that
    join over composed_contracts.

  - Section 9.6: add 'composition not materialized' as a failure mode
    so producers know a composition-bound predicate needs eager-or-lazy
    materialization to fire.

  - Section 9.7: renumber the prior 9.6 (federation mismatch) to make
    room.

PPP version stays v0.1.0; the changes are additive non-breaking schema
extensions plus references to the now-existing CCP spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants