Draft Pattern Predicate Protocol (PPP) v0.1.0 by TSavo · Pull Request #514 · TSavo/provekit

TSavo · 2026-05-09T16:47:56Z

Summary

Drafts a new extension protocol that names the missing edge between an editorially-named bug class and a Fix Receipt Protocol receipt: how a producer authors the policy a fix receipt cites, with cryptographic identity, in a form a verifier can re-run locally.

The wire-level pipeline:

Editorial pattern (nefariousplan)
  -> predicate (CID, schema-bound, deterministic SQL)
  -> query application memento (predicate, substrate, lifter, result-set CIDs)
  -> closure witness (deterministic delta over pre/post applications)
  -> FRP receipt (signed, attached to commit .proof root)
  -> proofchain head (carries closure under verifier policy)

Existing protocols cover the rest of the stack: proofchain (what a witness IS), FRP (what a receipt IS), proof substrate (what a lifter fills), lift-plugin-protocol (C1-C8 conformance). PPP gives FRP's policyCid and closureWitnessCid their content shape and names the artifacts in between.

Load-bearing observation

Derived empirically from this morning's V4bel-RxRPC patch experiment (PR #510 substrate + borrowed-pages-as-scratch-v2.sql):

The predicate's substrate binding determines what patch shapes can witness closure under it.

A v2 predicate over call_edges witnesses patches that add or remove calls. V4bel's gate-widening patch (if (skb_cloned(skb)) to if (skb_cloned(skb) || skb->data_len)) does not change call edges and is invisible to v2. A v3 predicate over a future substrate-schema version that exposes gate conditions WOULD witness it. Section 8 walks the example end to end. Section 9.4 makes this an explicit growth function rather than a defect.

The closure-witness shape table in Section 5 is the structural keystone:

pre rows	post rows	shape
∅	∅	unchanged (vacuous)
R, R ≠ ∅	∅	empty (full closure)
R	S, S ⊊ R	strict-subset (partial closure)
R	R	unchanged
R	S, S ⊋ R	grew
R	S, S ∩ R ≠ ∅ ∧ S \ R ≠ ∅ ∧ R \ S ≠ ∅	non-monotonic

grew and non-monotonic are merge-blocking regression signals even when no fix receipt is claimed.

Sections

0 Purpose
1 Relation to existing protocols (proofchain / FRP / substrate / lift-plugin / ORP)
2 Pattern, predicate, compilation arrow (signed memento)
3 Predicate authoring (determinism, schema-binding, closed extension; v1 schema)
4 Query application memento (mandatory lifterCid)
5 Closure witness (shape table)
6 Composition with FRP (one receipt per (predicate, gap) pair)
7 Cross-language federation (per-language callee-mapping mementos)
8 Worked example: borrowed-pages-as-scratch end to end
9 Failure modes (lift drift, predicate drift, substrate drift, soundness, determinism, federation mismatch)
10 Operational notes (distribution, revocation, catalog signing, receipt provenance)
11 Pipeline diagram
A Canonical SQL admitted in v1 (SQLite-3 + JSON1, explicit function set)
B Reference implementation surface (provekit pp compile|run|witness|receipt)

Empirical grounding

Backed by today's experiments on linux@7.1.0-rc2:

Predicate	Pre-patch matches	Post V4bel-patch matches
`borrowed-pages-as-scratch.sql` (v1)	5 (40% TP)	5 (predicate sees no change; matches v3 schema gap)
`borrowed-pages-as-scratch-v2.sql`	2 (100% TP)	2 (same; gate-widening invisible at call_edges)

The two TPs are rxkad_verify_packet_1 (CVE-2026-43500, named in V4bel's disclosure) and rxkad_verify_packet_2 (substrate-discovered same-class sibling, not named in V4bel's disclosure). Section 8 walks the receipt structure that would close on both, and the substrate extension required to make V4bel's actual patch bytes witness closure.

Deferred to v0.2

Soundness-memento shape (currently a vague reference to "additional mementos"; should compose with the truth-discharge protocol).
Per-language callee-mapping table schema for federation.
Formal typed binding of predicate as a function from substrate(schema_v1) to result_set(locus_shape).

None are blocking; all are named gaps in the spec.

Test plan

No em-dashes / en-dashes in the spec text.
Section 8's worked example uses real CIDs and result sets observed today.
Section 5's closure-shape table is exhaustive (every pre/post pair classified).
Section 9 enumerates the failure modes that arose during the empirical work.
Reviewer to validate that v1 substrate schema (Section 3.1) matches what the C lifter emits today.
Reviewer to validate that the FRP composition in Section 6 doesn't conflict with FRP's nontriviality rule.
CI to confirm the spec markdown lints cleanly.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Added function effect extraction (reads, writes, I/O, unsafe, panics, unresolved calls) to C kernel lifter.
- Implemented contract composition system with deterministic content identifiers.
- Introduced pattern predicate system for automated kernel security pattern detection.
- Created Python tooling for kernel source lifting and predicate analysis automation.
Documentation
- Added Contract Composition Protocol specification.
- Added Pattern Predicate Protocol specification.
- Added kernel security pattern research notes.
Tests
- Added effect extraction integration tests.
- Added contract composition determinism validation.

PPP specifies how an editorially-named bug class (e.g. a nefariousplan pattern) becomes a content-addressed substrate query, how that query's output becomes a memento, and how the output's delta across two substrates discharges a fix receipt's policy. Existing protocols name what a witness IS (proofchain), what a fix receipt IS (FRP), what a substrate IS (proof substrate), and how lifters fill it (lift-plugin-protocol). PPP names the missing edge: how a producer authors the policy a fix receipt cites, with cryptographic identity, in a form a verifier can re-run locally. The pipeline: pattern (editorial) -> predicate (mechanical, schema-bound, deterministic) -> query application (predicate, substrate, lifter, result-set CIDs) -> closure witness (deterministic delta over pre/post applications) -> FRP receipt (signed, attached to commit .proof root) -> proofchain head (carries closure under verifier policy) Borrowed-pages-as-scratch is the worked example (Section 8). The spec includes the empirical observation surfaced by the V4bel/RxRPC patch experiment: the predicate's substrate binding determines what patch shapes can witness closure under it. A v2 predicate over call_edges witnesses patches that change the call graph; gate-condition patches require a substrate-schema extension. This is presented as the operational growth function, not as a defect. Sections: 0 Purpose 1 Relation to existing protocols 2 Pattern, predicate, compilation 3 Predicate authoring (determinism, schema-binding, closed extension) 4 Query application 5 Closure witness (shape table) 6 Composition with FRP 7 Cross-language federation 8 Worked example: borrowed-pages-as-scratch 9 Failure modes (lift drift, predicate drift, substrate drift, soundness, determinism violations, federation mismatch) 10 Operational notes (distribution, revocation, catalog signing) 11 Pipeline diagram A Canonical SQL admitted in v1 B Reference implementation surface Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-09T16:48:06Z

Warning

Rate limit exceeded

@TSavo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 4 minutes and 58 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a5297b85-4734-49ed-aa06-9f168039e4d5

📥 Commits

Reviewing files that changed from the base of the PR and between e92f7da and 1855b04.

⛔ Files ignored due to path filters (1)

implementations/rust/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (46)

.provekit/ci/accepted/c/blake3-512:de2c1208b6ca445560e1a044327bc8a780709865c9a068871d894996c2b3d6cc35f0aa9ea8b0d8aa93318148ac7f2f2f15b06736a865202a256c961e070bceeb.job-result.json
.provekit/ci/accepted/c/blake3-512:f384ad03c7615c8fb450ac010d90b7075cab204ea39650b0f0d3f26f3500f93c01370b528be5b2c14618b8649d0eec110548d155871d162679e43aac122975ea.job-result.json
.provekit/ci/accepted/cpp/blake3-512:2ac37c29afe73131d4ed26879284230b88958b90cf693ad6a5facd4b8f3f25037cbd11f90a6b7368b3a4549401c09bc36cba317b67d85835c878e6f58018cb6f.job-result.json
.provekit/ci/accepted/cpp/blake3-512:4ad9afa4ee154a70caa78e21a31eab46d9c070b8de817c841002c291162256a027406ee515ea9831b1ce2d92b3e70b566f734b8150c1c9eea88ea9bd02f34fbc.job-result.json
.provekit/ci/accepted/csharp/blake3-512:20061dbf5e03ba7550a716f46d777f97d918299e93e84d3657e15ff950ac20cfc08ef40b70335b4a8fbc49b2e8c2eef1d083391a3800cb878053b236eed3c510.job-result.json
.provekit/ci/accepted/csharp/blake3-512:578295c0bb0afb28c340b40acf3c3e98cba28220f18773988615b93a1e020383d2cee6d740730549a50b00c450ee4a1f121a5043c3642790eaa661ac5d7069c2.job-result.json
.provekit/ci/accepted/go/blake3-512:1429648a5debe7fc4c76af353413d0378ab64b12c8c06f1abdcc700c964a0ea47c1899288acab63fd763f3b4f4101cb937afc80e4b00561fa12d732c57421051.job-result.json
.provekit/ci/accepted/go/blake3-512:177f0a9c3d6f39ea3fa1520668df9676c8a891ad7f2f65b44c524005f0612fa4c671a262ac4c5ef30a43c1e1ca63ad6586a3c083e800e35b5cb19a3f4febfb1f.job-result.json
.provekit/ci/accepted/java/blake3-512:7b52d628aae35dbc7335df0522b68d8990a962c1252f4b10c26a3a257d75614be537b85e3b0f3d0eca8d626a4a795f8b54f4ae053b863b5f2bc155d6fb74b819.job-result.json
.provekit/ci/accepted/java/blake3-512:fb1d8750fd07ae4b651d8ca911378a91d0fc74865d13d27e0c851f9794b11df7c582e6366839c2181c0241da3d424cb294d5a88319ab1c1cd046c34d1253f00d.job-result.json
.provekit/ci/accepted/python/blake3-512:09ea25611717aa598bc7e62d60b6075b235df800c3bc59d0f32ef2944a3ce4eb0c598699464e93c24153916b390ba457477568779e1519d297a2ac641f9d8a5f.job-result.json
.provekit/ci/accepted/python/blake3-512:9b61a402902e9c8b323c9e990b27c68581766713f1b0204edc990ac920187ba4ca82930cbaeb95350bb217de510af46840bb0d226d8d4a37738788787b6c0397.job-result.json
.provekit/ci/accepted/ruby/blake3-512:4c17f3978436947cc5d8d0e24506459cd44e3838100919c0ec973b19e66beff07ecef3d617536cdefccd57fbbe75cfd34a9e0d7e11974c8ab0aa56d790abfa07.job-result.json
.provekit/ci/accepted/ruby/blake3-512:8f23849dbf9d4cb3580fcab23a1d2ce415bf6cf3153f235215564392dd7d503ab995194ae450df871999025a5d8c1ce2bf529364ac329d5b6e58cffa7cf67793.job-result.json
.provekit/ci/accepted/rust/blake3-512:70ed2618a92ef0d09e5bef07132e5d0bd9fdb1ce201fc51c499a6d37a3508a0f0ed781b9ab8187e1ff7ce07b86c7165a3c3014cfe42e3e6f33f82b193b2a69a3.job-result.json
.provekit/ci/accepted/rust/blake3-512:cb5386fcdfe3202e7b1c702c00fed18403920eb55a02883d44293dcb9a676f65acd6023ba3611df89540cb3c498c1149fd6ab9f7bdf6b892a737ee4e92e1b2b8.job-result.json
.provekit/ci/accepted/swift/blake3-512:26ed22b0ca752bac02c263196af2be8cc54469f59845166e291bc565cff36628cc03ce66bb3e1fa92721ff36c306f331fb861b5e72e0ec4a5185fbf9acc7207c.job-result.json
.provekit/ci/accepted/swift/blake3-512:5cb75d546d5680a872e0233e4a42e118f0e3cfb6b6d22014f48919e020e738970d6f5f10bc908c87799bdcebd781f3c8a47b77e2d2908bec0d7b278ecf5f925d.job-result.json
.provekit/ci/accepted/ts/blake3-512:815b4808e6c5b5501b5630b338e55eecd82138f9861f47359f8db39f3f4371c19b64339c556ac344f2068c4b107f87c1b8f2eaf53ca4c658a991ee79dd86f8e3.job-result.json
.provekit/ci/accepted/ts/blake3-512:d434917f35f477c7d5f177ae5deecabb41b490a8287048014a63ce52fc34006850ebaa6a022eff04d4057042830ff6785e5199b7e6c6f4a65ddf47ab1eefafc7.job-result.json
.provekit/ci/accepted/zig/blake3-512:2b628143573f4b1cf89c409b139138d8022a687be0d56ba897326b88178fee8dea5b58cd42aa55753c3979ae3f137a0713b3501155bd0bbdcccebda84298a961.job-result.json
.provekit/ci/accepted/zig/blake3-512:78ee8397975f1a278deddbe1032ed8b846598e7625c28b87da4668369805d13cce915aeb94eb191446c91c3223387ffc1218023874c3d91411dd2db0ebd358f9.job-result.json
implementations/c/provekit-lift-c-kernel-doc/.gitignore
implementations/c/provekit-lift-c-kernel-doc/Makefile
implementations/c/provekit-lift-c-kernel-doc/src/composition.c
implementations/c/provekit-lift-c-kernel-doc/src/kernel_doc.c
implementations/c/provekit-lift-c-kernel-doc/tests/fixtures/composition_basic.c
implementations/c/provekit-lift-c-kernel-doc/tests/integration.sh
implementations/c/provekit-lift-core/src/clang_ast.c
implementations/rust/libprovekit/Cargo.toml
implementations/rust/libprovekit/include/provekit-compose.h
implementations/rust/libprovekit/src/ffi.rs
implementations/rust/libprovekit/src/lib.rs
implementations/rust/libprovekit/tests/ffi_smoke.rs
implementations/rust/provekit-cli/Cargo.toml
implementations/rust/provekit-cli/src/cmd_compose.rs
implementations/rust/provekit-cli/src/main.rs
implementations/rust/provekit-cli/tests/compose_rpc_smoke.rs
menagerie/bug-zoo/species/BZ-COMPOSITION-001-cross-language-equivalence/.gitignore
menagerie/bug-zoo/species/BZ-COMPOSITION-001-cross-language-equivalence/README.md
menagerie/bug-zoo/species/BZ-COMPOSITION-001-cross-language-equivalence/lab/c/chain.c
menagerie/bug-zoo/species/BZ-COMPOSITION-001-cross-language-equivalence/lab/c/chain.h
menagerie/bug-zoo/species/BZ-COMPOSITION-001-cross-language-equivalence/lab/rust/Cargo.toml
menagerie/bug-zoo/species/BZ-COMPOSITION-001-cross-language-equivalence/lab/rust/src/lib.rs
menagerie/bug-zoo/species/BZ-COMPOSITION-001-cross-language-equivalence/runner.sh
protocol/specs/2026-05-09-pattern-predicate-protocol.md

Walkthrough

PR introduces three major subsystems: (1) conservative C function effect extraction (reads, writes, IO, unsafe, panics, unresolved calls) integrated into the libclang-based lifter; (2) Rust canonical contract composition protocol with opacity discharge and capture-avoiding substitution; (3) SQLite-based pattern-predicate infrastructure (Python tools, SQL templates, kernel experiments) for deterministic vulnerability detection. Two protocol specifications document CCP and PPP architectures.

Changes

Effect Extraction & Composition with Pattern Predicates

Layer / File(s)	Summary
C Effect Type Definitions & Public API `implementations/c/provekit-lift-core/include/provekit/c_lift_core.h`	New `pk_c_effect_kind` enum and `pk_c_function_effect` struct; `pk_c_function_fact` extended with effects array; public APIs for emission, addition, and kind-to-string mapping.
C Effect Extraction Implementation `implementations/c/provekit-lift-core/src/effects.c`	Conservative AST walker tags function effects: detects reads/writes via lvalue resolution, IO/panics via name matching, unsafe constructs, and unresolved calls via callee shape inspection. 735 lines of analysis logic.
C Effect Integration & Emission `implementations/c/provekit-lift-core/src/clang_ast.c`, `src/parser.c`, `Makefile`	clang_ast invokes extraction during traversal; parser deduplicates and emits JSON; Makefile conditionally includes effects.c.
C Effect Fixtures & Integration Tests `implementations/c/provekit-lift-c-kernel-doc/tests/fixtures/effects_basic.c`, `tests/integration.sh`	Seven test functions exercise all effect kinds; integration assertions validate effect-kind JSON output.
Rust Dependencies & Module Setup `implementations/rust/libprovekit/Cargo.toml`, `src/lib.rs`, `implementations/rust/provekit-walk/Cargo.toml`	libprovekit adds `provekit-ir-types` dependency; declares `compose` module; provekit-walk adds libprovekit dependency.
Rust Effect & Contract Models `implementations/rust/libprovekit/src/compose.rs` (lines 34–520)	Effect enum (reads/writes/IO/unsafe/panics/unresolved), EffectSet with purity checks, AliasingMemento, Locus, FunctionContractMemento bundling all metadata with canonical bytes/CID.
Rust Opacity Discharge & Validation `implementations/rust/libprovekit/src/compose.rs` (lines 526–683)	OpacityMementoLookup trait, EmptyOpacityPool, OpacityError, and EffectSet::check_opacity_effects for validation.
Rust Composition Primitives & Algorithms `implementations/rust/libprovekit/src/compose.rs` (lines 688–1370)	compose_function_contracts, compose_function_contracts_checked, compose_with_composed, compose_chain_contracts; capture-avoiding substitution; free-variable analysis; JCS bytes and CID computation.
Rust Composition Smoke Test `implementations/rust/libprovekit/tests/compose_smoke.rs`	Tests compose_chain_contracts on pure identity contracts; asserts CID prefix, component count, determinism, and pins CID constant.
Rust Walk Refactoring `implementations/rust/provekit-walk/src/contract.rs`, `src/locus.rs`, `src/type_decl.rs`	contract.rs re-exports libprovekit::compose; locus.rs re-exports Locus and adds from_span constructor; type_decl.rs imports LocusFromSpanExt trait.
Pattern Predicate Python Tools `menagerie/pattern-predicate-protocol/tools/lift_kernel.py`, `tools/run_predicates.py`	lift_kernel.py walks source tree, invokes C lifter, persists callEdges to SQLite; run_predicates.py executes all .sql predicates, computes blake3-512 CIDs, prints results.
Pattern Predicate SQL Definitions `menagerie/pattern-predicate-protocol/predicates/*.sql`	Six SQL predicates: borrowed-pages-as-scratch (v1 & v2), copy_from_user-no-bounds, spin-lock-no-unlock, kmalloc-no-free-locally, rcu-read-lock-no-unlock.
PPP Experimental Records & Notifications `menagerie/pattern-predicate-protocol/experiments/2026-05-09-.md`, `notifications/2026-05-09-.md`	Three experiment/notification artifacts documenting net-only and wide-kernel runs, predicate results, triage, empirical observations, and vulnerability notification coordination.
PPP Reference Documentation `menagerie/pattern-predicate-protocol/README.md`	Explains exploratory tooling vs spec reference, lists repository layout, provides quickstart, enumerates caveats, describes predicate provenance.
Contract Composition Protocol Specification `protocol/specs/2026-05-09-contract-composition-protocol.md`	CCP specification: compose_chain_contracts primitive contract, per-language effect kinds, eager/lazy materialization, three cross-language binding modes, formal algebra, worked example, reference implementation scope (libprovekit compose.rs, C FFI, CLI, corpus).
Pattern Predicate Protocol Specification `protocol/specs/2026-05-09-pattern-predicate-protocol.md`	PPP v0.1.0 specification: pattern→predicate→substrate→query-application→closure-witness→FRP-receipt pipeline; determinism/schema binding; federation requirements; worked example; failure modes; operational guidance; reference implementation commands.
Research & Protocol Documentation `docs/research/sparse-annotation-as-predicate.md`	Research note documenting current sparse annotation lifting and five predicate strategy ideas with priority recommendations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

TSavo/provekit#495: Related C lifter family changes (provekit-lift-core, sparse lifter, clang AST, Makefile, test integration).
TSavo/provekit#370: Related Rust composition algebra refactoring and provekit-walk re-exports (FunctionContractMemento, Effect, EffectSet, Locus, compose functions).

Poem

🐰 Effects now traced in every call,
Contracts composed through functions all,
Patterns dance in SQLite rows—
Where the kernel's heart of vulnerability shows!

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ts-substrate-barrage

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 622ba97ed3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-09T16:50:17Z

+admitted built-in function set is the SQLite core plus JSON1's
+`json_extract`, `json_each`, `json_array`, `json_object`,
+`json_array_length`, `json_type`, `json_valid`. Functions outside this
+set break determinism guarantees and disqualify a query from being a


Restrict v1 predicate functions to deterministic SQLite built-ins

This whitelist currently permits the entire SQLite core function set, which includes non-deterministic functions like random()/randomblob(). That conflicts with Section 3's determinism requirement and allows "valid" v1 predicates that cannot reproduce bit-identical result sets across runs, leading to verifier fail-closed behavior for otherwise compliant receipts. Narrow the whitelist to deterministic core functions (or explicitly exclude non-deterministic ones) so the admitted set matches the protocol’s reproducibility guarantees.

Useful? React with 👍 / 👎.

Companion exploratory artifact set for the PPP spec at protocol/specs/2026-05-09-pattern-predicate-protocol.md (PR #514). This destination is the experimental record + tooling that produced the spec, not yet the v1 reference implementation referenced in PPP Appendix B (which should be a Rust extension to provekit-cli exposing `provekit pp compile|run|witness|receipt`). What's here: predicates/ Six SQL predicates, one per pattern borrowed-pages-as-scratch.sql v1, function-local borrowed-pages-as-scratch-v2.sql v2, caller-aware copy_from_user-no-bounds.sql userspace-bounds heuristic spin-lock-no-unlock.sql lock-pairing heuristic rcu-read-lock-no-unlock.sql RCU read-side pairing kmalloc-no-free-locally.sql alloc/free pairing tools/ lift_kernel.py Walk a kernel subtree, lift to SQLite run_predicates.py Run every .sql against a substrate experiments/ 2026-05-09-linux-kernel-net.md Receipt of the run that produced the spec: linux@7.1.0-rc2 net/ subtree, 347 files, 48,701 call edges, five predicates fired, full triage of the borrowed-pages-as-scratch results (v1: 5 candidates 40% TP rate; v2: 2 candidates 100% TP rate), and the empirical observation that grounds PPP Section 8: V4bel's gate-widening patch is invisible to a call_edges predicate, motivating a substrate-schema extension to expose gate conditions. The README is explicit about caveats versus the spec: predicate CIDs use blake2b instead of BLAKE3, no signed query-application or closure- witness mementos are emitted, the lifterCid is not recorded. These are non-architectural; the proper reference impl will close them. Substrate-discovered finding: rxkad_verify_packet_2 (net/rxrpc/rxkad.c) is the same-class sibling of CVE-2026-43500 (rxkad_verify_packet_1) and is not named in V4bel's public disclosure. The cluster predicate identifies it from structure alone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Second-pass receipt over a 12.5×-larger kernel surface (4332 files, 621,100 call edges across net/, crypto/, security/, drivers/net/, mac80211/, wireless/, bluetooth/). Findings: - borrowed-pages-as-scratch v1+v2 produce IDENTICAL match sets on the wide substrate as on net/-only. The class is concentrated in net/rxrpc; outside that subsystem the lifted surface does not contain receivers of this shape. That is itself a useful structural claim about the kernel, mechanically established. - copy_from_user-no-bounds widens 9 -> 28: PPP I/O paths, wireless debugfs writers, WWAN core, security-module userspace interfaces. Heuristic; candidates for editorial follow-up triage. - spin-lock-no-unlock widens 2 -> 5: ethernet driver acquire-and-return-locked patterns, paired across function boundaries; predicate's function-level scope can't see them. Likely all FPs. - rcu-read-lock-no-unlock new finding: smk_seq_start in security/smack. Standard seq_file lock-start / unlock-stop pattern; predicate can't see across vtable. FP. - kmalloc-no-free-locally widens 43 -> 459: most are return-the-allocation patterns. Predicate too coarse. The borrowed-pages predicate's stability across 12.5× more files is a positive signal: the predicate is not hallucinating across the wider surface, and the kernel does not have a long tail of borrowed-pages instances hiding in unaudited subsystems. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Adds menagerie/pattern-predicate-protocol/notifications/ 2026-05-09-rxkad-verify-packet-2.md as the receipt-chain marker for the responsible-disclosure step that follows the 2026-05-09-linux-kernel-net.md experimental record. Captured: recipient (imv4bel@gmail.com), subject, signing key fingerprint (5FD21B4FE7E4A3CA7971CB09DE6639788E091026, Kevlar), body BLAKE2b-512 hash, 48-hour publication hold start. The signed email body itself is private correspondence and is NOT committed; the hash lets the recipient (or any future auditor) verify against the bytes received. The substantive claim communicated is summarized: rxkad_verify_packet_2 shares the in-place pattern of rxkad_verify_packet_1 (CVE-2026-43500), is dispatched from the same parent on a different security_level branch (ENCRYPT vs AUTH), with a per-fire write primitive of sp->len rather than a fixed 8 bytes. V4bel's submitted upstream gate-widening patch covers both via skb_unshare upstream of the security-ops dispatch. This is the responsible-disclosure step in the receipt chain that the PPP spec describes (pattern -> predicate -> substrate -> result -> triage -> notification -> public writeup -> FRP receipt). Each step is content-addressable; the notification's hash anchors the email without leaking the body. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CCP defines how atomic contract mementos compose across function call sites into ComposedFunctionContract mementos whose CIDs are the algebraic composition of their atomic constituents. The composed CID is what the handshake algorithm's tier 2 cache reuses for O(1) discharge of structurally-equivalent chains across any future program in any language. Architectural call (made jointly during today's session under Supra omnia, rectum): composition MUST be a single canonical primitive in libprovekit, exposed via FFI / CLI / direct-link to every lifter and verifier. Multi-implementation across N languages is rejected because it multiplies the trust surface and breaks federation under any divergence. The build order: 1. Per-language effects extraction (lifter prerequisite) 2. Extract compose_chain_contracts from provekit-walk into libprovekit 3. Expose libprovekit compose via Rust direct-link, C ABI FFI, and a `provekit compose` CLI subprocess speaking JSON-RPC 4. Wire each lifter to call the canonical compose 5. Bug-zoo cross-language equivalence specimen as the federation guarantee in executable form Sections: 0 Purpose 1 Relation to existing protocols 2 The composition function (signature, determinism, refusal rules) 3 Effects (per-language prerequisite, extraction notes) 4 Materialization timing (eager / lazy, both produce same CIDs) 5 Canonical implementation in libprovekit 6 Binding modes (Rust direct, C ABI FFI, JSON-RPC subprocess) 7 Cross-language equivalence (the bug-zoo specimen) 8 Failure modes (impure, schema-version, effect incompatibility, determinism, lifter effects-tracking gap) 9 Composition algebra (8 formal rules) 10 Worked example: kernel C function chain (kmalloc/memset/zalloc) 11 Versioning and revocation 12 Pipeline diagram A Canonical encoding for compose inputs B Reference implementation surface CCP composes with the existing protocols (memento envelope grammar, handshake algorithm, contract merge semantics, lift-plugin protocol, PPP, FRP) without schema changes. Implementation is multi-PR: extract compose into libprovekit, expose bindings, add per-language effects extraction to each lifter family. Under Supra omnia, rectum: this is the architecturally correct shape for cross-language composition. Federation across languages requires byte-identical composition CIDs; only a single canonical implementation guarantees that. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Contract Composition Protocol (CCP) v1.0.0 specified at protocol/specs/2026-05-09-contract-composition-protocol.md refuses to compose any subtree containing an impure atom. Without effects extraction the C lifter emits atoms that look pure to the composer, which would either silently produce unsound composed contracts (liberal, broken) or be globally refused (the lift becomes useless to composition). CCP section 3 names the per-language effects extraction as the prerequisite for sound composition. The Rust lifter (provekit-walk) already extracts the six canonical effect kinds from MIR. Bring the C lifter to parity using the libclang AST. The new pass walks each FunctionDecl body cursor after the existing structural visit and records: - Reads { target } on MemberRefExpr / ArraySubscriptExpr / UnaryOperator(deref) used as an rvalue. - Writes { target } on the LHS of an assignment / compound assignment / ++ / --. The assignment operator is identified by tokenizing the cursor extent and locating the punctuation token that sits between the first and second children, since libclang's BinaryOperatorKind accessor is version-gated. - Io on calls to a sorted allowlist of kernel-style entry points (copy_from_user, copy_to_user, kmalloc family, kfree, vmalloc family, printk, pr_/dev_ prefix families, sysfs_/debugfs_/netlink_ prefix families, register_chrdev, ioctl). - Unsafe on AsmStmt and on CStyleCastExpr that targets a non-void pointer type (type punning). - Panics on calls and macro instantiations of BUG, BUG_ON, BUG_ON_ONCE, WARN, WARN_ON, WARN_ON_ONCE, panic, abort, assert. - UnresolvedCall { name } on calls whose callee resolves to a struct field (ops-table dispatch like x->ops->method(...)) or a VarDecl/ParmDecl (function-pointer variable). Conservative tagging is acceptable per CCP section 3; liberal is unsound. The walker tags Writes with target "<unknown>" when the LHS root cannot be named, prefers the Reads tag on any complex expression that might be a read, and degrades gracefully on partial ASTs produced by libclang's KeepGoing + Incomplete modes by null-checking every clang_getCursorReferenced result. The new pk_c_function_fact carries an effects array; the new pk_c_emit_function_effects emits a function-effects declaration into the IR-document for each function with a body, matching the JSON shape of CCP section 3 ({"kind":"Reads","target":"x"} etc.). Existing contract emission is untouched; effects are additive. Effects extraction lives in provekit-lift-core/src/effects.c and is only compiled when libclang is available (the existing stub fallback path remains effects-free, and consumers see the established ast-backend-unavailable opacity entry that already flags the soundness gap on stub builds). Test: tests/fixtures/effects_basic.c provides one function per effect kind plus pure_function with the empty effect set. integration.sh asserts each effect kind appears on the right function. The assertions live inside the existing libclang availability gate so make test stays meaningful on stub builds. Two libclang quirks worth flagging for follow-up agents: tokenizing a BinaryOperator cursor's full extent returns LHS punctuation (member access dots) before the operator, so the operator probe must skip past the first child's end offset; and macro-expansion panic detection relies on the kernel headers not being on the include path, which is true for our standalone fixtures but may need rework when the kernel compile context lands real headers.

The Contract Composition Protocol (CCP) v1.0.0 specified at protocol/specs/2026-05-09-contract-composition-protocol.md mandates that compose_chain_contracts be a single canonical primitive in libprovekit, callable identically from every language lifter and verifier. CCP sections 2, 5, and 9 name the function, its determinism guarantees, and the eight algebra rules; section 5 explicitly places the implementation at libprovekit/src/compose.rs. Until now the function lived inside provekit-walk/src/contract.rs, which meant only the Rust walking lifter could call it. The C lifter, future Java / Go / TypeScript / Python lifters, and the provekit-cli compose subcommand could not reach the canonical implementation. Federation across languages depends on byte-identical composed CIDs from a single source of truth. This commit is plumbing only: move plus rewire plus smoke test. The composition algebra is unchanged; CIDs are byte-identical pre-refactor and post-refactor (the existing 238 walk lib tests continue to pass without modification, including the deterministic CID assertions that would catch any algebra drift). Moved to libprovekit::compose: - compose_chain_contracts, ChainStep - compose_function_contracts, compose_function_contracts_checked, compose_with_composed - FunctionContractMemento, ComposedFunctionContract - EffectSet, Effect, AtomicKind - AliasingMemento, AliasingStatus - OpacityError, OpacityMementoLookup, EmptyOpacityPool, PinInvariantMementoView - Locus (data type only; the syn-driven from_span constructor stays in walk so libprovekit doesn't gain a syn dependency) - build_value, build_memento_value, sort_to_value - The capture-avoiding substitute_in_formula and the JCS / CID canonical-encoding glue (formula_to_canonical, cid_of_value, jcs_bytes_of_value), duplicated from walk's wp.rs and canonical.rs so the compose module is self-contained. Walk's pre-existing copies remain untouched; both impls operate over identical types from provekit-ir-types and produce byte-equivalent output by construction. Walk's contract.rs is now a thin shim: pub use everything from libprovekit::compose, plus the syn-walking helpers (build_function_contract, detect_effects, scan_*_for_effects, extract_formals, infer_sort, is_io_method, is_known_pure_method, is_known_pure_call) and the existing 32 tests. Existing 'use crate::contract::FunctionContractMemento' paths in walk's other modules (chain.rs, marriage.rs, llbc_calls.rs, llbc_lift.rs, envelope.rs, type_decl.rs, charon_runner.rs) all continue to resolve through the re-export, so no churn outside locus.rs and contract.rs. Walk's locus.rs adopts libprovekit::compose::Locus via re-export and supplies a syn-using from_span free function plus a LocusFromSpanExt trait so the existing five Locus::from_span call sites in type_decl.rs keep working with one extra import. provekit-verifier and provekit-cli only refer to the compose primitive in comments today, so no rewiring was needed there. They continue to build clean. Smoke test at libprovekit/tests/compose_smoke.rs composes two trivial pure atoms via compose_chain_contracts and asserts the composed CID is byte-stable across runs against a pinned hex value. Per CCP section 5 ("Test corpus") and section 11 ("Versioning and revocation"), any future change that alters this pinned CID requires a CCP version bump; the test is the conformance witness for v1.0.0 of the algebra. No FFI work; the C ABI wrapper at libprovekit/include/provekit-compose.h and the JSON-RPC subprocess CLI subcommand are follow-up commits (spec sections 6.2 and 6.3, both marked planned in Appendix B). The C lifter's federation hookup is a separate agent's PR in parallel. Build verified: cargo build --release -p libprovekit -p provekit-walk -p provekit-cli cargo test -p libprovekit # 1 passed (compose_smoke) cargo test -p provekit-walk --lib # 238 passed cargo test -p provekit-walk # 238 + 12 integration files passed Pinned smoke-test CID for the two-atom identity-result fixture: blake3-512:36212b7bf7b9ccf264950940a33d64e1cfe88b6f4d8a47c01949fc64d9359d1813d6147aa2e1afe82b01e6e7ebcbe0a413683284b5f47ffef5bf364213304665 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Captured during the contract-as-predicate exploration agent fanout (2026-05-09). Five concrete predicate proposals that need substrate extensions to compile: __user direct deref, __rcu without rcu_dereference, __force cast manifest, __iomem direct deref, __bitwise contamination. Sources lifter analysis at implementations/c/provekit-lift-c-sparse/ and identifies the parameter-name binding gap and missing deref-site facts as the shared substrate extension that unblocks 3 of the 5 predicates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…CP section 7 This specimen is the load-bearing federation test for the entire Contract Composition Protocol. It pairs a tiny pure Rust chain with a structurally equivalent C chain and asserts that the canonical compose_chain_contracts primitive (libprovekit, CCP section 5) produces byte-identical ComposedFunctionContract CIDs from both sides. Layout follows the brief flat shape: README, lab/rust, lab/c, runner.sh. The Rust crate exposes vec_double_then_filter_positive_then_sum composed of three pure single-line helpers (double, keep_positive, sum). The C source mirrors the same algebra and arithmetic with the same per-helper pre/post comments, so the only meaningful difference between the two inputs is the surface language. Any divergence in the composed CID is therefore attributable to the lifters or to the canonical compose primitive, not to the reference algebra. The runner stubs cleanly to PENDING (exit 2) when the C ABI FFI binding or the Rust lifter CLI surface is not yet wired, per CCP section 6.2 and section 6.3. This is the accepted v0 state: the specimen is scaffolded so the assertion can be filled in as the bindings land. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 18

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

menagerie/pattern-predicate-protocol/predicates/rcu-read-lock-no-unlock.sql (1)

15-23: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Correlate lock/unlock by both function and path, not function name alone.

Line 17 and Line 22 use only caller_function; this conflates same-named functions across files and distorts results.

Key by `(caller_function, callsite_path)`

 WHERE NOT EXISTS (
   SELECT 1 FROM call_edges g
   WHERE g.caller_function = r.caller_function
+    AND g.callsite_path = r.callsite_path
     AND g.callee_name IN (
       'rcu_read_unlock', 'rcu_read_unlock_bh', 'rcu_read_unlock_sched'
     )
 )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@menagerie/pattern-predicate-protocol/predicates/rcu-read-lock-no-unlock.sql`
around lines 15 - 23, The WHERE clause currently correlates lock/unlock only by
caller_function which conflates same-named functions; update the join to key on
both caller_function and callsite_path so the EXISTS checks the same callsite:
in the subquery on call_edges (alias g) add AND g.callsite_path =
r.callsite_path alongside g.caller_function = r.caller_function (and keep
g.callee_name IN (...) as-is) so the predicate matches by (caller_function,
callsite_path).

🧹 Nitpick comments (2)

menagerie/pattern-predicate-protocol/experiments/2026-05-09-linux-kernel-wide.md (1)
64-68: 💤 Low value

Acronym collision: "PPP" overloaded with Pattern Predicate Protocol.

This document otherwise uses "PPP" exclusively for Pattern Predicate Protocol (e.g. "PPP-canonical receipt machinery", "PPP Appendix B"). Using "PPP I/O paths" to mean the Linux Point-to-Point Protocol driver in the same paragraph is going to confuse a reader who jumped straight to this section. Disambiguating once is enough.
📝 Suggested wording
-- **PPP I/O paths** (`drivers/net/ppp/ppp_*.c`): `ppp_async_ioctl`,
+- **Point-to-Point Protocol driver I/O paths** (`drivers/net/ppp/ppp_*.c`): `ppp_async_ioctl`,
And in the table on line 28:
-| `copy_from_user-no-bounds.sql` | 9 | 28 | +19 (drivers/net PPP, TUN, debugfs; security/apparmor; security/keys) |
+| `copy_from_user-no-bounds.sql` | 9 | 28 | +19 (drivers/net ppp driver, TUN, debugfs; security/apparmor; security/keys) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@menagerie/pattern-predicate-protocol/experiments/2026-05-09-linux-kernel-wide.md`
around lines 64 - 68, The paragraph uses "PPP" for both Pattern Predicate
Protocol and Linux Point-to-Point Protocol, causing confusion; update the
mention of the kernel drivers and function list (ppp_async_ioctl, ppp_ioctl,
ppp_set_compress, ppp_write, ppp_sync_ioctl) to disambiguate by expanding the
acronym once—e.g., change "PPP I/O paths" to "PPP (Point-to-Point Protocol) I/O
paths" or "Linux PPP I/O paths" and keep the rest of the paragraph as-is so
readers know this PPP refers to the kernel driver rather than the document's
Pattern Predicate Protocol.
implementations/c/provekit-lift-c-kernel-doc/tests/integration.sh (1)
317-357: 💤 Low value

Effect assertions are coupled to JSON key emission order.

Each grep -qE pins the object key sequence to function, kind, effects. The effects emitter uses fixed insertion order in its snprintf; if this were refactored to use JCS-canonical output (alphabetical keys: effects, function, kind, locus) — which is the convention used elsewhere in this PR for content-addressed mementos — every one of these assertions would silently regress to a misleading FAIL.

Two hardenings to consider:

Match each key independently with per-key grep against a per-function JSON object slice, or

Use jq to parse and assert against a normalized projection.

Not a blocker for the draft; brittleness worth addressing while the emitter format is still in flux.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@implementations/c/provekit-lift-c-kernel-doc/tests/integration.sh` around
lines 317 - 357, The test assertions currently check EFFECTS_RESPONSE with
ordered regexes (e.g., grep -qE
'"function":"pure_function","kind":"function-effects","effects":\[\]') which
couples tests to JSON key order; update the checks to be order-independent by
parsing the JSON output and asserting per-function properties: either
(preferred) use jq to find the object with .function == "pure_function" and
assert .effects == [] or .effects[].kind contains the expected kind, or
(simpler) isolate the JSON object for each function name from EFFECTS_RESPONSE
and run independent grep checks for the "function", "kind", and "effects" keys;
target the test harness symbol EFFECTS_RESPONSE and the function names
pure_function, writes_function, reads_function, io_function, unsafe_function,
panics_function, unresolved_call_function when applying the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/research/sparse-annotation-as-predicate.md`:
- Around line 119-123: The paragraph starts with "#1" which triggers MD018 and
may be parsed as a malformed heading; change the leading "#1" marker to a normal
numbered list token (e.g., "1.") or escape/code-format it (e.g., "`#1`") so it
isn't interpreted as a heading, keeping the rest of the text (mentions of "#2",
"#3", "#4", "#5" and "__user") unchanged; update the same sentence that begins
with "#1" to use the new marker to avoid the MD018 lint error.

In `@implementations/c/provekit-lift-c-kernel-doc/tests/integration.sh`:
- Around line 297-298: Update the inline comment "Per-function effects
extraction per CCP v1.0.0 section 3." to match this PR's protocol version by
changing "v1.0.0" to "v0.1.0" or remove the explicit version string entirely;
locate the comment in the tests/integration.sh block that contains that exact
text and edit it so it references CCP v0.1.0 (or omits the version) to keep the
comment consistent with the PR and the spec file.

In `@implementations/c/provekit-lift-core/src/clang_ast.c`:
- Around line 166-180: pk_c_clang_find_function currently returns the first fact
matching name, which can be the forward-declaration (has_body==0) so effects get
attached to the prototype and later ignored; change the lookup to prefer a fact
with the same name and has_body == 1 (return that if found), and only if no such
body-entry exists fall back to returning a prototype fact (has_body == 0); apply
the same fix to the other lookup at lines 579-586 (i.e., ensure callers that
search functions by name prefer the has_body==1 fact or use the updated
pk_c_clang_find_function behavior).
- Around line 769-774: The call to
pk_c_emit_function_effects(facts->extraction_result, facts) currently ignores
its return value; change it to check the return and propagate failure like the
AST-walk failure path: if pk_c_emit_function_effects returns an error
(non-success), propagate that to the caller (e.g., return an error code or NULL
from the current parse function) or mark facts->extraction_result as failed and
abort parsing rather than continuing; locate the block that creates
facts->extraction_result (pk_c_lift_result_new) and the subsequent
pk_c_emit_function_effects call and implement the same
failure-handling/propagation logic used by the AST-walk routine so missing
function-effects are surfaced.

In `@implementations/c/provekit-lift-core/src/effects.c`:
- Around line 723-735: pk_c_extract_function_effects currently ignores
clang_visitChildren()'s return so failures from pk_c_effects_walk (which returns
CXChildVisit_Break when pk_c_function_fact_add_effect() fails) are swallowed;
change pk_c_extract_function_effects to return an int error code (0 on success,
nonzero on failure), call clang_visitChildren and return its nonzero result to
the caller, and update callers to treat nonzero as an extraction failure and
mark the corresponding pk_c_function_fact as opaque/closed; ensure references to
pk_c_effects_walk and pk_c_function_fact_add_effect remain unchanged except for
their existing break-on-failure behavior so the over-approximation guarantee is
preserved.

In `@implementations/rust/libprovekit/src/compose.rs`:
- Around line 1018-1042: serde_to_canonical currently maps serde_json::Number
that don't fit in i64 into a synthetic object which violates JCS number
canonicalization; change the behavior in serde_to_canonical (and related Value
type) so numbers are handled as true JSON numbers: either (A) make
serde_to_canonical return a CompositionError for non-i64 numbers (propagate
CompositionError from serde_to_canonical callers, e.g., where
FunctionContractMemento bodies are composed), or (B, preferred) add numeric
variants to provekit_canonicalizer::Value (e.g., f64/u64 or a generic Number
variant), update the Value type used by encode_jcs to emit ECMA-262 canonical
number serialization, and map JsonValue::Number -> the new numeric Variant in
serde_to_canonical; update callers and encode_jcs accordingly to ensure
cross-language JCS compliance.

In
`@menagerie/pattern-predicate-protocol/notifications/2026-05-09-rxkad-verify-packet-2.md`:
- Around line 18-20: The file
menagerie/pattern-predicate-protocol/notifications/2026-05-09-rxkad-verify-packet-2.md
currently publishes direct personal identifiers (the Recipient
"imv4bel@gmail.com (Hyunwoo Kim, V4bel)" and the Signing key owner "Kevlar
<evilgenius@nefariousplan.com>, fp 5FD2...1026"); redact or replace these with
neutral placeholders (e.g., "[redacted email]" or "[Name/handle redacted]") or
reference documented consent, update the Recipient and Signing key lines
accordingly, and ensure any fingerprint or email strings (imv4bel@gmail.com,
Hyunwoo Kim, V4bel, evilgenius@nefariousplan.com, fingerprint
5FD21B4FE7E4A3CA7971CB09DE6639788E091026) are removed or masked.

In
`@menagerie/pattern-predicate-protocol/predicates/borrowed-pages-as-scratch-v2.sql`:
- Around line 40-46: The candidate_parents CTE is under-keyed because it joins
call_edges c to inplace_skb_frag_receivers r only on function name
(c.callee_name = r.caller_function), which can cross-link identically named
functions; tighten the join by adding the module/binary or address-level keys
available in your schema (e.g., c.callee_module or c.callee_binary and/or
c.callee_addr vs r.caller_module/r.caller_binary/r.caller_addr) so the join
matches both function name and its owning module/object/address, keeping the
DISTINCT and same selected aliases (candidate and parent).

In `@menagerie/pattern-predicate-protocol/predicates/kmalloc-no-free-locally.sql`:
- Around line 20-24: The NOT EXISTS free-check is only matching on
caller_function which can collide across files; update the EXISTS subquery that
references call_edges (alias g) and the caller rows (alias a) to match both the
function name and the file/path column (e.g. add a condition like g.caller_file
= a.caller_file or g.caller_path = a.caller_path depending on the actual column
name) so the check is scoped to the same function in the same file and avoids
cross-file symbol masking.

In `@menagerie/pattern-predicate-protocol/predicates/spin-lock-no-unlock.sql`:
- Around line 21-24: The query in predicates/spin-lock-no-unlock.sql uses only
g.caller_function = l.caller_function to correlate unlocks and can conflate
same-named functions across different paths; update the NOT EXISTS subquery on
table call_edges (aliases g and l) to also compare path columns so correlations
require both caller_path and callee_path to match (e.g., add g.caller_path =
l.caller_path and ensure g.callee_path matches the corresponding l path when
checking g.callee_name IN (...)); this will scope matches by full symbol path as
well as name and avoid false positives from same-named functions.

In `@menagerie/pattern-predicate-protocol/tools/lift_kernel.py`:
- Around line 113-115: The JSON decode of edges currently does json.loads(e)
inside the block that handles string edges (in lift_kernel.py near the code that
appends to norm), which can raise JSONDecodeError and abort the run; wrap the
json.loads call in a try/except catching json.JSONDecodeError (or
json.decoder.JSONDecodeError), and on failure log or record the malformed edge
and skip it (i.e., do not append a broken value to norm) so ingestion continues;
keep the rest of the branch (the isinstance(e, str) check and the
norm.append(e)) intact but only append when decoding succeeds.
- Line 15: The LIFTER constant is insecurely hardcoded to a /tmp path; replace
the literal assignment to LIFTER with code that reads a configured path (e.g.
from an environment variable like PROVEKIT_LIFTER_PATH) and otherwise resolves
the binary via shutil.which('provekit-lift-c-kernel-doc'), then validate the
resolved path (os.access(..., os.X_OK)) and check file ownership/permissions
(os.stat to ensure it is not in a world-writable directory and not
world-writable itself) before using it; update lift_kernel.py to set LIFTER from
that lookup/validation and raise a clear error if validation fails.

In `@menagerie/pattern-predicate-protocol/tools/run_predicates.py`:
- Around line 13-15: The CID prefix in function cid is incorrect: it labels the
hash as "blake3-512" but uses hashlib.blake2b; either change the prefix to match
the implemented algorithm (e.g., "blake2b-512:") or switch the implementation to
compute a BLAKE3-512 digest (use a blake3 library and its .hexdigest() for
64-byte digest). Update the cid function accordingly so the prefix and the hash
algorithm (hashlib.blake2b vs blake3) are consistent.

In `@protocol/specs/2026-05-09-contract-composition-protocol.md`:
- Around line 56-58: Update the spec wording to match the canonical Rust
implementation: rename inner-result variables using the impl convention
result__<last_12_hex_of_cid> (double underscore + last 12 hex chars of the CID)
rather than result_<full_cid>, document the truncation rule and collision
rationale (why 12 hex chars is sufficient and colons are excluded), and make the
naming consistent in both §2 and §9 Rule 2 (use the same label, e.g. G = inner)
so the spec and the implementation functions
FunctionContractMemento::result_var_name and find_namespaced_result refer to the
identical byte-level form used when hashing composed pre/post formulas.

In `@protocol/specs/2026-05-09-pattern-predicate-protocol.md`:
- Around line 169-171: The fenced code blocks showing the tuple/list and
application/result-set examples (e.g., the line with applicationCid =
blake3-512(canonical(predicateCid, substrateCid))) should include a language
hint to fix MD040 and restore syntax highlighting; add "text" for tuple/list
style blocks and "json" for application/result-set or result examples, and apply
the same change to the other similar fences in this document that contain
tuple/list examples and application/result-set examples.
- Around line 484-489: The current wording "SQLite core plus JSON1" is too broad
and admits nondeterministic core functions; update the text in the v1 query
language paragraph (Appendix A / Section 3 reference) to either (A) replace the
phrase with an explicit deterministic allowlist that enumerates only
deterministic SQLite functions plus the JSON1 functions `json_extract`,
`json_each`, `json_array`, `json_object`, `json_array_length`, `json_type`,
`json_valid`, or (B) keep the core-but-JSON1 phrasing and add an explicit ban
list calling out known nondeterministic SQLite functions (e.g. `random()`,
`randomblob()`, `changes()`, `last_insert_rowid()`, and time helpers like
`date()`, `time()`, `datetime()`, `julianday()`, `strftime()`) and state that
any function not on the allowlist disqualifies a v1 predicate; ensure the edited
paragraph references "v1 query language" and "v1 predicate" so readers can
locate the rule.
- Around line 339-343: The example text is inconsistent about where the patch
was applied; update the second site reference so both places point to the same
functions/files used in the experiment record—either change the mention of
net/rxrpc/call_event.c and net/rxrpc/conn_event.c to refer to the functions
rxrpc_input_call_event and rxrpc_verify_response, or change the earlier function
names to the file-centric references; ensure references to
rxrpc_input_call_event, rxrpc_verify_response, net/rxrpc/call_event.c and
net/rxrpc/conn_event.c are consistent so readers can replay the example without
ambiguity.

---

Outside diff comments:
In `@menagerie/pattern-predicate-protocol/predicates/rcu-read-lock-no-unlock.sql`:
- Around line 15-23: The WHERE clause currently correlates lock/unlock only by
caller_function which conflates same-named functions; update the join to key on
both caller_function and callsite_path so the EXISTS checks the same callsite:
in the subquery on call_edges (alias g) add AND g.callsite_path =
r.callsite_path alongside g.caller_function = r.caller_function (and keep
g.callee_name IN (...) as-is) so the predicate matches by (caller_function,
callsite_path).

---

Nitpick comments:
In `@implementations/c/provekit-lift-c-kernel-doc/tests/integration.sh`:
- Around line 317-357: The test assertions currently check EFFECTS_RESPONSE with
ordered regexes (e.g., grep -qE
'"function":"pure_function","kind":"function-effects","effects":\[\]') which
couples tests to JSON key order; update the checks to be order-independent by
parsing the JSON output and asserting per-function properties: either
(preferred) use jq to find the object with .function == "pure_function" and
assert .effects == [] or .effects[].kind contains the expected kind, or
(simpler) isolate the JSON object for each function name from EFFECTS_RESPONSE
and run independent grep checks for the "function", "kind", and "effects" keys;
target the test harness symbol EFFECTS_RESPONSE and the function names
pure_function, writes_function, reads_function, io_function, unsafe_function,
panics_function, unresolved_call_function when applying the change.

In
`@menagerie/pattern-predicate-protocol/experiments/2026-05-09-linux-kernel-wide.md`:
- Around line 64-68: The paragraph uses "PPP" for both Pattern Predicate
Protocol and Linux Point-to-Point Protocol, causing confusion; update the
mention of the kernel drivers and function list (ppp_async_ioctl, ppp_ioctl,
ppp_set_compress, ppp_write, ppp_sync_ioctl) to disambiguate by expanding the
acronym once—e.g., change "PPP I/O paths" to "PPP (Point-to-Point Protocol) I/O
paths" or "Linux PPP I/O paths" and keep the rest of the paragraph as-is so
readers know this PPP refers to the kernel driver rather than the document's
Pattern Predicate Protocol.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 55acde0d-e0b4-4b3d-b60a-5ebb3c89bb30

📥 Commits

Reviewing files that changed from the base of the PR and between f8be0e0 and e92f7da.

⛔ Files ignored due to path filters (1)

implementations/rust/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (30)

docs/research/sparse-annotation-as-predicate.md
implementations/c/provekit-lift-c-kernel-doc/Makefile
implementations/c/provekit-lift-c-kernel-doc/tests/fixtures/effects_basic.c
implementations/c/provekit-lift-c-kernel-doc/tests/integration.sh
implementations/c/provekit-lift-core/include/provekit/c_lift_core.h
implementations/c/provekit-lift-core/src/clang_ast.c
implementations/c/provekit-lift-core/src/effects.c
implementations/c/provekit-lift-core/src/parser.c
implementations/rust/libprovekit/Cargo.toml
implementations/rust/libprovekit/src/compose.rs
implementations/rust/libprovekit/src/lib.rs
implementations/rust/libprovekit/tests/compose_smoke.rs
implementations/rust/provekit-walk/Cargo.toml
implementations/rust/provekit-walk/src/contract.rs
implementations/rust/provekit-walk/src/locus.rs
implementations/rust/provekit-walk/src/type_decl.rs
menagerie/pattern-predicate-protocol/README.md
menagerie/pattern-predicate-protocol/experiments/2026-05-09-linux-kernel-net.md
menagerie/pattern-predicate-protocol/experiments/2026-05-09-linux-kernel-wide.md
menagerie/pattern-predicate-protocol/notifications/2026-05-09-rxkad-verify-packet-2.md
menagerie/pattern-predicate-protocol/predicates/borrowed-pages-as-scratch-v2.sql
menagerie/pattern-predicate-protocol/predicates/borrowed-pages-as-scratch.sql
menagerie/pattern-predicate-protocol/predicates/copy_from_user-no-bounds.sql
menagerie/pattern-predicate-protocol/predicates/kmalloc-no-free-locally.sql
menagerie/pattern-predicate-protocol/predicates/rcu-read-lock-no-unlock.sql
menagerie/pattern-predicate-protocol/predicates/spin-lock-no-unlock.sql
menagerie/pattern-predicate-protocol/tools/lift_kernel.py
menagerie/pattern-predicate-protocol/tools/run_predicates.py
protocol/specs/2026-05-09-contract-composition-protocol.md
protocol/specs/2026-05-09-pattern-predicate-protocol.md

coderabbitai · 2026-05-09T19:13:03Z

+#1 (`__user` deref) gives the biggest CVE-class blast radius and forces the substrate
+extension every other predicate also needs (parameter-binding + deref-site facts). Land
+that scaffolding first; #2 and #4 then collapse to one-day predicates each. #3 is
+cheap and high-value as a *manifest* even before bug detection. #5 is the lowest
+priority but is the canonical demonstration that ProvekIt preserves discipline no other


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid #1 at the start of the paragraph.

Line 119 trips MD018 and can be parsed ambiguously as a malformed heading. Please rewrite this as a normal numbered list item (1.) or escape/code-format the #1 marker.

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 119-119: No space after hash on atx style heading

(MD018, no-missing-space-atx)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/research/sparse-annotation-as-predicate.md` around lines 119 - 123, The paragraph starts with "#1" which triggers MD018 and may be parsed as a malformed heading; change the leading "#1" marker to a normal numbered list token (e.g., "1.") or escape/code-format it (e.g., "`#1`") so it isn't interpreted as a heading, keeping the rest of the text (mentions of "#2", "#3", "#4", "#5" and "__user") unchanged; update the same sentence that begins with "#1" to use the new marker to avoid the MD018 lint error.

coderabbitai · 2026-05-09T19:13:03Z

+
+    # Per-function effects extraction per CCP v1.0.0 section 3.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Version typo: CCP is v0.1.0 in this PR, not v1.0.0.

The PR objectives and the spec filename (protocol/specs/2026-05-09-contract-composition-protocol.md) both name CCP v0.1.0. Either drop the version from the comment or update it to match.

📝 Suggested wording

- # Per-function effects extraction per CCP v1.0.0 section 3. + # Per-function effects extraction per CCP v0.1.0 section 3.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Per-function effects extraction per CCP v1.0.0 section 3.

# Per-function effects extraction per CCP v0.1.0 section 3.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@implementations/c/provekit-lift-c-kernel-doc/tests/integration.sh` around lines 297 - 298, Update the inline comment "Per-function effects extraction per CCP v1.0.0 section 3." to match this PR's protocol version by changing "v1.0.0" to "v0.1.0" or remove the explicit version string entirely; locate the comment in the tests/integration.sh block that contains that exact text and edit it so it references CCP v0.1.0 (or omits the version) to keep the comment consistent with the PR and the spec file.

coderabbitai · 2026-05-09T19:13:03Z

+    if (facts->extraction_result == NULL) {
+        facts->extraction_result = pk_c_lift_result_new();
+    }
+    if (facts->extraction_result != NULL) {
+        (void)pk_c_emit_function_effects(facts->extraction_result, facts);
+    }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate pk_c_emit_function_effects() failures.

Line 773 can fail, but the return value is discarded. That leaves the parse looking successful even though some function-effects declarations may be missing. Please surface this the same way the AST-walk failure path does, or fail the parse outright.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@implementations/c/provekit-lift-core/src/clang_ast.c` around lines 769 - 774, The call to pk_c_emit_function_effects(facts->extraction_result, facts) currently ignores its return value; change it to check the return and propagate failure like the AST-walk failure path: if pk_c_emit_function_effects returns an error (non-success), propagate that to the caller (e.g., return an error code or NULL from the current parse function) or mark facts->extraction_result as failed and abort parsing rather than continuing; locate the block that creates facts->extraction_result (pk_c_lift_result_new) and the subsequent pk_c_emit_function_effects call and implement the same failure-handling/propagation logic used by the AST-walk routine so missing function-effects are surfaced.

coderabbitai · 2026-05-09T19:13:03Z

+void pk_c_extract_function_effects(
+    pk_c_function_fact *fact,
+    void *function_cursor_ptr
+) {
+    CXCursor *cursor = (CXCursor *)function_cursor_ptr;
+    pk_c_effects_visit_ctx ctx;
+
+    if (fact == NULL || cursor == NULL) {
+        return;
+    }
+    ctx.fact = fact;
+    (void)clang_visitChildren(*cursor, pk_c_effects_walk, &ctx);
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t swallow extraction failures in a pass that must over-approximate.

pk_c_effects_walk() uses CXChildVisit_Break when pk_c_function_fact_add_effect() fails, but pk_c_extract_function_effects() is void and ignores clang_visitChildren()’s return. An OOM or add-effect failure therefore looks like success and emits a partial effect set, which is unsound per the file header. Please make this API return an error and have the caller record opacity/fail closed on nonzero.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@implementations/c/provekit-lift-core/src/effects.c` around lines 723 - 735, pk_c_extract_function_effects currently ignores clang_visitChildren()'s return so failures from pk_c_effects_walk (which returns CXChildVisit_Break when pk_c_function_fact_add_effect() fails) are swallowed; change pk_c_extract_function_effects to return an int error code (0 on success, nonzero on failure), call clang_visitChildren and return its nonzero result to the caller, and update callers to treat nonzero as an extraction failure and mark the corresponding pk_c_function_fact as opaque/closed; ensure references to pk_c_effects_walk and pk_c_function_fact_add_effect remain unchanged except for their existing break-on-failure behavior so the over-approximation guarantee is preserved.

coderabbitai · 2026-05-09T19:13:04Z

+def cid(b: bytes) -> str:
+    return "blake3-512:" + hashlib.blake2b(b, digest_size=64).hexdigest()
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify claimed vs actual algorithm usage in this script. rg -n -C2 'def cid|blake3-512|blake2b' menagerie/pattern-predicate-protocol/tools/run_predicates.py

Repository: TSavo/provekit

Length of output: 180

CID algorithm label mismatch: prefix claims blake3-512: but implementation uses blake2b.

Line 14 advertises BLAKE3 in the CID prefix but computes BLAKE2b, creating inconsistency that breaks downstream verification logic relying on the algorithm label.

Fix

def cid(b: bytes) -> str: - return "blake3-512:" + hashlib.blake2b(b, digest_size=64).hexdigest() + return "blake2b-512:" + hashlib.blake2b(b, digest_size=64).hexdigest()

Alternatively, implement BLAKE3 instead of BLAKE2b if that algorithm is truly required.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def cid(b: bytes) -> str:

return "blake3-512:" + hashlib.blake2b(b, digest_size=64).hexdigest()

def cid(b: bytes) -> str:

return "blake2b-512:" + hashlib.blake2b(b, digest_size=64).hexdigest()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@menagerie/pattern-predicate-protocol/tools/run_predicates.py` around lines 13 - 15, The CID prefix in function cid is incorrect: it labels the hash as "blake3-512" but uses hashlib.blake2b; either change the prefix to match the implemented algorithm (e.g., "blake2b-512:") or switch the implementation to compute a BLAKE3-512 digest (use a blake3 library and its .hexdigest() for 64-byte digest). Update the cid function accordingly so the prefix and the hash algorithm (hashlib.blake2b vs blake3) are consistent.

coderabbitai · 2026-05-09T19:13:04Z

+**Singular formal substitution.** For an atom F with formals `[a, b, c]` called as `F(x, G(y), z)`, only the second argument `G(y)` triggers composition (G's contract substitutes into F's at the b-position). The first and third arrivals (`x`, `z`) are leaf substitutions that do not introduce new contract-level composition.
+
+**CID-namespaced result variable.** Each atom's `post` formula references a free variable (conventionally `result`). When composing F into a caller's pre, F's `result` is renamed to `result_<F.cid>` to avoid free-variable collision across nested composition.


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Inner-result rename rule diverges from the canonical implementation.

Spec §2 says inner's result is renamed to result_<F.cid> and §9 Rule 2 repeats the rule as result_<G.cid> (single underscore, full CID). The reference implementation at implementations/rust/libprovekit/src/compose.rs (FunctionContractMemento::result_var_name and find_namespaced_result) uses result__<last_12_hex_of_cid> instead — double underscore and a 12-hex-char truncation. The literal spec form is also impractical because CIDs include : (e.g. blake3-512:...), which is awkward inside an IR variable name.

Because the inner-result variable name appears verbatim inside the canonical pre/post formulas hashed into the composed CID, this is exactly the kind of byte-level disagreement that will surface as a federation failure in the §7 cross-language equivalence specimen the moment a second-language binding is implemented faithful to the spec text. Recommend amending the spec to match the impl convention (and document the truncation length + collision argument) before CCP v1.0.0 is locked, so the smoke-test pinned CID stays stable. Note also the F/G label flip between §2 and §9 Rule 2 for the same role (the inner contract).

📝 Suggested spec wording

-When composing F into a caller's pre, F's `result` is renamed to `result_<F.cid>` to avoid free-variable collision across nested composition. +When composing inner atom G into a caller's pre, G's `result` is renamed to +`result__<tail>`, where `<tail>` is the trailing 12 hex characters of `G.cid`. +Truncation is bounded so variable names stay readable; collision probability +at 12 hex chars (≈2^48) is acceptable for chains of realistic depth.

Apply the same wording to §9 Rule 2 and unify on a single label (G = inner) across both sections.

Also applies to: 221-223

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@protocol/specs/2026-05-09-contract-composition-protocol.md` around lines 56 - 58, Update the spec wording to match the canonical Rust implementation: rename inner-result variables using the impl convention result__<last_12_hex_of_cid> (double underscore + last 12 hex chars of the CID) rather than result_<full_cid>, document the truncation rule and collision rationale (why 12 hex chars is sufficient and colons are excluded), and make the naming consistent in both §2 and §9 Rule 2 (use the same label, e.g. G = inner) so the spec and the implementation functions FunctionContractMemento::result_var_name and find_namespaced_result refer to the identical byte-level form used when hashing composed pre/post formulas.

coderabbitai · 2026-05-09T19:13:04Z

+```
+applicationCid = blake3-512(canonical(predicateCid, substrateCid))
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add languages to the fenced code blocks.

These unlabeled fences trigger MD040 and lose syntax highlighting. text fits the tuple/list examples, and json fits the application/result-set examples.

Also applies to: 306-317, 328-337

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 169-169: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@protocol/specs/2026-05-09-pattern-predicate-protocol.md` around lines 169 - 171, The fenced code blocks showing the tuple/list and application/result-set examples (e.g., the line with applicationCid = blake3-512(canonical(predicateCid, substrateCid))) should include a language hint to fix MD040 and restore syntax highlighting; add "text" for tuple/list style blocks and "json" for application/result-set or result examples, and apply the same change to the other similar fences in this document that contain tuple/list examples and application/result-set examples.

coderabbitai · 2026-05-09T19:13:04Z

+The v1 query language is SQLite-3 with the JSON1 module enabled. The
+admitted built-in function set is the SQLite core plus JSON1's
+`json_extract`, `json_each`, `json_array`, `json_object`,
+`json_array_length`, `json_type`, `json_valid`. Functions outside this
+set break determinism guarantees and disqualify a query from being a
+v1 predicate.


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Appendix A currently admits nondeterministic SQLite functions.

“SQLite core plus JSON1” is too broad for Section 3’s determinism guarantee: SQLite core includes functions like random(), randomblob(), changes(), last_insert_rowid(), and time helpers that can vary across runs. As written, a predicate can be v1-admissible while still violating PPP determinism. Please replace this with an explicit deterministic allowlist, or explicitly ban the nondeterministic core functions.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@protocol/specs/2026-05-09-pattern-predicate-protocol.md` around lines 484 - 489, The current wording "SQLite core plus JSON1" is too broad and admits nondeterministic core functions; update the text in the v1 query language paragraph (Appendix A / Section 3 reference) to either (A) replace the phrase with an explicit deterministic allowlist that enumerates only deterministic SQLite functions plus the JSON1 functions `json_extract`, `json_each`, `json_array`, `json_object`, `json_array_length`, `json_type`, `json_valid`, or (B) keep the core-but-JSON1 phrasing and add an explicit ban list calling out known nondeterministic SQLite functions (e.g. `random()`, `randomblob()`, `changes()`, `last_insert_rowid()`, and time helpers like `date()`, `time()`, `datetime()`, `julianday()`, `strftime()`) and state that any function not on the allowlist disqualifies a v1 predicate; ensure the edited paragraph references "v1 query language" and "v1 predicate" so readers can locate the rule.

Exposes the canonical compose primitive (extracted in commit f06ce69) through a stable C ABI so the C lifter family and any other native consumer share libprovekit's implementation. Marshaling is JCS-encoded JSON across the boundary; libprovekit owns the canonical encoding per CCP Appendix A. No algebra change; this is wiring, not new physics. Surface (matches CCP §6.2 verbatim): pk_composition_result *pk_compose_chain_contracts( const char *atoms_jcs, const char *effects_jcs, size_t atoms_len, size_t effects_len); const char *pk_composition_result_cid(const pk_composition_result *r); const char *pk_composition_result_body_jcs(const pk_composition_result *r); const char *pk_composition_result_error(const pk_composition_result *r); void pk_composition_result_free(pk_composition_result *r); Two intentional, documented deviations from the literal §6.2 text: 1. The Rust algebra needs a per-step `formal_idx` (which formal of the outer atom the inner's result feeds). §6.2's signature has no slot for it. Resolution: each atom JSON carries an outer `formalIdx` field alongside the canonical `memento` body shape: `{"memento": <canonical body>, "formalIdx": N}`. The C signature itself is byte-identical to §6.2. 2. Effects appear twice: embedded in each memento body (per `build_value`) and in the parallel `effects_jcs` array. The embedded copy is authoritative; the parallel array is required by the spec signature and MUST equal-by-value, otherwise the FFI returns a typed `EffectsMismatch` error. Single source of truth, spec signature preserved, cross-check enforced. The crate now produces `cdylib` and `staticlib` artifacts in addition to `lib`, so existing in-workspace Rust consumers (provekit-cli, walk, verifier) keep working unchanged while C consumers can link the static or dynamic library. Implementation lives in `src/ffi.rs`. The Rust-side helper `compose_chain_contracts_jcs(&str, &str) -> Result<(cid, body), msg>` is the testable layer below the extern "C" wrapper; the four `pk_*` functions are thin lifecycle plumbing on top. No panics escape the FFI boundary; all errors flow through `pk_composition_result_error`. The C-callable header is hand-written at `include/provekit-compose.h` (5 declarations; cbindgen would be overkill). Smoke test (`tests/ffi_smoke.rs`) exercises both the Rust JCS entry point and the extern "C" entry point with the same two pure identity atoms as the existing `tests/compose_smoke.rs`. Both paths produce the pinned CID blake3-512:36212b7bf7b9ccf264950940a33d64e1cfe88b6f4d8a47c01949fc64d9359d1813d6147aa2e1afe82b01e6e7ebcbe0a413683284b5f47ffef5bf364213304665 byte-for-byte. Same algebra, different binding mode, identical CID: this is the §6.2 federation guarantee in test form. Two additional tests cover invalid JSON and null-pointer inputs. Verified: cargo build --release -p libprovekit → liblibprovekit.{a,dylib,rlib} cargo test --release -p libprovekit → 13 tests pass (4 new + 9 existing) Refs CCP §5 (canonical implementation), §6.2 (C ABI), §9 (algebra), Appendix A (canonical encoding), Appendix B (reference surface). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wires the JSON-RPC subprocess transport for the canonical compose primitive. The CLI is a thin stdin/stdout wrapper over libprovekit's compose_chain_contracts; cross-language consumers that cannot link Rust (TypeScript, Python, Ruby, PHP lifters) drive composition by spawning `provekit compose --rpc` and exchanging initialize / compose / shutdown messages over stdio. A new compose_rpc_smoke test spawns the binary, drives the protocol, and asserts the composed CID matches the pinned hex value libprovekit's own compose_smoke test pins for the same algebra. Same canonical primitive, byte-identical bodies, byte-identical CID; the test is the federation witness that the third binding mode (after direct Rust linking and the C ABI FFI) preserves CCP determinism. Two documented deviations from section 6.3: * Each entry in the request's `atoms` array carries an optional per-step `formal_idx` (default 0). The canonical primitive takes `&[ChainStep { contract, formal_idx }]`; without it the wire format cannot reproduce CIDs for chains whose composition fires at any formal other than zero. The default-zero path is the libprovekit smoke conformance witness. * The top-level `effects` parameter is preserved per spec but is advisory: each FunctionContractMemento already carries its own effects field. When both are supplied and disagree, the request is rejected with an effects_mismatch error so a careless caller cannot silently lose effect information. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

After atomic contract + per-function effects extraction, the C lifter now walks the call_sites graph, identifies pure subtrees (every member has empty effect set, chain length >= 2), and invokes pk_compose_chain_contracts via libprovekit's C ABI per CCP v1.0.0 section 4 (eager materialization) and section 6.2 (C ABI FFI). The resulting ComposedFunctionContract is emitted as an additional declaration in the IR-document under kind="composed-contract" with the composed CID, JCS body, and the leaf-first atom list. Implementation notes (also in src/composition.c header): - The existing C "contract" envelopes (kind="contract") are kernel-doc preconditions, not full FunctionContractMementos. The FFI's MementoBody DTO requires fnName, formals, formalSorts, pre, post, effects, locus, etc. per build_value in compose.rs, so this pass synthesises pure-identity mementos (post: result = formal_0) for each composable function. The composed CID reflects chain structure (number of atoms, formal-index linkage), which is what BZ-COMPOSITION-001 cross-language equivalence exercises. - formalIdx=0 uniformly. Real argument-position resolution is future work. - Determinism: functions walked in libclang source order; first pure callee per source-order callsite chosen for the chain. Cycle guard on repeated function names. - Composition refusal from the FFI is treated as data: the chain is silently skipped and the rest of the lifter output is unaffected. Federation property empirically demonstrated: the C lifter's composed CID for [double_it, add_one] produced via the FFI equals blake3-512:0d9bbbaf...969ac56, byte-identical to the same CID produced from Rust by feeding equivalent JCS envelopes through compose_chain_contracts_jcs. Same algebra, same CID across binding modes, per CCP section 6.2's federation guarantee. Makefile changes: - Links liblibprovekit.a + provekit-compose.h when libclang AST is enabled (the regex-only stub does not extract effects, so it cannot soundly classify pure subtrees). - Conditional system libs: macOS frameworks (Security, CoreFoundation, iconv) vs Linux (-lpthread -ldl -lm) selected via uname -s, since the Rust staticlib drags in libstd. - Lazy build of liblibprovekit.a: only invoked when the artifact is missing. Explicit `make libprovekit` refreshes. Tests: - New fixture tests/fixtures/composition_basic.c with three pure helpers chained via direct calls; double_it + add_one + compose_three is the longest pure chain. - Integration test asserts composition_basic.c yields at least one composed-contract declaration, the compose_three composed CID is byte-stable across runs, has the blake3-512 prefix, and matches a pinned CID c636517a...94b6e40 to guard against silent regressions that would break Rust/C federation. - Existing tests (kernel_doc_basic.c, structural_basic.c, recovery_call.c, effects_basic.c) all still pass. BZ-COMPOSITION-001 runner now drives the C lifter directly via its JSON-RPC parse method instead of the never-implemented `provekit lift --lifter ${C_LIFTER}` shape. Verdict logic typed: PENDING-RUST, PENDING-C, PENDING-OTHER, EQUAL, DIVERGENT. Today's verdict on the existing lab is PENDING-OTHER: lab/c/chain.c uses xs[i] reads + acc = writes that the conservative effects walker tags impure, AND the Rust side has no --emit-composed wiring yet. The C wiring itself is verified by composition_basic.c via `make test`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts: # implementations/c/provekit-lift-c-kernel-doc/tests/integration.sh

…chema PPP v0.1.0 was drafted earlier on 2026-05-09 before CCP existed. Under Supra omnia, rectum the spec was stale: it neither cited CCP nor listed the substrate relations CCP populates. This commit corrects that. Pre-publication amendment: PPP has not been merged to main, has not been published as a stable catalog CID, no consumer has pinned its catalog CID. Per PEP semantics this is draft-stage spec amendment, not formal protocol evolution. Future post-merge changes go through provekit protocol evolve. Additive changes only; existing v1 predicates over call_edges / functions / contracts / lifted_files remain valid byte-for-byte: - Section 1: add CCP to the relation table; note FRP policyCid MAY reference a ComposedFunctionContract CID for chain-level policies. - Section 3.1: add effects relation (per-function effect set, populated by CCP §3 prerequisite) and composed_contracts relation (ComposedFunctionContract emissions per CCP §4 materialization). Both are non-breaking additions to v1 schema landed alongside CCP v1.0.0. - Section 7: cite BZ-COMPOSITION-001 cross-language equivalence specimen as the empirical federation guarantee for predicates that join over composed_contracts. - Section 9.6: add 'composition not materialized' as a failure mode so producers know a composition-bound predicate needs eager-or-lazy materialization to fire. - Section 9.7: renumber the prior 9.6 (federation mismatch) to make room. PPP version stays v0.1.0; the changes are additive non-breaking schema extensions plus references to the now-existing CCP spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 9, 2026 16:47

Copilot started reviewing on behalf of TSavo May 9, 2026 16:48 View session

chatgpt-codex-connector Bot reviewed May 9, 2026

View reviewed changes

TSavo and others added 2 commits May 9, 2026 09:52

Copilot AI reviewed May 9, 2026

View reviewed changes

TSavo and others added 7 commits May 9, 2026 10:11

Scrub residual em-dashes in CCP Appendix B

7a89822

coderabbitai Bot reviewed May 9, 2026

View reviewed changes

TSavo and others added 7 commits May 9, 2026 12:17

Merge remote-tracking branch 'origin/main' into HEAD

6dca730

# Conflicts: # implementations/c/provekit-lift-c-kernel-doc/tests/integration.sh

Refresh CICP accepted witnesses

0373a85

Refresh CICP witnesses after PPP spec update

1855b04

TSavo merged commit 424ff58 into main May 9, 2026
17 checks passed

TSavo mentioned this pull request May 9, 2026

Mint protocol catalog v1.6.4 (PEP extension-only: PPP + CCP) #516

Merged

6 tasks

This was referenced May 10, 2026

chore(rust): provekit-verifier clippy -D warnings + dash cleanup #568

Merged

feat(libprovekit): wp_rule schema + substitute/apply formula nodes + the wp evaluator (wp-as-formula PR 1/N) #620

Merged


	# Per-function effects extraction per CCP v1.0.0 section 3.
	# Per-function effects extraction per CCP v0.1.0 section 3.

		def cid(b: bytes) -> str:
		return "blake3-512:" + hashlib.blake2b(b, digest_size=64).hexdigest()

		Singular formal substitution. For an atom F with formals `[a, b, c]` called as `F(x, G(y), z)`, only the second argument `G(y)` triggers composition (G's contract substitutes into F's at the b-position). The first and third arrivals (`x`, `z`) are leaf substitutions that do not introduce new contract-level composition.

		CID-namespaced result variable. Each atom's `post` formula references a free variable (conventionally `result`). When composing F into a caller's pre, F's `result` is renamed to `result_<F.cid>` to avoid free-variable collision across nested composition.

Conversation

TSavo commented May 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Load-bearing observation

Sections

Empirical grounding

Deferred to v0.2

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TSavo commented May 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 9, 2026 •

edited

Loading