feat(rust): self-lift provekit-canonicalizer (substrate dogfoods itself) by TSavo · Pull Request #38 · TSavo/provekit

TSavo · 2026-05-02T17:55:14Z

Sir's framing

The framework writes lifters and uses them on test fixtures; this is the first time it lifts itself. The canonicalizer is the right starting point because every CID in the system hashes over JCS-canonical bytes the canonicalizer produces. If the canonicalizer is wrong, everything is wrong; it is the substrate's first invariant.

What the Rust lifter actually does

provekit-lift (crate implementations/rust/provekit-lift/) walks a Rust workspace, parses every .rs file with syn, dispatches each parsed file to nine adapter crates, collects ContractDecl values, mints each as a signed contract memento via provekit_claim_envelope::mint_contract, and bundles them into a single .proof envelope addressed by its BLAKE3-512 CID.

Annotation sources the lifter reads:

proptest! { #[test] fn ... } blocks (adapter proptest)
#[contracts::requires] / #[contracts::ensures] attrs (contracts)
kani::proof / prusti / creusot / flux / verus attrs (one adapter each)
#[quickcheck] (quickcheck)
Plain #[test] and #[tokio::test] functions whose bodies contain assert!/assert_eq!/assert_ne!/assert_matches! (rust-tests Layer 0)
for x in lo..hi { assert!(...) } bounded loops, helper-function inlining, multi-assertion characterization (rust-tests Layer 2)

Outputs: a <cid>.proof file (deterministic CBOR) under the requested target dir. CLI flags:

provekit-lift --workspace <dir> --target-dir <out> (direct invocation)
cargo provekit-lift ... (Cargo subcommand form)
provekit-lift --rpc (NDJSON-on-stdio plugin mode)

The lifter has no --config flag; it picks up every .rs file under --workspace.

How I ran it

Exact command:

implementations/rust/target/release/provekit-lift \
  --workspace implementations/rust/provekit-canonicalizer \
  --target-dir .provekit/self-lifts/canonicalizer

Output (verbatim):

provekit-lift: scanned 9 .rs files
  adapter `proptest`: seen 0, lifted 0, skipped 0
  adapter `contracts`: seen 0, lifted 0, skipped 0
  adapter `kani`: seen 0, lifted 0, skipped 0
  adapter `prusti`: seen 0, lifted 0, skipped 0
  adapter `creusot`: seen 0, lifted 0, skipped 0
  adapter `flux`: seen 0, lifted 0, skipped 0
  adapter `quickcheck`: seen 0, lifted 0, skipped 0
  adapter `verus`: seen 0, lifted 0, skipped 0
  adapter `rust-tests`: seen 82, lifted 13, skipped 69
  adapter `rust-tests-layer2`: seen 11, lifted 0, skipped 11
provekit-lift: wrote .provekit/self-lifts/canonicalizer/blake3-512:79ef1067621f7a3bb3ad31acd157be9ec9e087eb6aa585acd9ea6c9d0965adb02e7f1037a3146dcbc3f6a4c284796d6eadee7e0265d3eeab75791b79628632de.proof (13 members)

Determinism: confirmed across two runs into separate temp dirs (default seed [0x42; 32]); both produced the same CID byte for byte.

What got lifted (13 contract mementos)

Each name is <test_fn>::<assert_index>. Atom shape is a single binary comparison whose operands are identifiers, integer/string literals, or single-arg ctor calls.

Contract	Asserts (informally)
`prefix_constant_matches_spec::0`	`BLAKE3_512_PREFIX = PREFIX` (the spec literal "blake3-512:")
`cid_string_form_for_empty_is_well_known::0`	the pinned BLAKE3-512 empty-input vector
`cid_regex_compliance::0`	`count = 128` (hex character count)
`deterministic_across_calls::0`	`a = b` (identity of two same-input calls)
`cid_distinguishes_byte_strings_from_text::0`	`blake3_512_hex(s) = blake3_512_of(b)` (str/bytes parity)
`mixed_ascii_and_unicode_preserved::0`	`encoded = "x \u{2265} 0"` (UTF-8 round-trip witness)
`unicode_in_object_key_and_value::0`	`encoded = '{"name":"\u{2265}"}'`
`non_ascii_strings_round_trip_byte_faithful::0`, `::2`	per-symbol round-trip witnesses for `\u{2265}` and `\u{65E5}\u{672C}\u{8A9E}`
`realistic_envelope_shape_round_trips::1`, `::2`, `::3`	envelope-shape determinism witnesses
`single_bit_flip_changes_most_output_bits::0`	identifier equality from the avalanche regression test

All 13 survive provekit verify clean (zero callsites, zero load errors; this is a pure-contract bundle, no bridges).

What couldn't be lifted (69 + 11 = 80 skips, honest gap-finding)

rust-tests adapter (69 skips):

53 hits expression shape outside the v0 whitelist. The canonicalizer's JCS tests are uniformly shaped as assert_eq!(encode_jcs(&Value::object([...])), "<canonical>"). The lhs is a method call; the rhs's operand is a multi-arg constructor. Both fall outside the "identifier, literal, single-arg ctor" whitelist.
14 hits assert! body must be a binary comparison. The CID-shape tests use assert!(matches!(...)) and assert!(h.starts_with(PREFIX)) heavily.
2 hits byte-string literals (b"") are not in the v0 literal set, so assert_eq!(blake3_512_hex(s), blake3_512_of(b"")) skips.

rust-tests-layer2 adapter (11 skips):

3 hits bounded-loop pattern with a multi-statement body. for input in [...] { let h = ...; let hex = ...; assert_eq!(...) } exceeds the v0 single-stmt cap.
8 hits characterization-conjunction with zero of N atoms liftable (every assert in the body is an encode_jcs(&v) == "..." shape, none of which Layer 0 can lift; Layer 2 then releases all eight tests back to Layer 0, which skips them).

Spec coverage against protocol/specs/2026-04-30-canonicalization-grammar.md:

Spec property	Lifter reach
RFC 8785 §3.2.3 sorted object keys	NONE (every test's lhs is a method call)
No whitespace	NONE
`\u00XX` escapes for U+0000..U+001F	NONE
Integer plain decimal	NONE
BLAKE3-512 length 11 + 128	PARTIAL (`count=128`, empty-vector pin)
Self-identifying prefix `blake3-512:`	PARTIAL (constant matches spec)
Determinism (same input -> same output)	YES
Distinct inputs distinct hashes	NO (`b""` not whitelisted)
UTF-8 byte-faithful for non-ASCII	PARTIAL (per-symbol witnesses only)

Full per-skip-reason taxonomy with file paths is in .provekit/self-lifts/canonicalizer/lift-report.txt.

Make target

make self-lift-canonicalizer

builds provekit-lift (release), wipes any prior .proof under .provekit/self-lifts/canonicalizer/, re-runs the lift, and writes the same CID. Idempotent. NOT wired into make conformance or make ci; this is an experiment, not core conformance, and the task explicitly cautioned against pinning a self-lift CID into the gate while the lifter is still v0.

Cross-impl leverage

The same skip patterns will surface on the TS, Go, and C++ canonicalizer test files when their respective lifters point at them, because the test idiom is the same across kits: expect(encodeJcs(...)).toEqual("..."), assert.Equal(t, EncodeJcs(...), "..."), EXPECT_EQ(encode_jcs(...), "..."). The shape gap is in the lifter's expression whitelist, not in any one language's idioms. Concrete next moves once the v0 whitelist grows (single-arg method-call lhs, multi-arg ctor rhs):

TS: provekit-lift-vitest against implementations/typescript/src/canonicalizer/
Go: provekit-lift-go-tests against implementations/go/.../canonicalizer/
C++: future C++ lifter against implementations/cpp/provekit/canonicalizer/

Each will recover roughly the same shape of mementos this run produced and skip roughly the same shape of tests with the same structured warnings.

Don't-touch list (followed)

No changes to provekit-canonicalizer/ source. The point was to surface what the lifter reaches as-is.
No changes to provekit-self-contracts (it ships a pinned CID under make conformance; conflating the experiment with core conformance would have wrecked it).
No changes to other kits.
No changes to protocol/specs/ text.
No changes to .github/workflows/.

Test plan

make self-lift-canonicalizer succeeds and prints the same CID across two runs
provekit verify <proof> loads the lifted catalog clean (0 errors, 0 violations)
git status clean after the make target re-runs (idempotent)
Reviewer sanity-check: open .provekit/self-lifts/canonicalizer/lift-report.txt and skim the skip taxonomy

The framework writes lifters and uses them on test fixtures; this is the first time it lifts itself. `provekit-lift` runs against the `provekit-canonicalizer` crate and emits a durable `.proof` plus a honest skip taxonomy at `.provekit/self-lifts/canonicalizer/`. Outcome (rust-tests adapter on canonicalizer source): files_scanned: 9 rust-tests seen=82 lifted=13 skipped=69 rust-tests-l2 seen=11 lifted=0 skipped=11 proptest/contracts/kani/prusti/creusot/flux/quickcheck/verus: zero (the crate uses no external annotation library) 13 contract mementos lifted: prefix_constant, deterministic_across_calls, empty-input vector pin, count=128, Unicode round-trip witnesses, etc. All survive `provekit verify` clean. Skip causes (honest gap-finding): 53 hits expression shape outside v0 whitelist (method calls, constructors, multi-arg calls) 14 hits assert! body not a binary comparison 2 hits byte-string literals (b"") not in the v0 literal set Spec coverage (canonicalization-grammar.md): partial. The lifter reaches counter and identifier equalities, the prefix constant, and a handful of literal pins. The byte-faithful claims (sorted keys, no whitespace, escape sequences, integer formatting) all skip because every test on the lhs is encode_jcs(&v), a method call. That is the operationally enforced bulk of the canonicalizer's invariants and it is currently unreachable by the auto-lifter; surfacing the gap is the point. Idempotent: default seed [0x42; 32] produces a byte-deterministic catalog CID (verified across two runs into separate temp dirs). make self-lift-canonicalizer regenerates the .proof at the same CID. NOT wired into make conformance because this is an experiment, not core conformance. Cross-impl leverage: the same skip patterns will hit the TS, Go, C++ canonicalizers when their respective lifters point at their canonicalizer crates. The shape gap is in the lifter's expression whitelist, not in any one language's idioms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector · 2026-05-02T17:55:19Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

coderabbitai · 2026-05-02T17:55:20Z

Warning

Rate limit exceeded

@TSavo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 27 minutes and 9 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c40a69bf-1501-4629-91d5-4cd55af3244c

📥 Commits

Reviewing files that changed from the base of the PR and between c2fe48e and 94aebc3.

📒 Files selected for processing (4)

.gitignore
.provekit/self-lifts/canonicalizer/blake3-512:79ef1067621f7a3bb3ad31acd157be9ec9e087eb6aa585acd9ea6c9d0965adb02e7f1037a3146dcbc3f6a4c284796d6eadee7e0265d3eeab75791b79628632de.proof
.provekit/self-lifts/canonicalizer/lift-report.txt
Makefile

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/self-lift-canonicalizer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 27 minutes and 9 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR adds a self-lift experiment for the Rust canonicalizer crate, checking in the resulting lifted .proof artifact and a human-readable report so the repository can track what provekit-lift currently reaches on first-party canonicalizer tests. It extends the top-level build orchestration with a dedicated make target and updates ignore rules so the experiment artifact can live in source control.

Changes:

Adds make self-lift-canonicalizer and help text for running the Rust lifter against implementations/rust/provekit-canonicalizer.
Checks in a generated self-lift .proof bundle plus a companion lift-report.txt summarizing lifted and skipped assertions.
Updates .gitignore to allow versioning self-lift .proof artifacts under .provekit/self-lifts/.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 6 comments.

File	Description
Makefile	Adds help text and a new self-lift experiment target for the Rust canonicalizer crate.
.provekit/self-lifts/canonicalizer/lift-report.txt	Adds a checked-in textual report describing lifted contracts, skip taxonomy, and spec coverage.
.provekit/self-lifts/canonicalizer/blake3-512:79ef1067621f7a3bb3ad31acd157be9ec9e087eb6aa585acd9ea6c9d0965adb02e7f1037a3146dcbc3f6a4c284796d6eadee7e0265d3eeab75791b79628632de.proof	Adds the generated proof envelope produced by the self-lift run.
.gitignore	Unignores self-lift `.proof` artifacts and documents that they are intended to be committed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+	@out=$$($(PROVEKIT_LIFT) \
+		--workspace implementations/rust/provekit-canonicalizer \
+		--target-dir $(SELF_LIFT_DIR) --quiet); \
+	  echo "  cid: $$out"; \
+	  test -f $(SELF_LIFT_DIR)/$$out.proof || \
+	    (echo "FAIL: lifter did not write $(SELF_LIFT_DIR)/$$out.proof" && exit 1); \
+	  echo "  proof: $(SELF_LIFT_DIR)/$$out.proof"
+	@echo "  report: $(SELF_LIFT_DIR)/lift-report.txt"


+      (object-key Unicode preservation witness)
+
+  non_ascii_strings_round_trip_byte_faithful::0
+      i = "≥"


+      jcs = ?
+  realistic_envelope_shape_round_trips::2
+      cid = ?
+  realistic_envelope_shape_round_trips::3
+      cid_again = ?


+      a = b
+      (avalanche regression test, lifted as identifier-equality)


+     bodies. Three of the canonicalizer's parameterized loops use a
+     three-line shape (input -> let -> assert) and skip cleanly with
+     a structured warning.


+  proptest          seen=0   lifted=0   skipped=0
+  contracts         seen=0   lifted=0   skipped=0
+  kani              seen=0   lifted=0   skipped=0
+  prusti            seen=0   lifted=0   skipped=0
+  creusot           seen=0   lifted=0   skipped=0
+  flux              seen=0   lifted=0   skipped=0
+  quickcheck        seen=0   lifted=0   skipped=0
+  verus             seen=0   lifted=0   skipped=0
+  rust-tests        seen=82  lifted=13  skipped=69
+  rust-tests-l2     seen=11  lifted=0   skipped=11


…from lift IR Proves the substrate's lift IR carries enough data to emit a complete java compilation unit (interface, constants, @boundary stubs) WITHOUT hand-writing the wrapper at integration time. Reads provekit lift --library-bindings output, extracts bind-lift-entry signatures, maps rust types via a small source-aliases table (the substrate-honest version walks the kit's catalog), emits: - package + imports - class header + static fields/constants - AdapterLifter interface - @boundary primitive stubs with byte-identical signatures to source The substrate-honest version of this script is the java realize plugin emitting the same when invoked over the whole IR (via the provekit.plugin.assemble RPC method — pending task #38). This Python script is the proof of concept: the IR carries the data; only the emission needs threading through cmd_lower or a new assemble RPC. What this retires: the hand-written CrossPlatform.java wrapper in the demo's test harness — the LAST hand-written java code in the rust → java → rust cycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@sugar

…es hand-written boundary code The python wrapper-emitter is now retired. cmd_lower --target java auto-emits the full java compilation unit when boundary entries are present in the lift IR: package com.provekit.crossplatform; imports (Jackson, runtime carriers, java.nio.file.Path) public final class CrossPlatform { static final ObjectMapper MAPPER; static String PLUGIN_VERSION; ... (constants) public interface AdapterLifter { name(); surface(); lift(...); } // @boundary primitives auto-emitted from rust @boundary declarations: public static <Type> json_parse(String s) { throw UnsupportedOp; } ... (9 primitives, signatures derived via map_rust_type_to_java) // @sugar functions follow (each as a static method inside the class): public static JsonNode ok_response(...) { ... } ... } CHANGES: - NamedTermDocument gains boundary_entries: Vec<Json> field (defaults empty, populated from bind-lift-entry records in the IR). - named_term_document_from_ir_document collects bind-lift-entry items that aren't also @sugar (i.e. true @boundary functions). - lower_named_document detects target=="java" + non-empty boundary_entries and emits emit_java_module_preamble() before, '}' after. - map_rust_type_to_java handles &str, Value, [u8;N], Result<T,E>, etc. - strip_transported_class_wrapper peels per-function 'final class XxxTransported' wrappers so the @sugar methods sit directly inside CrossPlatform. EMPIRICAL: provekit lower --target java now emits a single compilation unit with package + imports + CrossPlatform class + 9 @boundary stubs + all @sugar methods. The python wrapper-emitter script is retired (kept as proof-of-concept demonstration). Substrate-honest note: this lives in rust cmd_lower because the provekit.plugin.assemble RPC method-shape is still pending (#38). The ideal home is the java realize plugin (it owns the target's file layout). Moving this code there is a future refactor; the current shape is functional and demonstrates the substrate emits its own wrapper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 2, 2026 17:55

Copilot started reviewing on behalf of TSavo May 2, 2026 17:55 View session

TSavo merged commit 279fee0 into main May 2, 2026
5 of 6 checks passed

Copilot AI reviewed May 2, 2026

View reviewed changes

This was referenced May 2, 2026

feat(lift): extend v0 whitelist to method calls + multi-arg ctors #55

Merged

feat: cross-language conformance suite (rebased from PR #50) #56

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rust): self-lift provekit-canonicalizer (substrate dogfoods itself)#38

feat(rust): self-lift provekit-canonicalizer (substrate dogfoods itself)#38
TSavo merged 1 commit into
mainfrom
feat/self-lift-canonicalizer

TSavo commented May 2, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 2, 2026

Uh oh!

coderabbitai Bot commented May 2, 2026

Rate limit exceeded

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		a = b
		(avalanche regression test, lifted as identifier-equality)

Conversation

TSavo commented May 2, 2026

Sir's framing

What the Rust lifter actually does

How I ran it

What got lifted (13 contract mementos)

What couldn't be lifted (69 + 11 = 80 skips, honest gap-finding)

Make target

Cross-impl leverage

Don't-touch list (followed)

Test plan

Uh oh!

chatgpt-codex-connector Bot commented May 2, 2026

Uh oh!

coderabbitai Bot commented May 2, 2026

Rate limit exceeded

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants