Skip to content

IxVM kernel: unblock ByteArray.utf8DecodeChar?_utf8EncodeChar_append#451

Merged
arthurpaulino merged 5 commits into
mainfrom
ap/utf8-mod-witness
Jun 26, 2026
Merged

IxVM kernel: unblock ByteArray.utf8DecodeChar?_utf8EncodeChar_append#451
arthurpaulino merged 5 commits into
mainfrom
ap/utf8-mod-witness

Conversation

@arthurpaulino

@arthurpaulino arthurpaulino commented Jun 23, 2026

Copy link
Copy Markdown
Member

Unblocks the full UTF-8 round-trip theorem ByteArray.utf8DecodeChar?_utf8EncodeChar_append (the original blocker) — kernel-checks at 43_544_995_360 FFT under ulimit -v 22000000. Along the way, drops cost across the rest of the Nat-heavy kernel surface. Five commits, each independent, each measured.

Commits

1. a25e94e — native unconstrained_big_uint_div_mod Aiur op

New Aiur op unconstrained_big_uint_div_mod(a: KLimbs, b: KLimbs) -> (KLimbs, KLimbs). The runtime computes (q, r) via num_bigint::BigUint::div_rem and inserts both result lists into memory[10] content-addressed with multiplicity 0 — no constraints emitted by the op itself. Wired through bytecode / execute / constraints / trace + FFI decoder + Source / Typed / Simple / Concrete / Bytecode stages + Meta syntax + Compiler / {Check, Concretize, Simple, Match, Lower, Layout}. Lean reference evaluators (Interpret, SourceEval, BytecodeEval) are stubbed with TODOs; src/aiur/execute.rs is the authoritative impl.

Kernel/Primitive.lean::klimbs_div_mod now reads (q, r) from the op then verifies klimbs_normalize(q*b + r) == klimbs_normalize(a) and (when b != 0) the strict bound r < b via klimbs_le(klimbs_succ(r), b) == 1. Drops recursive klimbs_div_mod_go: its 2^24-step memo growth on (2^32, 256) was the klimbs OOM driver for the UTF-8 decode helpers.

Soundness on prover-supplied bytes: every limb byte flows through u8_mul inside klimbs_mul(q, b) and u8_add inside klimbs_add(q*b, r), both of which push to the u8_range_check channel (src/aiur/gadgets/bytes2.rs), so a byte > 255 fails the gadget's range check. Trailing junk limbs that mul doesn't touch are caught by the post-normalize equality.

r < b variant shopping (all three sound, measured on helper₃):

Variant helper₃ FFT
klimbs_le(r,b)==1 ∧ klimbs_eq(r,b)==0 40_381_441_414
klimbs_le(klimbs_succ(r), b) == 1 (this PR) 40_381_437_756
klimbs_le(b, r) == 0 40_381_440_143

Spread is ~3.6K FFT (noise-tier) but the succ-then-le variant is the consistent winner.

Per-commit deltas vs main:

Constant Before After commit 1 Δ
Nat.add_comm 55_504_714 55_504_714 0%
helper₃ OOM 40_381_437_756 unblocked

2. 1141f2dk_infer App-spine fusion

Collapses N consecutive expr_inst1 rewalks of a residual cod into one expr_inst_many per spine step on the ORIGINAL dom, with a growing innermost-first subs list. collect_spine once, k_infer the head once, then k_infer_app_spine_loop peels the Forall telescope arg-by-arg.

On non-Forall load (only reachable when head_ty itself wasn't a Forall syntactically), materialise via whnf on the concretised t (= prior subs applied to original t) — post-whnf dom/cod live in the OUTER context, so the descent restarts with subs_rev = [a].

k_infer_only retains the per-arg expr_inst1 path; the fusion is local to k_infer for now (k_infer_only saw a marginal-but-noisy delta when fused; not worth the extra surface area until there's a workload that exercises it more).

Per-commit deltas vs the previous commit:

Constant Before After commit 2 Δ
Nat.add_comm 55_504_714 54_670_728 −1.5%
helper₃ 40_381_437_756 38_696_601_688 −4.2%

3. 1c148db — route list_lookup through list_drop

list_lookup(list, idx) previously walked the list inline (Cons + match + recurse). Replacing the body with head(load(list_drop(list, idx))) lets list_drop's memo dedup sublist pointers — content-addressed intermediates collapse across all lookups that pass through them.

Per-commit deltas vs the previous commit:

Constant Before After commit 3 Δ
Nat.add_comm 54_670_728 54_350_107 −0.6%
helper₃ 38_696_601_688 37_793_682_359 −2.3%

4. 345999a — fix klimbs_to_ctor_form to single-step expansion

klimbs_to_ctor_form recursively unfolded a Nat literal n into the full ctor chain Nat.succ^n(Nat.zero). For large n the recursion created O(n) memo entries (one per klimbs_dec, one per klimbs_normalize, one per the recursive klimbs_to_ctor_form call) and OOM'd the per-fn function_queries before completing — the second OOM driver behind the UTF-8 round-trip blocker.

Rust's nat_to_constructor (src/ix/kernel/whnf.rs:1664-1687) does the right thing: produce Nat.succ(Lit(n-1)) — a single ctor wrap whose predecessor stays a Lit(Nat). Subsequent iota reductions re-trigger expansion only as needed. This commit mirrors that semantics in Aiur.

This is the commit that unblocks the full UTF-8 round-trip theorem.

Per-commit deltas vs the previous commit:

Constant Before After commit 4 Δ
Nat.add_comm 54_350_107 54_350_110 0% (noise)
helper₃ 37_793_682_359 16_870_283_185 −55.4%
ByteArray.utf8DecodeChar?_utf8EncodeChar_append OOM 43_544_995_360 unblocked

5. 65ff3f5 — re-pin Tests/Ix/IxVM.lean FFT costs

Pin-only commit reflecting the cumulative improvement across the prior four. lake test -- --ignored ixvm green.

Cumulative impact

Cumulative lake exe check deltas, main → HEAD:

Constant main This PR Δ
Nat.add_comm 55_504_714 54_350_110 −2.1%
Nat.decLe 206_196_563 191_354_793 −7.2%
Nat.sub_le_of_le_add 557_867_526 515_056_158 −7.7%
Array.append_assoc 3_079_334_815 2_588_157_538 −16.0%
Vector.append 3_160_970_390 2_661_244_845 −15.8%
IxVMPrim.nat_gcd_lit 649_859_784 605_514_332 −6.8%
IxVMPrim.nat_land_lit 1_116_700_968 1_019_743_752 −8.7%
helper₃ (_private....utf8DecodeChar?.helper₃) OOM 16_870_283_185 unblocked
ByteArray.utf8DecodeChar?_utf8EncodeChar_append OOM 43_544_995_360 unblocked

Test plan

  • lake build clean
  • lake test -- --ignored ixvm green (incl. all 41 kernel-check FFT pins)
  • lake exe check 'ByteArray.utf8DecodeChar?_utf8EncodeChar_append' passes at 43_544_995_360 FFT (ulimit -v 22000000)

@arthurpaulino arthurpaulino force-pushed the ap/utf8-mod-witness branch 2 times, most recently from 76a4267 to 6bf777f Compare June 25, 2026 14:23
@arthurpaulino arthurpaulino marked this pull request as ready for review June 25, 2026 14:23
@arthurpaulino arthurpaulino enabled auto-merge (squash) June 25, 2026 14:24
Aiur op `unconstrained_big_uint_div_mod(a: KLimbs, b: KLimbs) -> (KLimbs, KLimbs)`:
runtime computes (q, r) via `num_bigint::BigUint::div_rem` and inserts
both result lists into `memory[10]` content-addressed with multiplicity 0;
no constraints emitted by the op itself. Wired through
bytecode/execute/constraints/trace + FFI decoder +
Source/Typed/Simple/Concrete/Bytecode stages + Meta syntax +
Compiler/{Check, Concretize, Simple, Match, Lower, Layout}. Lean reference
evaluators (Interpret/SourceEval/BytecodeEval) are stubbed with TODOs;
src/aiur/execute.rs is the authoritative impl.

Primitive.lean: `klimbs_div_mod` now reads (q, r) from the op then verifies
`klimbs_normalize(q*b + r) == klimbs_normalize(a)` and (when b != 0) the
strict bound `r < b` via `klimbs_le(klimbs_succ(r), b) == 1`. Drops
recursive `klimbs_div_mod_go`: its 2^24-step memo growth on (2^32, 256)
was the klimbs OOM driver.

`lake exe check '_private.Init.Data.String.Decode.0.ByteArray.utf8DecodeChar?.helper₃'`
with `ulimit -v 20000000`: previously OOM (~1.2 GB single allocation
inside klimbs_div_mod_go); now passes at 40_381_441_414 FFT.

The `r < b` check was the variant-shopping axis (all three sound):
* `klimbs_le(r,b)==1 ∧ klimbs_eq(r,b)==0`: 40_381_441_414 FFT
* `klimbs_le(klimbs_succ(r), b) == 1`:     40_381_437_756 FFT (this commit)
* `klimbs_le(b, r) == 0`:                  40_381_440_143 FFT
Collapse N consecutive `expr_inst1` rewalks of a residual `cod` into one
`expr_inst_many` per spine step on the ORIGINAL `dom`, with a growing
innermost-first subs list. `collect_spine` once, `k_infer` the head once,
then `k_infer_app_spine_loop` peels the Forall telescope arg-by-arg.

On non-Forall load (only reachable when the head_ty itself wasn't a
Forall syntactically), materialise via `whnf` on the concretised t (=
prior subs applied to original t) — post-whnf dom/cod live in the OUTER
context, so the descent restarts with subs_rev = [a].

`lake exe check`:
* Nat.add_comm:                                  55_504_714 → 54_670_728 FFT (-1.5%)
* `_private....utf8DecodeChar?.helper₃`:    40_381_437_756 → 38_696_601_688 FFT (-4.2%)

k_infer_only retains the per-arg expr_inst1 path; the fusion is local to
k_infer for now.
`list_lookup(list, idx)` previously walked the list inline (Cons + match
+ recurse). Replacing the body with `head(load(list_drop(list, idx)))`
lets `list_drop`'s memo dedup sublist pointers — content-addressed
intermediates collapse across all lookups that pass through them.

`lake exe check`:
* Nat.add_comm:                                  54_670_728 → 54_350_107 FFT (-0.6%)
* `_private....utf8DecodeChar?.helper₃`:    38_696_601_688 → 37_793_682_359 FFT (-2.3%)
`klimbs_to_ctor_form` recursively unfolded a Nat literal `n` into the
full ctor chain `Nat.succ^n(Nat.zero)`. For large `n` the recursion
created O(n) memo entries (one per `klimbs_dec`, one per
`klimbs_normalize`, one per the recursive `klimbs_to_ctor_form` call)
and OOM'd the per-fn function_queries before completing.

Rust's `nat_to_constructor` (src/ix/kernel/whnf.rs:1664-1687) does the
right thing: produce `Nat.succ(Lit(n-1))` — a single ctor wrap whose
predecessor stays a `Lit(Nat)`. Subsequent iota reductions re-trigger
expansion only as needed. Mirror that semantics in Aiur.

`lake exe check '_private.Init.Data.String.Decode.0.ByteArray.utf8DecodeChar?.<bomb>'`
with `ulimit -v 20000000`:
* assemble₂._proof_1:                 OOM → 11_936_582_540 FFT
* assemble₃._proof_3:                 OOM → 11_934_210_486 FFT
* assemble₃._proof_4:                 OOM → 11_968_289_150 FFT
* assemble₄_eq_some_iff_..._proof_1_13: OOM → 8_510_124_580 FFT
* helper₃:                  37_793_684_978 → 16_870_284_540 FFT (-55.4%)

`lake exe check 'ByteArray.utf8DecodeChar?_utf8EncodeChar_append'` with
`ulimit -v 22000000`: passes at 43_544_984_712 FFT (the full theorem,
which was the original UTF-8 blocker).

Tests/Ix/IxVM.lean: 18 small FFT pin increases — every const whose
WHNF path now eagerly synthesizes a `Lit(n-1)` predecessor (one extra
`mk_nat_lit` per ctor-form coercion instead of zero in the old
recursive body that walked the whole chain anyway).
Re-pin after a25e94e (unconstrained_big_uint_div_mod op), 1141f2d
(k_infer App-spine fusion), 1c148db (list_lookup via list_drop), and
345999a (klimbs_to_ctor_form single-step). Lake test ixvm suite green.
@arthurpaulino arthurpaulino changed the title UTF-8 typechecking unblocked IxVM kernel: unblock UTF-8 decode helpers + Nat-layer FFT cuts Jun 25, 2026
@arthurpaulino arthurpaulino changed the title IxVM kernel: unblock UTF-8 decode helpers + Nat-layer FFT cuts IxVM kernel: unblock ByteArray.utf8DecodeChar?_utf8EncodeChar_append Jun 25, 2026
@arthurpaulino arthurpaulino merged commit 62e0a0f into main Jun 26, 2026
14 checks passed
@arthurpaulino arthurpaulino deleted the ap/utf8-mod-witness branch June 26, 2026 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants