rt: add 128 / (<=32-bit divisor) fast path to udivmod128 by heifner · Pull Request #61 · Wire-Network/wire-cdt

heifner · 2026-05-19T14:07:12Z

Change Description

udivmod128 (backing __udivti3/__umodti3/__divti3/__modti3) fell back to a 128-iteration shift/subtract loop whenever the dividend or divisor exceeded 64 bits. The overwhelming majority of contract 128-bit divisions have a small divisor — asset/token math, fixed-point scaling, percentages — so this adds a fast path for 128-bit dividend ÷ (≤ 32-bit divisor): schoolbook base-2³² long division over the dividend's four 32-bit digits — 4 native i64.div_u instead of 128 loop iterations.

Routing is now: 64 / 64 (1 division) → 128 / (≤32-bit) (4 divisions) → 128-iteration loop (unchanged).

Why it's safe

The running remainder satisfies r < v ≤ 2³²−1 after every r %= v, which gives the two invariants the path relies on:

(r << 32) | digit < 2⁶⁴ → fits a native uint64_t, so the divide is i64.div_u and does not recurse back into __udivti3 (the hard constraint in this file — librt is the contract's own compiler-rt provider, no host fallback to break a cycle).
(r << 32) | digit < v · 2³² → each per-digit quotient is < 2³², so the (hi << 32) | lo packing is exact.

Only uint64_t divide/mod/shift/or are used — the same operations the existing 64/64 fast path already uses. No __int128 multi-word ops.

Verification

In-repo suite (compiler_builtins_tests): all pass, including two new tests; no regression in the existing 18.
Standalone fuzz vs. native unsigned __int128: 16M random + boundary inputs, 0 failures, and bit-identical to the slow loop with the fast path disabled. This is a pure optimization — no determinism change on the consensus path.

New tests udivti3_small_divisor_fastpath / umodti3_small_divisor_fastpath pin the digit-carry chain (MAX/7), the 0xFFFFFFFF upper boundary, v=1 identity, a near-2³² divisor, and the v = 2³² just-over case that must fall through to the slow loop.

udivmod128 fell back to a 128-iteration shift/subtract loop whenever the dividend or divisor exceeded 64 bits. Most contract 128-bit divisions have a small divisor (asset math, fixed-point scaling), so add a schoolbook base-2^32 long division: the 128-bit dividend is processed as four 32-bit digits -- 4 native i64.div_u instead of 128 loop iterations. The running remainder satisfies r < v <= 2^32-1 after each step, so (r << 32) | digit < 2^64 (fits a native uint64, no recursion back into __udivti3) and < v * 2^32 (each per-digit quotient < 2^32, so the (hi<<32)|lo packing is exact). Routing is now 64/64 -> 128/(<=32-bit) -> 128-iteration loop. Verified bit-identical to the slow loop over 16M random + boundary inputs, so this is a pure optimization with no determinism change. Adds udivti3/umodti3_small_divisor_fastpath covering the digit-carry chain, the 0xFFFFFFFF boundary, and the v==2^32 slow-loop fall-through.

heifner requested a review from brianjohnson5972 May 19, 2026 14:07

huangminghuang approved these changes May 19, 2026

View reviewed changes

heifner merged commit 9267e8e into master May 19, 2026
4 checks passed

heifner deleted the perf/udivmod128-small-divisor-fastpath branch May 19, 2026 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rt: add 128 / (<=32-bit divisor) fast path to udivmod128#61

rt: add 128 / (<=32-bit divisor) fast path to udivmod128#61
heifner merged 1 commit into
masterfrom
perf/udivmod128-small-divisor-fastpath

heifner commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

heifner commented May 19, 2026

Change Description

Why it's safe

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants