rt: add 128 / (<=32-bit divisor) fast path to udivmod128#61
Merged
Conversation
udivmod128 fell back to a 128-iteration shift/subtract loop whenever the dividend or divisor exceeded 64 bits. Most contract 128-bit divisions have a small divisor (asset math, fixed-point scaling), so add a schoolbook base-2^32 long division: the 128-bit dividend is processed as four 32-bit digits -- 4 native i64.div_u instead of 128 loop iterations. The running remainder satisfies r < v <= 2^32-1 after each step, so (r << 32) | digit < 2^64 (fits a native uint64, no recursion back into __udivti3) and < v * 2^32 (each per-digit quotient < 2^32, so the (hi<<32)|lo packing is exact). Routing is now 64/64 -> 128/(<=32-bit) -> 128-iteration loop. Verified bit-identical to the slow loop over 16M random + boundary inputs, so this is a pure optimization with no determinism change. Adds udivti3/umodti3_small_divisor_fastpath covering the digit-carry chain, the 0xFFFFFFFF boundary, and the v==2^32 slow-loop fall-through.
huangminghuang
approved these changes
May 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change Description
udivmod128(backing__udivti3/__umodti3/__divti3/__modti3) fell back to a 128-iteration shift/subtract loop whenever the dividend or divisor exceeded 64 bits. The overwhelming majority of contract 128-bit divisions have a small divisor — asset/token math, fixed-point scaling, percentages — so this adds a fast path for 128-bit dividend ÷ (≤ 32-bit divisor): schoolbook base-2³² long division over the dividend's four 32-bit digits — 4 nativei64.div_uinstead of 128 loop iterations.Routing is now:
64 / 64(1 division) →128 / (≤32-bit)(4 divisions) → 128-iteration loop (unchanged).Why it's safe
The running remainder satisfies
r < v ≤ 2³²−1after everyr %= v, which gives the two invariants the path relies on:(r << 32) | digit < 2⁶⁴→ fits a nativeuint64_t, so the divide isi64.div_uand does not recurse back into__udivti3(the hard constraint in this file — librt is the contract's own compiler-rt provider, no host fallback to break a cycle).(r << 32) | digit < v · 2³²→ each per-digit quotient is< 2³², so the(hi << 32) | lopacking is exact.Only
uint64_tdivide/mod/shift/or are used — the same operations the existing 64/64 fast path already uses. No__int128multi-word ops.Verification
compiler_builtins_tests): all pass, including two new tests; no regression in the existing 18.unsigned __int128: 16M random + boundary inputs, 0 failures, and bit-identical to the slow loop with the fast path disabled. This is a pure optimization — no determinism change on the consensus path.New tests
udivti3_small_divisor_fastpath/umodti3_small_divisor_fastpathpin the digit-carry chain (MAX/7), the0xFFFFFFFFupper boundary,v=1identity, a near-2³² divisor, and thev = 2³²just-over case that must fall through to the slow loop.