Conversation
|
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics |
| return (new UInt128(highRes, lowRes), remainder >> shift); | ||
| } | ||
|
|
||
| static (ulong Quotient, ulong Remainder) Divide128BitsBy64BitsCore(ulong hi, ulong lo, ulong divisor) |
There was a problem hiding this comment.
I guess an alternative name for this could be Divide64BitsBy64BitsWithCarry given the restriction hi < divisor?
There was a problem hiding this comment.
In Knuth’s Algorithm D, this part is typically described as 128/64 bits.
In the Burnikel–Ziegler algorithm used by BigInteger, the corresponding part is referred to as 2n/n bits (cf. #96895).
I don’t think we should deliberately use a name that deviates from this convention.
| return (q, rem >> shift); | ||
| } | ||
|
|
||
| static (UInt128 Quotient, ulong Remainder) Divide128BitsBy64Bits(UInt128 left, ulong divisor) |
There was a problem hiding this comment.
This method could be implemented by executing X86Base.X64.DivRem twice (if supported)
There was a problem hiding this comment.
Are you referring to the previous implementation?
X86Base.X64.DivRem(left._upper, 0, right._lower) returns the same result as ulong.DivRem(left._upper, right._lower).
Is there any benefit to calling X86Base.X64.DivRem twice?
There was a problem hiding this comment.
Yes exactly, I mean the shifting part below could be skipped in that case, since shifting would not affect Divide128BitsBy64BitsCore performance.
Sorry for not being clear, with calling it twice i mean this method could simply be (givenX86Base.X64.IsSupported):
(ulong hires, ulong rem) = X86Base.X64.DivRem(left._upper, 0, divisor); // or ulong.DivRem
(ulong lowres, rem) = X86Base.X64.DivRem(left._lower, rem, divisor); // Or Divide128BitsBy64BitsCore
return (new UInt128(hires, lowres), rem);
|
This is not a scenario we care about it, it is an edge case scenario that only exists for testing and not for production purposes
These ones we do care about, but it'd be helpful to separate them out and make it clearer from where the improvements are flowing. |
|
I’ve added an explanation and benchmarks on an Apple M1 Mac.
On Intel CPUs, UInt128 division is accelerated using |
UInt128division uses the same algorithm as BigInteger division (before it was changed to nuint in #125799), but it includes unnecessary work for the 128-bit range. I optimized it by rewriting the division algorithm specifically for 128-bit values.Improvements:
DivRemthe core implementation. Since the division algorithm naturally computes the remainder alongside the quotient, this eliminates unnecessary multiplications.DOTNET_EnableHWIntrinsic=0).benchmark
Intel Windows
Apple M1
Intel full
benchmark code