Skip to content

Refactor UInt128 division#126543

Open
kzrnm wants to merge 3 commits intodotnet:mainfrom
kzrnm:UInt128.Divide
Open

Refactor UInt128 division#126543
kzrnm wants to merge 3 commits intodotnet:mainfrom
kzrnm:UInt128.Divide

Conversation

@kzrnm
Copy link
Copy Markdown
Contributor

@kzrnm kzrnm commented Apr 4, 2026

UInt128 division uses the same algorithm as BigInteger division (before it was changed to nuint in #125799), but it includes unnecessary work for the 128-bit range. I optimized it by rewriting the division algorithm specifically for 128-bit values.

Improvements:

  • The previous implementation only used SIMD acceleration for 128-bit/64-bit division; the pull request extends SIMD optimization to 128-bit/128-bit division as well.
  • Made DivRem the core implementation. Since the division algorithm naturally computes the remainder alongside the quotient, this eliminates unnecessary multiplications.
  • Reduced code size.
    • Improved performance on ARM CPUs (corresponding to the case where SIMD is disabled on Intel CPUs with DOTNET_EnableHWIntrinsic=0).
benchmark

Intel Windows


BenchmarkDotNet v0.16.0-nightly.20260320.467, Windows 11 (10.0.26200.8039/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i5-13500 2.50GHz, 1 CPU, 20 logical and 14 physical cores
Memory: 31.75 GB Total, 10.17 GB Available
.NET SDK 11.0.100-preview.3.26170.106
  [Host]   : .NET 11.0.0 (11.0.0-preview.3.26170.106, 11.0.26.17106), X64 RyuJIT x86-64-v3
  ShortRun : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), X64 RyuJIT x86-64-v3

Job=ShortRun  EnvironmentVariables=DOTNET_EnableHWIntrinsic=1  IterationCount=3  
LaunchCount=1  WarmupCount=3  Allocated=-  
Alloc Ratio=NA  

Method Toolchain Mean Ratio Code Size
DivSame128BitsConst main 0.8791 ns 1.00 16 B
DivSame128BitsConst pr 0.8877 ns 1.01 16 B
DivSame128BitsVar main 1.1602 ns 1.00 75 B
DivSame128BitsVar pr 1.1586 ns 1.00 92 B
DivSame64BitsConst main 0.8909 ns 1.00 16 B
DivSame64BitsConst pr 0.8807 ns 0.99 16 B
DivSame64BitsVar main 2.1560 ns 1.00 233 B
DivSame64BitsVar pr 2.2125 ns 1.03 113 B
Div128BitsBy128BitsConst main 8.0961 ns 1.00 991 B
Div128BitsBy128BitsConst pr 2.1759 ns 0.27 359 B
Div128BitsBy128BitsVar main 13.2269 ns 1.00 1,737 B
Div128BitsBy128BitsVar pr 3.6883 ns 0.28 513 B
Div128BitsBy64BitsConst main 4.3891 ns 1.00 72 B
Div128BitsBy64BitsConst pr 2.1862 ns 0.50 58 B
Div128BitsBy64BitsVar main 4.4881 ns 1.00 265 B
Div128BitsBy64BitsVar pr 4.3591 ns 0.97 221 B
Div128BitsBy32BitsConst main 4.4128 ns 1.00 67 B
Div128BitsBy32BitsConst pr 2.1829 ns 0.49 48 B
Div128BitsBy32BitsVar main 4.4785 ns 1.00 265 B
Div128BitsBy32BitsVar pr 4.3851 ns 0.98 221 B
Div64BitsBy128BitsConst main 0.8831 ns 1.00 12 B
Div64BitsBy128BitsConst pr 0.8684 ns 0.98 12 B
Div64BitsBy128BitsVar main 1.3792 ns 1.00 235 B
Div64BitsBy128BitsVar pr 1.3648 ns 0.99 180 B
Div64BitsBy64BitsConst main 0.8888 ns 1.00 16 B
Div64BitsBy64BitsConst pr 1.0119 ns 1.14 16 B
Div64BitsBy64BitsVar main 2.2363 ns 1.00 276 B
Div64BitsBy64BitsVar pr 2.1882 ns 0.98 206 B

Apple M1


BenchmarkDotNet v0.16.0-nightly.20260320.467, macOS Tahoe 26.3.1 (25D2128) [Darwin 25.3.0]
Apple M1, 1 CPU, 8 logical and 8 physical cores
Memory: 8 GB Total, 0.11 GB Available
.NET SDK 11.0.100-preview.3.26170.106
  [Host]   : .NET 11.0.0 (11.0.0-preview.3.26170.106, 11.0.26.17106), Arm64 RyuJIT armv8.0-a
  ShortRun : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a

Job=ShortRun  EnvironmentVariables=DOTNET_EnableHWIntrinsic=1  IterationCount=3  
LaunchCount=1  WarmupCount=3  

Method Toolchain Mean Ratio
DivSame128BitsConst main 0.8697 ns 1.00
DivSame128BitsConst pr 0.8702 ns 1.00
DivSame128BitsVar main 0.9243 ns 1.00
DivSame128BitsVar pr 0.8031 ns 0.87
DivSame64BitsConst main 0.8801 ns 1.00
DivSame64BitsConst pr 0.8705 ns 0.99
DivSame64BitsVar main 1.7937 ns 1.00
DivSame64BitsVar pr 1.4766 ns 0.82
Div128BitsBy128BitsConst main 14.0894 ns 1.00
Div128BitsBy128BitsConst pr 4.4355 ns 0.31
Div128BitsBy128BitsVar main 13.6011 ns 1.00
Div128BitsBy128BitsVar pr 5.7115 ns 0.42
Div128BitsBy64BitsConst main 29.6213 ns 1.00
Div128BitsBy64BitsConst pr 3.4257 ns 0.12
Div128BitsBy64BitsVar main 24.1194 ns 1.00
Div128BitsBy64BitsVar pr 4.4648 ns 0.19
Div128BitsBy32BitsConst main 22.6457 ns 1.00
Div128BitsBy32BitsConst pr 2.9554 ns 0.13
Div128BitsBy32BitsVar main 23.8699 ns 1.00
Div128BitsBy32BitsVar pr 4.4613 ns 0.19
Div64BitsBy128BitsConst main 0.8717 ns 1.00
Div64BitsBy128BitsConst pr 0.8782 ns 1.01
Div64BitsBy128BitsVar main 1.0032 ns 1.00
Div64BitsBy128BitsVar pr 1.0029 ns 1.00
Div64BitsBy64BitsConst main 0.8690 ns 1.00
Div64BitsBy64BitsConst pr 0.8690 ns 1.00
Div64BitsBy64BitsVar main 1.8038 ns 1.00
Div64BitsBy64BitsVar pr 1.5351 ns 0.85
Intel full

BenchmarkDotNet v0.16.0-nightly.20260320.467, Windows 11 (10.0.26200.8039/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i5-13500 2.50GHz, 1 CPU, 20 logical and 14 physical cores
Memory: 31.75 GB Total, 12.91 GB Available
.NET SDK 11.0.100-preview.3.26170.106
  [Host]   : .NET 11.0.0 (11.0.0-preview.3.26170.106, 11.0.26.17106), X64 RyuJIT
  ShortRun : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), X64 RyuJIT

Job=ShortRun  IterationCount=3  LaunchCount=1  
WarmupCount=3  Allocated=-  Alloc Ratio=NA  

Method EnvironmentVariables Toolchain Mean Ratio Code Size
DivSame128BitsConst DOTNET_EnableHWIntrinsic=0 main 0.8903 ns 0.99 15 B
DivSame128BitsConst DOTNET_EnableHWIntrinsic=0 pr 0.8982 ns 1.00 15 B
DivSame128BitsConst DOTNET_EnableHWIntrinsic=1 main 0.8959 ns 1.00 16 B
DivSame128BitsConst DOTNET_EnableHWIntrinsic=1 pr 0.9064 ns 1.01 16 B
DivSame128BitsVar DOTNET_EnableHWIntrinsic=0 main 1.0765 ns 0.92 71 B
DivSame128BitsVar DOTNET_EnableHWIntrinsic=0 pr 1.1523 ns 0.98 92 B
DivSame128BitsVar DOTNET_EnableHWIntrinsic=1 main 1.1735 ns 1.00 75 B
DivSame128BitsVar DOTNET_EnableHWIntrinsic=1 pr 1.1839 ns 1.01 92 B
DivSame64BitsConst DOTNET_EnableHWIntrinsic=0 main 0.9085 ns 1.03 15 B
DivSame64BitsConst DOTNET_EnableHWIntrinsic=0 pr 0.8794 ns 1.00 15 B
DivSame64BitsConst DOTNET_EnableHWIntrinsic=1 main 0.8814 ns 1.00 16 B
DivSame64BitsConst DOTNET_EnableHWIntrinsic=1 pr 0.8847 ns 1.00 16 B
DivSame64BitsVar DOTNET_EnableHWIntrinsic=0 main 2.1817 ns 1.00 233 B
DivSame64BitsVar DOTNET_EnableHWIntrinsic=0 pr 2.1708 ns 1.00 188 B
DivSame64BitsVar DOTNET_EnableHWIntrinsic=1 main 2.1733 ns 1.00 233 B
DivSame64BitsVar DOTNET_EnableHWIntrinsic=1 pr 2.1962 ns 1.01 183 B
Div128BitsBy128BitsConst DOTNET_EnableHWIntrinsic=0 main 8.7614 ns 1.09 894 B
Div128BitsBy128BitsConst DOTNET_EnableHWIntrinsic=0 pr 8.3035 ns 1.03 757 B
Div128BitsBy128BitsConst DOTNET_EnableHWIntrinsic=1 main 8.0590 ns 1.00 991 B
Div128BitsBy128BitsConst DOTNET_EnableHWIntrinsic=1 pr 2.2195 ns 0.28 359 B
Div128BitsBy128BitsVar DOTNET_EnableHWIntrinsic=0 main 14.6620 ns 1.10 1,687 B
Div128BitsBy128BitsVar DOTNET_EnableHWIntrinsic=0 pr 12.7556 ns 0.96 956 B
Div128BitsBy128BitsVar DOTNET_EnableHWIntrinsic=1 main 13.3261 ns 1.00 1,737 B
Div128BitsBy128BitsVar DOTNET_EnableHWIntrinsic=1 pr 3.7866 ns 0.28 513 B
Div128BitsBy64BitsConst DOTNET_EnableHWIntrinsic=0 main 18.0115 ns 4.13 992 B
Div128BitsBy64BitsConst DOTNET_EnableHWIntrinsic=0 pr 4.9383 ns 1.13 443 B
Div128BitsBy64BitsConst DOTNET_EnableHWIntrinsic=1 main 4.3657 ns 1.00 72 B
Div128BitsBy64BitsConst DOTNET_EnableHWIntrinsic=1 pr 2.2170 ns 0.51 58 B
Div128BitsBy64BitsVar DOTNET_EnableHWIntrinsic=0 main 26.2594 ns 5.92 1,686 B
Div128BitsBy64BitsVar DOTNET_EnableHWIntrinsic=0 pr 9.6416 ns 2.17 680 B
Div128BitsBy64BitsVar DOTNET_EnableHWIntrinsic=1 main 4.4375 ns 1.00 265 B
Div128BitsBy64BitsVar DOTNET_EnableHWIntrinsic=1 pr 4.4012 ns 0.99 279 B
Div128BitsBy32BitsConst DOTNET_EnableHWIntrinsic=0 main 15.9377 ns 3.65 771 B
Div128BitsBy32BitsConst DOTNET_EnableHWIntrinsic=0 pr 4.6957 ns 1.07 416 B
Div128BitsBy32BitsConst DOTNET_EnableHWIntrinsic=1 main 4.3709 ns 1.00 67 B
Div128BitsBy32BitsConst DOTNET_EnableHWIntrinsic=1 pr 2.1944 ns 0.50 58 B
Div128BitsBy32BitsVar DOTNET_EnableHWIntrinsic=0 main 22.4470 ns 5.16 1,684 B
Div128BitsBy32BitsVar DOTNET_EnableHWIntrinsic=0 pr 9.6501 ns 2.22 666 B
Div128BitsBy32BitsVar DOTNET_EnableHWIntrinsic=1 main 4.3526 ns 1.00 265 B
Div128BitsBy32BitsVar DOTNET_EnableHWIntrinsic=1 pr 4.4152 ns 1.01 279 B
Div64BitsBy128BitsConst DOTNET_EnableHWIntrinsic=0 main 0.8956 ns 1.03 10 B
Div64BitsBy128BitsConst DOTNET_EnableHWIntrinsic=0 pr 0.9337 ns 1.07 10 B
Div64BitsBy128BitsConst DOTNET_EnableHWIntrinsic=1 main 0.8729 ns 1.00 12 B
Div64BitsBy128BitsConst DOTNET_EnableHWIntrinsic=1 pr 0.9053 ns 1.04 12 B
Div64BitsBy128BitsVar DOTNET_EnableHWIntrinsic=0 main 1.5862 ns 1.13 203 B
Div64BitsBy128BitsVar DOTNET_EnableHWIntrinsic=0 pr 1.3953 ns 1.00 180 B
Div64BitsBy128BitsVar DOTNET_EnableHWIntrinsic=1 main 1.4004 ns 1.00 235 B
Div64BitsBy128BitsVar DOTNET_EnableHWIntrinsic=1 pr 1.4006 ns 1.00 180 B
Div64BitsBy64BitsConst DOTNET_EnableHWIntrinsic=0 main 0.8945 ns 1.02 15 B
Div64BitsBy64BitsConst DOTNET_EnableHWIntrinsic=0 pr 0.8977 ns 1.03 15 B
Div64BitsBy64BitsConst DOTNET_EnableHWIntrinsic=1 main 0.8757 ns 1.00 16 B
Div64BitsBy64BitsConst DOTNET_EnableHWIntrinsic=1 pr 0.8942 ns 1.02 16 B
Div64BitsBy64BitsVar DOTNET_EnableHWIntrinsic=0 main 2.2000 ns 1.00 240 B
Div64BitsBy64BitsVar DOTNET_EnableHWIntrinsic=0 pr 2.2255 ns 1.02 295 B
Div64BitsBy64BitsVar DOTNET_EnableHWIntrinsic=1 main 2.1905 ns 1.00 276 B
Div64BitsBy64BitsVar DOTNET_EnableHWIntrinsic=1 pr 2.2296 ns 1.02 288 B
ModSame128BitsConst DOTNET_EnableHWIntrinsic=0 main 0.9050 ns 0.67 10 B
ModSame128BitsConst DOTNET_EnableHWIntrinsic=0 pr 0.8868 ns 0.65 10 B
ModSame128BitsConst DOTNET_EnableHWIntrinsic=1 main 1.3598 ns 1.00 106 B
ModSame128BitsConst DOTNET_EnableHWIntrinsic=1 pr 0.8932 ns 0.66 12 B
ModSame128BitsVar DOTNET_EnableHWIntrinsic=0 main 2.6174 ns 1.43 207 B
ModSame128BitsVar DOTNET_EnableHWIntrinsic=0 pr 1.0635 ns 0.58 81 B
ModSame128BitsVar DOTNET_EnableHWIntrinsic=1 main 1.8337 ns 1.00 136 B
ModSame128BitsVar DOTNET_EnableHWIntrinsic=1 pr 0.9741 ns 0.53 81 B
ModSame64BitsConst DOTNET_EnableHWIntrinsic=0 main 0.9173 ns 0.85 10 B
ModSame64BitsConst DOTNET_EnableHWIntrinsic=0 pr 0.8749 ns 0.81 10 B
ModSame64BitsConst DOTNET_EnableHWIntrinsic=1 main 1.0760 ns 1.00 56 B
ModSame64BitsConst DOTNET_EnableHWIntrinsic=1 pr 0.9091 ns 0.84 12 B
ModSame64BitsVar DOTNET_EnableHWIntrinsic=0 main 3.2634 ns 1.32 377 B
ModSame64BitsVar DOTNET_EnableHWIntrinsic=0 pr 2.1683 ns 0.88 193 B
ModSame64BitsVar DOTNET_EnableHWIntrinsic=1 main 2.4665 ns 1.00 295 B
ModSame64BitsVar DOTNET_EnableHWIntrinsic=1 pr 2.1834 ns 0.89 185 B
Mod128BitsBy128BitsConst DOTNET_EnableHWIntrinsic=0 main 10.8463 ns 1.24 1,110 B
Mod128BitsBy128BitsConst DOTNET_EnableHWIntrinsic=0 pr 9.2494 ns 1.06 812 B
Mod128BitsBy128BitsConst DOTNET_EnableHWIntrinsic=1 main 8.7267 ns 1.00 1,093 B
Mod128BitsBy128BitsConst DOTNET_EnableHWIntrinsic=1 pr 2.5108 ns 0.29 408 B
Mod128BitsBy128BitsVar DOTNET_EnableHWIntrinsic=0 main 18.1935 ns 1.06 2,164 B
Mod128BitsBy128BitsVar DOTNET_EnableHWIntrinsic=0 pr 17.5810 ns 1.02 1,065 B
Mod128BitsBy128BitsVar DOTNET_EnableHWIntrinsic=1 main 17.2354 ns 1.00 2,175 B
Mod128BitsBy128BitsVar DOTNET_EnableHWIntrinsic=1 pr 4.1911 ns 0.24 614 B
Mod128BitsBy64BitsConst DOTNET_EnableHWIntrinsic=0 main 20.6537 ns 4.74 1,179 B
Mod128BitsBy64BitsConst DOTNET_EnableHWIntrinsic=0 pr 4.5795 ns 1.05 421 B
Mod128BitsBy64BitsConst DOTNET_EnableHWIntrinsic=1 main 4.3543 ns 1.00 124 B
Mod128BitsBy64BitsConst DOTNET_EnableHWIntrinsic=1 pr 2.1958 ns 0.50 57 B
Mod128BitsBy64BitsVar DOTNET_EnableHWIntrinsic=0 main 29.9440 ns 6.81 2,193 B
Mod128BitsBy64BitsVar DOTNET_EnableHWIntrinsic=0 pr 8.7176 ns 1.98 706 B
Mod128BitsBy64BitsVar DOTNET_EnableHWIntrinsic=1 main 4.3984 ns 1.00 331 B
Mod128BitsBy64BitsVar DOTNET_EnableHWIntrinsic=1 pr 4.3848 ns 1.00 328 B
Mod128BitsBy32BitsConst DOTNET_EnableHWIntrinsic=0 main 17.2415 ns 3.93 953 B
Mod128BitsBy32BitsConst DOTNET_EnableHWIntrinsic=0 pr 4.6249 ns 1.05 397 B
Mod128BitsBy32BitsConst DOTNET_EnableHWIntrinsic=1 main 4.3865 ns 1.00 150 B
Mod128BitsBy32BitsConst DOTNET_EnableHWIntrinsic=1 pr 2.2215 ns 0.51 57 B
Mod128BitsBy32BitsVar DOTNET_EnableHWIntrinsic=0 main 27.5504 ns 6.35 2,191 B
Mod128BitsBy32BitsVar DOTNET_EnableHWIntrinsic=0 pr 8.5561 ns 1.97 696 B
Mod128BitsBy32BitsVar DOTNET_EnableHWIntrinsic=1 main 4.3371 ns 1.00 331 B
Mod128BitsBy32BitsVar DOTNET_EnableHWIntrinsic=1 pr 4.4553 ns 1.03 328 B
Mod64BitsBy128BitsConst DOTNET_EnableHWIntrinsic=0 main 0.9039 ns 0.86 15 B
Mod64BitsBy128BitsConst DOTNET_EnableHWIntrinsic=0 pr 0.8903 ns 0.85 15 B
Mod64BitsBy128BitsConst DOTNET_EnableHWIntrinsic=1 main 1.0468 ns 1.00 53 B
Mod64BitsBy128BitsConst DOTNET_EnableHWIntrinsic=1 pr 0.8914 ns 0.85 16 B
Mod64BitsBy128BitsVar DOTNET_EnableHWIntrinsic=0 main 3.0250 ns 1.50 361 B
Mod64BitsBy128BitsVar DOTNET_EnableHWIntrinsic=0 pr 1.1598 ns 0.58 165 B
Mod64BitsBy128BitsVar DOTNET_EnableHWIntrinsic=1 main 2.0158 ns 1.00 304 B
Mod64BitsBy128BitsVar DOTNET_EnableHWIntrinsic=1 pr 1.1524 ns 0.57 165 B
Mod64BitsBy64BitsConst DOTNET_EnableHWIntrinsic=0 main 0.9123 ns 0.88 15 B
Mod64BitsBy64BitsConst DOTNET_EnableHWIntrinsic=0 pr 0.9024 ns 0.87 15 B
Mod64BitsBy64BitsConst DOTNET_EnableHWIntrinsic=1 main 1.0419 ns 1.00 56 B
Mod64BitsBy64BitsConst DOTNET_EnableHWIntrinsic=1 pr 0.9045 ns 0.87 16 B
Mod64BitsBy64BitsVar DOTNET_EnableHWIntrinsic=0 main 3.3517 ns 1.39 388 B
Mod64BitsBy64BitsVar DOTNET_EnableHWIntrinsic=0 pr 2.2079 ns 0.91 285 B
Mod64BitsBy64BitsVar DOTNET_EnableHWIntrinsic=1 main 2.4175 ns 1.00 339 B
Mod64BitsBy64BitsVar DOTNET_EnableHWIntrinsic=1 pr 2.1942 ns 0.91 280 B
benchmark code
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;

[DisassemblyDiagnoser]
[MemoryDiagnoser(false)]
[HideColumns("Job", "Error", "StdDev", "Median", "RatioSD", "Allocated", "Alloc Ratio")]
[GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByMethod)]
public class UInt128Divide
{
    UInt128 UInt128MaxValue, UInt64MaxValue, UInt32MaxValue, UInt64Middle, UInt128Middle;

    [GlobalSetup]
    public void Setup()
    {
        UInt128MaxValue = UInt128.MaxValue;
        UInt64MaxValue = ulong.MaxValue;
        UInt32MaxValue = uint.MaxValue;
        UInt64Middle = 0x_1990_62AD_C61A_9C62ul;
        UInt128Middle = new UInt128(0x_F0A4_EB02_3567, 0x_1990_62AD_C61A_9C62);
    }

    [Benchmark] public UInt128 DivSame128BitsConst() => new UInt128(0x_F0A4_EB02_3567, 0x_1990_62AD_C61A_9C62) / new UInt128(0x_F0A4_EB02_3567, 0x_1990_62AD_C61A_9C62);
    [Benchmark] public UInt128 DivSame128BitsVar() => UInt128Middle / UInt128Middle;

    [Benchmark] public UInt128 DivSame64BitsConst() => new UInt128(0, ulong.MaxValue) / new UInt128(0, ulong.MaxValue);
    [Benchmark] public UInt128 DivSame64BitsVar() => UInt64MaxValue / UInt64MaxValue;

    [Benchmark] public UInt128 Div128BitsBy128BitsConst() => UInt128.MaxValue / new UInt128(0x_F0A4_EB02_3567, 0x_1990_62AD_C61A_9C62);
    [Benchmark] public UInt128 Div128BitsBy128BitsVar() => UInt128MaxValue / UInt128Middle;

    [Benchmark] public UInt128 Div128BitsBy64BitsConst() => UInt128.MaxValue / new UInt128(0, 0x_1990_62AD_C61A_9C62);
    [Benchmark] public UInt128 Div128BitsBy64BitsVar() => UInt128MaxValue / UInt64Middle;

    [Benchmark] public UInt128 Div128BitsBy32BitsConst() => new UInt128(0x_F0A4_EB02_3567, 0x_1990_62AD_C61A_9C62) / new UInt128(0, uint.MaxValue);
    [Benchmark] public UInt128 Div128BitsBy32BitsVar() => UInt128Middle / UInt32MaxValue;

    [Benchmark] public UInt128 Div64BitsBy128BitsConst() => new UInt128(0, ulong.MaxValue) / new UInt128(0x_F0A4_EB02_3567, 0x_1990_62AD_C61A_9C62);
    [Benchmark] public UInt128 Div64BitsBy128BitsVar() => UInt64MaxValue / UInt128Middle;

    [Benchmark] public UInt128 Div64BitsBy64BitsConst() => new UInt128(0, ulong.MaxValue) / new UInt128(0, 0x_1990_62AD_C61A_9C62);
    [Benchmark] public UInt128 Div64BitsBy64BitsVar() => UInt64MaxValue / UInt64Middle;


    [Benchmark] public UInt128 ModSame128BitsConst() => new UInt128(0x_F0A4_EB02_3567, 0x_1990_62AD_C61A_9C62) % new UInt128(0x_F0A4_EB02_3567, 0x_1990_62AD_C61A_9C62);
    [Benchmark] public UInt128 ModSame128BitsVar() => UInt128Middle % UInt128Middle;

    [Benchmark] public UInt128 ModSame64BitsConst() => new UInt128(0, ulong.MaxValue) % new UInt128(0, ulong.MaxValue);
    [Benchmark] public UInt128 ModSame64BitsVar() => UInt64MaxValue % UInt64MaxValue;

    [Benchmark] public UInt128 Mod128BitsBy128BitsConst() => UInt128.MaxValue % new UInt128(0x_F0A4_EB02_3567, 0x_1990_62AD_C61A_9C62);
    [Benchmark] public UInt128 Mod128BitsBy128BitsVar() => UInt128MaxValue % UInt128Middle;

    [Benchmark] public UInt128 Mod128BitsBy64BitsConst() => UInt128.MaxValue % new UInt128(0, 0x_1990_62AD_C61A_9C62);
    [Benchmark] public UInt128 Mod128BitsBy64BitsVar() => UInt128MaxValue % UInt64Middle;

    [Benchmark] public UInt128 Mod128BitsBy32BitsConst() => new UInt128(0x_F0A4_EB02_3567, 0x_1990_62AD_C61A_9C62) % new UInt128(0, uint.MaxValue);
    [Benchmark] public UInt128 Mod128BitsBy32BitsVar() => UInt128Middle % UInt32MaxValue;

    [Benchmark] public UInt128 Mod64BitsBy128BitsConst() => new UInt128(0, ulong.MaxValue) % new UInt128(0x_F0A4_EB02_3567, 0x_1990_62AD_C61A_9C62);
    [Benchmark] public UInt128 Mod64BitsBy128BitsVar() => UInt64MaxValue % UInt128Middle;

    [Benchmark] public UInt128 Mod64BitsBy64BitsConst() => new UInt128(0, ulong.MaxValue) % new UInt128(0, 0x_1990_62AD_C61A_9C62);
    [Benchmark] public UInt128 Mod64BitsBy64BitsVar() => UInt64MaxValue % UInt64Middle;
}

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Apr 4, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

return (new UInt128(highRes, lowRes), remainder >> shift);
}

static (ulong Quotient, ulong Remainder) Divide128BitsBy64BitsCore(ulong hi, ulong lo, ulong divisor)
Copy link
Copy Markdown
Contributor

@lilinus lilinus Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess an alternative name for this could be Divide64BitsBy64BitsWithCarry given the restriction hi < divisor?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Knuth’s Algorithm D, this part is typically described as 128/64 bits.
In the Burnikel–Ziegler algorithm used by BigInteger, the corresponding part is referred to as 2n/n bits (cf. #96895).

I don’t think we should deliberately use a name that deviates from this convention.

return (q, rem >> shift);
}

static (UInt128 Quotient, ulong Remainder) Divide128BitsBy64Bits(UInt128 left, ulong divisor)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method could be implemented by executing X86Base.X64.DivRem twice (if supported)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to the previous implementation?

X86Base.X64.DivRem(left._upper, 0, right._lower) returns the same result as ulong.DivRem(left._upper, right._lower).
Is there any benefit to calling X86Base.X64.DivRem twice?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly, I mean the shifting part below could be skipped in that case, since shifting would not affect Divide128BitsBy64BitsCore performance.

Sorry for not being clear, with calling it twice i mean this method could simply be (givenX86Base.X64.IsSupported):

(ulong hires, ulong rem) = X86Base.X64.DivRem(left._upper, 0, divisor); // or ulong.DivRem
(ulong lowres, rem) = X86Base.X64.DivRem(left._lower, rem, divisor); // Or Divide128BitsBy64BitsCore
return (new UInt128(hires, lowres), rem);

@kzrnm
Copy link
Copy Markdown
Contributor Author

kzrnm commented Apr 5, 2026

Method EnvironmentVariables Toolchain Mean Ratio Code Size
Div128BitsBy64BitsVar DOTNET_EnableHWIntrinsic=0 main 24.634 ns 5.76 1,717 B
Div128BitsBy64BitsVar DOTNET_EnableHWIntrinsic=0 pr 9.790 ns 2.29 680 B
Div128BitsBy64BitsVar DOTNET_EnableHWIntrinsic=1 main 4.280 ns 1.00 265 B
Div128BitsBy64BitsVar DOTNET_EnableHWIntrinsic=1 pr 4.441 ns 1.04 221 B

@tannergooding
Copy link
Copy Markdown
Member

DOTNET_EnableHWIntrinsic=0

This is not a scenario we care about it, it is an edge case scenario that only exists for testing and not for production purposes

DOTNET_EnableHWIntrinsic=1

These ones we do care about, but it'd be helpful to separate them out and make it clearer from where the improvements are flowing.

@kzrnm
Copy link
Copy Markdown
Contributor Author

kzrnm commented Apr 6, 2026

I’ve added an explanation and benchmarks on an Apple M1 Mac.

This is not a scenario we care about it, it is an edge case scenario that only exists for testing and not for production purposes

On Intel CPUs, UInt128 division is accelerated using X86Base.X64 instructions. Since there is no equivalent on ARM, I disable SIMD on my primary development environment (an Intel CPU) to reproduce that scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Runtime.Intrinsics community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants