Can "Math.DivRem" do division operation only once? #4155

ygc369 · 2015-04-21T13:00:04Z

Currently it does division twice, but most CPUs support to get the two values by doing division only once.
Besides,this method should support uint and ulong too.

category:cq
theme:div-mod-rem
skill-level:expert
cost:large
impact:medium

jkotas · 2015-04-21T14:37:47Z

Math.DivRem is not in the .NET Core public surface, but I agree that it would be nice for the code generators to recognize the pattern and perform the division just one.

cc @CarolEidt @cmckinsey

stephentoub · 2015-04-21T15:05:31Z

cc: @KrzysztofCwalina, as IIRC, Krzysztof bumped up against this as a measurable cost when doing some recent fomatting/parsing experimentation.

CarolEidt · 2015-04-21T15:52:16Z

This would not be a very difficult pattern to recognize: a div and a rem with incoming value numbers that match, though it doesn't "fit" with any existing (or other common) optimization patterns. The thing that would be challenging would be to deal with the resulting IR node that would produce two values in registers. Although that would not be all that difficult, it would require more pervasive changes.

tannergooding · 2015-04-21T15:57:45Z

👍

In short term, couldn't Math.DivRem be treated as a special-case (intrinsic)? I wouldn't think that this optimization wouldn't really require many changes.

Long term though, adding an optimization pattern to recognize / and % with matching value members would be great.

CarolEidt · 2015-04-21T16:03:04Z

@tannergooding - you would think that it would be simpler, but actually it requires the messiest part of the work (handling the node in the IR that produces two values), while only relieving the JIT of what is probably the more straightforward part of the implementation. It might actually be easier on x86 (32 bit) where we already have to deal with longs that occupy multiple registers.

tannergooding · 2017-06-07T20:01:11Z

@CarolEidt, I'm actually wondering why this would require more pervasive changes?

I would think that instructions like idiv should already document (via some metadata) that they end up writing two registers, correct? As without this being carried around by the code generator, it would potentially end up overwriting the second argument of the parameter list whenever it generates the idiv instruction.

I've not looked at the code, but I would guess that various instructions do indicate what registers end up being input/output by the instruction and that the changes here would be to allow the node to actually carry the second value (which could probably be done via inheritance and type checking). I actually think this is in a similar vein to having an intrinsic that takes 3 arguments where the GenTreeOp currently only allows 2 arguments.

ygc369 · 2020-03-29T02:04:12Z

@CarolEidt
Is there any progress about this?

ygc369 · 2020-08-26T13:16:24Z

@tannergooding @CarolEidt
After more than five years, has this issue been solved?

tannergooding · 2020-08-27T15:27:14Z

No, but .NET 5 brought about some register allocator changes that should (hopefully) make it simpler to support now.

caihongxu · 2020-09-21T22:37:10Z

Before this is addressed in .NET 5, any remedies we could do ourselves. e.g. is it possible to roll our own DivRem method to achieve doing division only once?

ygc369 · 2021-02-18T11:38:05Z

@tannergooding
From this blog Announcing .NET 6 Preview 1, I see that there will be a new overload of DivRem for .Net 6, which returns a tuple. Will this new overload be able to do division only once?

tannergooding · 2021-02-18T15:44:11Z

The new API surface is setup in a way that would enable that optimization on x86/x64. I don't believe there is any instruction to compute quotient and remainder simultaneously on ARM32/ARM64.

Whether or not it gets enabled likely depends on prioritization and availability from the dev's familiar enough with that area of the codebase.

Bip901 · 2022-03-09T12:00:57Z

@tannergooding @CarolEidt After more than five years, has this issue been solved?

Seems like it's doing a single division now. Still not using the x86 optimization though.

ygc369 · 2022-07-25T13:32:26Z

I wonder why this issue so difficult to fix? .Net 7 is coming soon, when will this issue be fixed? @tannergooding

tannergooding · 2022-07-25T17:04:43Z

The issue is complex because you have to take an instruction with very explicit register usage and then map that back to a higher level concept.

In particular, div and idiv are really two instructions in one. They compute the quotient and the remainder "simultaneously". They also don't do simple 32 / 32 = 32 or 64 / 64 = 64, they do 128 / 64 = 64 and 64 / 32 = 32. They have a requirement that the input be in RDX:RAX (or EDX:EAX, or DX:AX, or AH:AL) and then return the result in left (Quotient, so RAX, EAX, AX, or AH) and right (Remainder, so RDX, EDX, DX, or AL).

This then needs to be mapped, at a higher level, back to T DivRem(T left, T right, out T remainder) or (T Quotient, T Remainder) DivRem(T left, T right).

In the case of the first, the Quotient is simple because the "natural" return register is RAX/EAX. The output is slightly problematic because what you have is really an address (so a GC tracked T*).

In the case of the latter, what you have is a struct containing two fields. On some platforms this will be "one register" (often RAX/EAX), on other platforms it will be "two registers" (often RAX:RDX/EAX:EDX), and on others still it will actually be an "out buffer" (so the signature is really ref ValueTuple<T, T> DivRem(out ValueTuple<T, T> result, T left, T right)).

In each of these cases, the JIT has some complexities it has to work around and deal with such as eliding the pointers/indirections and ensuring the data can be consumed directly.

There is also the note that this is an x86/x64 specific problem. Other platforms just have "div" and if you also need the remainder, you compute it exactly as x86/x64 is already in DivRem (remainder = left - (quotient * right)). On modern CPUs this isn't going to be terrible either because modern x86/x64 can optimize the case where you don't consume the remainder. Likewise, if right (the divisor) is a constant, you can often skip using div/idiv entirely and use an alternative sequence that is faster instead (at which point computing the remainder manually is often strictly better). -- One of the main optimizations an x86/x64 compiler can do is avoiding actually emitting div/idiv however it can because it is that expensive (which is something we do already do, so the light up opportunities for actually using a single div/idiv are rarer in the first place).

Now, the support was added to the JIT to better support instructions that "return" multiple registers. However, it is still quite a complex area due to all the scenarios, edge cases, and considerations listed above. We'll get it eventually, but its not the highest priority consideration since its likely not the bottleneck in most applications and when we do get to it, we have to ensure that codegen is good and not worse than the "fallback" we're generating today.

colejohnson66 · 2024-05-21T17:53:58Z

For an update, in 8.0, DivRem only does one division, then computes the remainder from that.

https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIGYACMhgYQYG8aHunGAKAK4BLAHYYGARQEQMQ7GLQNhYhgCUY+bKIAmMKAEoGAESEA3dfkGjx2RcvHBDnaj1dMA7AwCy2DAAsAOhNzDT5bBkcAbi4eAF8aWKA=

tannergooding · 2024-05-24T16:06:02Z

DivRem has been implemented to do a division and then compute the remainder using multiplication for many releases now.

On non-xarch platforms, this is the only way to compute the remainder as well, notably. There is no single instruction that computes div+rem simultaneously.

colejohnson66 · 2024-05-24T17:58:25Z

Yes. So the main complaint that it performs two divisions is no longer valid.

tannergooding · 2024-05-24T18:34:54Z

And the overall history of the issue, which needs to be taken into account, covers that such a change went in. This issue then still exists to track the underlying spirit of the bug which is that its doing more work than necessary on some platforms.

ygc369 · 2024-05-29T11:24:37Z

@tannergooding
Using multiplication is also extra work, although maybe faster than an extra division. Isn't it possible to get div+rem at the same time just by doing division once and no other operation? Maybe this cannot be done for all platforms as some platforms do not support it by hardware. But there must be platforms that support it, so, can it be done just on supported platforms?

bazzilic · 2024-05-29T11:47:17Z

multiplication is also extra work, although maybe faster than an extra division

it is indeed significantly faster

can it be done just on supported platforms

some comments above answer this question in great detail, e.g.: #4155 (comment)

colejohnson66 · 2024-05-29T12:04:35Z

On x86, a division can take 15-30 clock cycles (sometimes over 40) and can only start one ever three or four. Multiplication, on the other hand, takes about eight and can start a new one every cycle.

tannergooding · 2024-05-29T13:25:24Z

Isn't it possible to get div+rem at the same time just by doing division once and no other operation

For x86/x64, yes. It isn't possible on other platforms as they don't provide such support. However, even for x86/x64 the benefit there is really using 1 instruction, it isn't actually much faster (than doing div, mul, sub explicitly). In many cases it is actually more desirable to break it apart to avoid a dependency chain or use an alternative sequence (especially if one of the inputs is known constant).

Modern CPUs (almost any in the past 12-14 years) account for these things and often have special div/idiv support that breaks it apart internally to what is functionally div, mul, sub already so that the quotient can be available prior to the remainder being available. They may even skip the internal microcode for computing the remainder in some cases if they can see its unused. The time is also often dependent on the number of significant bits in the dividend, although even newer processors have changed implementations and now it is often based on the number of significant bits in the quotient instead.

On x86, a division can take 15-30 clock cycles (sometimes over 40) and can only start one ever three or four. Multiplication, on the other hand, takes about eight and can start a new one every cycle.

Notably this very much depends on microarchitecture, whether you're doing signed or unsigned, and it has massively improved in even more recent years

Haswell (2013)
- idiv (32-bit) took around 20-28 cycles with 8 reciprocal throughput
- idiv (64-bit) took around 37-101 cycles with 24 reciprocal throughput.
Cannon Lake (2018)
- idiv (32-bit) took around 10-15 cycles with 6 reciprocal throughput
- idiv (64-bit) took around 14-18 cycles with 10 reciprocal throughput.

Your multiplication info is also quite a bit off, both 32-bit and 64-bit have been 3 cycles with 1 reciprocal throughput since the early 2000s. If you're doing a big multiplication (64x64=128) then it used to take 7-9 or 3-10 cycles for pre 2011 chips, but that's not typical.

msftgits transferred this issue from dotnet/coreclr Jan 30, 2020

msftgits added this to the Future milestone Jan 30, 2020

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

TIHan self-assigned this Oct 31, 2022

TIHan removed the JitUntriaged CLR JIT issues needing additional triage label Oct 31, 2022

TIHan modified the milestones: Future, 8.0.0 Jan 17, 2023

TIHan mentioned this issue Jan 28, 2023

[RyuJIT] Div then Mod (or viceversa) executes idiv twice instead of using the value in the register #5213

Open

TIHan removed their assignment Jan 28, 2023

jeffhandley modified the milestones: 8.0.0, Future Mar 25, 2023

sgorozco mentioned this issue Mar 28, 2024

For many small stream writes, RecyclableMemoryStream is slower than other implementations microsoft/Microsoft.IO.RecyclableMemoryStream#339

Open

ygc369 closed this as completed May 30, 2024

github-actions bot locked and limited conversation to collaborators Jun 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can "Math.DivRem" do division operation only once? #4155

Can "Math.DivRem" do division operation only once? #4155

ygc369 commented Apr 21, 2015 •

edited by BruceForstall

Loading

jkotas commented Apr 21, 2015

stephentoub commented Apr 21, 2015

CarolEidt commented Apr 21, 2015

tannergooding commented Apr 21, 2015

CarolEidt commented Apr 21, 2015

tannergooding commented Jun 7, 2017

ygc369 commented Mar 29, 2020

ygc369 commented Aug 26, 2020

tannergooding commented Aug 27, 2020

caihongxu commented Sep 21, 2020

ygc369 commented Feb 18, 2021

tannergooding commented Feb 18, 2021

Bip901 commented Mar 9, 2022

ygc369 commented Jul 25, 2022

tannergooding commented Jul 25, 2022 •

edited by danmoseley

Loading

colejohnson66 commented May 21, 2024

tannergooding commented May 24, 2024

colejohnson66 commented May 24, 2024

tannergooding commented May 24, 2024

ygc369 commented May 29, 2024

bazzilic commented May 29, 2024

colejohnson66 commented May 29, 2024

tannergooding commented May 29, 2024

Can "Math.DivRem" do division operation only once? #4155

Can "Math.DivRem" do division operation only once? #4155

Comments

ygc369 commented Apr 21, 2015 • edited by BruceForstall Loading

jkotas commented Apr 21, 2015

stephentoub commented Apr 21, 2015

CarolEidt commented Apr 21, 2015

tannergooding commented Apr 21, 2015

CarolEidt commented Apr 21, 2015

tannergooding commented Jun 7, 2017

ygc369 commented Mar 29, 2020

ygc369 commented Aug 26, 2020

tannergooding commented Aug 27, 2020

caihongxu commented Sep 21, 2020

ygc369 commented Feb 18, 2021

tannergooding commented Feb 18, 2021

Bip901 commented Mar 9, 2022

ygc369 commented Jul 25, 2022

tannergooding commented Jul 25, 2022 • edited by danmoseley Loading

colejohnson66 commented May 21, 2024

tannergooding commented May 24, 2024

colejohnson66 commented May 24, 2024

tannergooding commented May 24, 2024

ygc369 commented May 29, 2024

bazzilic commented May 29, 2024

colejohnson66 commented May 29, 2024

tannergooding commented May 29, 2024

ygc369 commented Apr 21, 2015 •

edited by BruceForstall

Loading

tannergooding commented Jul 25, 2022 •

edited by danmoseley

Loading