Ensure Log2 can actually be imported as intrinsic#128678
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR updates the CoreCLR JIT importer for the NI_PRIMITIVE_Log2 primitive intrinsic to correctly distinguish signed vs. unsigned inputs using the precise type, enabling the intrinsic expansion for unsigned arguments and adding a conditional throw path for signed arguments.
Changes:
- Fixes signed/unsigned detection for Log2 by using
JitType2PreciseVarType(baseJitType)rather than the non-precisebaseType. - Implements the Log2 expansion via
LeadingZeroCount(value | 1)to satisfy the0 -> 0contract. - Adds a conditional throw for negative signed inputs using a qmark-based fallback path.
f76559c to
eaeb042
Compare
bbe9576 to
d70190d
Compare
d70190d to
e005a7f
Compare
|
CC. @dotnet/jit-contrib, @EgorBo for review. This one fixes the The diffs here are significant. We see The vast bulk of this is of course in The diffs are then actually meaningful as well, often impacting key libraries APIs that are common in most apps like - movz x1, #0xD1FFAB1E // code for <unknown method>
- movk x1, #0xD1FFAB1E LSL #16
- movk x1, #0xD1FFAB1E LSL #32
- ldr x1, [x1]
- blr x1 // code for <unknown method>
- ;; size=24 bbWeight=1 PerfScore 7.50
+ orr w0, w0, #1
+ clz w0, w0
+ eor w0, w0, #31The places where we have diff regressions then mostly appear to be places where we get additional inlining, CSE, and other optimizations being done due to having the 3 instruction sequence imported directly (they aren't taking part of the budget anymore, which means other code can use it instead). We see the same on x64 and x86 as well, with the vast bulk of the wins just coming from replacing the calls with the actual IR. |
| assert(!varTypeIsUnsigned(JitType2PreciseVarType(baseJitType))); | ||
|
|
||
| GenTree* fallback = | ||
| gtNewMustThrowException(CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION, baseType, NO_CLASS_HANDLE); |
There was a problem hiding this comment.
This is not full equalent of the managed code that is
throw new ArgumentOutOfRangeException("value", SR.ThrowArgument_ValueNonNegative);There was a problem hiding this comment.
is this optimization worth loosing user-friendly (and localized) message that the provided value "The provided value must be non-negative." ?
There was a problem hiding this comment.
I think so, yes. There's exactly one failure that the method can throw and its incredibly unlikely to be hit.
We already shouldn't be expanding this in MinOpts (Debug) since it isn't a "mustExpand" intrinsic, so debug code will still get the appropriate exception message where it matters.
There was a problem hiding this comment.
Change the managed code then? why do we have a different behavior for Debug or Release JITs or JIT vs Interpeter.
There was a problem hiding this comment.
Because we already regularly have all manner of different semantics for stack traces between debug vs release and as we found out a few weeks ago even between methods marked AggressiveInlining vs not.
It is okay for some key scenarios to streamline the experience a little bit for release while leaving the debug experience "optimal". The alternative is we either add support for customizing exception messages for these cases in the JIT or explicitly make the debug UX worse; neither of which is actually an improvement.
I am doubtful that QMARK will ever fully go away as they have a significant benefit in that they allow introducing block like functionality (namely a
Most compilers do this for the fundamental math APIs like this, we're not special and even have stricter timing requirements which means there is more reason for us to handle such APIs intrinsically as it significantly reduces the amount of work the JIT has to do. The inliner is always going to have limitations and there's no reason to contribute to its pessimizations for these foundational APIs everything else is built on.
This is the case with most diffs, as
I don't think this is a fix and it just pushes things down the road more. The general problem here is already known as well and we've discussed it many times, which is the inliner has a budget and it means we still give up on trivially getters, setters, and other direct calls like this.
L_0000: ldarg.0
IL_0001: call int32 [System.Runtime]System.Numerics.BitOperations::Log2(uint32)
IL_0006: retBut it does give up anyways, because we have dozens of heuristics and other checks for something that ultimately looks expensive but really isn't. We skip all those checks, all the eating of the budget, and all the expensive handling by treating these key APIs as intrinsic. It completely bypasses the other expensive transforms and checks which ultimately simplifies what the JIT has to do, by adding a few lines to the importer to handle the known special scenarios. |
Many phases between importer and global morph just bail on QMARKs (e.g. escape analysis), also, it introduces a very unobvious execution order (esp. nested qmarks) in trees all phases have to keep in mind, so removing it eventually will be a nice simplification. I am not saying it's bad, just needs a justification. E.g. I almost removed QMARKs for early cast expansion, my next goal is to remove the entire importervectorization.cpp, move it to a late phase.
Why don't we intrinsify everything then, where is the line? Is Log2 that important for .NET users? I think we need to continue investing into improving inliner instead. Making it much more reliable for things we care about. Are you going to continue pushing various small methods as JIT intrinsics?
All I see is a lot of pretty trivial tests for Log2 there which are now e.g. folded into constants in Tier0 thanks to this change.
I do think we need to stop using it as an excuse. I agree on importing things as special opcodes so then JIT can emit those opcodes as part of other transformations and rely on proper expansion - that made sense to me, but it's no the case here as we just mimic inliner's work, not more than that.
L_0000: ldarg.0 I think we need to study why we give up, it might be something fundamental like abstract generic resolution in importer. |
My point here was rather that we have a fundamental need to introduce IR that represents So even if they aren't the best IR today, the code is logically correct and will maintain roughly the current shape even when QMARKs disappear.
Because intrinsifying everything is both impossible and a negative. Most methods are not foundational in that sense and it would effectively cause "too much inlining", regress throughput, regress codegen size, etc.
I agree, but we also know this is much more complex work and that it will always have limits. So for foundational cases that are common to other compilers and where we can avoid some of the redundant work, it makes sense.
The short answer is not really, fixing the existing I'm not and have never been looking to intrinsify the world, only the key foundational math APIs like this one, i.e. the building blocks for the rest of .NET. Most of these were already handled and Of the foundational math APIs that aren't intrinsified, we only have integral At best we might consider having
Does the I had downloaded the full diffs and saw most of the impactful changes in code that was actually proper fullopts or T1, i.e. the asmdiff says One of the more robust cases is: going from But for non tests it improves the codegen around string formatting for all integer primitives (via This is also why there are so many triggers in tests, because |

Previously this was code was never being hit because we were checking against the non-precise var_type and so we always saw
TYP_INTeven for unsigned inputs. Now we correctly handle checking for unsigned inputs and allow importing signed ones as well.