Skip to content

Ensure Log2 can actually be imported as intrinsic#128678

Open
tannergooding wants to merge 3 commits into
dotnet:mainfrom
tannergooding:better-log2
Open

Ensure Log2 can actually be imported as intrinsic#128678
tannergooding wants to merge 3 commits into
dotnet:mainfrom
tannergooding:better-log2

Conversation

@tannergooding
Copy link
Copy Markdown
Member

Previously this was code was never being hit because we were checking against the non-precise var_type and so we always saw TYP_INT even for unsigned inputs. Now we correctly handle checking for unsigned inputs and allow importing signed ones as well.

Copilot AI review requested due to automatic review settings May 28, 2026 03:48
@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 28, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the CoreCLR JIT importer for the NI_PRIMITIVE_Log2 primitive intrinsic to correctly distinguish signed vs. unsigned inputs using the precise type, enabling the intrinsic expansion for unsigned arguments and adding a conditional throw path for signed arguments.

Changes:

  • Fixes signed/unsigned detection for Log2 by using JitType2PreciseVarType(baseJitType) rather than the non-precise baseType.
  • Implements the Log2 expansion via LeadingZeroCount(value | 1) to satisfy the 0 -> 0 contract.
  • Adds a conditional throw for negative signed inputs using a qmark-based fallback path.

Comment thread src/coreclr/jit/importercalls.cpp Outdated
Comment thread src/coreclr/jit/importercalls.cpp Outdated
@tannergooding tannergooding force-pushed the better-log2 branch 2 times, most recently from f76559c to eaeb042 Compare May 28, 2026 06:38
Copilot AI review requested due to automatic review settings May 28, 2026 06:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.

Comment thread src/coreclr/jit/importercalls.cpp
Comment thread src/coreclr/jit/importercalls.cpp
Comment thread src/coreclr/jit/importercalls.cpp
Comment thread src/coreclr/jit/importercalls.cpp
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread src/coreclr/jit/importercalls.cpp
Comment thread src/coreclr/jit/importercalls.cpp Outdated
Copilot AI review requested due to automatic review settings May 28, 2026 16:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread src/coreclr/jit/importercalls.cpp Outdated
@tannergooding tannergooding marked this pull request as ready for review May 29, 2026 02:37
Copilot AI review requested due to automatic review settings May 29, 2026 02:37
@tannergooding
Copy link
Copy Markdown
Member Author

tannergooding commented May 29, 2026

CC. @dotnet/jit-contrib, @EgorBo for review. This one fixes the Log2 importation so it actually works and simultaneously adds handling for signed log2 to be directly imported as well.

The diffs here are significant. We see -321k bytes of codegen on Linux Arm64 and -299k bytes of codegen on Linux x64, we see similar numbers on Windows Arm64 (-331k bytes) and smaller number (-156k bytes) on Windows x64.

The vast bulk of this is of course in tests, but there is a decent amount found in the benchmarks and libraries codegen as well. It also comes with a significant TP improvement to FullOpts, tending up to -0.32% for the tests. This also has similar benefits on x86.

The diffs are then actually meaningful as well, often impacting key libraries APIs that are common in most apps like FormattingHelpers.CountDigits. In many cases Log2 was not being inlined and was instead being left as a call, so we were failing to get any optimizations in these areas and so a lot of the diffs look like this:

-            movz    x1, #0xD1FFAB1E      // code for <unknown method>
-            movk    x1, #0xD1FFAB1E LSL #16
-            movk    x1, #0xD1FFAB1E LSL #32
-            ldr     x1, [x1]
-            blr     x1      // code for <unknown method>
-						;; size=24 bbWeight=1 PerfScore 7.50
+            orr     w0, w0, #1
+            clz     w0, w0
+            eor     w0, w0, #31

The places where we have diff regressions then mostly appear to be places where we get additional inlining, CSE, and other optimizations being done due to having the 3 instruction sequence imported directly (they aren't taking part of the budget anymore, which means other code can use it instead). We see the same on x64 and x86 as well, with the vast bulk of the wins just coming from replacing the calls with the actual IR.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread src/coreclr/jit/importercalls.cpp
Comment thread src/coreclr/jit/importercalls.cpp
@EgorBo
Copy link
Copy Markdown
Member

EgorBo commented May 29, 2026

I am not a fan of these intrinsifications (especially when it has to import yet another QMARK) and move all possible math functions to JIT intrinsics while it should just naturally be handled in the inliner.

The diffs here are significant. We see -321k bytes of codegen on Linux Arm64

Literally 99% of diffs are in libraries_tests.run. and all example diffs imply that

[Intrinsic]
public static uint Log2(uint value) => (uint)BitOperations.Log2(value);

is not inlined. I suspect it might be either PGO-driven or it needs an AggressiveInlining.

My opinion we should just either look why e.g. in such a small method we don't inline it without AggressiveInlining on it:
{A3A3FBCC-75AB-42F7-A074-1D18928B1FED}

or just slap AggressiveInlining everywhere

cc @dotnet/jit-contrib for opinions

assert(!varTypeIsUnsigned(JitType2PreciseVarType(baseJitType)));

GenTree* fallback =
gtNewMustThrowException(CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION, baseType, NO_CLASS_HANDLE);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not full equalent of the managed code that is

throw new ArgumentOutOfRangeException("value", SR.ThrowArgument_ValueNonNegative);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this optimization worth loosing user-friendly (and localized) message that the provided value "The provided value must be non-negative." ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, yes. There's exactly one failure that the method can throw and its incredibly unlikely to be hit.

We already shouldn't be expanding this in MinOpts (Debug) since it isn't a "mustExpand" intrinsic, so debug code will still get the appropriate exception message where it matters.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the managed code then? why do we have a different behavior for Debug or Release JITs or JIT vs Interpeter.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we already regularly have all manner of different semantics for stack traces between debug vs release and as we found out a few weeks ago even between methods marked AggressiveInlining vs not.

It is okay for some key scenarios to streamline the experience a little bit for release while leaving the debug experience "optimal". The alternative is we either add support for customizing exception messages for these cases in the JIT or explicitly make the debug UX worse; neither of which is actually an improvement.

Comment thread src/coreclr/jit/importercalls.cpp
Comment thread src/coreclr/jit/importercalls.cpp
@tannergooding
Copy link
Copy Markdown
Member Author

I am not a fan of these intrinsifications (especially when it has to import yet another QMARK)

I am doubtful that QMARK will ever fully go away as they have a significant benefit in that they allow introducing block like functionality (namely a compare + branch to throw an exception) into the code. If anything, it will just be replaced with a helper that does the creation of the if/else blocks at the point you're currently creating the QMARK node instead. They exist because it simplifies a bit of handling in the early phases and even allows some constant folding without forcing the block to actually be produced.

move all possible math functions to JIT intrinsics while it should just naturally be handled in the inliner.

Most compilers do this for the fundamental math APIs like this, we're not special and even have stricter timing requirements which means there is more reason for us to handle such APIs intrinsically as it significantly reduces the amount of work the JIT has to do. The inliner is always going to have limitations and there's no reason to contribute to its pessimizations for these foundational APIs everything else is built on.

Literally 99% of diffs are in libraries_tests.run.

This is the case with most diffs, as libraries_tests is a significant amount of code compared to the others. That's why I explicitly called out the hits we also have in other areas. But that's also representative of this code being "hot" and common to code that most apps and binaries end up using, even if indirectly.

My opinion we should just either look why e.g. in such a small method we don't inline it without AggressiveInlining on it:

I don't think this is a fix and it just pushes things down the road more.

The general problem here is already known as well and we've discussed it many times, which is the inliner has a budget and it means we still give up on trivially getters, setters, and other direct calls like this.

AggressiveInlining should even do nothing here because we are below the skipBudgetChecksSize (currently 12):

L_0000: ldarg.0
IL_0001: call int32 [System.Runtime]System.Numerics.BitOperations::Log2(uint32)
IL_0006: ret

But it does give up anyways, because we have dozens of heuristics and other checks for something that ultimately looks expensive but really isn't. We skip all those checks, all the eating of the budget, and all the expensive handling by treating these key APIs as intrinsic. It completely bypasses the other expensive transforms and checks which ultimately simplifies what the JIT has to do, by adding a few lines to the importer to handle the known special scenarios.

@EgorBo
Copy link
Copy Markdown
Member

EgorBo commented May 29, 2026

I am doubtful that QMARK will ever fully go away as they have a significant benefit

Many phases between importer and global morph just bail on QMARKs (e.g. escape analysis), also, it introduces a very unobvious execution order (esp. nested qmarks) in trees all phases have to keep in mind, so removing it eventually will be a nice simplification. I am not saying it's bad, just needs a justification. E.g. I almost removed QMARKs for early cast expansion, my next goal is to remove the entire importervectorization.cpp, move it to a late phase.

The inliner is always going to have limitations

Why don't we intrinsify everything then, where is the line? Is Log2 that important for .NET users? I think we need to continue investing into improving inliner instead. Making it much more reliable for things we care about. Are you going to continue pushing various small methods as JIT intrinsics?

This is the case with most diffs, as libraries_tests is a significant amount of code compared to the others.

All I see is a lot of pretty trivial tests for Log2 there which are now e.g. folded into constants in Tier0 thanks to this change.

The general problem here is already known as well and we've discussed it many times, which is the inliner has a budget and it means we still give up on trivially getters, setters, and other direct calls like this.

I do think we need to stop using it as an excuse. I agree on importing things as special opcodes so then JIT can emit those opcodes as part of other transformations and rely on proper expansion - that made sense to me, but it's no the case here as we just mimic inliner's work, not more than that.
Another downside of all of these intrinsics is that we continue adding binary size to jit while the original C# code can be trimmed.

AggressiveInlining should even do nothing here because we are below the skipBudgetChecksSize (currently 12):

L_0000: ldarg.0
IL_0001: call int32 [System.Runtime]System.Numerics.BitOperations::Log2(uint32)
IL_0006: ret
But it does give up anyways, because we have dozens of heuristics and other checks for something that ultimately looks expensive but really isn't

I think we need to study why we give up, it might be something fundamental like abstract generic resolution in importer.
Or you just copied a diff from Tier0 that is not inlined. What is the Tier1 method we should look at?

Comment thread src/coreclr/jit/importercalls.cpp
@tannergooding
Copy link
Copy Markdown
Member Author

tannergooding commented May 29, 2026

Many phases between importer and global morph just bail on QMARKs (e.g. escape analysis), also, it introduces a very unobvious execution order (esp. nested qmarks) in trees all phases have to keep in mind, so removing it eventually will be a nice simplification. I am not saying it's bad, just needs a justification. E.g. I almost removed QMARKs for early cast expansion, my next goal is to remove the entire importervectorization.cpp, move it to a late phase.

My point here was rather that we have a fundamental need to introduce IR that represents x = cond ? throw : y and right now that is handled via QMARK. Even if we remove QMARK, it will just have to be replaced by something else that puts in the relevant blocks directly instead.

So even if they aren't the best IR today, the code is logically correct and will maintain roughly the current shape even when QMARKs disappear.

Why don't we intrinsify everything then, where is the line? Is Log2 that important for .NET users?

Because intrinsifying everything is both impossible and a negative. Most methods are not foundational in that sense and it would effectively cause "too much inlining", regress throughput, regress codegen size, etc.

Log2 is such a foundational helper, it and many of the APIs on the primitive types are the building blocks for all the other algorithms. It is an API that most compilers explicitly recognize and handle as intrinsic in some fashion, accordingly.

I think we need to continue investing into improving inliner instead. Making it much more reliable for things we care about.

I agree, but we also know this is much more complex work and that it will always have limits. So for foundational cases that are common to other compilers and where we can avoid some of the redundant work, it makes sense.

Are you going to continue pushing various small methods as JIT intrinsics?

The short answer is not really, fixing the existing Log2 handling really rounds out the set of them.

I'm not and have never been looking to intrinsify the world, only the key foundational math APIs like this one, i.e. the building blocks for the rest of .NET. Most of these were already handled and Log2 was one where we notably had handling but it was never firing because we were only checking TYP_INT and not TYP_UINT.

Of the foundational math APIs that aren't intrinsified, we only have integral Abs, Log10, Max, and Min. However, these likely aren't ever going to get handling because they aren't common to other compilers, don't have trivial hardware acceleration available, or do not have the size/ir complexity that other intrinsics have.

At best we might consider having Max/Min directly import as GT_CONDITIONAL to save on inlining and converting to those anyways, but then we really need the JIT to have better GT_CONDITIONAL support in the first place and because there isn't the size/ir complexity that other APIs have and we also want DPGO to work, I find it much less likely we do that.

All I see is a lot of pretty trivial tests for Log2 there which are now e.g. folded into constants in Tier0 thanks to this change.

Does the FullOpts metric include T0? If it does we should probably split that out so we can actually differentiate MinOpts vs T0 vs T1 vs FullOpts, I had assumed it was just T1+FullOpts

I had downloaded the full diffs and saw most of the impactful changes in code that was actually proper fullopts or T1, i.e. the asmdiff says ; FullOpts code

One of the more robust cases is:

- ; 93 inlinees with PGO data; 314 single block inlinees; 54 inlinees without PGO data
+ ; 41 inlinees with PGO data; 85 single block inlinees; 11 inlinees without PGO data

going from 611 locals with a frame size of 632 down to 243 locals with a frame size of 104 and from 10112 bytes of codegen down to 1847 bytes of codegen

But for non tests it improves the codegen around string formatting for all integer primitives (via CountDigits and CountHexDigits), array sorting and the various Sorted* collections, some BigInteger handling particularly around ModPow, etc

This is also why there are so many triggers in tests, because Log2 is used in APIs that most apps, even trivial ones, end up using, so it triggers for a lot of code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants