-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert RuntimeHelpers intrinsics to JIT ones #88543
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsSimplifies implementations of those at the cost of a single new JIT-EE API for Further unifies @jkotas This is the change I've mentioned in #88006 (comment)
|
|
||
GenTree* op1Clone; | ||
GenTree* op2Clone; | ||
op1 = impCloneExpr(op1, &op1Clone, CHECK_SPILL_ALL, nullptr DEBUGARG("EnumCompareTo arg1")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is less efficient that the existing implementation for 8-bit and 16-bit enums.
Also, this may be better marked as mustExpand
so that there are no observable behavior differences between the expanded and non-expanded version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing implementation that just called CompareTo on the underlying type feels more robust, less duplicative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is less efficient that the existing implementation for 8-bit and 16-bit enums.
Fixed.
Also, this may be better marked as
mustExpand
so that there are no observable behavior differences between the expanded and non-expanded version.
It's marked as betterExpand
now, which means that it'll expand unless in debug AFAIR. I don't think we should have any behaviour differences between the two paths.
The existing implementation that just called CompareTo on the underlying type feels more robust, less duplicative.
But it also relies on that hardcoded table of tables for ILs for selected types there and it relies more on the inliner. This logic resembles the Mono one where it's expanded directly in the intrinsic too. We could potentially change this to PrimiviteCompareTo
and use it for all primitives in their CompareTo
s or we could just remove the intrinsic and rewrite the code in managed with the Enum.GetUnderlyingType
intrinsic instead (but that could potentially run into inlining issues due to IL size).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's marked as betterExpand now, which means that it'll expand unless in debug AFAIR.
It means that the debug code will allocate vs. release code will not. The performance difference between debug and release is going to be order of magnitude more than what it is today. Performance regressions like this are not acceptable for debug code. It is not unusual to see issues filled on large debug code performance regressions like this by our customers.
relies on that hardcoded table of tables for ILs for selected types there
It is not hard to fix the code so that it works for all enum underlying types.
it relies more on the inliner
The method being inlined is very small. There are thousands of places in this repo where we have small methods to make the code easier to maintain and depend on them being inlined without an issue. Why should this one be different? Enum comparers are not that common for inliner avoidance to make difference.
we could just remove the intrinsic and rewrite the code in managed with the Enum.GetUnderlyingType intrinsic instead (but that could potentially run into inlining issues due to IL size).
Right, this approach would not work well.
...veaot/System.Private.CoreLib/src/System/Runtime/CompilerServices/RuntimeHelpers.NativeAot.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/Comparer.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/RuntimeHelpers.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
@@ -2354,6 +2354,9 @@ class ICorStaticInfo | |||
// Quick check whether the type is a value class. Returns the same value as getClassAttribs(cls) & CORINFO_FLG_VALUECLASS, except faster. | |||
virtual bool isValueClass(CORINFO_CLASS_HANDLE cls) = 0; | |||
|
|||
// Checks if type can be compared for equality as bytes. | |||
virtual bool isBitwiseEquatable(CORINFO_CLASS_HANDLE cls) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed feeling about introducing a new JIT/EE interface method just to support an expansion of one-off BCL method in the JIT. Providing the expansion as IL on the VM is less spread through the system.
This direction makes sense only if we believe that 100% of the IL intrinsics are going to move to the JIT eventually. We would need to have a number of new one-off methods like this to pull it off.
@dotnet/jit-contrib Opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that the this proper recognition of this intrinsic as a constant in the JIT is important since it makes sure it can properly skip importing branches guarded by it, similar to how IsKnownConstant
and IsReferenceOrContainsReferences
are used. It's also tracked for exposal in #75642.
As far as I could see, the only IL intrinsics left in the CoreCLR VM are the Unsafe ones (tracked for removal in #69220), Interlocked.CompareExchange(T)
which can be removed with no impact today since it already has an equivalent managed implementation with Unsafe.As
and Activator.CreateInstance<T>
which seems to be the only problematic one.
Since "bitwise equality" is a somewhat fundamental type property (and could potentially be useful for other optimizations in the JIT), I've considered making this a flag in getClassAttribs
, but I've refrained from doing so since it's I think relatively expensive to fetch and wouldn't be used that often.
However, citing @SingleAccretion from Discord:
As Tanner notes, it may be good to design a more general mechanism to ask questions about facts like that.
For example, an API that takes flags with the "requests" and returns answers to only the questions asked.
maybe it'd make sense to introduce getExtendedClassAttribs(uint32_t mask)
that'd only fetch more rarely used/expensive flags (with a new separate enum for them) like this or #88136 based on the mask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This direction makes sense only if we believe that 100% of the IL intrinsics are going to move to the JIT eventually
Do we have a list of the APIs that would still require moving and how complex they are to handle? That is, are many of them effectively simple flags like this one which are often trivial to lookup and which may get used on generic hot paths or do we have a number of substantially more complex APIs where the benefits of special-casing them in the JIT are more questionable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can properly skip importing branches guarded by it, similar to how IsKnownConstant and IsReferenceOrContainsReferences are used
FWIW, this is a workaround for a deficiency in the current JIT inlining strategy. It would be better to fix it in more general way for everybody out there instead of using it to justify implementing more methods as JIT intrinsics.
Interlocked.CompareExchange(T) which can be removed with no impact today
I expect that we would see performance regressions in shared generic code callsites. It is a workaround for the current shared generic code inlining limitation.
maybe it'd make sense to introduce getExtendedClassAttribs(uint32_t mask)
getExtendedClassAttribs(uint32_t mask)
would make sense if the typically caller needs most of the flags and when there is an efficiency benefit from computing and returning multiple flags together. If the typically caller needs just one flag as is the case for expanding one-off Intrinsics, it is better to have an API that returns the one property directly and avoids the extra level of abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a list of the APIs that would still require moving and how complex they are to handle?
The current CoreCLR and native AOT intrinsics that would required introducing new special new JIT-EE interface APIs. You can generally assume that it is one new JIT/EE interface method for each:
Activaror.CreateInstanceOf<T>
EqualityCompaparer/Comparer.Create<T>
EqualityComparerHelpers.GetComparerForReferenceTypesOnly/StructOnlyEquals
Interlocked.CompareExchange<T>(ref)
RuntimeAugments.GetCanonType
MethodTable.SupportsRelativePointers
System.IO.Stream.HasOverriddenBeginEndRead/HasOverriddenBeginEndReadHasOverriddenBeginEndWrite
We like introducing intrinsics to do various tricks, so this list would grow over time. Also, some of these would require special-handling in the native AOT scanner if they are not IL intrinsics anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this is a workaround for a deficiency in the current JIT inlining strategy. It would be better to fix it in more general way for everybody out there instead of using it to justify implementing more methods as JIT intrinsics.
Even then, having just a temporary workaround is better than not having one.
I expect that we would see performance regressions in shared generic code callsites. It is a workaround for the current shared generic code inlining limitation.
I don't really think that's the case, I think the only difference is the smaller IL size due to no Unsafe.As calls and the inlining boost from being an intrinsic.
runtime/src/coreclr/System.Private.CoreLib/src/System/Threading/Interlocked.CoreCLR.cs
Lines 120 to 128 in 3110693
// Note that getILIntrinsicImplementationForInterlocked() in vm\jitinterface.cpp replaces | |
// the body of the following method with the following IL: | |
// ldarg.0 | |
// ldarg.1 | |
// ldarg.2 | |
// call System.Threading.Interlocked::CompareExchange(ref Object, Object, Object) | |
// ret | |
// The workaround is no longer strictly necessary now that we have Unsafe.As but it does | |
// have the advantage of being less sensitive to JIT's inliner decisions. |
The current CoreCLR and native AOT intrinsics that would required introducing new special new JIT-EE interface APIs. You can generally assume that it is one new JIT/EE interface method for each
I think that it only makes sense to move apis that are: performance sensitive (like the three Is
APIs here), more valid (like the GetMethodTable
one) or ones that have big implementations in both runtimes (like the enum ones). I don't think it makes much sense to have runtime specific, rarely used intrinsics like HasOverriddenBeginEndRead
that'd just call into the VM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having just a temporary workaround is better than not having one.
Can you come up with a real-world looking example where isBitwiseEquatable
expansion in the JIT makes a difference to show that it is worth doing?
more valid (like the GetMethodTable one)
Yes, the GetMethodTable one looks good.
ones that have big implementations in both runtimes (like the enum ones)
The implementations in each runtime are about 30 lines of code each. I would not call that big. The inlining expansion in the JIT is not reducing runtime complexity much overall.
The expansion in the JIT makes the code less maintainable by creating a copy of CompareTo implementation from 10 different .cs files into the JIT. To make this explicit, you would want to add a comment "This logic is duplicated in the JIT in EnumCompare intrinsic expansion" to CompareTo implementation in all Byte.cs, UByte.cs, ...., etc. I do not think it would be a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interlocked.CompareExchange(T) which can be removed with no impact today
I don't really think that's the case, I think the only difference is the smaller IL size due to no Unsafe.As calls and the inlining boost from being an intrinsic.
I don't really think that's the case, I think the only difference is the smaller IL size due to no Unsafe.As calls and the inlining boost from being an intrinsic.
I have done a few quick tests and I agree that the IL intrinsic for Interlocked.CompareExchange does not seem to be making a difference. Feel free to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dotnet/jit-contrib Opinions?
I don't understand the motivation for these changes. It seems like a lot of code churn late in a release cycle with a decent risk of introducing correctness or perf issues.
If we think further unification / simplification here is sufficient motivation, perhaps it can wait until .NET 9?
@@ -1058,7 +1058,7 @@ mini_emit_inst_for_method (MonoCompile *cfg, MonoMethod *cmethod, MonoMethodSign | |||
EMIT_NEW_TEMPLOAD (cfg, ins, span->inst_c0); | |||
return ins; | |||
} else | |||
return NULL; | |||
return emit_runtime_helpers_intrinsics (cfg, cmethod, fsig, args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Half of the runtime helpers is handled inline here by existing code and the other half is in emit_runtime_helpers_intrinsics
now. Move everything to emit_runtime_helpers_intrinsics
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I've planned to unify that but:
a) I wanted to ask the mono maintainers whether i should move everything into that method or out of it
b) I've also wanted to get everything working first with minimal changes
@jkotas @AndyAyersMS I've moved the 3 non-controversial, most impactful intrinsics here to a separate PR #88860 that should be fine for .Net 8 since they are simple, the rest we can leave here up for discussion for .Net 9. |
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
Simplifies implementations of those at the cost of a single new JIT-EE API for
isBitwiseEquatable
.Further unifies
Enum(Equality)Comparer
logic.@jkotas This is the change I've mentioned in #88006 (comment)