New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Fold typeof(T).TypeHandle.Value #85804
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsIt turns out to be an easy fix with all the "frozen" infra we've built previously: static nint Test()
{
return typeof(int).TypeHandle.Value;
} emits: ; Method Test():long
mov rax, 0x7FFF9EC7E7E8
ret
; Total bytes of code: 11 Closes #5973, might slighly improve GC.Allocate* APIs, see here.
|
int[] Foo()
{
return GC.AllocateUninitializedArray<int>(10000, true);
} codegen: ; Method Program:Foo():int[]:this
sub rsp, 40
mov rcx, 0x7FF7A4FAA2A8 ;;; type handle
mov r8d, 80 ;;; gc flags
mov edx, 0x2710 ;;; 10000 length
call System.GC:AllocateNewArray(long,int,int):System.Array ;;; Internal call
nop
add rsp, 40
ret
; Total bytes of code: 36
|
Can we make it not close this? I think this will be awkward to do like this for NativeAOT even if RuntimeTypes become frozen because there's still a reloc to deal with. .NET Native did optimize this and it was useful, especially because type handle != RuntimeType for .NET Native or NativeAOT. We did it at the IL level by just patternmatching and replacing the whole sequence of ldtoken/GetTypeFromHandle/get_Handle with ldtoken. |
Sure thing! Changed wording |
Current codegen on NativeAOT for ; Method Program:Test():long
sub rsp, 40
lea rcx, [(reloc 0x4000000000420450)]
call CORINFO_HELP_TYPEHANDLE_TO_RUNTIMETYPE
mov rcx, rax
mov rax, qword ptr [rax]
call [rax+D0H]System.Type:get_TypeHandle():System.RuntimeTypeHandle:this
nop
add rsp, 40
ret
; Total bytes of code: 34 |
@jkotas PTAL, alternatively we can walk all fields in |
I think it would be better to do this optimization as intrinsic pattern match - to handle the shared generic case, the native AOT case (and the unloadable assemblies case). Is the pattern match difficult to do for some reason? |
I assume yes? I did this change for delegates and just noticed that it also handles this case - it is effectively 1 line of change + clean up |
for the intrinsic did you mean @GrabYourPitchforks's branch with GrabYourPitchforks@7a6e22e ? Any reason it didn't land as a PR (and is it enough?) |
Yes, something like, but do that for methods on the public surface so that all code out there that wants to convert type to unmanaged pointer can benefit, it is not limited to the internal methods in CoreLib. |
It seems like this PR is still needed then. E.g. it can handle
while importer-level intrinsic will give up here |
It does not hurt, but I would be surprised if you can find pattern where it helps in any code out there (after the importer expansion is done). |
274b8e9
to
52065e1
Compare
@jkotas What we have after import:
I'm a bit struggling to understand how I can fold this in JIT for CoreCLR without what I have in this PR already - do we want to intrinsify two methods basically (
to
It should be easier for NativeAOT since |
CoreCLR: public unsafe partial struct RuntimeTypeHandle : IEquatable<RuntimeTypeHandle>, ISerializable
{
internal RuntimeType m_type; // gc object
public IntPtr Value => m_type?.m_handle ?? 0;
} NativeAOT: public unsafe struct RuntimeTypeHandle : IEquatable<RuntimeTypeHandle>, ISerializable
{
internal IntPtr RawValue
{
get
{
return _value;
}
}
private IntPtr _value;
} So my suggestion is - intrinsify only NativeAOT's |
We sort of have intrinsics on CoreCLR for all this already - Ideally, the intrinsic expansion would kick for all cases where we can figure out the raw type.
Is the spill into local before |
Pushed a change for NativeAOT, now codegen is: lea rax, [(reloc 0x4000000000421030)] when we know the class handle in JIT, and this for runtime lookup: mov rax, qword ptr [rdx] So from my understanding we're only left with runtime lookup case on CoreCLR and unloadble contexts (ALC). Sadly, in the BCL we have just one use-case (
We don't have a good liveness analysis in the importer so we'll have to delay this transformation to e.g. morph (after forward sub). |
lvaSetStruct(structLcl, sig->retTypeClass, false); | ||
GenTree* realHandle = op1->AsCall()->gtArgs.GetUserArgByIndex(0)->GetNode(); | ||
GenTreeLclFld* handleFld = gtNewLclFldNode(structLcl, realHandle->TypeGet(), 0); | ||
GenTree* asgHandleFld = gtNewAssignNode(handleFld, realHandle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to create this as CORINFO_HELP_TYPEHANDLE_TO_RUNTIMETYPEHANDLE
helper call that takes the first arg of op1
as an input?
We have other places in the JIT that look for CORINFO_HELP_TYPEHANDLE_TO_RUNTIMETYPEHANDLE
. I do not think they will be able to recognize this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably makes sense but we'll need to fold it back to a constant somewhere later, I tried to do this and hit no diffs (I'm running ILC for a complex app) so maybe we better teach jit to recognize raw handles if we ever find a use case?
Presumably, there is nothing JIT can do additionally for typeof(T).TypeHandle.Value;
if it's compared against a different type handle - it will be folded (if possible) anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The canonical shape for creating RuntimeTypeHandle from raw handle used in other places is CORINFO_HELP_TYPEHANDLE_TO_RUNTIMETYPEHANDLE
helper. I think we should stick to canonical shapes where possible, to allow pattern matching optimizations to compose.
Cleaning up CORINFO_HELP_TYPEHANDLE_TO_RUNTIMETYPEHANDLE
in later stages applies to all places that use canonical shape for creating RuntimeTypeHandle from raw handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the change you wanted to see EgorBo@55279bc
It seems to be not needed for CoreCLR since we have to wrap RuntimeType object and I've ran SPMI searching for CORINFO_HELP_TYPEHANDLE_TO_RUNTIMETYPEHANDLE(class handle)
and found 0 cases.
I spent some time trying to find a case where this helps and wasn't able to do so, I only see a worse codegen (a stack spill I can't explain) - my understanding that we promote structs right after importer so it's better to expand them there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that the right way to do this optimization is without any AOT special-casing and without RuntimeType
special-casing in getObjectContent
. I think you are saying that it is complicated and not worth it, and that you would prefer to either do nothing or do some of the special casing.
@dotnet/jit-contrib Any opinions about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I just feel that the energy is better to spent elsewhere since:
- Does NativeAOT needs this optimization? It doesn't look so since GC.NativeAOT.cs doesn't use it (although. it's fully implemented in this PR including shared generics)
- Is this pattern popular? Only one hit in
GC.CoreCLR.cs
in our repo + a few hits in OSS judging by grep.app
So we only miss the shared generic case on CoreCLR and unloadable ALC (on CoreCLR). To support both of them we need to mark two methods with [Intrinsic] (Type.get_TypeHandle
and RuntimeTypeHandle.get_Value
(before it is inlined) - in this case we also will need some hacky way to work with a spilled struct RuntimeTypeHandle
before we can detect the shape from two intrinsics.
STMT00000 ( 0x000[E-] ... 0x00F )
[000003] I-C-G------ * CALL struct System.RuntimeType:get_TypeHandle():System.RuntimeTypeHandle:this (exactContextHnd=0x00007FF7A7BA11C9)
[000002] --C-G------ this \--* CALL help ref CORINFO_HELP_TYPEHANDLE_TO_RUNTIMETYPE
[000000] H---------- arg0 \--* CNS_INT(h) long 0x7ff7a7cfa2a8 class
***** BB01
STMT00001 ( 0x000[E-] ... ??? )
[000006] -AC-------- * ASG struct (copy)
[000005] D------N--- +--* LCL_VAR struct<System.RuntimeTypeHandle, 8> V01 loc0
[000004] --C-------- \--* RET_EXPR struct(for [000003])
***** BB01
STMT00002 ( 0x010[E-] ... 0x017 )
[000008] I-C-G------ * CALL long System.RuntimeTypeHandle:get_Value():long:this (exactContextHnd=0x00007FF7A7D21919)
[000007] ----------- this \--* LCL_ADDR byref V01 loc0 [+0]
***** BB01
STMT00003 ( 0x010[E-] ... ??? )
[000010] --C-------- * RETURN long
[000009] --C-------- \--* RET_EXPR long (for [000008])
(as a proof that it is not trivial to do anything with this in importer - we need:
- make sure the local is single-use
- find its def
- make sure the def is "before" use (which is possible)
this sort of things we avoid doing in importer
We can sort of do this after importer and ForwardSub but we'll need to disable inlining for them which is not great either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case we also will need some hacky way to work with a spilled struct RuntimeTypeHandle
If it helps, I would be ok with ignoring RuntimeTypeHandle.get_Value
and tell people to use RuntimeTypeHandle.ToIntPtr
for best results. We have added RuntimeTypeHandle.ToIntPtr/FromIntPtr
family of APIs recently, as a managed equivalent of some Mono embedding APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case we also will need some hacky way to work with a spilled struct RuntimeTypeHandle
If it helps, I would be ok with ignoring
RuntimeTypeHandle.get_Value
and tell people to useRuntimeTypeHandle.ToIntPtr
for best results. We have addedRuntimeTypeHandle.ToIntPtr/FromIntPtr
family of APIs recently, as a managed equivalent of some Mono embedding APIs.
Yes, that one is at least do-able in importer, I've just added it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some special spill avoidance in the importer to make recognizing these kinds of multi-call patterns easier; you could extend this if needed.
runtime/src/coreclr/jit/importercalls.cpp
Lines 1414 to 1442 in 7d54c05
// For non-candidates we must also spill, since we | |
// might have locals live on the eval stack that this | |
// call can modify. | |
// | |
// Suppress this for certain well-known call targets | |
// that we know won't modify locals, eg calls that are | |
// recognized in gtCanOptimizeTypeEquality. Otherwise | |
// we may break key fragile pattern matches later on. | |
bool spillStack = true; | |
if (call->IsCall()) | |
{ | |
GenTreeCall* callNode = call->AsCall(); | |
if ((callNode->gtCallType == CT_HELPER) && (gtIsTypeHandleToRuntimeTypeHelper(callNode) || | |
gtIsTypeHandleToRuntimeTypeHandleHelper(callNode))) | |
{ | |
spillStack = false; | |
} | |
else if ((callNode->gtCallMoreFlags & GTF_CALL_M_SPECIAL_INTRINSIC) != 0) | |
{ | |
spillStack = false; | |
} | |
} | |
if (spillStack) | |
{ | |
impSpillSideEffects(true, CHECK_SPILL_ALL DEBUGARG("non-inline candidate call")); | |
} | |
} | |
} |
Interesting, there is a test that is written in IL that changes |
I have submitted #85861 for this part. |
We can say "fixes #5973" again. The |
Can we also include the simplification of AllocateUninitializedArray in this change? |
public virtual RuntimeTypeHandle TypeHandle => throw new NotSupportedException(); | ||
public virtual RuntimeTypeHandle TypeHandle | ||
{ | ||
#if NATIVEAOT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh. I thought that the native AOT special casing is not going to be needed. Do we really need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not needed for RuntimeTypeHandle.ToIntPtr()
but it is still needed for old pattern typeof(int).TypeHandle.Value;
as @MichalStrehovsky has just mentioned that this pattern is used in COM so I decided to get it back. If we recommend users to always use RuntimeTypeHandle.ToIntPtr
I guess we can remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not needed for RuntimeTypeHandle.ToIntPtr() but it is still needed for old pattern typeof(int).TypeHandle.Value
RuntimeTypeHandle.ToIntPtr
pattern is RuntimeTypeHandle.ToIntPtr(typeof(int).TypeHandle)
. What makes it work without get_TypeHandle
being treated as intrinsic?
In any case, could you please delete the ifdef around the attribute, and centralize the differences in the JIT? Intrinsic annotations are not meant to be 100% precise. We have number of them where the method is treated as Intrinsic on one runtime flavor, but not on the other runtime flavor. We do not encode this detail using ifdefs in .cs files. (In other words, everything should work fine if all CoreLib methods are reported to the JIT as intrinsics.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RuntimeTypeHandle.ToIntPtr pattern is RuntimeTypeHandle.ToIntPtr(typeof(int).TypeHandle). What makes it work without get_TypeHandle being treated as intrinsic?
For RuntimeTypeHandle.ToIntPtr
we don't have to expand get_TypeHandle
we only need to know that the arg of ToIntPtr
is a "non-expanded get_TypeHandle
" call (if it is expanded (only on NAOT) - not a big deal, it will be folded naturally since ToIntPtr
is a simple method that returns struct's value so it will be inlined as is.
In any case, could you please delete the ifdef around the attribute, and centralize the differences in the JIT?
Sure
should work fine if all CoreLib methods are reported to the JIT as intrinsics.
I bet it's not the case today as many existing intrinsic expansions just assume (via asserts) that they work with a specific overload
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typeof(int).TypeHandle.Value; as @MichalStrehovsky has just mentioned that this pattern is used in COM so I decided to get it back
The pattern in COM generators is typeof(Foo).TypeHandle
without the Value part:
runtime/src/libraries/System.Runtime.InteropServices/gen/ComInterfaceGenerator/ComClassGenerator.cs
Line 138 in 5eef91d
// details = StrategyBasedComWrappers.DefaultIUnknownInterfaceDetailsStrategy.GetIUnknownDerivedDetails(typeof(<ifaceName>).TypeHandle); |
src/coreclr/jit/importercalls.cpp
Outdated
@@ -2589,7 +2590,7 @@ GenTree* Compiler::impIntrinsic(GenTree* newobjThis, | |||
case NI_System_Runtime_CompilerServices_RuntimeHelpers_IsKnownConstant: | |||
|
|||
// We need these to be able to fold "typeof(...) == typeof(...)" | |||
case NI_System_RuntimeTypeHandle_ToIntPtr: | |||
case NI_System_RuntimeType_get_TypeHandle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NI_System_RuntimeType_get_TypeHandle
should not be involved in typeof(...) == typeof(...)
folding.
Should NI_System_RuntimeType_get_TypeHandle
rather be in the lightweight intrinsics set below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for bearing with me!
Somebody from @dotnet/jit-contrib should signoff on the JIT changes.
lvaSetStruct(structLcl, sig->retTypeClass, false); | ||
GenTree* realHandle = op1->AsCall()->gtArgs.GetUserArgByIndex(0)->GetNode(); | ||
GenTreeLclFld* handleFld = gtNewLclFldNode(structLcl, realHandle->TypeGet(), 0); | ||
GenTree* asgHandleFld = gtNewAssignNode(handleFld, realHandle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some special spill avoidance in the importer to make recognizing these kinds of multi-call patterns easier; you could extend this if needed.
runtime/src/coreclr/jit/importercalls.cpp
Lines 1414 to 1442 in 7d54c05
// For non-candidates we must also spill, since we | |
// might have locals live on the eval stack that this | |
// call can modify. | |
// | |
// Suppress this for certain well-known call targets | |
// that we know won't modify locals, eg calls that are | |
// recognized in gtCanOptimizeTypeEquality. Otherwise | |
// we may break key fragile pattern matches later on. | |
bool spillStack = true; | |
if (call->IsCall()) | |
{ | |
GenTreeCall* callNode = call->AsCall(); | |
if ((callNode->gtCallType == CT_HELPER) && (gtIsTypeHandleToRuntimeTypeHelper(callNode) || | |
gtIsTypeHandleToRuntimeTypeHandleHelper(callNode))) | |
{ | |
spillStack = false; | |
} | |
else if ((callNode->gtCallMoreFlags & GTF_CALL_M_SPECIAL_INTRINSIC) != 0) | |
{ | |
spillStack = false; | |
} | |
} | |
if (spillStack) | |
{ | |
impSpillSideEffects(true, CHECK_SPILL_ALL DEBUGARG("non-inline candidate call")); | |
} | |
} | |
} |
Co-authored-by: Andy Ayers <andya@microsoft.com>
For some reason I can't reply directly on your comment on github 😐 |
It turns out to be an easy fix with all the "frozen" infra we've built previously:
emits:
Addresses #5973 for CoreCLR, might slighly improve GC.Allocate* APIs, see here.
PS: not relevant for NativeAOT since type objects are not frozen
UPD: handles NativeAOT as well:
was:
now: