-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve static field access for NativeAOT #63620
Conversation
NativeAOT currently goes through a helper call whenever we need to access static fields. For this simple program: ```csharp class Program { static int s1, s2; static int Main() => s1 + s2; } ``` We generate: ```nasm repro_Program__Main: sub rsp,28h call __GetNonGCStaticBase_repro_Program mov edx,dword ptr [rax] add edx,dword ptr [rax+4] mov eax,edx add rsp,28h ret __GetNonGCStaticBase_repro_Program: lea rax,[?__NONGCSTATICS@repro_Program@@] ret ``` It's not terrible, but also could be better. As far as I can see, JitInterface cannot express the exact way static fields are accessed in NativeAOT. I had a previous attempt to use the existing JitInterface facilities in dotnet/corert#5131 (misusing the facilities that exist to support RVA static fields), but it didn't generate "nice" addressing modes and couldn't support GC statics (see the disassembly there). I'm adding a way to do that with "nice" addressing modes. After this change, the above program compiles into: ```nasm repro_Program__Main: mov eax,dword ptr [?__NONGCSTATICS@repro_Program@@] lea rdx,[?__NONGCSTATICS@repro_Program@@] add eax,dword ptr [rdx+4] ret ``` There's room for improvement: * It would be nice if we could similarly inline these lookups when we're in shared generic code * I think the addressing mode could be more compact if RyuJIT generated a reloc with a delta on x64. * It would be nice if we could do this if there's a static constructor. .NET Native could inline the "did static constructor already run?" checks. There was a previous attempt at that in dotnet/coreclr#12420 (and corresponding dotnet/corert#3962).
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsNativeAOT currently goes through a helper call whenever we need to access static fields. For this simple program: class Program
{
static int s1, s2;
static int Main() => s1 + s2;
} We generate: repro_Program__Main:
sub rsp,28h
call __GetNonGCStaticBase_repro_Program
mov edx,dword ptr [rax]
add edx,dword ptr [rax+4]
mov eax,edx
add rsp,28h
ret
__GetNonGCStaticBase_repro_Program:
lea rax,[?__NONGCSTATICS@repro_Program@@]
ret It's not terrible, but also could be better. As far as I can see, JitInterface cannot express the exact way static fields are accessed in NativeAOT. I had a previous attempt to use the existing JitInterface facilities in dotnet/corert#5131 (misusing the facilities that exist to support RVA static fields), but it didn't generate "nice" addressing modes and couldn't support GC statics (see the disassembly there). I'm adding a way to do that with "nice" addressing modes. After this change, the above program compiles into: repro_Program__Main:
mov eax,dword ptr [?__NONGCSTATICS@repro_Program@@]
lea rdx,[?__NONGCSTATICS@repro_Program@@]
add eax,dword ptr [rdx+4]
ret There's room for improvement:
|
// Prevent CSE from adding the offset to the base | ||
op1->gtFlags |= GTF_DONT_CSE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why prevent CSE unconditionally? At least for the IAT_PVALUE
case it seems it could be valuable, for the IAT_VALUE
less so I suppose, except on ARM/64.
Is the issue here that we cannot fold Icon + Offset
and end up CSEing it?
(If so would be worth a comment to that effect; seems like a bug in gtSetEvalOrder
's logic for setting ADDRMODE_NO_CSE
-- perhaps that's what the existing comment meant to say?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was seeing RyuJIT fold the constant handle with the constant and that created an invalid handle (the handle is not an address when we're AOT compiling. It's an actual handle that should not be done constant propagation math on). Could be that this is a bug in CSE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I think I know what the issue is (it should be a VN + assertion propagation bug). Could you open a bug on this and add a TODO-Bug: workaround for <link>
here?
@@ -1649,6 +1649,7 @@ enum CORINFO_FIELD_ACCESSOR | |||
CORINFO_FIELD_STATIC_ADDR_HELPER, // static field accessed using address-of helper (argument is FieldDesc *) | |||
CORINFO_FIELD_STATIC_TLS, // unmanaged TLS access | |||
CORINFO_FIELD_STATIC_READYTORUN_HELPER, // static field access using a runtime lookup helper | |||
CORINFO_FIELD_STATIC_DATASEGMENT, // static field access from the data segment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have documentation above on what the various accessors mean, it would be nice to update it to cover this new value.
(I realize some of it is rather outdated)
@@ -8231,6 +8231,35 @@ GenTree* Compiler::impImportStaticFieldAccess(CORINFO_RESOLVED_TOKEN* pResolvedT | |||
} | |||
break; | |||
|
|||
case CORINFO_FIELD_STATIC_DATASEGMENT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does look a lot like getFieldAddress
with the second argument not set to a dummy value.
I realize it won't work as-is because the Jit assumes getFieldAddress
will always return a direct address today (it should not be hard to fix that), it just does not feel great to have the duplicated functionality.
Edit: on second thought, plumbing this through getFieldAddress
is not that great as it will require getFieldAddress
to become more like getBaseFieldAddress
. So I agree a new accessor is the better way to have things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference from getFieldAddress
is that this gives RyuJIT a base+offset view instead of giving a single symbol that RyuJIT can't do much about (getFieldAddress
returns the address with the base factored in).
It was the feedback from dotnet/corert#5131 (comment) (my earlier attempt) that ended up going the getFieldAddress
path.
else | ||
{ | ||
assert(pFieldInfo->fieldLookup.accessType == InfoAccessType::IAT_PVALUE); | ||
op1 = gtNewIndOfIconHandleNode(TYP_BYREF, (size_t)pFieldInfo->fieldLookup.addr, GTF_ICON_GLOBAL_PTR, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GTF_ICON_GLOBAL_PTR
,isInvariant: false
What is the scenario where the address of the static can change?
Note that some code in the Jit depends on the addresses of statics being invariant. It will not matter for this change now, I believe, but I have changes in progress that will make this dependency more visible.
(To be clear: mutable addresses can absolutely be made to work, I just need to understand the scenarios I will need to add as tests to account for them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gtNewIndOfIconHandleNode asserts if GTF_ICON_GLOBAL_PTR is used with isInvariant = true. This was just me hammering on it until it worked. Maybe we need a new GTF_
value for this? I'm not particularly experienced in the RyuJIT code base; I know just enough to CTRL-C/CTRL-V from various places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a new handle kind would be the best solution (GTF_STATIC_ADDR_PTR
say...). Using GTF_ICON_CONST_PTR
will work too but is less desirable. When adding a new handle kind, please also add printing for it to cTreeFlags
and gtDispConst
.
Why a new handle kind?
A "raw" GTF_ICON_CONST_PTR
will work as an invariant indirection, but a dedicated handle kind will allow VN to treat it specially (and more precisely):
// We will have staticPtr = IND(STATIC_ADDR_PTR) and give "staticPtr" a "VNF_PtrToStatic" VN.
ref var staticPtr = ref staticField;
staticPtr = { ... }; // The compiler will be able to see that this store is just "staticField = { ... }"
It will be very similar to GTF_ICON_STATIC_BOX_PTR
.
@jkotas Would there be any advantage for CoreCLR to also give RyuJIT visibility into base+offset form of static fields? I'm wondering whether we should just do something with getFieldAddress and then use the same codepath for both JIT and AOT. |
It would not hurt to have the JIT/EE interface setup for this. It is done today using heuristic for JIT-case only - see #39096. I agree with @SingleAccretion that the new |
I think taking the same codepaths as in the JITted case will be a lot more feasible once we get rid of GT_VAR as outlined in #63845. At that point we can construct the right tree using Right now we kind of lose the addr+offset information when we make a GT_CLS_VAR (and then we need to query it with getFieldAddress`). So I'll just wait with this for that. |
Don't have time for this now. Tracked in #64242. |
NativeAOT currently goes through a helper call whenever we need to access static fields. For this simple program:
We generate:
It's not terrible, but also could be better. As far as I can see, JitInterface cannot express the exact way static fields are accessed in NativeAOT. I had a previous attempt to use the existing JitInterface facilities in dotnet/corert#5131 (misusing the facilities that exist to support RVA static fields), but it didn't generate "nice" addressing modes and couldn't support GC statics (see the disassembly there).
I'm adding a way to do that with "nice" addressing modes. After this change, the above program compiles into:
There's room for improvement: