New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import KEEPALIVE(BOX)
as KEEPALIVE(ADDR(TEMP))
#54412
Conversation
1d83882
to
e05bf6c
Compare
For more context, this is for a mechanism to implement |
Running the diffs, there were some hits (to my surprise, I have to say). The test hit has been fixed in one of the commits. Another case is BindSasl, where there is a struct being kept alive, in all apparency so that freeing it in the |
cc: @dotnet/jit-contrib |
@sandreenko @dotnet/jit-contrib PTAL community PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more context, this is for a mechanism to implement KeepAlive semantics for user-defined types with custom marshalling (like structs with delegate fields) in the DllImportGenerator project without allocating.
was there a discord discussion or something about it?
public byte AnotherByteField; | ||
|
||
[FieldOffset(24)] | ||
public SimpleStructWithExplicitLayout StructWithRef; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you can make an intersection and declare two object fields with the same offset if you want.
src/coreclr/jit/importer.cpp
Outdated
{ | ||
if (layout->IsGCPtr(slot)) | ||
{ | ||
// This is a very verbose way of saying gtNewLclFldNode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why importation was chosen for this transformation? Won't it work better after implicit byrefs resolution somewhere in morph? Are these IND(ADD(ADDR(LCL_VAR), OFFSET)
always transformed as GT_LCL_FLD
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why importation was chosen for this transformation?
Primary reason is because that's where the original code was. I can think of a few reasons that make it beneficial to do in import:
- Adding statements is "natural", and
KEEPALIVE
s require new statements (except if one usesCOMMA
s I suppose, but I'd think it is better not to use them if we can). - Boxes are "fresh", i. e. there is little chance that some subsequent transform changed their shape somehow.
- Implicit byrefs and normal locals are "the same" (no need to write code that handles both - though perhaps that code is trivial enough for this not to matter).
However, I am in now way as experienced as you in this area. What would be the advantage of doing it late(er)?
Are these
IND(ADD(ADDR(LCL_VAR), OFFSET)
always transformed asGT_LCL_FLD
?
For locals, yes. I expect most of these structs will be implicit byrefs though, for them it turns into IND(ADD(LCL_VAR byref, offset))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implicit byrefs and normal locals are "the same" (no need to write code that handles both - though perhaps that code is trivial enough for this not to matter).
for implicit byrefs do we need to do anything? are not they reported as alive from the parent method, for example:
struct S // more than 16 bytes, so implicit byref
{
long a, b, c;
}
void Foo()
{
S s = new S();
Do(s); // <- implicit byref, s is on the stack and reported live for this line
}
void Do(S s) // <- s is implicit byref so we don't care about its liveness because the caller is reporting it.
{
CallUnmanagedApiWithSuppressGCTransition(s); // no need to report s as live in Do, it will be reported in Foo.
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for implicit byrefs do we need to do anything?
Sorry for not being explicit. I agree there are liveness concerns here, it is just convenient that the implicit byrefs haven't been expanded yet, so the copy elision code (the if (OperIsLocal()) { reuse the existing local }
branch) "just works".
On that note this copy elision is actually bugged, it must be if (OperIs(GT_LCL_VAR))
.
src/coreclr/jit/importer.cpp
Outdated
GenTree* field = gtNewIndir(TYP_REF, fieldAddr); | ||
GenTree* keepalive = gtNewKeepAliveNode(field); | ||
|
||
Statement* stmt = gtNewStmt(keepalive, impCurStmtOffs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect such structs to always be on the stack (because they have GC fields and used as a whole, so no independent promotion) so we don't have precise fields' liveness for them, we do zero-init and always report them as alive, don't we?
So my theory for such test:
S s = struct with GC fields;
nativeCallThatTakesStructAndDoesSomethingWithGCField(s);
GC.KeepAlive(s);
it will currently work even if you delete GC.KeepAlive(s)
just because the struct S
will be allocated on the stack and will be reported as live for the whole method. The only exception, in theory, could be a struct with 1 byref field
struct S
{
Object o;
}
in this case Jit can replace the struct with its field and do precise liveness for it.
Am I correct and it will work even without KeepAlive ?
However, the previous example is just a current limitation.
What I don't understand is why we can't do this:
if (layout->HasGCPtr())
{
GenTree* boxTemp = gtNewLclvNode(boxTempNum, boxSrc->TypeGet());
GenTree* boxSrcAddr = gtNewOperNode(GT_ADDR, TYP_BYREF, boxTemp);
GenTree* keepalive = gtNewKeepAliveNode(boxSrcAddr);
addStmt.
}
and keep one GT_KEEPALIVE for the stuct, not for each GC field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct and it will work even without KeepAlive ?
Some ad-hoc testing for me says yes, larger structs survive, while the "one object" one dies.
What I don't understand is why we can't do this:
Put simply I didn't know it could be done that way (I was also trying to emulate how the manual KeepAlive
on each field would be imported, since that's known to work, presumably, and the only testing we'll ever have for this optimization is the little test I wrote here).
I suppose one possible concern here is from the following scenario.
Say the source for the box is an implicit byref. Let's also presume that the method we are compiling is a reverse PInvoke one, so the location of the actual shadow copy will be in an unmanaged frame. Let's also presume that the user code is keeping the object
fields of the struct in question alive via some other means (and they're pinned, obviously).
Now, in the method, the user code does this:
// Edit: UnmanagedCallersOnly only allows blittable types, but let's assume `MyStruct`
// consists of `void*`s and we `Unsafe.As` the reference to it as one to a struct with `object`s
void CallerIsUnmanaged(MyStruct myStructWithRefFields)
{
TurnOffGlobalKeepAliveForMyStructFields(); // Clears static fields or whatever. Keeps the fields pinned.
CallUnmanagedApiWithSuppressGCTransition(&myStructWithRefFields); // No transition so no spilling.
GC.KeepAlive(myStructWithRefFields);
}
Do we risk not keeping the fields alive or not?
This is a completely made-up scenario from me, and I do not actually know if it would work or not (even with the current version of the code). I will test it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So after thinking about this and some testing, I've come to the conclusion that the concern that fields in structs in foreign frames won't be "kept alive" is "real", but also per-existing and not something that much can (should) be done about.
Structs in the frame of the method being compiled will be kept alive by virtue of being address-exposed if we take the address of the whole struct, while implicit byrefs will be kept alive by their owner method.
In terms of CQ, the current approach of field-by-field importation produces sequences of dummy loads into a register like mov rax, [fld1]
, mov rax, [fld2]
, etc. Address-exposing the struct will avoid that, but will obviously impede other possible optimizations, though presumably that will matter little. It does seem a little weird to generate a dummy use of a byref just so that the struct is marked address-exposed, thus indirectly making it live for the whole method, but perhaps that's by design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I am a little behind, how do we get address exposed
set for the struct? Do we set address exposed
when using your current approach with fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I am a little behind, how do we get address exposed set for the struct? Do we set address exposed when using your current approach with fields?
So if we use the address of the whole struct, it is as if we passed it to a call as an argument - that is why we'll have it address-exposed.
Do we set address exposed when using your current approach with fields?
It appears so...
LocalAddressVisitor
sets it, and only later does morph fold it back to a LCL_FLD
. I initially got confused between the two :(. Given this fact, I see no advantage to keeping the field approach and will use the whole struct as you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LocalAddressVisitor sets it, and only later does morph fold it back to a LCL_FLD. I initially got confused between the two :(. Given this fact, I see no advantage to keeping the field approach and will use the whole struct as you suggested.
It is not important when we fold them, because if the whole struct is marked as address exposed we can't do anything with the fields even if they are all in LCL_FLD
form.
So if we use the address of the whole struct, it is as if we passed it to a call as an argument - that is why we'll have it address-exposed.
what if the struct is passed by value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not important when we fold them, because if the whole struct is marked as address exposed we can't do anything with the fields even if they are all in LCL_FLD form.
Yes I understand. I just thought that it was the address visitor that folded them (and didn't set the address exposed flag in doing so), for some reason.
what if the struct is passed by value?
You mean to KEEPALIVE
? That would require codegen support, wouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, if the struct is 8-byte size or 16-byte on x64 unix/arm and it is passed to unmanaged call by value, is it marked as address exposed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, as you say, it is not. I think I misunderstood your original question...
Instead, import KEEPALIVE(BOX) as a sequence of KEEPALIVE for object fields (flattened).
Move the importation to its own method. Add debug output. Rename a few things.
Basic verification of the new optimization.
It relies on a precise GC.
Running diffs for win-x64, there was exactly one result, in this test. GC.KeepAlive here is used as a generic "make sure that there is a use for the variable beyond an assigment" function. The new optimization eliminates this use as the struct in question has no GC fields. Fix the test by using an actual generic "use me" function. Will be broken by advanced interprocediral analysis that is apparently not as far away as one would expect, but will do for now.
It is possible for a GC to occur just after the KeepAlive calls and for CheckSuccess to wrongly conclude the objects weren't kept alive.
Heh, yes, a small one :). @jkoritzinsky asked on the C# discord (#lowlevel channel) if the Jit performed box elision for |
Specifically, I was looking at mechanisms for ensuring that the DllImportGenerator could provide support for keeping delegate fields alive in structs. Due to limited type information in ref assemblies, the generator cannot expect marshaled types to be accurately described. To ensure that all cases where a KeepAlive is needed are covered, the best option is to emit a KeepAlive in all cases. However, the cost of the box makes that not an option. @SingleAccretion was kind enough to take a stab at eliding the box, which would make KeepAlive usable in my scenario. |
If the source is a GT_LCL_FLD, then we would only want too keep alive the fields it encloses (or, since TYP_STRUCT local fields are currently not supported, nothing).
The field-by-field approach results in the struct getting address-exposed anyway, so there is no good reason to bloat the IR. Just have the address escape directly.
5a1ec76
to
9cfb36d
Compare
Since this is a bit of an old branch, I've gone ahead and rebased to as to be able to build with my current runtime fork. |
KEEPALIVE(BOX)
as a sequence of KEEPALIVE(FIELD)
KEEPALIVE(BOX)
as a sequence of KEEPALIVE(ADDR(TEMP))
KEEPALIVE(BOX)
as a sequence of KEEPALIVE(ADDR(TEMP))
KEEPALIVE(BOX)
as KEEPALIVE(ADDR(TEMP))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation looks good to me based on my knowledge of the JIT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks good.
Nice to see another customer for gtTryRemoveBoxUpstreamEffects
.
Hello @jkoritzinsky! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
This is a specialized optimization for a targeted use case, where the intent is to keep the fields of the struct alive across a native call in generated code. Since the generated code does not have access to private fields, it cannot do it on its own without boxing and thus incurring an allocation (and a copy).
The transform itself is quite simple: we import multiple
KEEPALIVE
s, one for each field, via indirections, and insert a copy if necessary. It is identical to the user writingGC.KeepAlive(struct.ObjectField)
themselves (well - not quite - I do not generateGT_FIELD
s, but they do eventually fold to the sameLCL_FLD
s in morph and result in the same codegen).I included a small test to validate the basic functionality, but would appreciate feedback on more interesting scenarios (and ways to make the test itself more robust).