Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import KEEPALIVE(BOX) as KEEPALIVE(ADDR(TEMP)) #54412

Merged
13 commits merged into from Sep 3, 2021

Conversation

SingleAccretion
Copy link
Contributor

@SingleAccretion SingleAccretion commented Jun 18, 2021

This is a specialized optimization for a targeted use case, where the intent is to keep the fields of the struct alive across a native call in generated code. Since the generated code does not have access to private fields, it cannot do it on its own without boxing and thus incurring an allocation (and a copy).

The transform itself is quite simple: we import multiple KEEPALIVEs, one for each field, via indirections, and insert a copy if necessary. It is identical to the user writing GC.KeepAlive(struct.ObjectField) themselves (well - not quite - I do not generate GT_FIELDs, but they do eventually fold to the same LCL_FLDs in morph and result in the same codegen).

I included a small test to validate the basic functionality, but would appreciate feedback on more interesting scenarios (and ways to make the test itself more robust).

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 18, 2021
@jkoritzinsky
Copy link
Member

For more context, this is for a mechanism to implement KeepAlive semantics for user-defined types with custom marshalling (like structs with delegate fields) in the DllImportGenerator project without allocating.

@SingleAccretion SingleAccretion marked this pull request as ready for review June 19, 2021 07:04
@SingleAccretion
Copy link
Contributor Author

SingleAccretion commented Jun 19, 2021

Running the diffs, there were some hits (to my surprise, I have to say). The test hit has been fixed in one of the commits.

Another case is BindSasl, where there is a struct being kept alive, in all apparency so that freeing it in the finally block is ok...

@jkoritzinsky
Copy link
Member

cc: @dotnet/jit-contrib

@JulieLeeMSFT
Copy link
Member

@sandreenko @dotnet/jit-contrib PTAL community PR.

@terrajobst terrajobst added the community-contribution Indicates that the PR has been added by a community member label Jul 19, 2021
Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more context, this is for a mechanism to implement KeepAlive semantics for user-defined types with custom marshalling (like structs with delegate fields) in the DllImportGenerator project without allocating.

was there a discord discussion or something about it?

public byte AnotherByteField;

[FieldOffset(24)]
public SimpleStructWithExplicitLayout StructWithRef;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you can make an intersection and declare two object fields with the same offset if you want.

{
if (layout->IsGCPtr(slot))
{
// This is a very verbose way of saying gtNewLclFldNode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why importation was chosen for this transformation? Won't it work better after implicit byrefs resolution somewhere in morph? Are these IND(ADD(ADDR(LCL_VAR), OFFSET) always transformed as GT_LCL_FLD?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why importation was chosen for this transformation?

Primary reason is because that's where the original code was. I can think of a few reasons that make it beneficial to do in import:

  1. Adding statements is "natural", and KEEPALIVEs require new statements (except if one uses COMMAs I suppose, but I'd think it is better not to use them if we can).
  2. Boxes are "fresh", i. e. there is little chance that some subsequent transform changed their shape somehow.
  3. Implicit byrefs and normal locals are "the same" (no need to write code that handles both - though perhaps that code is trivial enough for this not to matter).

However, I am in now way as experienced as you in this area. What would be the advantage of doing it late(er)?

Are these IND(ADD(ADDR(LCL_VAR), OFFSET) always transformed as GT_LCL_FLD?

For locals, yes. I expect most of these structs will be implicit byrefs though, for them it turns into IND(ADD(LCL_VAR byref, offset)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implicit byrefs and normal locals are "the same" (no need to write code that handles both - though perhaps that code is trivial enough for this not to matter).

for implicit byrefs do we need to do anything? are not they reported as alive from the parent method, for example:

struct S // more than 16 bytes, so implicit byref
{
 long a, b, c;
}

void Foo()
{
  S s = new S();
  Do(s); // <- implicit byref, s is on the stack and reported live for this line
}

void Do(S s) // <- s is implicit byref so we don't care about its liveness because the caller is reporting it.
{
  CallUnmanagedApiWithSuppressGCTransition(s); // no need to report s as live in Do, it will be reported in Foo.
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for implicit byrefs do we need to do anything?

Sorry for not being explicit. I agree there are liveness concerns here, it is just convenient that the implicit byrefs haven't been expanded yet, so the copy elision code (the if (OperIsLocal()) { reuse the existing local } branch) "just works".

On that note this copy elision is actually bugged, it must be if (OperIs(GT_LCL_VAR)).

GenTree* field = gtNewIndir(TYP_REF, fieldAddr);
GenTree* keepalive = gtNewKeepAliveNode(field);

Statement* stmt = gtNewStmt(keepalive, impCurStmtOffs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect such structs to always be on the stack (because they have GC fields and used as a whole, so no independent promotion) so we don't have precise fields' liveness for them, we do zero-init and always report them as alive, don't we?

So my theory for such test:

S s = struct with GC fields;
nativeCallThatTakesStructAndDoesSomethingWithGCField(s);
GC.KeepAlive(s);

it will currently work even if you delete GC.KeepAlive(s) just because the struct S will be allocated on the stack and will be reported as live for the whole method. The only exception, in theory, could be a struct with 1 byref field

struct S
{
  Object o;
}

in this case Jit can replace the struct with its field and do precise liveness for it.
Am I correct and it will work even without KeepAlive ?

However, the previous example is just a current limitation.

What I don't understand is why we can't do this:

if (layout->HasGCPtr())
{
  GenTree* boxTemp     = gtNewLclvNode(boxTempNum, boxSrc->TypeGet());
  GenTree* boxSrcAddr  = gtNewOperNode(GT_ADDR, TYP_BYREF, boxTemp);
  GenTree* keepalive = gtNewKeepAliveNode(boxSrcAddr);
  addStmt.
}

and keep one GT_KEEPALIVE for the stuct, not for each GC field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct and it will work even without KeepAlive ?

Some ad-hoc testing for me says yes, larger structs survive, while the "one object" one dies.

What I don't understand is why we can't do this:

Put simply I didn't know it could be done that way (I was also trying to emulate how the manual KeepAlive on each field would be imported, since that's known to work, presumably, and the only testing we'll ever have for this optimization is the little test I wrote here).

I suppose one possible concern here is from the following scenario.

Say the source for the box is an implicit byref. Let's also presume that the method we are compiling is a reverse PInvoke one, so the location of the actual shadow copy will be in an unmanaged frame. Let's also presume that the user code is keeping the object fields of the struct in question alive via some other means (and they're pinned, obviously).

Now, in the method, the user code does this:

// Edit: UnmanagedCallersOnly only allows blittable types, but let's assume `MyStruct`
// consists of `void*`s and we `Unsafe.As` the reference to it as one to a struct with `object`s
void CallerIsUnmanaged(MyStruct myStructWithRefFields)
{
    TurnOffGlobalKeepAliveForMyStructFields(); // Clears static fields or whatever. Keeps the fields pinned.
    CallUnmanagedApiWithSuppressGCTransition(&myStructWithRefFields); // No transition so no spilling.
    GC.KeepAlive(myStructWithRefFields);
}

Do we risk not keeping the fields alive or not?

This is a completely made-up scenario from me, and I do not actually know if it would work or not (even with the current version of the code). I will test it though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So after thinking about this and some testing, I've come to the conclusion that the concern that fields in structs in foreign frames won't be "kept alive" is "real", but also per-existing and not something that much can (should) be done about.

Structs in the frame of the method being compiled will be kept alive by virtue of being address-exposed if we take the address of the whole struct, while implicit byrefs will be kept alive by their owner method.

In terms of CQ, the current approach of field-by-field importation produces sequences of dummy loads into a register like mov rax, [fld1], mov rax, [fld2], etc. Address-exposing the struct will avoid that, but will obviously impede other possible optimizations, though presumably that will matter little. It does seem a little weird to generate a dummy use of a byref just so that the struct is marked address-exposed, thus indirectly making it live for the whole method, but perhaps that's by design.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I am a little behind, how do we get address exposed set for the struct? Do we set address exposed when using your current approach with fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I am a little behind, how do we get address exposed set for the struct? Do we set address exposed when using your current approach with fields?

So if we use the address of the whole struct, it is as if we passed it to a call as an argument - that is why we'll have it address-exposed.

Do we set address exposed when using your current approach with fields?

It appears so...
LocalAddressVisitor sets it, and only later does morph fold it back to a LCL_FLD. I initially got confused between the two :(. Given this fact, I see no advantage to keeping the field approach and will use the whole struct as you suggested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocalAddressVisitor sets it, and only later does morph fold it back to a LCL_FLD. I initially got confused between the two :(. Given this fact, I see no advantage to keeping the field approach and will use the whole struct as you suggested.

It is not important when we fold them, because if the whole struct is marked as address exposed we can't do anything with the fields even if they are all in LCL_FLD form.

So if we use the address of the whole struct, it is as if we passed it to a call as an argument - that is why we'll have it address-exposed.

what if the struct is passed by value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not important when we fold them, because if the whole struct is marked as address exposed we can't do anything with the fields even if they are all in LCL_FLD form.

Yes I understand. I just thought that it was the address visitor that folded them (and didn't set the address exposed flag in doing so), for some reason.

what if the struct is passed by value?

You mean to KEEPALIVE? That would require codegen support, wouldn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, if the struct is 8-byte size or 16-byte on x64 unix/arm and it is passed to unmanaged call by value, is it marked as address exposed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, as you say, it is not. I think I misunderstood your original question...

@sandreenko sandreenko modified the milestones: 6.0.0, 7.0.0 Jul 23, 2021
Instead, import KEEPALIVE(BOX) as a sequence of
KEEPALIVE for object fields (flattened).
Move the importation to its own method.
Add debug output.
Rename a few things.
Basic verification of the new optimization.
It relies on a precise GC.
Running diffs for win-x64, there was exactly one result,
in this test. GC.KeepAlive here is used as a generic
"make sure that there is a use for the variable
beyond an assigment" function. The new optimization
eliminates this use as the struct in question has
no GC fields. Fix the test by using an actual generic
"use me" function.

Will be broken by advanced interprocediral analysis
that is apparently not as far away as one would expect,
but will do for now.
It is possible for a GC to occur just after
the KeepAlive calls and for CheckSuccess to
wrongly conclude the objects weren't kept alive.
@SingleAccretion
Copy link
Contributor Author

was there a discord discussion or something about it?

Heh, yes, a small one :). @jkoritzinsky asked on the C# discord (#lowlevel channel) if the Jit performed box elision for KeepAlive, and I decided to see how hard would it be to make it do that.

@jkoritzinsky
Copy link
Member

Specifically, I was looking at mechanisms for ensuring that the DllImportGenerator could provide support for keeping delegate fields alive in structs. Due to limited type information in ref assemblies, the generator cannot expect marshaled types to be accurately described. To ensure that all cases where a KeepAlive is needed are covered, the best option is to emit a KeepAlive in all cases. However, the cost of the box makes that not an option. @SingleAccretion was kind enough to take a stab at eliding the box, which would make KeepAlive usable in my scenario.

If the source is a GT_LCL_FLD, then we would only want
too keep alive the fields it encloses (or, since TYP_STRUCT
local fields are currently not supported, nothing).
The field-by-field approach results in the struct getting
address-exposed anyway, so there is no good reason to bloat
the IR. Just have the address escape directly.
@SingleAccretion
Copy link
Contributor Author

SingleAccretion commented Jul 30, 2021

Since this is a bit of an old branch, I've gone ahead and rebased to as to be able to build with my current runtime fork.

@SingleAccretion SingleAccretion changed the title Import KEEPALIVE(BOX) as a sequence of KEEPALIVE(FIELD) Import KEEPALIVE(BOX) as a sequence of KEEPALIVE(ADDR(TEMP)) Aug 4, 2021
@SingleAccretion SingleAccretion changed the title Import KEEPALIVE(BOX) as a sequence of KEEPALIVE(ADDR(TEMP)) Import KEEPALIVE(BOX) as KEEPALIVE(ADDR(TEMP)) Aug 4, 2021
Copy link
Member

@jkoritzinsky jkoritzinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation looks good to me based on my knowledge of the JIT.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good.

Nice to see another customer for gtTryRemoveBoxUpstreamEffects.

@ghost
Copy link

ghost commented Sep 2, 2021

Hello @jkoritzinsky!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit 4173838 into dotnet:main Sep 3, 2021
@SingleAccretion SingleAccretion deleted the Keepalive-Box-Opt branch September 7, 2021 20:50
@dotnet dotnet locked as resolved and limited conversation to collaborators Oct 7, 2021
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants