Comment out a false-positive assert that is firing and preventing jit-diffs #52462

tannergooding · 2021-05-07T19:38:43Z

Putting this up as suggested by @AndyAyersMS on Discord.

As per the comment in code and the analysis in #51728, we have many scenarios where a pointer is converted to a byref. These conversions can range from something as simple as ref *x to things like ref T Unsafe.AsRef<T>(void*) to even new Span<T>(void*, int).

This conversion is correctly tracked and represented in metadata. However, if the pointer is enregistered and "last use" the register allocator may decide the byref will exist in the same register. There are many places in codegen that have logic similar to:

if (tgtReg != srcReg)
{
    emit(INS_mov, tgtReg, srcReg);
}

Due to how many locations this check exists, it is non trivial to go and fix this to be handled correctly everywhere today. Additionly, for most scenarios, this is a "safe" optimization. However, when that move is for a pointer<->byref conversion, eliding this move means that the liveness update will never occur and so asserts such as the one listed will fire due to the mismatch.

…-diffs

AndyAyersMS · 2021-05-07T19:45:20Z

cc @dotnet/jit-contrib

I think we should take this to unblock other work.

BruceForstall · 2021-05-07T21:36:20Z

So a byref can be constructed "out of thin air" and needs to be reported to the GC? In the case above, where the mov is elided, if a byref is going live, do we report it going live to the GC? Or, conversely, if it is going dead (because a byref is being converted to a non-GC object pointer?), do we report it going dead?

src/coreclr/jit/emitxarch.cpp

tannergooding · 2021-05-07T22:06:31Z

So a byref can be constructed "out of thin air" and needs to be reported to the GC?

@BruceForstall That's my understanding. We have a pointer (possibly to the stack or to native memory) and we turn it into a byref. In the case of Unsafe.AsRef<T> this is often simply done via ldarg.0; ret.

When we have inlining, struct promotion, and other things at play, we will end up with something like the following, where V01 arg1 long (V01 is a pointer) and V40 tmp31 byref V14._value(offs=0x00) P-INDEP "field V14._value (fldOffset=0x0)" (V40 is the promoted byref field for Span<T>):

Generating: N015 (  1,  1) [000085] ------------        t85 =    LCL_VAR   byref  V01 arg1         u:1 rdx (last use) REG rdx $c0
                                                              /--*  t85    byref
Generating: N017 (  5,  4) [000065] DA----------              *  STORE_LCL_VAR byref  V40 tmp31        d:1 rdx REG rdx

Since they are both assigned rdx, one of the codegen paths decides to not emit the move.

In the case above, where the mov is elided, if a byref is going live, do we report it going live to the GC? Or, conversely, if it is going dead (because a byref is being converted to a non-GC object pointer?), do we report it going dead?

We only do liveness updates for emitted code, eliding the move in codegen means that liveness update never happens.

Updating codegen to still do the liveness update for such moves is non-trivial because we do not centrally elide the moves today. Instead, we have dozens of if (tgtReg != srcReg) checks scattered throughout each codegen file and sometimes in the emitter itself.

I started a prototype to centralize the move elision handling: tannergooding@875cb1f, but it isn't done and still needs more work

BruceForstall · 2021-05-07T22:26:31Z

I guess I'm having a hard time understanding if this is really a "false-positive" assert. From the scenario you describe, a byref goes live but we don't tell the GC about it. If there's a GC at the object pointed to is not otherwise rooted, then the GC can collect it, and if we don't report the GC byref, the GC doesn't need to update it. So it seems like it's actually a GC hole.

(How byrefs "appear out of thin air" I guess is not worth considering since this is all unsafe code and the user sure better know what they're doing?)

saucecontrol · 2021-05-07T22:32:49Z

You don't have to be unsafe to create a byref out of thin air. e.g. Span<byte> span = stackalloc byte[64]

Edit: never mind, that could still be a real ref 🤦 , but yeah, we magic byrefs out unmanaged pointers with Span all the time.

tannergooding · 2021-05-07T22:54:00Z

I guess I'm having a hard time understanding if this is really a "false-positive" assert. From the scenario you describe, a byref goes live but we don't tell the GC about it. If there's a GC at the object pointed to is not otherwise rooted, then the GC can collect it, and if we don't report the GC byref, the GC doesn't need to update it. So it seems like it's actually a GC hole.

AFAICT, this scenario can only happen when going from a pointer to a byref or vice-versa. For the first scenario, going from a pointer to a byref, since pointers are not GC tracked, either the user introduced their own GC-hole (such as from void* Unsafe.AsPointer(ref T) where the ref T is not already pinned) or it is not a real byref and so it doesn't actually matter if its reported, its just good for us to correctly track it. Likewise for the latter, going from a byref to a pointer, this is only valid if the byref is pinned.

Now, I could conceive a possible GC hole in a scenario where you go from pinned byref->pointer->byref and then kill the pinned byref. Imagine for example something like:

var a = new int[30];               // a is a live gcref
ref var x = ref a[0];              // x is a live byref
a = null;                          // a is a dead gcref, the backing array is kept alive due to the x byref

fixed (int* p = &x)                // p is a live pinned byref
{
    x = ref Unsafe.NullRef<int>(); // x is a dead byref, the backing array is kept alive due to the p pinned byref
    int* t = p;                    // t is a pointer, not a pinned byref
    x = ref Unsafe.AsRef<int>(t);  // x is a live byref, it still points to the pinned backing array
} // p is a dead pinned byref, x should be keeping the backing array alive

GC.Collect();
Console.WriteLine(x);

Now, if t is given a register (say edx) and then x is given the same register (also edx) the JIT would elide the move and not update the liveness and so the "last tracked reference" to the created array would go dead at the end of the fixed block. While in practice, it should be kept alive by x since that's a byref and should be considered live.

It's possible that this isn't happening in practice due to register allocation or a number of other scenarios. But I could conceive it being an actual hole in some edge case scenario like this. But maybe someone more knowledgeable here could weigh in.

AndyAyersMS · 2021-05-10T16:23:11Z

@BruceForstall I have many of the same concerns from you. It is not easy to understand how a reg-reg mov can be the source of a GC ref. But presumably we're already "handling" this case today in cases where the registers are different, it's just when they are the same that we don't do the GC liveness updates.

Tanner has a preliminary set of changes where we make the GC updates in the same reg-reg mov case, but they're extensive and still don't resolve all the cases of this issue. I've also seen cases where the allocator uses commutativity to reorder a last-use add to a byref and introduces a small window where there is no live byref, and the assert fires (see #10821 (comment)).

So I think the assert is intrinsically flawed and is now giving us a lot of false positive cases and so we should disable it for now as we work on a better solution.

My thinking is that the only comfortable resolution to this problem is to walk back up the data flow, and we'll find either pointer being retyped as a byref or a pin, or else something we can't track further (say an arg); once we have that we can retype the entire lifetime as pointer or byref as appropriate.

My understanding is this sort of "gc rooting" flow analysis is akin to how NUTC did (does) gc reporting, instead of the gc pointer kind distinctions we try and maintain throughout the jit today.

But all that requires dataflow. Figuring out where this can/should happen is not straightforward. Since most of these issues seem to arise from inlining, the natural place for this analysis is perhaps the object stack allocation phase; it happens to already run a very similar analysis since it needs to change GC types.

tannergooding · 2021-05-12T22:26:57Z

@BruceForstall, @AndyAyersMS; #52661 is an alternative to this.

It centralizes the move management and results in zero PMI diffs for x64 S.P.Corelib (still validating for tests and other platforms). It's relatively small (+854, -820 lines) and ensures that the relevant liveness updates happen so also resolves the assert.

tannergooding · 2021-06-03T18:27:28Z

Closing this in favor of #53684, which is the second part to #52661

Comment out a false-positive assert that is firing and preventing jit…

4981e41

…-diffs

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 7, 2021

Applying formatting patch

3e18032

BruceForstall reviewed May 7, 2021

View reviewed changes

src/coreclr/jit/emitxarch.cpp Outdated Show resolved Hide resolved

Adding CLANG_FORMAT_COMMENT_ANCHOR

25ef348

tannergooding mentioned this pull request May 13, 2021

Refactor codegen and emit to centralize mov reg, reg handling #52661

Merged

tannergooding closed this Jun 3, 2021

dotnet locked as resolved and limited conversation to collaborators Jul 3, 2021

tannergooding deleted the workaround-51728 branch November 11, 2022 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comment out a false-positive assert that is firing and preventing jit-diffs #52462

Comment out a false-positive assert that is firing and preventing jit-diffs #52462

tannergooding commented May 7, 2021 •

edited

AndyAyersMS commented May 7, 2021

BruceForstall commented May 7, 2021

tannergooding commented May 7, 2021 •

edited

BruceForstall commented May 7, 2021

saucecontrol commented May 7, 2021 •

edited

tannergooding commented May 7, 2021 •

edited

AndyAyersMS commented May 10, 2021

tannergooding commented May 12, 2021

tannergooding commented Jun 3, 2021 •

edited

Comment out a false-positive assert that is firing and preventing jit-diffs #52462

Comment out a false-positive assert that is firing and preventing jit-diffs #52462

Conversation

tannergooding commented May 7, 2021 • edited

AndyAyersMS commented May 7, 2021

BruceForstall commented May 7, 2021

tannergooding commented May 7, 2021 • edited

BruceForstall commented May 7, 2021

saucecontrol commented May 7, 2021 • edited

tannergooding commented May 7, 2021 • edited

AndyAyersMS commented May 10, 2021

tannergooding commented May 12, 2021

tannergooding commented Jun 3, 2021 • edited

tannergooding commented May 7, 2021 •

edited

tannergooding commented May 7, 2021 •

edited

saucecontrol commented May 7, 2021 •

edited

tannergooding commented May 7, 2021 •

edited

tannergooding commented Jun 3, 2021 •

edited