Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RyuJIT: tail call stress assert in importer on Linux #7586

Closed
BruceForstall opened this issue Mar 9, 2017 · 18 comments
Closed

RyuJIT: tail call stress assert in importer on Linux #7586

BruceForstall opened this issue Mar 9, 2017 · 18 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions JitStress CLR JIT issues involving JIT internal stress modes JitUntriaged CLR JIT issues needing additional triage
Milestone

Comments

@BruceForstall
Copy link
Member

With COMPlus_TailcallStress=1, on non-Windows (Ubuntu, Mac, CentOS), the following assertion fires:

Assertion failed 'impTailCallRetTypeCompatible(info.compRetNativeType, info.compMethodInfo->args.retTypeClass, (var_types)callee->gtReturnType, callee->callSig->retTypeClass)' in 'System.Numerics.Vector:SquareRoot(struct):struct' (IL size 7)

in test JIT/SIMD/SqrtGeneric_r/SqrtGeneric_r.sh

You can trigger these with:

@dotnet-bot test CentOS7.1 tailcallstress
@dotnet-bot test OSX tailcallstress
@dotnet-bot test Ubuntu tailcallstress

category:correctness
theme:tail-call
skill-level:intermediate
cost:medium

@BruceForstall
Copy link
Member Author

@AndyAyersMS : @sivarv said you recent touched code in this area. Perhaps that is at fault? Want to investigate? (I don't know if this is a regression or not)

@AndyAyersMS
Copy link
Member

Seems likely it was my change dotnet/coreclr#9405 which identifies implicit tail call candidates from inlined code. A quick look at the code makes me wonder if it is a stress mode only bug -- a stress-mode specific update may be missing for the new cases.

Will take a look, though it may be a few days before I get around to it.

@AndyAyersMS
Copy link
Member

Having trouble getting testing going again on my Ubuntu box... not sure what changed but coreclr fails to initialize.

@AndyAyersMS
Copy link
Member

Ok, finally can repro.... not sure this has anything to do with my changes but will investigate.

@AndyAyersMS
Copy link
Member

Issue is that with UNIX ABI, for methods returning SIMD16 types, compRetType and compRetNativeType differ. The check in the importer compares compRetType to callee return type, while the check in morph uses compRetNativeType. If the importer check passes (which gates enabling tail call at given call site) the morph check will fail.

@AndyAyersMS
Copy link
Member

If we think the assert adds value, then we can fix it by using compRetType for the caller in both places, or by trying to find the appropriate callee native return type. The latter seems more in keeping with the spirit of the assert since we seem to be trying to double-check ABI compatibility.

And if we want to find the appropriate ABI type on the callee side we'll need to duplicate/refactor the logic in lvaInitTypeRef that updates compRetNativeType.

@AndyAyersMS
Copy link
Member

@BruceForstall any thoughts on where we should go here? Options are:

  • remove assert
  • re-express in terms of logical types (stop looking at compRetNativeType)
  • re-express in terms of ABI types (map callee's type to ABI type, likely entails some refactoring)

@BruceForstall
Copy link
Member Author

It feels to me we should just use the precise compRetType in both places.

@AndyAyersMS
Copy link
Member

Unfortunately comparing compRetType for caller and callee is not always the right thing either.

@BruceForstall
Copy link
Member Author

Do you have an example where that is true?

@AndyAyersMS
Copy link
Member

Sure. Looks like if the return type gets passed back by reference then callee's type has already been updated.

Assert failure(PID 12664 [0x00003178], Thread: 12664 [0x3178]): Assertion failed 'impTailCallRetTypeCompatible(info.compRetType, info.compMethodInfo->args.retTypeClass, (var_types)callee->gtReturnType, callee->callSig->retTypeClass)' in 'System.Type:GetTypeHandleInternal():struct:this' (IL size 7)

    File: /home/andy/repos/coreclr/src/jit/morph.cpp Line: 6754
    Image: /home/andy/repos/coreclr/bin/Product/Linux.x64.Checked/crossgen

(lldb) print this->info.compRetType
(var_types) $0 = TYP_STRUCT

(lldb) print this->info.compRetNativeType
(var_types) $1 = TYP_REF

(lldb) print (var_types)callee->gtReturnType
(var_types) $2 = TYP_REF

(lldb) expr this->gtDispTree(callee, 0, 0, 0, 0)
               [000002] --C-G-------             *  callv ind ref    System.Type.get_TypeHandle
               [000001] ------------ this in rdi \--*  lclVar    ref    V00 this 

(lldb) print info.compMethodInfo->args.retTypeClass
(CORINFO_CLASS_HANDLE) $3 = 0x00007ffff5ab1280

(lldb) print callee->callSig->retTypeClass
(CORINFO_CLASS_HANDLE) $4 = 0x00007ffff5ab1280

@RussKeldorph
Copy link
Contributor

@AndyAyersMS @BruceForstall Is this just a case of an incorrect assertion? Should we disable/weaken the assertion or disable a test for now to get things green(er) and leave this issue open to resolve it later?

@AndyAyersMS
Copy link
Member

Yes, it seems most likely that the assertion is incorrect. ABI lowering happens at different "rates" for the root method and its callees so getting the assert right here may need to take that more fully into account. However it is hard to reconcile the assert-is-incorrect hypothesis with just one test failure -- you'd expect more widespread impact.

Perhaps the fact that just one test fails is is telling us that test case ABI coverage for SIMD types is not as diverse as we'd hope. But tail call stress is kind of an odd stress mode and I can't yet rule out that it may be an issue with the way the stress mode is implemented or that the assert failure is hinting at an actual problem.

So I think we should keep looking at this.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Apr 21, 2017

Realized why SquareRoot is special -- for Vector<T> it is an intrinsic only for float and double. For integer types the static square root just wraps the instance square root, and so is potentially tail-callable. In "stress mode" we pretend it's marked with the tail prefix, and for such explicit tail calls we do additional checking, and for SIMD types this checking is not quite right, triggering the assert above.

So all that explains why the assert only happens in this one case, and that there's no underlying bug and nothing else nefarious going on that I can see.

A simple fix is to modify impTailCallRetTypeCompatible to return true if the caller and callee return type class handles match. I'll send out a PR.

[edit: realized that we only tail call here on Windows, not on Unix, so reworded the first paragraph a bit; see below].

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Apr 21, 2017

Wasn't entirely satisfied yet so kept looking.

The root cause of the problem is that in impFixupCallStructReturn under FEATURE_UNIX_AMD64_STRUCT_PASSING, we change the gtReturnType of the call. It's not clear to me that this is entirely necessary or desirable or matches what the comment seems to imply.

#ifdef FEATURE_UNIX_AMD64_STRUCT_PASSING
    ...
    // The return type will remain as the incoming struct type unless normalized to a
    // single eightbyte return type below.
    call->gtReturnType = call->gtType;

The net effect is that for simd types we change the type from simd to struct here. So this leads to the assert in morph firing because the callee's return type has diverged from the caller's.

So one fix is to do what I suggested above and update impTailCallRetTypeCompatible to return true if the handles match. Another is to remove the update of gtReturnType above. Both fix the assert.

In both cases we ultimately (on Unix) don't tail call, whereas we do on Windows. I don't know this ABI as well as I do the Windows case so I can't say if we're overly conservative here or not. This might be something worth investigating because it seems like we're introducing an unnecessary copy. But perhaps it has to be this way.

Fixing impTailCallRetTypeCompatible is certainly less risky so we could do that for now and come back later and see if the other fix is the one we ultimately prefer.

AndyAyersMS referenced this issue in AndyAyersMS/coreclr Apr 22, 2017
If class handles match, the return types are compatible, no matter
what the jit types indicate.

Addresses #10047.
@AndyAyersMS
Copy link
Member

Will propose the simple fix for now.

@AndyAyersMS
Copy link
Member

Assert will no longer fire in this case, but we might want to revisit this down the road.

So moving to future and keeping it open.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@jakobbotsch
Copy link
Member

Closing this since it seems stale and we had a fix already.

@jakobbotsch jakobbotsch closed this as not planned Won't fix, can't repro, duplicate, stale Dec 15, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jan 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions JitStress CLR JIT issues involving JIT internal stress modes JitUntriaged CLR JIT issues needing additional triage
Projects
None yet
Development

No branches or pull requests

5 participants