Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose a new CallConvSuppressGCTransition so SuppressGCTransition works for function pointers #38134

Closed
tannergooding opened this issue Jun 19, 2020 · 15 comments · Fixed by #46343
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.InteropServices
Milestone

Comments

@tannergooding
Copy link
Member

Expose a new CallConvSuppressGCTransition type so function pointers can also opt into the SuppressGCTransitionAttribute behavior.

Proposed API

namespace System.Runtime.CompilerServices
{
    public class CallConvSuppressGCTransition
    {
        // This type has no members and is identical in structure to other `CallConv*` types
    }
}
@tannergooding
Copy link
Member Author

CC. @333fred, @davidwrighton, @AaronRobinsonMSFT, @jkotas

I've marked this milestone future as it sounds like it isn't a requirement for .NET 5 and is instead a potential nice to have if and when we start getting customer asks.

@tannergooding tannergooding added this to the Future milestone Jun 19, 2020
@AaronRobinsonMSFT AaronRobinsonMSFT removed the untriaged New issue has not been triaged by the area owner label Jun 24, 2020
@carlossanlop carlossanlop added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-ready-for-review labels Jul 6, 2020
@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Aug 6, 2020
@terrajobst
Copy link
Member

terrajobst commented Aug 6, 2020

Video

  • Looks good as proposed
namespace System.Runtime.CompilerServices
{
    public class CallConvSuppressGCTransition
    {
    }
}

@GrabYourPitchforks
Copy link
Member

I think I have a scenario for this within CoreLib. Is there an experimental branch where this is implemented where I can run some tests?

@AaronRobinsonMSFT
Copy link
Member

@GrabYourPitchforks Nope. I don't think we will be creating an experimental branch for this work. What scenario are you thinking of?

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Nov 15, 2020

@AaronRobinsonMSFT I have a branch where I'm experimenting resolving #13091, which is the DateTime.UtcNow perf regression. Since this is a hot path, I'm also wondering if it's possible to suppress the GC transition on the p/invoke into GetSystemTimePreciseAsFileTime (see calli here).

The line I highlighted above is a calli into an unmanaged p/invoke: either GetSystemTimeAsFileTime or GetSystemTimePreciseAsFileTime. GetSystemTimeAsFileTime's implementation is trivial; it dereferences a well-known memory address and returns immediately. However, GetSystemTimePreciseAsFileTime has a small amount of synchronization inside of it. It doesn't take a lock or otherwise enter kernel space, but it does in very rare occasions execute a pause instruction as part of a spinwait. The total execution time of the GSTPAFT method is otherwise expected to be ~20 nanoseconds.

Edit: Using the managed calli hack in the sample above was just a proof of concept. If I leave the function pointer as delegate* unmanaged[StdCall]<...>, the sample still works, but there's an extra few nanoseconds of overhead I'm trying to shave off. Since the function we're p/invoking might not exist on the current OS I can't use a standard [DllImport, SuppressGCTransition] per this comment.

@AaronRobinsonMSFT
Copy link
Member

Using the managed calli hack in the sample above was just a proof of concept.

@GrabYourPitchforks Can I assume the trickery with the calli demonstrates (i.e. reduces overhead) justification for this work? If so, I am convinced we can prioritize this work for .NET 6.

@GrabYourPitchforks
Copy link
Member

@AaronRobinsonMSFT Yeah, that trickery was just to demonstrate that suppressing the GC transition saves a few nanoseconds. Since this is a hot path I'll take whatever win I can get.

The only difference in the runs below is whether the delegate is typed as delegate*<...> (mimicking suppress GC transition) or delegate* unmanaged[StdCall]<...>.

|    Method |        Job |       Toolchain |     Mean |    Error |   StdDev | Ratio |
|---------- |----------- |---------------- |---------:|---------:|---------:|------:|
| GetUtcNow | Job-CCKDPC | suppressgctrans | 27.14 ns | 0.082 ns | 0.073 ns |  0.94 |
| GetUtcNow | Job-OKNQNN |     withgctrans | 28.81 ns | 0.249 ns | 0.233 ns |  1.00 |

@AaronRobinsonMSFT
Copy link
Member

< 2 ns is a bit hard for me to justify anything actually. However, since there is a measurable difference and I don't think this work will be too hard I will mark this as .NET 6.

@AaronRobinsonMSFT AaronRobinsonMSFT modified the milestones: Future, 6.0.0 Nov 15, 2020
@GrabYourPitchforks
Copy link
Member

Yeah, 2ns isn't going to make or break this. It's more to point out that if this capability existed, we'd take advantage of it.

Related question: the Win32 API might - under some rare edge cases - call a pause instruction as part of a simple spinwait construct. There are otherwise no synchronization primitives used, and there are no kernel transitions in the code as far as I can tell. Would this pause instruction violate the spirit of "don't use [SuppressGCTransition] when the target performs synchronization work"?

@GrabYourPitchforks
Copy link
Member

As an interesting note, there are some cases where the calli instruction might target GetSystemTimeAsFileTime instead of GetSystemTimePreciseAsFileTime. In these cases, the difference is more pronounced (~5 ns, not ~2 ns). I don't know why this is, but it does repro over multiple runs for me.

Method Job Toolchain Mean Error StdDev Ratio
GetUtcNow Job-VDLCLB suppressgctrans 4.682 ns 0.0402 ns 0.0336 ns 0.47
GetUtcNow Job-CEQACO withgctrans 10.058 ns 0.0748 ns 0.0699 ns 1.00

@jkotas
Copy link
Member

jkotas commented Nov 15, 2020

pause instruction

pause instruction is fine with [SuppressGCTransition]

@AaronRobinsonMSFT
Copy link
Member

Would this pause instruction violate the spirit of "don't use [SuppressGCTransition] when the target performs synchronization work"?

Not to my understanding. The pause instruction is for CPU memory pipelining issues during speculative execution - at least that was how it was explained to me years ago. This usage isn't a concern to me.

In these cases, the difference is more pronounced (~5 ns, not ~2 ns)

Even that doesn't really move me much in these kind of APIs. What I am far more sensitive to is the reduced error and standard deviation. I think the high order bit should be predictable execution followed immediately by performance. Obviously there are nuances here but in general a steady and predictable platform is what I want.

@AaronRobinsonMSFT
Copy link
Member

The following is where one would start in the runtime to add support for this:

//----------------------------------------------------------
// Gets the unmanaged calling convention by reading any modopts.
//
// Returns:
// S_OK - Calling convention was read from modopt
// S_FALSE - Calling convention was not read from modopt
// COR_E_BADIMAGEFORMAT - Signature had an invalid format
// COR_E_INVALIDPROGRAM - Program is considered invalid (more
// than one calling convention specified)
//----------------------------------------------------------
static HRESULT TryGetUnmanagedCallingConventionFromModOpt(
_In_ Module *pModule,
_In_ PCCOR_SIGNATURE pSig,
_In_ ULONG cSig,
_Out_ CorUnmanagedCallingConvention *callConvOut,
_Out_ UINT *errorResID);

The various uses would need to pass the CORINFO_SIGFLAG_SUPPRESS_GC_TRANSITION flag to the JIT. A quick look here indicates one would likely begin by indicating to the IL Stub MethodDesc this state during creation rather than having it querying for a non-existent attribute.

@AaronRobinsonMSFT AaronRobinsonMSFT added this to To do in Interop-CoreCLR 6.0 via automation Nov 15, 2020
@GrabYourPitchforks
Copy link
Member

Just to clarify - this issue isn't blocking me. We're able to find significant perf wins, even without this feature.

@AaronRobinsonMSFT
Copy link
Member

@GrabYourPitchforks Appreciate the clarification. I like to have some breadcrumbs in issues in case someone else does the work or I pick this up 4 months from now. Since it was in my head I took 15 minutes to verify my understanding and add notes for future me or some industrious soul.

@jkoritzinsky jkoritzinsky self-assigned this Dec 22, 2020
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Dec 22, 2020
Interop-CoreCLR 6.0 automation moved this from To do to Done Jan 7, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 7, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Feb 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.InteropServices
Projects
No open projects
8 participants