-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call to unmanaged function pointer always emits full transition frame #45077
Comments
This massively impacts Silk.NET where function pointers are used as part of 99% of all functions it exposes. Eliminating the transition frame would be a pleasant boost in performance library-wide. Context: (Don't worry a lot of the ugliness in this file is to workaround various other JIT quirks) |
That is by design. It is required for the precise GC scanning. We have #38134 opened on exposing SuppressGCTransition calling convention for function pointers. However, this optimization can be only used in very specific situations as described in https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.suppressgctransitionattribute?view=net-5.0 |
Out of curiosity, why is this needed for a fnptr but not a pinvoke? |
It is needed for both fptr and PInvoke.
This is not a correct statement. |
@jkotas I'll admit I'm not an absolute expert in this area, was just quoting @tannergooding there (from chatting in Discord).
I was specifically referring to this last bit where he said P/Invoke methods got this additional optimization on .NET 5. I'd love to understand more about what is happening here, as the resulting codegen is at the very least surprising I'd say. |
@jkotas, could you elaborate why all non-volatile registers need to be saved? Is that a hard limitation in how the GC does tracking today and is it something that would be reasonable to track adjusting in the future to make the transition lower cost? Likewise, is there something that could be optimized here for multiple transitions in a single call? Now that function pointers and P/Invoke transitions can be somewhat inlined, it would seem that if you have: public unsafe void M(delegate* unmanaged[Stdcall]<int, void> test) {
test(1);
test(2);
} It would be beneficial to do effectively:
rather than the following that it looks to do:
|
The non-volatile registers can contain object references and the stackwalk needs to find them if the GC runs while the PInvoke is executing. I do not see how we can adjust it without hurting performance in other places. Yes, it should be possible to optimize out some of in-between transitions in the sequence of PInvokes with no code between them. This optimization is not specific to function pointers in any way. This optimization would likely require changes in number of places, for example debugger may need work to make stepping work in the presence of this optimization.
Inlining of P/Invoke transitions was around since .NET Framework 2.0 (it was probably in .NET Framework 1.x too - I am just not 100% sure). We have not changed anything fundamental here recently. |
https://github.com/dotnet/runtime/blob/master/docs/design/coreclr/botr/intro-to-clr.md#the-stack-unwinding-problem has high-level description of this problem. |
Description
Stumbled upon this missed optimization while analyzing some codegen from @tannergooding's
TerraFx
library, which relies very heavily on unmanaged function pointers. It seems that no matter what signature is used, the JIT will always emit a full transition frame when calling an unmanaged function pointer, backup up all local registers on the stack, etc.They basically behave similarly to P/Invoke calls up to .NET Core 3.1.
Consider this example:
This results in the following (using disasmo from a local CoreCLR build from
release/5.0
):That's... Quite a lot of codegen for just invoking a single unmanaged function pointer 🤔
Configuration
release.5.0
(from ea56d0c)build -c checked clr+libs -os Windows_NT -a x64
Any chances a fix for this might be serviced to a .NET 5.x update?
Since function pointers are especially useful for high performance scenarios, this seems like a relatively big missed optimization, especially considering standard P/Invoke methods don't have this issue anymore when running on .NET 5?
The text was updated successfully, but these errors were encountered: