-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow async interruptions on safepoints #95565
Conversation
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas Issue Details
|
c10522c
to
d12c536
Compare
2c94646
to
d569006
Compare
Diff results for #95565Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (+0.12% to +0.38%)
MinOpts (+0.02% to +1.26%)
FullOpts (+0.08% to +0.38%)
Throughput diffs for linux/x64 ran on windows/x64Overall (+0.12% to +0.35%)
MinOpts (+0.02% to +1.36%)
FullOpts (+0.07% to +0.35%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (+0.15% to +0.38%)
MinOpts (+0.01% to +1.29%)
FullOpts (+0.05% to +0.38%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (+0.15% to +0.38%)
MinOpts (+0.02% to +1.29%)
FullOpts (+0.08% to +0.38%)
Throughput diffs for windows/x64 ran on windows/x64Overall (+0.13% to +0.36%)
MinOpts (+0.02% to +1.24%)
FullOpts (+0.06% to +0.34%)
Details here Throughput diffs for linux/arm ran on windows/x86Overall (+0.14% to +0.42%)
MinOpts (+0.02% to +1.66%)
FullOpts (+0.10% to +0.42%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (+0.11% to +0.33%)
MinOpts (+0.05% to +0.90%)
FullOpts (+0.07% to +0.33%)
Throughput diffs for linux/x64 ran on linux/x64Overall (+0.10% to +0.31%)
MinOpts (+0.06% to +1.02%)
FullOpts (+0.06% to +0.31%)
Details here |
I think this is ready for a discussion. |
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsTL;DR
== What is going on here: Currently we have roughly two kinds of methods - fully and partially interruptible. We can initiate stackwalks in fully interruptible methods as well as can stackwalk through them. Partially interruptible allow only stackwalk through them as only call-return sites have info about the GC content of nonvolatile registers (and volatile ones are conveniently dead at return sites). A good question to ask - can we initiate a stack walk when stopped on a call-return site in a partially interruptible code? == What is gained:
== Posible further improvements: Once we have this, hijacking could be simplified as well. All this special casing would be unnecessary as we could put synchronous interrupts (hijacked case) and asynchronous interrupted case (signals or OS suspension) on the same plan with respect to GC reporting. - just ask the decoder to report volatile registers, of which only returns would be live at return site - same as what happens in asynchronous case. The difference would be only in how interruption happened and how registerset display was formed (i.e pushed by hijack probe or by the OS). A sizable chunk of architecture and platform-specific c++/asm code that deals with return flags could be gone. Like a good part of this file: runtime/src/coreclr/nativeaot/Runtime/ICodeManager.h Lines 25 to 38 in 9a3cacd
Also we may be able to drop the return kind bits in the GC info entirely, I think there are no other uses.
|
The reason why this is NativeAOT is just because NativeAOT is more straightforward in this area and as such more approachable to experimenting. I did experiment with CoreCLR locally. It worked and passed libraries tests, but I was not confident enough to include those changes in the same PR. |
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
rebased onto recent main, resolved conflicts |
Seems like Bruce is out for a while longer still. @kunalspathak are you going to review? |
I have asked @jakobbotsch last week to take a look. |
@jakobbotsch looked through the changes after that and signed off. (Thanks, Jakob!!) |
I think we are good. |
I will update the JIT Guid and proceed to merging this, if nothing comes up. |
The rest is green. |
Thanks everybody for reviewing and helping with this!!! |
* allow async interruptions on safepoints * ARM64 TODO * report GC ref/byref returns at partially interruptible callsites * enable on all platforms * tweak * fix after rebasing * do not record tailcalls * IsInterruptibleSafePoint * update gccover * turn on new behavior on a gcinfo version * tailcalls tweak * do not report unused returns * CORINFO_HELP_FAIL_FAST should not be a safepoint * treat tailcalls as emitNoGChelper * versioning tweak * enable in CoreCLR (not just for GC stress scenarios) * fix x86 build * other architectures * added a knob DOTNET_InterruptibleCallSites * moved DOTNET_InterruptibleCallSites check to the code manager * JIT_StackProbe should not be a safepoint (stack is not cleaned yet) * Hooked up GCInfo version to R2R file version * formatting * GCStress support for RISC architectures * Update src/coreclr/inc/gcinfo.h Co-authored-by: Jan Kotas <jkotas@microsoft.com> * make InterruptibleSafePointsEnabled static * fix linux-x86 build. * ARM32 actually can`t return 2 GC references, so can filter out R1 early * revert unnecessary change * Update src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/Amd64/GcInfo.cs Co-authored-by: Filip Navara <filip.navara@gmail.com> * removed GCINFO_VERSION cons from GcInfo.cs * Use RBM_INTRET/RBM_INTRET_1 * Update src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/Amd64/GcInfo.cs Co-authored-by: Jan Kotas <jkotas@microsoft.com> * do not skip safe points twice (stress failure) * revert unnecessary change in gccover. * fix after rebase * make sure to check `idIsNoGC` on all codepaths in `emitOutputInstr` * make CORINFO_HELP_CHECK_OBJ a no-gc helper (because we can) * mark a test that tests WaitForPendingFinalizers as GCStressIncompatible * NOP * these helpers should not form GC safe points * require that the new block has BBF_HAS_LABEL * Apply suggestions from code review Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com> * updated JITEEVersionIdentifier GUID --------- Co-authored-by: Jan Kotas <jkotas@microsoft.com> Co-authored-by: Filip Navara <filip.navara@gmail.com> Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
For the final effects on suspension robustness and outliers in CoreCLR from this change + from ported NativeAOT thread suspension algorithm (Re: #101782 ) I have tried the current bits from the main branch with the same repro as used above (#95565 (comment)) The benchmark prints out GC pauses in milliseconds. Smaller is better. CoreCLR on Windows 10 Before this PR and #101782 we saw multi-minute pauses as measured in #95565 (comment)
With bits from current main I see:
The suspension happens in sub-millisecond range and thus below the sensitivity of the benchmark. |
* allow async interruptions on safepoints * ARM64 TODO * report GC ref/byref returns at partially interruptible callsites * enable on all platforms * tweak * fix after rebasing * do not record tailcalls * IsInterruptibleSafePoint * update gccover * turn on new behavior on a gcinfo version * tailcalls tweak * do not report unused returns * CORINFO_HELP_FAIL_FAST should not be a safepoint * treat tailcalls as emitNoGChelper * versioning tweak * enable in CoreCLR (not just for GC stress scenarios) * fix x86 build * other architectures * added a knob DOTNET_InterruptibleCallSites * moved DOTNET_InterruptibleCallSites check to the code manager * JIT_StackProbe should not be a safepoint (stack is not cleaned yet) * Hooked up GCInfo version to R2R file version * formatting * GCStress support for RISC architectures * Update src/coreclr/inc/gcinfo.h Co-authored-by: Jan Kotas <jkotas@microsoft.com> * make InterruptibleSafePointsEnabled static * fix linux-x86 build. * ARM32 actually can`t return 2 GC references, so can filter out R1 early * revert unnecessary change * Update src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/Amd64/GcInfo.cs Co-authored-by: Filip Navara <filip.navara@gmail.com> * removed GCINFO_VERSION cons from GcInfo.cs * Use RBM_INTRET/RBM_INTRET_1 * Update src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/Amd64/GcInfo.cs Co-authored-by: Jan Kotas <jkotas@microsoft.com> * do not skip safe points twice (stress failure) * revert unnecessary change in gccover. * fix after rebase * make sure to check `idIsNoGC` on all codepaths in `emitOutputInstr` * make CORINFO_HELP_CHECK_OBJ a no-gc helper (because we can) * mark a test that tests WaitForPendingFinalizers as GCStressIncompatible * NOP * these helpers should not form GC safe points * require that the new block has BBF_HAS_LABEL * Apply suggestions from code review Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com> * updated JITEEVersionIdentifier GUID --------- Co-authored-by: Jan Kotas <jkotas@microsoft.com> Co-authored-by: Filip Navara <filip.navara@gmail.com> Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
TL;DR
== What is going on here:
Currently we have roughly two kinds of methods - fully and partially interruptible. We can initiate stackwalks in fully interruptible methods as well as can stackwalk through them. Partially interruptible allow only stackwalk through them as only call-return sites have info about GC content of stack and nonvolatile registers (and volatile ones are conveniently dead at return sites).
The advantage of partially interruptible methods is that they carry less info and thus everything that can be partially interruptible is emitted as such. The ratio of partially/fully interuptible varies, but generally partially interruptible is in vast majority.
A good question to ask - can we initiate a stack walk when stopped on a call-return site in a partially interruptible code?
The current answer is "almost". We already store the info about nonvolatile registers and stack locations for these sites, so in theory we have enough information to start stackwalk. Almost enough.
The only piece that is missing is the content of return registers. Unlike the case of walking through virtually unwound callers, when return registers are not live yet, in the case of actual return, they contain return values, and if those are object refs, they must be reported to GC. Once GC info could tell if return registers are live after a call and interesting to GC, the call-return sites become interruptible.
== What is gained:
#2
is actually more interesting. There are known cases that present a challenge to suspension. One of them is a "short call in a loop". This happens when we have a relatively tight loop that perform call(s) to short methods. At compile time we only know that there are calls, thus we do not put GC polls in the loop. As a result we end up with a loop that can be suspended only when we manage to catch the thread at a few instructions in the calee.Superscalar multiple-issue CPUs can process multiple instructions at a time, which makes this situation worse as that effectively reduces the number of possible hijackable points in the calee. We may see the IP of a stopped thread at the call-return site way more often than inside the calee's body.
== Posible further improvements:
Once we have this, hijacking could be simplified as well.
Currently, while installing a hijack, we need to consult the GC info about the GC-refness of contained method's returns and stash that information in a transition frame, so that, if hijack is hit, the stack crawler could know if/how report the return registers.
All this special casing would be unnecessary as we could put synchronous interrupts (hijacked case) and asynchronous interrupted case (signals or OS suspension) on the same plan with respect to GC reporting. - just ask the decoder to report volatile registers for the leaf frame, of which only returns would be live at return site - same as what happens in asynchronous case. The difference would be only in how interruption happened and how registerset display was formed (i.e pushed by hijack probe or by the OS).
A sizable chunk of architecture and platform-specific c++/asm code that deals with return flags could be gone. Like a good part of this file:
runtime/src/coreclr/nativeaot/Runtime/ICodeManager.h
Lines 25 to 38 in 9a3cacd
Also we may be able to drop the return kind bits in the GC info entirely, I think there are no other uses.